acl acl2011 acl2011-58 knowledge-graph by maker-knowledge-mining

58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Source: pdf

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. [sent-3, score-0.272]

2 We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. [sent-4, score-0.125]

3 Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. [sent-5, score-0.575]

4 We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions. [sent-6, score-0.302]

5 1 Introduction Statistical constituent parsers have gradually increased in accuracy over the past ten years. [sent-7, score-0.205]

6 Prior work incorporating parse structure into machine translation (Chiang, 2010) and Semantic Role Labeling (Tsai et al. [sent-9, score-0.118]

7 Although syntax is becoming increasingly important for large-scale NLP applications, constituent parsing is slow too slow to scale to the size of many potential consumer applications. [sent-12, score-0.282]

8 com @ — |G| is the number of grammar productions, a nonnegligible ncounmstbaenrt . [sent-16, score-0.138]

9 o fIn gcrraemasmesa rin p accuracy hs,a ave n primarily been accomplished through an increase in the size of the grammar, allowing individual grammar rules to be more sensitive to their surrounding context, at a considerable cost in efficiency. [sent-17, score-0.201]

10 , 2006) have increased the grammar size |G| from a f2e0w0 t)ho huavsaen din c rrueleass tdo sheev egrraalm mmilalrio sniz ien an explicitly enumerable grammar, or even more in an implicit grammar. [sent-20, score-0.164]

11 Exhaustive search for the maximum likelihood parse tree with a state-of-the-art grammar can require over a minute of processing for a single sentence of 25 words, an unacceptable amount of time for real-time applications or when processing millions of sentences. [sent-21, score-0.44]

12 Deterministic algorithms for dependency parsing exist that can extract syntactic dependency structure very quickly (Nivre, 2008), but this approach is often undesirable as constituent parsers are more accurate and more adaptable to new domains (Petrov et al. [sent-22, score-0.332]

13 , Charniak (2000), Petrov and Klein (2007a), make use of approximate inference, limiting their search to a fraction of the total search space and achieving speeds of between one and four newspaper sentences per second. [sent-26, score-0.129]

14 The paradigm for building stateof-the-art parsing models is to first design a model structure that can achieve high accuracy and then, after the model has been built, design effective approximate inference methods around that particular model; e. [sent-27, score-0.247]

15 While both of the above mentioned papers use the CYK dynamic programming algorithm to search through possible solutions, their particular methods of approximate inference are quite distinct. [sent-32, score-0.118]

16 In this paper, we examine a general approach to approximate inference in constituent parsing that learns cell-specific thresholds for arbitrary grammars. [sent-33, score-0.365]

17 For each cell in the CYK chart, we sort all potential constituents in a local agenda, ordered by an estimate of their posterior probability. [sent-34, score-0.487]

18 Given features extracted from the chart cell context e. [sent-35, score-0.787]

19 , span width; POS-tags and words surrounding the boundary of the cell we train a log linear model to predict how many constituents should be popped from the local agenda and added to the chart. [sent-37, score-0.873]

20 More generally, instead of a binary classification decision, we can also use this method to predict the desired cell population directly and get cell closure for free when the classifier predicts a beam-width of zero. [sent-39, score-0.914]

21 Rather, the beam-width prediction model is trained to learn the rank of constituents in the maximum likelihood trees. [sent-42, score-0.271]

22 We simply parse sections 2-21 of the WSJ treebank and train our search models from the output of these trees, with no prior knowledge of the non-terminal set or other grammar characteristics to guide the process. [sent-44, score-0.327]

23 We emphasize that we are learning to search using only maximum likelihood trees, not that we are doing unsupervised parsing. [sent-46, score-0.113]

24 44 1 Figure 1: Inside (grey) and outside (white) representations of an example chart edge Ni,j . [sent-47, score-0.542]

25 In the next section, we present prior work on approximate inference in parsing, and discuss how our method to learn optimal beam-search parameters unite many of their strengths into a single framework. [sent-49, score-0.098]

26 We then explore using our approach to open or close cells in the chart as an alternative to Roark and Hollingshead (2008; 2009). [sent-50, score-0.655]

27 Finally, we present results which combine cell closure and adaptive beam-width prediction to achieve the most efficient parser. [sent-51, score-0.58]

28 We use the term chart edge to refer to a non-terminal spanning a specific substring of the input sentence. [sent-60, score-0.586]

29 Let Ni,j denote the edge labeled with non-terminal N spanning wi,j, for example NP3,7. [sent-61, score-0.134]

30 We define an edge’s figure-of-merit (FOM) as an estimate of the product of its inside (β) and outside (α) scores, conceptually the relative merit the edge has to participate in the final parse tree (see Figure 1). [sent-62, score-0.317]

31 In this paper, we use a modified version of the Caraballo and Charniak Boundary FOM (1998) for local edge comparison, which computes αˆ (Ni,j) using POS forward-backward scores and POS-tononterminal constituent boundary transition probabilities. [sent-64, score-0.351]

32 We also note that in this paper we only use the FOM scoring function to rank constituents in a local agenda. [sent-67, score-0.124]

33 2 Agenda-based parsing Agenda-based parsers maintain a global agenda of edges, ranked by FOM score. [sent-71, score-0.339]

34 At each iteration, the highest-scoring edge is popped off of the agenda, added to the chart, and combined with other edges already in the chart. [sent-72, score-0.261]

35 The agenda-based approach includes best-first parsing (Bobrow, 1990) and A* parsing (Klein and Manning, 2003a), which differ in whether an admissible FOM estimate αˆ (Ni,j) is required. [sent-73, score-0.331]

36 A* uses an admissible FOM, and thus guarantees finding the maximum likelihood parse, whereas an inadmissible heuristic (best-first) may require less exploration of the search space. [sent-74, score-0.216]

37 Much work has been pursued in both admissible and inadmissible heuristics for agenda parsing (Caraballo and Charniak, 1998; Klein and Manning, 2003a; Pauls et al. [sent-75, score-0.392]

38 3 Beam-search parsing CYK parsing with a beam-search is a local pruning strategy, comparing edges within the same chart cell. [sent-80, score-0.96]

39 The beam-width can be defined in terms of a threshold in the number of edges allowed, or in terms of a threshold on the difference in probability relative to the highest scoring edge (Collins, 1999; Zhang et al. [sent-81, score-0.193]

40 Since the coarse grammar is quite small, parsing is much faster than with the fine grammar, and can quickly yield an estimate of the outside probability α(·) for use in subsequent agenda or d beea pmro-bseaabriclihty parsing rw uitshe t ihne sfuinbegrammar. [sent-88, score-0.764]

41 Building a coarse grammar from a fine grammar is a non-trivial problem, and most often approached with detailed knowledge of the fine grammar being used. [sent-90, score-0.591]

42 For example, Goodman (1997) suggests using a coarse grammar consisting of regular non-terminals, such as NP and VP, and then non-terminals augmented with head-word information for the more accurate second-pass grammar. [sent-91, score-0.245]

43 Petrov and Klein (2007a) derive coarse grammars in a more statistically principled way, although the technique is closely tied to their latent variable grammar representation. [sent-93, score-0.283]

44 To the extent that our cell-specific threshold classifier predicts that a chart cell should contain zero edges or more than zero edges, it is making coarse predictions about the unlabeled constituent structure of the target parse tree. [sent-94, score-1.277]

45 5 Chart Constraints Roark and Hollingshead (2008; 2009) introduced a pruning technique that ignores entire chart cells based on lexical and POS features of the input sentence. [sent-97, score-0.752]

46 They train two finite-state binary taggers: one that allows multi-word constituents to start at a word, and one that allows constituents to end at a word. [sent-98, score-0.18]

47 Given these tags, it is straightforward to completely skip many chart cells during processing. [sent-99, score-0.627]

48 In this paper, instead of tagging word positions to infer valid constituent spans, we classify chart cells directly. [sent-100, score-0.774]

49 We further generalize this cell classification to predict the beam-width of the chart cell, where a beam-width of zero indicates that the cell is completely closed. [sent-101, score-1.245]

50 1 Constituent Closure We first look at the binary classification ofchart cells as either open or closed to full constituents, and predict this value from the input sentence alone. [sent-104, score-0.341]

51 This is the same problem that Roark and Hollingshead (2008; 2009) solve with Chart Constraints; however, where they classify lexical items as either beginning or ending a constituent, we classify individual chart cells as open or closed, an approach we call Constituent Closure. [sent-105, score-0.713]

52 Although the number of classifications scales quadratically with our approach, the total parse time is still dominated by the O(n3 |G| ) parsing complexity and we find that the added l|eGv|e)l of specificity reduces the search space significantly. [sent-106, score-0.473]

53 To learn to classify a chart cell spanning words wi+1 . [sent-107, score-0.884]

54 The feature vector x is encoded with the chart cell’s absolute and relative span width, as well as unigram and bigram lexical and part-ofspeech tag items from wi−1 . [sent-111, score-0.457]

55 When predicting cell closure, all misclassifications are not equal. [sent-116, score-0.363]

56 If we leave open a cell which contains no edges in the maximum likelihood (ML) parse, we incur the cost of additional processing, but are still able to recover the ML tree. [sent-117, score-0.559]

57 However, if we close a chart cell which contains an ML edge, search errors occur. [sent-118, score-0.833]

58 During decoding, we assign the beam-width for chart cell spanning wi+1 . [sent-124, score-0.829]

59 Choosing tuhte t hnuism lebveerl lo off fc glarasnsiufilcaartiitoyn is s b ninost to minimize total parsing time is dependent on the FOM function and how it ranks ML edges. [sent-133, score-0.172]

60 8% of ML edges have a local rank less than five and we find that the added cost of computing b decision boundaries for each cell is not worth the added specificity. [sent-135, score-0.548]

61 We searched over possible classification bins and found that training four classifiers with beam-width decision boundaries at 0, 1, 2, and 4 is faster than 15 individual classifiers and more memory efficient, since each model θk has over 800,000 parameters. [sent-136, score-0.128]

62 All beam-width prediction results reported in this paper use these settings. [sent-137, score-0.088]

63 Figure 3 is a visual representation of beam-width prediction on a single sentence of the development set using the Berkeley latent-variable grammar and Boundary FOM. [sent-138, score-0.226]

64 In this figure, the gray scale represents the relative size ofthe beam-width, black being the maximum beam-width value, b, and the lightest gray being a beam-width of size one. [sent-139, score-0.156]

65 We can see from this figure that very few chart cells are classified as needing the full 15 edges, apart from span-1 cells which we do not classify. [sent-140, score-0.83]

66 The grey scale represents the size of the predicted beam-width: white is 0 (cell is skipped) and black is the maximum value b (b=15 in this example). [sent-142, score-0.089]

67 We preprocess the treebank by removing empty nodes, temporal labels, and spurious unary productions (X→X), as is standard in published wyo prrkosd on syntactic parsing. [sent-145, score-0.105]

68 The pruning methods we present in this paper can be used to parse with any grammar. [sent-146, score-0.243]

69 To achieve stateof-the-art accuracy levels, we parse with the Berkeley SM6 latent-variable grammar (Petrov and Klein, 2007b) where the original treebank non-terminals are automatically split into subclasses to optimize parsing accuracy. [sent-147, score-0.456]

70 Exhaustive CYK parsing with the grammar takes more than a minute per sentence. [sent-151, score-0.313]

71 Alternative decoding methods, such as marginalizing over the latent variables in the grammar or MaxRule decoding (Petrov and Klein, 2007a) are certainly possible in our framework, but it is unknown how effective these methods will be given the heavily pruned na446 ture of the chart. [sent-153, score-0.233]

72 We compute the precision and recall of constituents from the 1-best Viterbi trees using the standard EVALB script (? [sent-155, score-0.09]

73 6 Results We empirically demonstrate the advantages of our pruning methods by comparing the total parse time of each system, including FOM initialization, chart cell classification, and beam-width prediction. [sent-164, score-1.064]

74 The parse times reported for Chart Constraints do not include tagging times as we were provided with this pre-tagged data, but tagging all of Section 22 takes less than three seconds and we choose to ignore this contribution for simplicity. [sent-165, score-0.118]

75 As expected, the O(n3 |G|) beam-search begins to dominate as the senten|cGe length grows, b beugti Boundary FOM initialization is not cheap, and absorbs, on average, 20% of the total parse time. [sent-169, score-0.15]

76 Beam-width prediction, on the other hand, is almost negligible in terms of processing time even though it scales quadratically with the length of the sentence. [sent-170, score-0.114]

77 We compare the accuracy degradation of beamwidth prediction and Chart Constraints in Figure 5 as we incrementally tighten their respective pruning parameters. [sent-171, score-0.25]

78 We also include the baseline beamsearch parser with Boundary FOM in this figure to demonstrate the accuracy/speed trade-off of adjusting a global beam-width alone. [sent-172, score-0.114]

79 In Table 1we present the accuracy and parse time for three baseline parsers on the development set: exhaustive CYK parsing, beam-search parsing using only the inside score β(·), and beam-search parsing using hthee i Boundary F βO(·M),. [sent-174, score-0.616]

80 As expected, the relative speedup of these methods across the various baselines is similar since the open/closed cell classification does not change across parsers. [sent-176, score-0.395]

81 accuracy curves comparing beam-width prediction (Beam-Predict) and Chart Constraints. [sent-178, score-0.125]

82 also see that Complete Closure is between 22% and 31% faster than Constituent Closure, indicating that the greater number of cells closed translates directly into a reduction in parse time. [sent-179, score-0.416]

83 We can further apply beam-width prediction to the two beam-search baseline parsers in Table 1. [sent-180, score-0.138]

84 Dynamically adjusting the beam-width for the remaining open cells decreases parse time by an additional 25% when using the Inside FOM, and 28% with the boundary FOM. [sent-181, score-0.52]

85 Both our parser and the Berkeley parser are written in Java, both are run with Viterbi decoding, and both parse with the same grammar, so a direct comparison of speed and accuracy is fair. [sent-185, score-0.237]

86 We note that 3 of 2416 sentences fail to parse under these settings. [sent-187, score-0.118]

87 Using the ‘-accurate’ option provides a valid parse for all sentences, but increases parsing time of section 23 to 0. [sent-188, score-0.29]

88 We assume a back-off strategy for failed parses could be implemented to parse all sentences with a parsing time close to the default parameterization. [sent-190, score-0.29]

89 Beam-Search (Beam) parsing using the Berkeley latent-variable grammar. [sent-191, score-0.138]

90 Furthermore, our pruning method is trained using only maximum likelihood trees, allowing it to be tuned to specific domains without labeled data. [sent-193, score-0.192]

91 Using this framework, we have shown that we can decrease parsing time by 65% over a standard beam-search without any loss in accuracy, and parse significantly faster than both the Berkeley parser and Chart Constraints. [sent-194, score-0.409]

92 On the other hand, our parser currently spends 20% of the total parse time initializing the FOM, and adding additional preprocessing costs, such as parsing with a coarse grammar, may not outweigh the benefits gained in the final search. [sent-198, score-0.412]

93 Second, as with Chart Constraints we do not prune lexical or unary edges in the span-1 chart cells (i. [sent-199, score-0.763]

94 We expect pruning entries in these cells would notably reduce parse time since they cause exponentially many chart edges to be built in larger spans. [sent-202, score-1.005]

95 Initial work constraining span-1 chart cells has promising results (Bodenstab et al. [sent-203, score-0.627]

96 Finally, the size and structure of the grammar is the single largest contributor to parse efficiency. [sent-206, score-0.282]

97 In contrast to the current paradigm, we plan to investigate new algorithms that jointly optimize accuracy and efficiency during grammar induction, leading to more efficient decoding. [sent-207, score-0.175]

98 New figures of merit for best-first probabilistic chart parsing. [sent-222, score-0.453]

99 The importance of syntactic parsing and inference in semantic role labeling. [sent-301, score-0.173]

100 Exploiting full parsing information to label semantic roles using an ensemble of ME and SVM via integer linear programming. [sent-313, score-0.138]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('chart', 0.424), ('fom', 0.411), ('cell', 0.363), ('cells', 0.203), ('agenda', 0.151), ('grammar', 0.138), ('parsing', 0.138), ('closure', 0.129), ('pruning', 0.125), ('cyk', 0.123), ('constituent', 0.118), ('parse', 0.118), ('petrov', 0.111), ('boundary', 0.107), ('hollingshead', 0.105), ('roark', 0.104), ('edges', 0.101), ('edge', 0.092), ('berkeley', 0.09), ('constituents', 0.09), ('prediction', 0.088), ('coarse', 0.081), ('ml', 0.078), ('klein', 0.074), ('kristy', 0.074), ('caraballo', 0.073), ('width', 0.067), ('charniak', 0.067), ('bodenstab', 0.064), ('admissible', 0.055), ('inside', 0.052), ('closed', 0.051), ('parsers', 0.05), ('exhaustive', 0.049), ('brian', 0.049), ('agendas', 0.048), ('inadmissible', 0.048), ('timing', 0.048), ('fine', 0.048), ('wj', 0.048), ('search', 0.046), ('productions', 0.045), ('faster', 0.044), ('slav', 0.043), ('beamsearch', 0.043), ('popped', 0.043), ('tsai', 0.043), ('spanning', 0.042), ('parser', 0.041), ('likelihood', 0.041), ('scales', 0.041), ('punyakanok', 0.039), ('gray', 0.039), ('quadratically', 0.039), ('operating', 0.038), ('thresholds', 0.037), ('grammars', 0.037), ('approximate', 0.037), ('grey', 0.037), ('minute', 0.037), ('constraints', 0.037), ('ann', 0.037), ('accuracy', 0.037), ('zero', 0.036), ('unary', 0.035), ('inference', 0.035), ('local', 0.034), ('decoding', 0.034), ('loss', 0.034), ('time', 0.034), ('matsuzaki', 0.034), ('span', 0.033), ('complexity', 0.032), ('classification', 0.032), ('initialization', 0.032), ('deterministic', 0.032), ('pauls', 0.031), ('viterbi', 0.031), ('adjusting', 0.03), ('asymmetric', 0.03), ('nathan', 0.03), ('dan', 0.03), ('classify', 0.029), ('imbalance', 0.029), ('eugene', 0.029), ('merit', 0.029), ('open', 0.028), ('wi', 0.028), ('substring', 0.028), ('latent', 0.027), ('predict', 0.027), ('outside', 0.026), ('accurate', 0.026), ('maximum', 0.026), ('learn', 0.026), ('classifiers', 0.026), ('size', 0.026), ('added', 0.025), ('tuples', 0.025), ('treebank', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

2 0.44623733 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

Abstract: We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning.

3 0.18433012 285 acl-2011-Simple supervised document geolocation with geodesic grids

Author: Benjamin Wing ; Jason Baldridge

Abstract: We investigate automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and it is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evidence. All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds. For Wikipedia, our best method obtains a median prediction error of just 11.8 kilometers. Twitter geolocation is more challenging: we obtain a median error of 479 km, an improvement on previous results for the dataset.

4 0.17433123 298 acl-2011-The ACL Anthology Searchbench

Author: Ulrich Schafer ; Bernd Kiefer ; Christian Spurk ; Jorg Steffen ; Rui Wang

Abstract: We describe a novel application for structured search in scientific digital libraries. The ACL Anthology Searchbench is meant to become a publicly available research tool to query the content of the ACL Anthology. The application provides search in both its bibliographic metadata and semantically analyzed full textual content. By combining these two features, very efficient and focused queries are possible. At the same time, the application serves as a showcase for the recent progress in natural language processing (NLP) research and language technology. The system currently indexes the textual content of 7,500 anthology papers from 2002–2009 with predicateargument-like semantic structures. It also provides useful search filters based on bibliographic metadata. It will be extended to provide the full anthology content and en- . hanced functionality based on further NLP techniques. 1 Introduction and Motivation Scientists in all disciplines nowadays are faced with a flood of new publications every day. In addition, more and more publications from the past become digitally available and thus even increase the amount. Finding relevant information and avoiding duplication of work have become urgent issues to be addressed by the scientific community. The organization and preservation of scientific knowledge in scientific publications, vulgo text documents, thwarts these efforts. From a viewpoint of 7 dfki .de / lt a computer scientist, scientific papers are just ‘unstructured information’ . At least in our own scientific community, Computational Linguistics, it is generally assumed that NLP could help to support search in such document collections. The ACL Anthology1 is a comprehensive elec- tronic collection of scientific papers in our own field (Bird et al., 2008). It is updated regularly with new publications, but also older papers have been scanned and are made available electronically. We have implemented the ACL Anthology Searchbench2 for two reasons: Our first aim is to provide a more targeted search facility in this collection than standard web search on the anthology website. In this sense, the Searchbench is meant to become a service to our own community. Our second motivation is to use the developed system as a showcase for the progress that has been made over the last years in precision-oriented deep linguistic parsing in terms of both efficiency and coverage, specifically in the context of the DELPHIN community3. Our system also uses further NLP techniques such as unsupervised term extraction, named entity recognition and part-of-speech (PoS) tagging. By automatically precomputing normalized semantic representations (predicate-argument structure) of each sentence in the anthology, the search space is structured and allows to find equivalent or related predicates even if they are expressed differ- 1http : / /www . aclweb .org/ anthology 2http : //aclasb . dfki . de 3http : / /www . de lph-in . net – DELPH-IN stands for DEep Linguistic Processing with HPSG INitiative. Portland,P Orroecge ondi,n UgSsA o,f 2 th1e J AunCeL 2-H0L1 T. 2 ?0c 1210 1S1ys Atesmso Dcieamtio n s ftorart Cio nms,p puatgaetiso 7n–al1 L3i,nguistics ently, e.g. in passive constructions, using synonyms, etc. By storing the semantic sentence structure along with the original text in a structured full-text search engine, it can be guaranteed that recall cannot fall behind the baseline of a fulltext search. In addition, the Searchbench also provides detailed bibliographic metadata for filtering as well as autosuggest texts for input fields computed from the corpus two further key features one can expect from such systems today, nevertheless very important for efficient search in digital libraries. We describe the offline preprocessing and deep parsing approach in Section 2. Section 3 concentrates on the generation of the semantic search index. In Section 4, we describe the search interface. We conclude in Section 5 and present an outlook to future extensions. – 2 Parsing the ACL Anthology The basis of the search index for the ACL Anthology are its original PDF documents, currently 8,200 from the years 2002 through 2009. To overcome quality problems in text extraction from PDF, we use a commercial PDF extractor based on OCR techniques. This approach guarantees uniform and highquality textual representations even from older papers in the anthology (before 2000) which mostly were scanned from printed paper versions. The general idea of the semantics-oriented access to scholarly paper content is to parse each sentence they contain with the open-source HPSG (Pollard and Sag, 1994) grammar for English (ERG; Flickinger (2002)) and then distill and index semantically structured representations for search. To make the deep parser robust, it is embedded in a NLP workflow. The coverage (percentage of full deeply parsed sentences) on the anthology corpus could be increased from 65 % to now more than 85 % through careful combination of several robustness techniques; for example: (1) chart pruning, directed search during parsing to increase per- formance, and also coverage for longer sentences (Cramer and Zhang, 2010); (2) chart mapping, a novel method for integrating preprocessing information in exactly the way the deep grammar expects it (Adolphs et al., 2008); (3) new version of the ERG with better handling of open word classes; (4) 8 more fine-grained named entity recognition, including recognition of citation patterns; (5) new, better suited parse ranking model (WeScience; Flickinger et al. (2010)). Because of limited space, we will focus on (1) and (2) below. A more detailed description and further results are available in (Sch a¨fer and Kiefer, 2011). Except for a small part of the named entity recognition components (citations, some terminology) and the parse ranking model, there are no further adaptations to genre or domain of the text corpus. This implies that the NLP workflow could be easily and modularly adapted to other (scientific or nonscientific) domains—mainly thanks to the generic and comprehensive language modelling in the ERG. The NLP preprocessing component workflow is implemented using the Heart of Gold NLP middleware architecture (Sch a¨fer, 2006). It starts with sentence boundary detection (SBR) and regular expression-based tokenization using its built-in component JTok, followed by the trigram-based PoS tagger TnT (Brants, 2000) trained on the Penn Treebank (Marcus et al., 1993) and the named entity recognizer SProUT (Dro z˙d z˙y n´ski et al., 2004). 2.1 Precise Preprocessing Integration with Chart Mapping Tagger output is combined with information from the named entity recognizer, e.g. delivering hypothetical information on citation expressions. The combined result is delivered as input to the deep parser PET (Callmeier, 2000) running the ERG. Here, citations, for example, can be treated as either persons, locations or appositions. Concerning punctuation, the ERG can make use of information on opening and closing quotation marks. Such information is often not explicit in the input text, e.g. when, as in our setup, gained through OCR which does not distinguish between ‘ and ’ or “ and However, a tokenizer can often guess (recon- ”. struct) leftness and rightness correctly. This information, passed to the deep parser via chart mapping, helps it to disambiguate. 2.2 Increased Processing Speed and Coverage through Chart Pruning In addition to a well-established discriminative maximum entropy model for post-analysis parse selection, we use an additional generative model as described in Cramer and Zhang (2010) to restrict the search space during parsing. This restriction increases efficiency, but also coverage, because the parse time was restricted to at most 60 CPU seconds on a standard PC, and more sentences could now be parsed within these bounds. A 4 GB limit for main memory consumption was far beyond what was ever needed. We saw a small but negligible decrease in parsing accuracy, 5.4 % best parses were not found due to the pruning of important chart edges. Ninomiya et al. (2006) did a very thorough comparison ofdifferent performance optimization strategies, and among those also a local pruning strategy similar to the one used here. There is an important difference between the systems, in that theirs works on a reduced context-free backbone first and reconstructs the results with the full grammar, while PET uses the HPSG grammar directly, with subsumption packing and partial unpacking to achieve a similar effect as the packed chart of a context-free parser. sentence length −→ Figure 1: Distribution of sentence length and mean parse times for mild pruning In total, we parsed 1,537,801 sentences, of which 57,832 (3.8 %) could not be parsed because of lexicon errors. Most of them were caused by OCR ar- tifacts resulting in unexpected punctuation character combinations. These can be identified and will be deleted in the future. Figure 1 displays the average parse time of processing with a mild chart pruning setting, together with the mean quadratic error. In addition, it contains the distribution of input sentences over sentence length. Obviously, the vast majority of sen9 tences has a length of at most 60 words4. The parse times only grow mildly due to the many optimization techniques in the original system, and also the new chart pruning method. The sentence length distribution has been integrated into Figure 1 to show that the predominant part of our real-world corpus can be processed using this information-rich method with very low parse times (overall average parse time < 2 s per sentence). The large amount of short inputs is at first surprising, even more so that most of these inputs can not be parsed. Most of these inputs are non-sentences such as headings, enumerations, footnotes, table cell content. There are several alternatives to deal with such input, one to identify and handle them in a preprocessing step, another to use a special root condition in the deep analysis component that is able to combine phrases with well-defined properties for inputs where no spanning result could be found. We employed the second method, which has the advantage that it handles a larger range of phenomena in a homogeneous way. Figure 2 shows the change in percentage of unparsed and timed out inputs for the mild pruning method with and without the root condition combining fragments. sentence length −→ Figure 2: Unparsed and timed out sentences with and without fragment combination Figure 2 shows that this changes the curve for unparsed sentences towards more expected characteristics and removes the uncommonly high percentage of short sentences for which no parse can be computed. Together with the parses for fragmented 4It has to be pointed out that extremely long sentences also may be non-sentences resulting from PDF extraction errors, missing punctuation etc. No manual correction took place. Figure 3: Multiple semantic tuples may be generated for a sentence input, we get a recall (sentences with at least one parse) over the whole corpus of 85.9 % (1,321,336 sentences), without a significant change for any of the other measures, and with potential for further improvement. 3 Semantic Tuple Extraction with DMRS In contrast to shallow parsers, the ERG not only handles detailed syntactic analyses of phrases, com- pounds, coordination, negation and other linguistic phenomena that are important for extracting semantic relations, but also generates a formal semantic representation of the meaning of the input sentence in the Minimal Recursion Semantics (MRS) representation format (Copestake et al., 2005). It consists of elementary predications for each word and larger constituents, connected via argument positions and variables, from which predicate-argument structure can be extracted. MRS representations resulting from deep parsing are still relatively close to linguistic structures and contain more detailed information than a user would like to query and search for. Therefore, an additional extraction and abstraction step is performed before storing semantic structures in the search index. Firstly, MRS is converted to DMRS (Copestake, 2009), a dependency-style version of MRS that eases extraction of predicate-argument structure using the implementation in LKB (Copestake, 2002). The representation format we devised for the search index we call semantic tuples, in fact quintuples

5 0.14394943 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

Author: Markos Mylonakis ; Khalil Sima'an

Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.

6 0.1365934 282 acl-2011-Shift-Reduce CCG Parsing

7 0.13613625 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

8 0.11671352 44 acl-2011-An exponential translation model for target language morphology

9 0.11416382 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

10 0.11400967 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

11 0.11206255 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

12 0.11063942 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

13 0.10386331 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

14 0.10369589 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

15 0.10055373 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

16 0.0997715 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

17 0.097575188 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

18 0.09557277 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

19 0.095336154 333 acl-2011-Web-Scale Features for Full-Scale Parsing

20 0.091816075 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.226), (1, -0.086), (2, -0.04), (3, -0.241), (4, -0.036), (5, -0.063), (6, -0.101), (7, 0.037), (8, -0.044), (9, -0.044), (10, -0.03), (11, 0.066), (12, 0.027), (13, -0.021), (14, -0.024), (15, 0.061), (16, -0.008), (17, -0.006), (18, 0.053), (19, -0.015), (20, 0.182), (21, 0.022), (22, -0.007), (23, -0.106), (24, -0.124), (25, 0.219), (26, 0.018), (27, -0.049), (28, 0.013), (29, -0.046), (30, -0.02), (31, 0.201), (32, -0.006), (33, -0.124), (34, 0.186), (35, -0.105), (36, -0.009), (37, 0.036), (38, 0.163), (39, -0.147), (40, 0.081), (41, 0.032), (42, 0.103), (43, 0.015), (44, 0.145), (45, -0.017), (46, 0.162), (47, -0.096), (48, 0.04), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96741056 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

same-paper 2 0.947375 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

3 0.75086457 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.

4 0.68212414 298 acl-2011-The ACL Anthology Searchbench

Author: Ulrich Schafer ; Bernd Kiefer ; Christian Spurk ; Jorg Steffen ; Rui Wang

5 0.55025506 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

Author: Elias Ponvert ; Jason Baldridge ; Katrin Erk

Abstract: We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing—the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer’s (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.

6 0.53192592 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

7 0.50968504 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

8 0.50575769 285 acl-2011-Simple supervised document geolocation with geodesic grids

9 0.49086282 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

10 0.48250604 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

11 0.48051634 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

12 0.45901847 267 acl-2011-Reversible Stochastic Attribute-Value Grammars

13 0.45530033 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

14 0.44273868 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems

15 0.43836477 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

16 0.39543146 192 acl-2011-Language-Independent Parsing with Empty Elements

17 0.39047533 282 acl-2011-Shift-Reduce CCG Parsing

18 0.35894838 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

19 0.35616913 176 acl-2011-Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results

20 0.35547915 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.058), (17, 0.04), (26, 0.017), (37, 0.116), (39, 0.103), (41, 0.083), (55, 0.045), (59, 0.038), (72, 0.03), (73, 0.173), (87, 0.021), (91, 0.049), (96, 0.125)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85769135 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look

Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder

Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1

same-paper 2 0.85336637 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

3 0.80035198 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

4 0.77733642 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

Author: Preslav Nakov ; Hwee Tou Ng

Abstract: We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show signif- icant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).

5 0.76724643 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang

Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.

6 0.75578725 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

7 0.75531214 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

8 0.75181842 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

9 0.75130028 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

10 0.7472235 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

11 0.74552166 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

12 0.74296701 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

13 0.74222201 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

14 0.74215227 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

15 0.74087441 182 acl-2011-Joint Annotation of Search Queries

16 0.74029148 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

17 0.73922765 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

18 0.73912132 178 acl-2011-Interactive Topic Modeling

19 0.73523962 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

20 0.73401141 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments