acl acl2011 acl2011-316 knowledge-graph by maker-knowledge-mining

316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Source: pdf

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

Abstract: We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. [sent-3, score-1.756]

2 Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. [sent-4, score-0.915]

3 We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. [sent-5, score-1.096]

4 Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. [sent-6, score-0.837]

5 We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning. [sent-7, score-0.267]

6 Graph-based pruning methods such as best-first and beam-search have both be used within context-free parsers to increase their efficiency. [sent-9, score-0.249]

7 Pipeline systems make use of simpler models to reduce the search space of the full model. [sent-10, score-0.054]

8 For example, the well-known Charniak parser (Charniak, 2000) uses a simple grammar to prune the search space for a richer model in a second pass. [sent-11, score-0.338]

9 edu Roark and Hollingshead (2008; 2009) have recently shown that using a finite-state tagger to close cells within the CKY chart can reduce the worst-case and average-case complexity of context-free parsing, without reducing accuracy. [sent-14, score-0.69]

10 In their work, word positions are classified as beginning and/or ending multi-word constituents, and all chart cells not conforming to these constraints can be pruned. [sent-15, score-0.748]

11 (201 1) both extend this approach by classifying chart cells with a finer granularity. [sent-18, score-0.61]

12 Pruning based on constituent span is straightforwardly applicable to all parsing architectures, yet the methods mentioned above only consider spans of length two or greater. [sent-19, score-0.141]

13 Lexical and unary productions spanning a single word are never pruned, and these can, in many cases, contribute significantly to the parsing effort. [sent-20, score-0.871]

14 In this paper, we investigate complementary methods to prune chart cells with finite-state preprocessing. [sent-21, score-0.65]

15 Informally, we use a tagger to restrict the number of unary productions with nonterminals on the right-hand side that can be included in cells spanning a single word. [sent-22, score-1.219]

16 We term these single word constituents (SWCs) (see Section 2 for a formal definition). [sent-23, score-0.076]

17 Disallowing SWCs alters span-1 cell population from potentially containing all nonterminals to just pre-terminal part-of-speech (POS) non-terminals. [sent-24, score-0.166]

18 In practice, this decreases the number of active states in span-1 chart cells by 70%, significantly reducing the number of allowable con- stituents in larger spans. [sent-25, score-0.784]

19 Span-1 chart cells are also the most frequently queried cells in the CKY algorithm. [sent-26, score-0.956]

20 The search over possible midpoints will always include two cells spanning a single word one as the first left child and one as the last right child. [sent-27, score-0.502]

21 It is therefore critical that the number of active states Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-28, score-0.148]

22 The black cells in (c) indicate CKY chart cells containing a single-word constituent from the transformed tree. [sent-31, score-1.068]

23 in these cells be minimized so that the number of grammar access requests is also minimized. [sent-32, score-0.557]

24 Note, however, that some methods of grammar access such as scanning through the rules of a grammar and looking for matches in the chart achieve less of a speedup from diminished cell population than others, something we investigate in this paper. [sent-33, score-0.74]

25 Importantly, our method is orthogonal to prior work on tagging chart constraints and we expect efficiency gains to be additive. [sent-34, score-0.554]

26 In what follows, we will demonstrate that a finite-state tagger can learn, with high accuracy, which span-1 chart cells can be – – closed to SWCs, and how such pruning can increase the efficiency of context-free parsing. [sent-35, score-0.924]

27 We define a single word constituent (SWC) unary production as any production A → B ∈ P such that pAr ∈ VPHR a ansd a nAy spans (derives) a single w suorcdh. [sent-37, score-0.734]

28 Ahant example SWC unary production, VP → VBP, can be seen in Figure 1b. [sent-38, score-0.462]

29 Note that ROOT → SBAR and RB → “quickly” in Figure 1b are als→o unary productions, buuictk by ”d ienfin Fiitgiounre they are naolsto S uWnaCr unary productions. [sent-39, score-0.924]

30 One implementation detail necessary to leverage the benefits of sparsely populated chart cells is the 677 grammar access method used by the inner loop of the CKY algorithm. [sent-40, score-0.899]

31 With the cross-product approach, fewer active states in either child cell leads to fewer grammar access operations. [sent-42, score-0.463]

32 Thus, pruning constituents in lower cells directly affects the overall efficiency of parsing. [sent-43, score-0.657]

33 On the other hand, with the grammar loop method there is a constant number of grammar access operations (i. [sent-44, score-0.456]

34 , the number of grammar rules) and the number of active states in each child cell has no impact on efficiency. [sent-46, score-0.446]

35 Therefore, with the grammar loop implementation of the CYK algorithm, pruning techniques such as unary constraints will have very little impact on the final run-time efficiency ofthe parser. [sent-47, score-1.107]

36 We will report results in Section 5 with parsers using both approaches. [sent-48, score-0.069]

37 3 Treebank Unary Productions In this section, we discuss the use of unary productions both in the Penn WSJ treebank (Marcus et al. [sent-49, score-0.835]

38 , 1999) and during parsing by analyzing their function and frequency. [sent-50, score-0.077]

39 All statistics reported here are computed from sections 2-21 of the treebank. [sent-51, score-0.033]

40 A common pre-processing step in treebank parsing is to transform the original WSJ treebank before training and evaluation. [sent-52, score-0.322]

41 Table 1: Unary production counts from sections 2-21 of the original and transformed WSJ treebank. [sent-60, score-0.24]

42 ibility in this process, but most pre-processing efforts include (1) affixing a ROOT unary production to the root symbol of the original tree, (2) removal of empty nodes, and (3) striping functional tags and cross-referencing annotations. [sent-63, score-0.649]

43 For this paper we only apply transforms 1-3 and otherwise leave the treebank in its original form. [sent-66, score-0.166]

44 We also note that ROOT unaries are a special case that do not affect search, and we choose to ignore them for the remainder of this paper. [sent-67, score-0.079]

45 These tree transformations have a large impact on the number and type of unary productions in the treebank. [sent-68, score-0.779]

46 Table 1 displays the absolute counts of unaries in the treebank before and after processing. [sent-69, score-0.219]

47 Multi-word constituent unary productions in the original treebank are rare and used primarily to mark quantifier phrases as noun phrases. [sent-70, score-0.924]

48 But due to the removal of empty nodes, the transformed treebank contains many more unary productions that span multiple words, such as S → VP, where the noun phrase was left unspecified i→n the original clause. [sent-71, score-0.936]

49 The number of SWC unaries is relatively unchanged after processing the original treebank, but note that only 11. [sent-72, score-0.104]

50 2% of words in the transformed treebank are covered by SWCs. [sent-73, score-0.158]

51 This implies that we are unnecessarily adding SWC productions to almost 90% of span-1 chart cells during search. [sent-74, score-0.873]

52 One may argue that an unsmoothed grammar will naturally disallow most SWC productions since they are never observed in the training data, for example 678 |SAVWcPtOHCivRSe|grVaPmHORmSsatr eusl M1524k. [sent-75, score-0.487]

53 2956M1k,2+42765 S0L9a1t,528e 75n2 8t Table 2: Grammar statistics and averaged span-1 active state counts for exhaustive parsing of section 24 using a Markov order-2 (Mk2), a smoothed Markov order-2 (Mk2+S), and the Berkeley latent variable (Latent) grammars. [sent-76, score-0.299]

54 This is true to some extent, but grammars induced from the WSJ treebank are notorious for over-generation. [sent-78, score-0.189]

55 In addition, state-of-the-art accuracy in context-free parsing is often achieved by smoothing the grammar, so that rewrites from any one non-terminal to another are permissible, albeit with low probability. [sent-79, score-0.101]

56 To empirically evaluate the impact of SWCs on span-1 chart cells, we parse the development set (section 24) with three different grammars induced from sections 2-21. [sent-80, score-0.428]

57 Table 2 lists averaged counts of active Viterbi states (derivations with probability greater than zero) from span-1 cells within the dynamic programming chart, as well as relevant grammar statistics. [sent-81, score-0.716]

58 Note that these counts are extracted from exhaustive parsing no pruning has been applied. [sent-82, score-0.355]

59 First, although |VPOS| > |VPHR |, for the unsmoothed grammars more phrase-level states are a ucntisvme owoitthheind the span-1 cells than states derived from POS tags. [sent-84, score-0.6]

60 When parsing with the Markov order-2 grammar, – 70% of active states are non-terminals from VPHR, and with the latent-variable grammar, 67% (152 of 227). [sent-85, score-0.225]

61 Second, although using a smoothed grammar maximizes the number of active states, the unsmoothed grammars still provide many possible derivations per word. [sent-87, score-0.39]

62 Given the infrequent use of SWCs in the treebank, and the search-space explosion incurred by including them in exhaustive search, it is clear that restricting SWCs in contexts where they are unlikely to occur has the potential for large efficiency gains. [sent-88, score-0.173]

63 4 Tagging Unary Constraints To automatically predict if word wi from sentence w can be spanned by an SWC production, we train a binary classifier from supervised data using sections 2-21 of the Penn WSJ Treebank for training, section 00 as heldout, and section 24 as development. [sent-90, score-0.102]

64 The class labels of all words in the training data are extracted from the treebank, where wi ∈ U if wi is observed with a SWC production and wi ∈ U if o wther- wise. [sent-91, score-0.311]

65 We train a log linear model with the∈ averaged perceptron algorithm (Collins, 2002) using unigram word and POS-tag2 features from a five word window. [sent-92, score-0.057]

66 We also trained models with bi-gram and trigram features, but tagging accuracy did not improve. [sent-93, score-0.055]

67 Because the classifier output is imposing hard constraints on the search space of the parser, we may want to choose a tagger operating point that favors precision over recall to avoid over-constraining the downstream parser. [sent-94, score-0.305]

68 5e simply cfh(o·)o isses a the most likely class, but to increase precision we can move this threshold to favor U over U. [sent-97, score-0.025]

69 If the total number of words wi with P(U|wi, θ) < 0. [sent-99, score-0.069]

70 Similar methods are used to replicate cellclosing constraints, which are combined with unary constraints in the next section. [sent-108, score-0.653]

71 , 2011), a single-pass beam-search parser with a figure-of-merit constituent ranking function. [sent-111, score-0.141]

72 The Berkeley and BUBS parsers both parse with the Berkeley latent-variable grammar (Petrov and Klein, 2007b), while the Charniak parser uses a lexicalized grammar, and the exhaustive CKY algorithm is run with a simple Markov order-2 grammar. [sent-112, score-0.431]

73 All grammars are induced from the same data: sections 2-21 of the WSJ treebank. [sent-113, score-0.112]

74 Figure 2 contrasts the merit of unary constraints on the three high-accuracy parsers, and several interesting comparisons emerge. [sent-114, score-0.6]

75 First, as recall is traded for precision within the tagger, each parser reacts quite differently to the imposed constraints. [sent-115, score-0.108]

76 We apply constraints to the Berkeley parser during the initial coarse-pass search, which is simply an exhaustive CKY search with a coarse grammar. [sent-116, score-0.362]

77 Applying unary and cell-closing constraints at this point in the coarse-to-fine pipeline speeds up the initial coarsepass significantly, which accounted for almost half of the total parse time in the Berkeley parser. [sent-117, score-0.656]

78 In addition, all subsequent fine-pass searches also benefit from additional pruning as their search is guided by the remaining constituents of the previous pass, which is the intersection of standard coarse-to-fine pruning and our imposed constraints. [sent-118, score-0.471]

79 We apply constraints to the Charniak parser during the first-pass agenda-based search. [sent-119, score-0.215]

80 Because an agenda-based search operates at a constituent level instead of a cell/span level, applying unary constraints alters the search frontier instead of reducing the absolute number of constituents placed in the chart. [sent-120, score-0.917]

81 Wejointly tune lambda and the internal search parameters of the Charniak parser until accuracy degrades. [sent-121, score-0.155]

82 Application of constraints to the CKY and BUBS parsers is straightforward as they are both single pass parsers any constituent violating the constraints is pruned. [sent-122, score-0.478]

83 We also note that the CKY and – Figure 2: Development set results applying unary constraints at multiple values of λ to three parsers. [sent-123, score-0.6]

84 BUBS parsers both employ the cross-product grammar access method discussed in Section 2, while the Berkeley parser uses the grammar loop method. [sent-124, score-0.602]

85 This grammar access difference dampens the benefit of unary constraints for the Berkeley parser. [sent-125, score-0.811]

86 3 Referring back to Figure 2, we see that both speed and accuracy increase in all but the Berkeley parser. [sent-126, score-0.049]

87 Although it is unusual that pruning leads to higher accuracy during search, it is not unexpected here as our finite-state tagger makes use of lexical relationships that the PCFG does not. [sent-127, score-0.233]

88 By leveraging this new information to constrain the search space, we are indirectly improving the quality of the model. [sent-128, score-0.089]

89 Finally, there is an obvious operating point for each parser at which the unary constraints are too severe and accuracy deteriorates rapidly. [sent-129, score-0.752]

90 For test conditions, we set the tuning parameter λ based on the development set results to prune as much of the search space as possible before reaching this degradation point. [sent-130, score-0.094]

91 We see that in all cases, unary constraints improve the efficiency of parsing without significant accuracy loss. [sent-132, score-0.781]

92 As one might expect, exhaustive CKY parsing benefits the most from unary constraints since no other pruning is applied. [sent-133, score-0.925]

93 But even heavily pruned parsers using graph-based and pipelining techniques still see substantial speedups 3The Berkeley parser does maintain meta-information about where non-terminals have been placed in the chart, giving it some of the advantages of cross-product grammar access. [sent-134, score-0.392]

94 680 Table 3 : Test set results applying unary constraints (UC) and cell-closing (CC) constraints (Roark and Hollingshead, 2008) to various parsers. [sent-135, score-0.738]

95 Furthermore, unary constraints consistently provide an additive efficiency gain when combined with cellclosing constraints. [sent-137, score-0.733]

96 6 Conclusion We have presented a new method to constrain context-free chart parsing and have shown it to be orthogonal to many forms of graph-based and pipeline pruning methods. [sent-138, score-0.603]

97 In addition, our method parallels the cell closing paradigm and is an elegant complement to recent work, providing a finite-state tagging framework to potentially constrain all areas of the search space both multi-word and single-word constituents. [sent-139, score-0.249]

98 Statistical parsing with a context-free grammar and word statistics. [sent-150, score-0.244]

99 Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. [sent-170, score-0.032]

100 Parsing with treebank grammars: Empirical bounds, theoretical models, and the structure of the Penn treebank. [sent-176, score-0.11]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('unary', 0.462), ('cells', 0.346), ('chart', 0.264), ('productions', 0.263), ('swc', 0.211), ('cky', 0.202), ('grammar', 0.167), ('swcs', 0.158), ('pruning', 0.155), ('constraints', 0.138), ('vphr', 0.132), ('roark', 0.127), ('treebank', 0.11), ('bubs', 0.105), ('production', 0.104), ('berkeley', 0.098), ('exhaustive', 0.093), ('bodenstab', 0.093), ('charniak', 0.083), ('efficiency', 0.08), ('unaries', 0.079), ('vpos', 0.079), ('loop', 0.078), ('parsing', 0.077), ('parser', 0.077), ('hollingshead', 0.076), ('constituents', 0.076), ('states', 0.074), ('active', 0.074), ('cell', 0.071), ('parsers', 0.069), ('wi', 0.069), ('spanning', 0.069), ('brian', 0.067), ('constituent', 0.064), ('wsj', 0.063), ('kristy', 0.06), ('unsmoothed', 0.057), ('tagger', 0.054), ('search', 0.054), ('cellclosing', 0.053), ('pipelining', 0.053), ('grammars', 0.049), ('transformed', 0.048), ('access', 0.044), ('petrov', 0.044), ('derivations', 0.043), ('alters', 0.043), ('cocke', 0.043), ('orthogonal', 0.041), ('dunlop', 0.04), ('disallowing', 0.04), ('prune', 0.04), ('constrain', 0.035), ('architectures', 0.035), ('closing', 0.034), ('collapsing', 0.034), ('child', 0.033), ('nathan', 0.033), ('sections', 0.033), ('markov', 0.032), ('substrings', 0.032), ('perceptron', 0.032), ('tagging', 0.031), ('pipeline', 0.031), ('imposed', 0.031), ('downstream', 0.031), ('transforms', 0.031), ('penn', 0.031), ('aaron', 0.03), ('root', 0.03), ('induced', 0.03), ('vp', 0.03), ('counts', 0.03), ('klein', 0.029), ('removal', 0.028), ('operating', 0.028), ('transformations', 0.027), ('population', 0.027), ('impact', 0.027), ('park', 0.027), ('reducing', 0.026), ('pruned', 0.026), ('original', 0.025), ('association', 0.025), ('averaged', 0.025), ('nonterminals', 0.025), ('parse', 0.025), ('increase', 0.025), ('paradigm', 0.024), ('accuracy', 0.024), ('pcfg', 0.024), ('slav', 0.024), ('deteriorates', 0.023), ('tolerable', 0.023), ('cfh', 0.023), ('bodenst', 0.023), ('byung', 0.023), ('gyu', 0.023), ('nectar', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

2 0.44623733 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.

3 0.23976578 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea

Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.

4 0.14361057 282 acl-2011-Shift-Reduce CCG Parsing

Author: Yue Zhang ; Stephen Clark

Abstract: CCGs are directly compatible with binarybranching bottom-up parsing algorithms, in particular CKY and shift-reduce algorithms. While the chart-based approach has been the dominant approach for CCG, the shift-reduce method has been little explored. In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C.; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result.

5 0.13610165 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

Author: Michael Auli ; Adam Lopez

Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.

6 0.12634532 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

7 0.12191756 298 acl-2011-The ACL Anthology Searchbench

8 0.10410403 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

9 0.10244944 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

10 0.09705884 285 acl-2011-Simple supervised document geolocation with geodesic grids

11 0.0969804 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

12 0.093844227 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

13 0.089895882 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

14 0.086445332 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

15 0.08314874 192 acl-2011-Language-Independent Parsing with Empty Elements

16 0.082722068 44 acl-2011-An exponential translation model for target language morphology

17 0.08168719 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

18 0.080037899 333 acl-2011-Web-Scale Features for Full-Scale Parsing

19 0.080030777 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

20 0.07997717 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, -0.094), (2, -0.041), (3, -0.236), (4, -0.045), (5, -0.063), (6, -0.114), (7, 0.023), (8, -0.057), (9, -0.06), (10, -0.041), (11, 0.075), (12, 0.022), (13, 0.009), (14, 0.001), (15, 0.044), (16, 0.002), (17, 0.014), (18, 0.039), (19, 0.011), (20, 0.176), (21, 0.04), (22, -0.045), (23, -0.121), (24, -0.14), (25, 0.254), (26, 0.032), (27, -0.039), (28, 0.019), (29, -0.051), (30, 0.027), (31, 0.207), (32, -0.019), (33, -0.134), (34, 0.173), (35, -0.066), (36, 0.031), (37, 0.066), (38, 0.176), (39, -0.134), (40, 0.058), (41, 0.058), (42, 0.12), (43, -0.014), (44, 0.131), (45, 0.012), (46, 0.151), (47, -0.092), (48, 0.047), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96678126 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

2 0.90252763 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

3 0.71544373 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.

4 0.61065 298 acl-2011-The ACL Anthology Searchbench

Author: Ulrich Schafer ; Bernd Kiefer ; Christian Spurk ; Jorg Steffen ; Rui Wang

Abstract: We describe a novel application for structured search in scientific digital libraries. The ACL Anthology Searchbench is meant to become a publicly available research tool to query the content of the ACL Anthology. The application provides search in both its bibliographic metadata and semantically analyzed full textual content. By combining these two features, very efficient and focused queries are possible. At the same time, the application serves as a showcase for the recent progress in natural language processing (NLP) research and language technology. The system currently indexes the textual content of 7,500 anthology papers from 2002–2009 with predicateargument-like semantic structures. It also provides useful search filters based on bibliographic metadata. It will be extended to provide the full anthology content and en- . hanced functionality based on further NLP techniques. 1 Introduction and Motivation Scientists in all disciplines nowadays are faced with a flood of new publications every day. In addition, more and more publications from the past become digitally available and thus even increase the amount. Finding relevant information and avoiding duplication of work have become urgent issues to be addressed by the scientific community. The organization and preservation of scientific knowledge in scientific publications, vulgo text documents, thwarts these efforts. From a viewpoint of 7 dfki .de / lt a computer scientist, scientific papers are just ‘unstructured information’ . At least in our own scientific community, Computational Linguistics, it is generally assumed that NLP could help to support search in such document collections. The ACL Anthology1 is a comprehensive elec- tronic collection of scientific papers in our own field (Bird et al., 2008). It is updated regularly with new publications, but also older papers have been scanned and are made available electronically. We have implemented the ACL Anthology Searchbench2 for two reasons: Our first aim is to provide a more targeted search facility in this collection than standard web search on the anthology website. In this sense, the Searchbench is meant to become a service to our own community. Our second motivation is to use the developed system as a showcase for the progress that has been made over the last years in precision-oriented deep linguistic parsing in terms of both efficiency and coverage, specifically in the context of the DELPHIN community3. Our system also uses further NLP techniques such as unsupervised term extraction, named entity recognition and part-of-speech (PoS) tagging. By automatically precomputing normalized semantic representations (predicate-argument structure) of each sentence in the anthology, the search space is structured and allows to find equivalent or related predicates even if they are expressed differ- 1http : / /www . aclweb .org/ anthology 2http : //aclasb . dfki . de 3http : / /www . de lph-in . net – DELPH-IN stands for DEep Linguistic Processing with HPSG INitiative. Portland,P Orroecge ondi,n UgSsA o,f 2 th1e J AunCeL 2-H0L1 T. 2 ?0c 1210 1S1ys Atesmso Dcieamtio n s ftorart Cio nms,p puatgaetiso 7n–al1 L3i,nguistics ently, e.g. in passive constructions, using synonyms, etc. By storing the semantic sentence structure along with the original text in a structured full-text search engine, it can be guaranteed that recall cannot fall behind the baseline of a fulltext search. In addition, the Searchbench also provides detailed bibliographic metadata for filtering as well as autosuggest texts for input fields computed from the corpus two further key features one can expect from such systems today, nevertheless very important for efficient search in digital libraries. We describe the offline preprocessing and deep parsing approach in Section 2. Section 3 concentrates on the generation of the semantic search index. In Section 4, we describe the search interface. We conclude in Section 5 and present an outlook to future extensions. – 2 Parsing the ACL Anthology The basis of the search index for the ACL Anthology are its original PDF documents, currently 8,200 from the years 2002 through 2009. To overcome quality problems in text extraction from PDF, we use a commercial PDF extractor based on OCR techniques. This approach guarantees uniform and highquality textual representations even from older papers in the anthology (before 2000) which mostly were scanned from printed paper versions. The general idea of the semantics-oriented access to scholarly paper content is to parse each sentence they contain with the open-source HPSG (Pollard and Sag, 1994) grammar for English (ERG; Flickinger (2002)) and then distill and index semantically structured representations for search. To make the deep parser robust, it is embedded in a NLP workflow. The coverage (percentage of full deeply parsed sentences) on the anthology corpus could be increased from 65 % to now more than 85 % through careful combination of several robustness techniques; for example: (1) chart pruning, directed search during parsing to increase per- formance, and also coverage for longer sentences (Cramer and Zhang, 2010); (2) chart mapping, a novel method for integrating preprocessing information in exactly the way the deep grammar expects it (Adolphs et al., 2008); (3) new version of the ERG with better handling of open word classes; (4) 8 more fine-grained named entity recognition, including recognition of citation patterns; (5) new, better suited parse ranking model (WeScience; Flickinger et al. (2010)). Because of limited space, we will focus on (1) and (2) below. A more detailed description and further results are available in (Sch a¨fer and Kiefer, 2011). Except for a small part of the named entity recognition components (citations, some terminology) and the parse ranking model, there are no further adaptations to genre or domain of the text corpus. This implies that the NLP workflow could be easily and modularly adapted to other (scientific or nonscientific) domains—mainly thanks to the generic and comprehensive language modelling in the ERG. The NLP preprocessing component workflow is implemented using the Heart of Gold NLP middleware architecture (Sch a¨fer, 2006). It starts with sentence boundary detection (SBR) and regular expression-based tokenization using its built-in component JTok, followed by the trigram-based PoS tagger TnT (Brants, 2000) trained on the Penn Treebank (Marcus et al., 1993) and the named entity recognizer SProUT (Dro z˙d z˙y n´ski et al., 2004). 2.1 Precise Preprocessing Integration with Chart Mapping Tagger output is combined with information from the named entity recognizer, e.g. delivering hypothetical information on citation expressions. The combined result is delivered as input to the deep parser PET (Callmeier, 2000) running the ERG. Here, citations, for example, can be treated as either persons, locations or appositions. Concerning punctuation, the ERG can make use of information on opening and closing quotation marks. Such information is often not explicit in the input text, e.g. when, as in our setup, gained through OCR which does not distinguish between ‘ and ’ or “ and However, a tokenizer can often guess (recon- ”. struct) leftness and rightness correctly. This information, passed to the deep parser via chart mapping, helps it to disambiguate. 2.2 Increased Processing Speed and Coverage through Chart Pruning In addition to a well-established discriminative maximum entropy model for post-analysis parse selection, we use an additional generative model as described in Cramer and Zhang (2010) to restrict the search space during parsing. This restriction increases efficiency, but also coverage, because the parse time was restricted to at most 60 CPU seconds on a standard PC, and more sentences could now be parsed within these bounds. A 4 GB limit for main memory consumption was far beyond what was ever needed. We saw a small but negligible decrease in parsing accuracy, 5.4 % best parses were not found due to the pruning of important chart edges. Ninomiya et al. (2006) did a very thorough comparison ofdifferent performance optimization strategies, and among those also a local pruning strategy similar to the one used here. There is an important difference between the systems, in that theirs works on a reduced context-free backbone first and reconstructs the results with the full grammar, while PET uses the HPSG grammar directly, with subsumption packing and partial unpacking to achieve a similar effect as the packed chart of a context-free parser. sentence length −→ Figure 1: Distribution of sentence length and mean parse times for mild pruning In total, we parsed 1,537,801 sentences, of which 57,832 (3.8 %) could not be parsed because of lexicon errors. Most of them were caused by OCR ar- tifacts resulting in unexpected punctuation character combinations. These can be identified and will be deleted in the future. Figure 1 displays the average parse time of processing with a mild chart pruning setting, together with the mean quadratic error. In addition, it contains the distribution of input sentences over sentence length. Obviously, the vast majority of sen9 tences has a length of at most 60 words4. The parse times only grow mildly due to the many optimization techniques in the original system, and also the new chart pruning method. The sentence length distribution has been integrated into Figure 1 to show that the predominant part of our real-world corpus can be processed using this information-rich method with very low parse times (overall average parse time < 2 s per sentence). The large amount of short inputs is at first surprising, even more so that most of these inputs can not be parsed. Most of these inputs are non-sentences such as headings, enumerations, footnotes, table cell content. There are several alternatives to deal with such input, one to identify and handle them in a preprocessing step, another to use a special root condition in the deep analysis component that is able to combine phrases with well-defined properties for inputs where no spanning result could be found. We employed the second method, which has the advantage that it handles a larger range of phenomena in a homogeneous way. Figure 2 shows the change in percentage of unparsed and timed out inputs for the mild pruning method with and without the root condition combining fragments. sentence length −→ Figure 2: Unparsed and timed out sentences with and without fragment combination Figure 2 shows that this changes the curve for unparsed sentences towards more expected characteristics and removes the uncommonly high percentage of short sentences for which no parse can be computed. Together with the parses for fragmented 4It has to be pointed out that extremely long sentences also may be non-sentences resulting from PDF extraction errors, missing punctuation etc. No manual correction took place. Figure 3: Multiple semantic tuples may be generated for a sentence input, we get a recall (sentences with at least one parse) over the whole corpus of 85.9 % (1,321,336 sentences), without a significant change for any of the other measures, and with potential for further improvement. 3 Semantic Tuple Extraction with DMRS In contrast to shallow parsers, the ERG not only handles detailed syntactic analyses of phrases, com- pounds, coordination, negation and other linguistic phenomena that are important for extracting semantic relations, but also generates a formal semantic representation of the meaning of the input sentence in the Minimal Recursion Semantics (MRS) representation format (Copestake et al., 2005). It consists of elementary predications for each word and larger constituents, connected via argument positions and variables, from which predicate-argument structure can be extracted. MRS representations resulting from deep parsing are still relatively close to linguistic structures and contain more detailed information than a user would like to query and search for. Therefore, an additional extraction and abstraction step is performed before storing semantic structures in the search index. Firstly, MRS is converted to DMRS (Copestake, 2009), a dependency-style version of MRS that eases extraction of predicate-argument structure using the implementation in LKB (Copestake, 2002). The representation format we devised for the search index we call semantic tuples, in fact quintuples

5 0.5233506 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

Author: Antske Fokkens

Abstract: When designing grammars of natural language, typically, more than one formal analysis can account for a given phenomenon. Moreover, because analyses interact, the choices made by the engineer influence the possibilities available in further grammar development. The order in which phenomena are treated may therefore have a major impact on the resulting grammar. This paper proposes to tackle this problem by using metagrammar development as a methodology for grammar engineering. Iargue that metagrammar engineering as an approach facilitates the systematic exploration of grammars through comparison of competing analyses. The idea is illustrated through a comparative study of auxiliary structures in HPSG-based grammars for German and Dutch. Auxiliaries form a central phenomenon of German and Dutch and are likely to influence many components of the grammar. This study shows that a special auxiliary+verb construction significantly improves efficiency compared to the standard argument-composition analysis for both parsing and generation.

6 0.51467562 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

7 0.50764024 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

8 0.46674997 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

9 0.4565779 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

10 0.44920316 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

11 0.44841743 285 acl-2011-Simple supervised document geolocation with geodesic grids

12 0.42459518 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems

13 0.41294929 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

14 0.3834942 267 acl-2011-Reversible Stochastic Attribute-Value Grammars

15 0.37364107 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

16 0.35012174 192 acl-2011-Language-Independent Parsing with Empty Elements

17 0.34784147 282 acl-2011-Shift-Reduce CCG Parsing

18 0.32723504 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

19 0.31187695 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

20 0.3106319 176 acl-2011-Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.056), (17, 0.038), (26, 0.016), (28, 0.01), (37, 0.088), (39, 0.101), (41, 0.089), (55, 0.061), (59, 0.041), (72, 0.03), (73, 0.023), (87, 0.23), (91, 0.035), (96, 0.09)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87813145 239 acl-2011-P11-5002 k2opt.pdf

Author: empty-author

Abstract: unkown-abstract

same-paper 2 0.81369394 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark

3 0.72190791 178 acl-2011-Interactive Topic Modeling

Author: Yuening Hu ; Jordan Boyd-Graber ; Brianna Satinoff

Abstract: Topic models have been used extensively as a tool for corpus exploration, and a cottage industry has developed to tweak topic models to better encode human intuitions or to better model data. However, creating such extensions requires expertise in machine learning unavailable to potential end-users of topic modeling software. In this work, we develop a framework for allowing users to iteratively refine the topics discovered by models such as latent Dirichlet allocation (LDA) by adding constraints that enforce that sets of words must appear together in the same topic. We incorporate these constraints interactively by selectively removing elements in the state of a Markov Chain used for inference; we investigate a variety of methods for incorporating this information and demonstrate that these interactively added constraints improve topic usefulness for simulated and actual user sessions.

4 0.6156472 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

5 0.60925871 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

Author: Jacob Eisenstein ; Noah A. Smith ; Eric P. Xing

Abstract: We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ‘1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.

6 0.60243857 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

7 0.59777951 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

8 0.59417289 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

9 0.59402162 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

10 0.59050405 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

11 0.58995491 192 acl-2011-Language-Independent Parsing with Empty Elements

12 0.58838731 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

13 0.58669585 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

14 0.58565879 182 acl-2011-Joint Annotation of Search Queries

15 0.5851078 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

16 0.58402467 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

17 0.58329463 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

18 0.58090883 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

19 0.57962918 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

20 0.57861829 10 acl-2011-A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing