emnlp emnlp2010 emnlp2010-106 knowledge-graph by maker-knowledge-mining

106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing


Source: pdf

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. [sent-2, score-0.155]

2 The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. [sent-5, score-0.257]

3 Our parser achieves a new best result for topdown parsers of 89. [sent-6, score-0.333]

4 4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86. [sent-7, score-0.109]

5 1 Introduction We present a new syntactic parser that works top-down and left-to-right, maintaining a fullyconnected parse tree for a few alternative parse hypotheses. [sent-10, score-0.461]

6 , 1993) parser in that it is capable of parsing the Penn treebank test sets, and is trained on the now standard training set. [sent-12, score-0.297]

7 It achieves a new best result for this parser type. [sent-13, score-0.166]

8 All of the commonly used statistical parsers available on the web such as the Collins(/Bikel) 674 . [sent-14, score-0.109]

9 , 2006) , parsers use context-free dynamic programming algorithms so they work bottom up on the entire sentence. [sent-17, score-0.155]

10 Thus any parser claiming cognitive plausibility must, to a first approximation, work in this left-toright top-down fashion. [sent-20, score-0.208]

11 Our parser obtains a new best result for topdown parsers of 89. [sent-21, score-0.333]

12 It differs from Roark in its explicit recognition that by giving up context-free dynamic programming we may embrace near context sensitivity and condition on many diverse pieces of information. [sent-27, score-0.136]

13 Section three describes how random forests allow us to integrate the diverse information sources that contextsensitive parsing allows. [sent-34, score-0.298]

14 Finally, section six suggests that because this parser type is comparatively little explored one may hope for further substantial improvements, and proposes avenues to be explored. [sent-37, score-0.21]

15 2 Previous Work on Top-Down Parsing and the Roark Model We care about top-down incremental parsing because it automatically satisfies the criteria we have established for cognitive plausibility. [sent-38, score-0.165]

16 In particular In top-down strategies a node is enumerated before any of its descendents. [sent-43, score-0.185]

17 (Abney and Johnson, 1991) In this era of statistical parsers it is useful to think in terms of possible conditioning information. [sent-44, score-0.218]

18 In typical bottom up CKY parsing when creating, say, a constituent X from positions i to j we may not condition on its parent. [sent-45, score-0.182]

19 Note that this means that none of the words or sub-constituents of either the NP or VP are integrated into a single overall tree until the very end. [sent-60, score-0.161]

20 Since Nivre’s parser is a dependency parser this exact case does not apply (as it does not use CFG rules) , but similar situations arise. [sent-62, score-0.39]

21 Naturally this is transitive so that the parser can, and presumably does, process an unbounded number of words before connecting them all together. [sent-64, score-0.235]

22 ) The input to the parser is a string of n words w0,n−1 . [sent-74, score-0.166]

23 This has the special property that p(⊳ | t) = 1 for a complete tree t pofr w0,n−1 , zero o(t⊳h |e tr)wi =se. [sent-76, score-0.161]

24 A hypothesis h is a three-tuple < q, r, t > where q is the probability assigned to the current tree t. [sent-78, score-0.269]

25 For example, the tree (ROOT) would be a vector of two elements, ROOT and “)” , the latter indicating that the constituent labeled root is completed. [sent-81, score-0.261]

26 At the start of our processing of wi we have hypotheses on C ordered by p · q — the probability pofo tthhees hypothesis so dfa br yti pm·eqs — an heest pimroabtaeb q toyf the probability cost we encounter when trying 676 to now integrate wi. [sent-87, score-0.351]

27 We remove each h from C and integrate a new tree symbol x. [sent-88, score-0.161]

28 If x = wi it means that we have successfully added the new word to the tree and this hypothesis goes into the queue for the next word N. [sent-89, score-0.528]

29 Otherwise h does not yet represent an extension to wi and we put it back on C to compete with the other hypotheses waiting to be extended. [sent-90, score-0.198]

30 If we put h back onto C it’s probability p is lowered by the factor p(x | h) . [sent-92, score-0.157]

31 Without going into details, we have adopted exactly the decision criteria and associated parameters used by Roark so that the accuracy numbers presumably reflect the same amount of search. [sent-95, score-0.168]

32 ” Lines one and two show the current queue at the start of processing. [sent-98, score-0.222]

33 Line one has the ultimately correct partial tree (S (NP (NNS Terms) . [sent-99, score-0.161]

34 Note that the NP is not closed off as the parser defers closing constituents until necessary. [sent-100, score-0.166]

35 On the right of line 1 we show the possible next tree pieces that could be added. [sent-101, score-0.262]

36 ) The result is that the hypothesis of line 1 is removed from the queue, and a new hypothesis is added back on C as this new hypothesis does not incorporate the second word. [sent-104, score-0.249]

37 With line 3 removed from the queue, and its two extensions added, we get the new queue state shown in lines 6,7 and 8. [sent-108, score-0.323]

38 This still has not yet incorporated the next word into the parse, so this extension is inserted in the current queue giving us the queue state shown in 9,10, 11. [sent-110, score-0.486]

39 ” This addition incorporates the current word, and the resulting extension, shown in line 12 is inserted in N, not C, ending this example. [sent-112, score-0.143]

40 3 Random Forests The Roark model we emulate requires the estimation of two probability distributions: one for the next tree element (non-terminals,terminals, and “)” ) in the grammar, and one for the lookahead probability of the yet-to-be-incorporated next word. [sent-113, score-0.345]

41 We first consider how to construct a single (non-random) decision tree for estimating this distribution. [sent-115, score-0.26]

42 A tree is a fully-connected directed acyclic graph where each node has one input arc (except for the root) and, for reasons we go into later, either zero or two output arcs — the tree is binary. [sent-116, score-0.635]

43 A node is a four-tuple < d, s, p, q > , where d is a set of training instances, p, a probability distribution of the correct decisions for all of the examples in d, and q a binary ques677 tion about the conditioning information for the examples in d. [sent-117, score-0.366]

44 The 0/1 answer to this question causes the decision-tree program to follow the left/right arc out of the node to the children nodes. [sent-118, score-0.274]

45 s is a strict subset of the domain of the q for the parent of h. [sent-120, score-0.147]

46 Decision tree construction starts with the root node n where d consists of the several million situations in the training data where the next tree element needs to be guessed (according to our probability distribution) based upon previous words and the analysis so far. [sent-121, score-0.728]

47 At each iteration one node is selected from a queue of unprocessed nodes. [sent-122, score-0.456]

48 These are inserted in the queue of unprocessed nodes and the process repeats. [sent-124, score-0.368]

49 We have chosen to simply pick the number of nodes we create. [sent-126, score-0.116]

50 Nodes left on the queue are the leaf nodes of the decision tree. [sent-127, score-0.376]

51 We pick nodes from the heap based upon how much they increased the probability of the data. [sent-128, score-0.188]

52 First pick a query type qt from a user supplied set. [sent-130, score-0.237]

53 Examples include the parent of the non-terminal to be created, its predecessor, 2 previous, etc. [sent-132, score-0.147]

54 Secondly we turn our qt into a binary question by creating two disjoint sets s1 and s2 s1∪ s2 = s where s is the domain of qt. [sent-135, score-0.163]

55 aFnodr example, if qt is the parent relation, and the parent in h is NP, then h goes in d1 iff NP ∈ s1. [sent-138, score-0.459]

56 19 Figure 3: Some nodes in a decision tree for p(wi | t) ator of parsers is the propensity of prepositional phrases (PP) headed by “of” to attach to noun phrases (NP) rather than verb phrases (VP) . [sent-157, score-0.467]

57 Here we illustrate how a decision tree for p(wi | t) captures this. [sent-158, score-0.26]

58 o Eeasc inh line gives a node number, Q — the question asked at that node, examples of answers, and probabilities for “of” and “in” . [sent-161, score-0.326]

59 Looking at node 0 we see that the first question type is 1 — parent of the proposed word. [sent-163, score-0.372]

60 We see that prepositions (IN) have been put in node 1. [sent-165, score-0.275]

61 To get a feel for who is sharing this node with prepositions each line gives two examples. [sent-167, score-0.331]

62 For node 1 this includes a lot of very different types, including NN (common singular noun) . [sent-168, score-0.185]

63 Node 1 again asks about the preterminal, leading to node 4. [sent-169, score-0.229]

64 Node 4 again asks about the preterminal, leading to node 12. [sent-171, score-0.229]

65 Also note that at each node we give the probability of both “of” and “to” given the questions and answers leading to that node. [sent-175, score-0.403]

66 We can see that the probability of “of” goes up from 0. [sent-176, score-0.114]

67 By node 16 we are concentrating on prepositions heading prepositional phrases, but nothing has been asked that would distinguish between these two prepositions. [sent-180, score-0.316]

68 However, at node 16 we ask the ques678 tion “who is the grandparent” leading to nodes 39 and 40. [sent-181, score-0.284]

69 At this point note how the probability of “of” dramatically increases for node 39, and decreases for 40. [sent-185, score-0.257]

70 That the tree is binary forces the decision tree to use information about words and nonterminals one bit at a time. [sent-186, score-0.421]

71 We go from a single decision tree to a random forest by creating many trees, randomly changing the questions used at every node. [sent-188, score-0.495]

72 Secondly, each qt produces its own binary version q. [sent-190, score-0.123]

73 Rather than picking the one that raises the data probability the most, we choose it with probability m. [sent-191, score-0.18]

74 With probability 1 − m we repeat atbhiilsi procedure on trohbe lbisilti toyf q’s m min uwse t rheebest. [sent-192, score-0.13]

75 Given a forest of f trees we compute the final probability by taking the average over all the trees: p(x | t) =f1j=X1,fpj(x | t) where pj denotes the probability computed by tree j. [sent-193, score-0.424]

76 4 Implementation Details We have twenty seven basic query types as shown in Figure 4. [sent-194, score-0.107]

77 The description is from the point of view of the tree entry x to be added to the tree. [sent-196, score-0.161]

78 So the first line of Figure 4 specifies six query types, the most local of which is the label of the parent of x. [sent-197, score-0.301]

79 For example, if we have the local context “(S (NP (DT the)” and we are assigning a probability to the preterminal after DT, (e. [sent-198, score-0.158]

80 Similarly one of the query types from line two is one-previous, which is DT. [sent-201, score-0.154]

81 l Toohke- faohreesatd probability, LAP, we use a single decision tree with greedy optimal questions and 1600 nodes. [sent-208, score-0.302]

82 We smooth our random forest probabilities by successively backing off to distributions three earlier in the decision tree. [sent-209, score-0.213]

83 We use linear interpolation so pl (x | t) = λ(cl) ∗ pˆl (x | t) + (1−λ(cl)) ∗pl−3(x | t) Here pl is the smoothed distribution for level l of the tree and pˆl is the maximum likelihood (unsmoothed) distribution. [sent-210, score-0.295]

84 Because random forests have so much latitude in picking combinations of words for specific situations we have the impression that it can overfit the training data, although we have not done an explicit study to confirm this. [sent-219, score-0.309]

85 Because the inner loop of random-forest training involves moving a conditioning event to the other decedent node to see if this raises training data probability, this also substantially speeds up training time. [sent-221, score-0.294]

86 Lastly Roark obtained the results we quote here with selective use of left-corner transforms (Demers, 1977; Johnson and Roark, 2000) . [sent-222, score-0.121]

87 However, we are also aware that in some respects left-corner transforms work against the fully-connected tree rule as operationalizing the “understand as you go along” cognitive constraint. [sent-226, score-0.359]

88 Thus using left corner transforms on all NP’s allows the parser to conflate differing semantic situations into a single tree. [sent-229, score-0.301]

89 ) 5 Results and Analysis We trained the parser on the standard sections 2-21 of the Penn Tree-bank, and tested on all sentences of length ≤ 100 of section 23. [sent-258, score-0.166]

90 The first group of results show the performance of standard parsers now in use. [sent-261, score-0.109]

91 4% f-measure needs improvement before it would be worth-while using this parser for routine work, it has moved past the accuracy of the Collins-Bikel (Collins, 2003; Bikel, 2004) parser and is not statistically distinguishable from (Charniak, 2000) . [sent-263, score-0.332]

92 Naturally, a new combination using our parser would almost surely register another significant gain. [sent-274, score-0.166]

93 In Figure 6 we show results illustrating how parser performance improves as the probability distributions are conditioned on more diverse information from the partial trees. [sent-283, score-0.28]

94 The first line has results when we condition on only the “closest” eight non-terminal and the previous word. [sent-284, score-0.149]

95 ) One other illustrative result: if we keep all system settings constant and replace the random forest mechanism by a single greedy optimal decision tree for probability computation, performance is reduced to 86. [sent-291, score-0.446]

96 While this almost certainly overstates the performance improvement due to many random trees (the system parameters could be better adjusted to the one-tree case) , it strongly suggests that nothing like our performance could have been obtained without the forests in random forests. [sent-293, score-0.359]

97 ” Initially the parser believes that this is a prepositional phrase as shown in Figure 7. [sent-308, score-0.209]

98 However, the correct tree-bank parse incorporates a subordinate sentential clause “than General Motors is worth” , as in Figure 8. [sent-309, score-0.164]

99 It is also the case that the parsing model gives the correct parse a higher probability if it is available, showing that this is a search error, not a model error. [sent-312, score-0.224]

100 While it is certainly possible that this would prove to be the same for this new model, the use of random forests to in- tegrate more diverse information sources might help us to reverse this state of affairs. [sent-324, score-0.265]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('roark', 0.384), ('np', 0.269), ('nns', 0.223), ('queue', 0.222), ('motors', 0.202), ('node', 0.185), ('parser', 0.166), ('tree', 0.161), ('parent', 0.147), ('forests', 0.127), ('qt', 0.123), ('lap', 0.115), ('vp', 0.109), ('parsers', 0.109), ('conditioning', 0.109), ('line', 0.101), ('decision', 0.099), ('earley', 0.099), ('preterminal', 0.086), ('parsing', 0.085), ('go', 0.079), ('transforms', 0.077), ('johnson', 0.074), ('probability', 0.072), ('forest', 0.07), ('presumably', 0.069), ('wi', 0.067), ('parse', 0.067), ('abney', 0.062), ('grandparent', 0.062), ('charniak', 0.061), ('pick', 0.061), ('brian', 0.06), ('answers', 0.06), ('penn', 0.058), ('situations', 0.058), ('gompel', 0.058), ('parenthesis', 0.058), ('subordinate', 0.058), ('topdown', 0.058), ('toyf', 0.058), ('nivre', 0.058), ('aux', 0.058), ('collins', 0.056), ('nodes', 0.055), ('twenty', 0.054), ('terminals', 0.054), ('query', 0.053), ('worth', 0.052), ('certainly', 0.052), ('root', 0.051), ('trees', 0.049), ('unprocessed', 0.049), ('bllip', 0.049), ('arc', 0.049), ('pl', 0.049), ('constituent', 0.049), ('nn', 0.049), ('condition', 0.048), ('treebank', 0.046), ('hypotheses', 0.046), ('programming', 0.046), ('cl', 0.045), ('prepositions', 0.045), ('put', 0.045), ('selective', 0.044), ('amit', 0.044), ('impression', 0.044), ('possessive', 0.044), ('sbar', 0.044), ('henderson', 0.044), ('exchange', 0.044), ('comparatively', 0.044), ('gorithm', 0.044), ('graham', 0.044), ('random', 0.044), ('leading', 0.044), ('perceptron', 0.044), ('prepositional', 0.043), ('nothing', 0.043), ('cognitive', 0.042), ('questions', 0.042), ('diverse', 0.042), ('inserted', 0.042), ('goes', 0.042), ('restricted', 0.041), ('stack', 0.041), ('back', 0.04), ('element', 0.04), ('question', 0.04), ('clause', 0.039), ('petrov', 0.039), ('lastly', 0.038), ('cm', 0.038), ('chen', 0.038), ('eugene', 0.038), ('incremental', 0.038), ('hypothesis', 0.036), ('picking', 0.036), ('interpolation', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

2 0.19530399 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

3 0.17393626 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

4 0.16765478 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: Syntactic consistency is the preference to reuse a syntactic construction shortly after its appearance in a discourse. We present an analysis of the WSJ portion of the Penn Treebank, and show that syntactic consistency is pervasive across productions with various lefthand side nonterminals. Then, we implement a reranking constituent parser that makes use of extra-sentential context in its feature set. Using a linear-chain conditional random field, we improve parsing accuracy over the generative baseline parser on the Penn Treebank WSJ corpus, rivalling a similar model that does not make use of context. We show that the context-aware and the context-ignorant rerankers perform well on different subsets of the evaluation data, suggesting a combined approach would provide further improvement. We also compare parses made by models, and suggest that context can be useful for parsing by capturing structural dependencies between sentences as opposed to lexically governed dependencies.

5 0.16660295 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

Author: Stephen Boxwell ; Dennis Mehay ; Chris Brew

Abstract: In many NLP systems, there is a unidirectional flow of information in which a parser supplies input to a semantic role labeler. In this paper, we build a system that allows information to flow in both directions. We make use of semantic role predictions in choosing a single-best parse. This process relies on an averaged perceptron model to distinguish likely semantic roles from erroneous ones. Our system penalizes parses that give rise to low-scoring semantic roles. To explore the consequences of this we perform two experiments. First, we use a baseline generative model to produce n-best parses, which are then re-ordered by our semantic model. Second, we use a modified version of our semantic role labeler to predict semantic roles at parse time. The performance of this modified labeler is weaker than that of our best full SRL, because it is restricted to features that can be computed directly from the parser’s packed chart. For both experiments, the resulting semantic predictions are then used to select parses. Finally, we feed the selected parses produced by each experiment to the full version of our semantic role labeler. We find that SRL performance can be improved over this baseline by selecting parses with likely semantic roles.

6 0.16496497 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

7 0.13972357 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

8 0.11664449 114 emnlp-2010-Unsupervised Parse Selection for HPSG

9 0.10938933 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

10 0.10751718 40 emnlp-2010-Effects of Empty Categories on Machine Translation

11 0.093893506 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

12 0.086345226 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

13 0.08614748 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

14 0.083279483 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

15 0.07953462 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

16 0.079411581 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

17 0.075960636 2 emnlp-2010-A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model

18 0.075549655 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

19 0.070248157 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

20 0.066686139 51 emnlp-2010-Function-Based Question Classification for General QA


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.277), (1, 0.068), (2, 0.295), (3, 0.121), (4, 0.131), (5, 0.212), (6, 0.048), (7, -0.065), (8, -0.114), (9, -0.087), (10, 0.122), (11, -0.07), (12, 0.087), (13, 0.103), (14, -0.013), (15, 0.021), (16, 0.082), (17, -0.004), (18, 0.081), (19, -0.072), (20, -0.019), (21, 0.212), (22, -0.034), (23, 0.069), (24, -0.092), (25, -0.01), (26, 0.004), (27, 0.094), (28, 0.041), (29, 0.044), (30, -0.055), (31, 0.039), (32, 0.013), (33, -0.023), (34, 0.11), (35, 0.018), (36, -0.016), (37, 0.051), (38, 0.054), (39, 0.07), (40, 0.049), (41, -0.027), (42, -0.008), (43, 0.072), (44, 0.053), (45, 0.016), (46, -0.017), (47, 0.016), (48, -0.059), (49, -0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97361052 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

2 0.72715074 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: Syntactic consistency is the preference to reuse a syntactic construction shortly after its appearance in a discourse. We present an analysis of the WSJ portion of the Penn Treebank, and show that syntactic consistency is pervasive across productions with various lefthand side nonterminals. Then, we implement a reranking constituent parser that makes use of extra-sentential context in its feature set. Using a linear-chain conditional random field, we improve parsing accuracy over the generative baseline parser on the Penn Treebank WSJ corpus, rivalling a similar model that does not make use of context. We show that the context-aware and the context-ignorant rerankers perform well on different subsets of the evaluation data, suggesting a combined approach would provide further improvement. We also compare parses made by models, and suggest that context can be useful for parsing by capturing structural dependencies between sentences as opposed to lexically governed dependencies.

3 0.68686992 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

4 0.57728434 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

5 0.54695213 40 emnlp-2010-Effects of Empty Categories on Machine Translation

Author: Tagyoung Chung ; Daniel Gildea

Abstract: We examine effects that empty categories have on machine translation. Empty categories are elements in parse trees that lack corresponding overt surface forms (words) such as dropped pronouns and markers for control constructions. We start by training machine translation systems with manually inserted empty elements. We find that inclusion of some empty categories in training data improves the translation result. We expand the experiment by automatically inserting these elements into a larger data set using various methods and training on the modified corpus. We show that even when automatic prediction of null elements is not highly accurate, it nevertheless improves the end translation result.

6 0.52205932 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

7 0.50219208 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

8 0.47822815 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

9 0.45613822 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

10 0.43446589 114 emnlp-2010-Unsupervised Parse Selection for HPSG

11 0.42170024 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

12 0.38783592 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

13 0.3333202 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

14 0.33221719 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

15 0.29782981 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

16 0.29335967 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

17 0.28116331 2 emnlp-2010-A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model

18 0.27886713 94 emnlp-2010-SCFG Decoding Without Binarization

19 0.2701174 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

20 0.25822046 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.019), (12, 0.033), (29, 0.106), (30, 0.023), (32, 0.015), (52, 0.018), (56, 0.064), (62, 0.032), (66, 0.09), (72, 0.049), (76, 0.064), (79, 0.011), (87, 0.376)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84275132 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

2 0.80767453 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

Author: Avihai Mejer ; Koby Crammer

Abstract: Confidence-Weighted linear classifiers (CW) and its successors were shown to perform well on binary and multiclass NLP problems. In this paper we extend the CW approach for sequence learning and show that it achieves state-of-the-art performance on four noun phrase chucking and named entity recognition tasks. We then derive few algorithmic approaches to estimate the prediction’s correctness of each label in the output sequence. We show that our approach provides a reliable relative correctness information as it outperforms other alternatives in ranking label-predictions according to their error. We also show empirically that our methods output close to absolute estimation of error. Finally, we show how to use this information to improve active learning.

3 0.74313271 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition

Author: Zhiyuan Liu ; Wenyi Huang ; Yabin Zheng ; Maosong Sun

Abstract: Existing graph-based ranking methods for keyphrase extraction compute a single importance score for each word via a single random walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics, we propose to decompose traditional random walk into multiple random walks specific to various topics. We thus build a Topical PageRank (TPR) on word graph to measure word importance with respect to different topics. After that, given the topic distribution of the document, we further calculate the ranking scores of words and extract the top ranked ones as keyphrases. Experimental results show that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics.

4 0.53578675 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

5 0.48544356 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

6 0.47933221 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

7 0.47863099 40 emnlp-2010-Effects of Empty Categories on Machine Translation

8 0.47531325 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

9 0.4646737 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

10 0.45941204 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

11 0.45790499 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

12 0.448421 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

13 0.447512 84 emnlp-2010-NLP on Spoken Documents Without ASR

14 0.44640833 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

15 0.44541219 20 emnlp-2010-Automatic Detection and Classification of Social Events

16 0.44133553 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

17 0.44051614 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

18 0.43988374 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

19 0.43957615 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

20 0.43624967 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing