emnlp emnlp2010 emnlp2010-115 knowledge-graph by maker-knowledge-mining

115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing


Source: pdf

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com av Abstract It is well known that parsing accuracies drop significantly on out-of-domain data. [sent-2, score-0.286]

2 What is less known is that some parsers suffer more from domain shifts than others. [sent-3, score-0.412]

3 We show that dependency parsers have more difficulty parsing questions than constituency parsers. [sent-4, score-0.952]

4 In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. [sent-5, score-0.54]

5 We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). [sent-6, score-1.385]

6 Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. [sent-7, score-0.7]

7 With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance. [sent-8, score-0.762]

8 At this point, we have many different parsing models that reach and even surpass 90% dependency or constituency accuracy on this test set (McDonald et al. [sent-10, score-0.394]

9 Quite impressively, models based on deterministic shift-reduce parsing 705 algorithms are able to rival the other computationally more expensive models (see Nivre (2008) and references therein for more details). [sent-15, score-0.293]

10 Unfortunately, the parsing accuracies of all models have been reported to drop significantly on outof-domain test sets, due to shifts in vocabulary and grammar usage (Gildea, 2001; McClosky et al. [sent-17, score-0.286]

11 Questions pose interesting challenges for WSJ-trained parsers because they are heavily underrepresented in the training data (there are only 334 questions among the 39,832 training sentences). [sent-20, score-0.558]

12 At the same time, questions are of particular interest for user facing applications like question answering or web search, which necessitate parsers that can process questions in a fast and accurate manner. [sent-21, score-0.896]

13 We start our investigation in Section 3 by training several state-of-the-art (dependency and constituency) parsers on the standard WSJ training set. [sent-22, score-0.326]

14 When evaluated on a question corpus, we observe dramatic accuracy drops exceeding 20% for the deterministic shift-reduce parsers. [sent-23, score-0.27]

15 , 2007), seem to suffer more from this domain change than constituency parsers (Charniak and Johnson, 2005; Petrov et al. [sent-26, score-0.65]

16 Unfortunately, the parsers that generalize better to this new domain have time complexities that are cubic in the sentence length (or even higher), rendering them impractical for web-scale text processing. [sent-30, score-0.442]

17 Figure 1: Example constituency tree from the QuestionBank (a) converted to labeled Stanford dependencies (b). [sent-36, score-0.373]

18 We therefore propose an uptraining method, in which a deterministic shift-reduce parser is trained on the output of a more accurate, but slower parser (Section 4). [sent-37, score-1.081]

19 Instead, our aim is to train a computationally cheaper model (a linear time dependency parser) to match the performance of the best model (a cubic time constituency parser), resulting in a computationally efficient, yet highly accurate model. [sent-40, score-0.381]

20 In practice, we parse a large amount of unlabeled data from the target domain with the constituency parser of Petrov et al. [sent-41, score-0.649]

21 (2006) and then train a deterministic dependency parser on this noisy, automatically parsed data. [sent-42, score-0.554]

22 The accuracy of the linear time parser on a question test set goes up from 60. [sent-43, score-0.265]

23 94% after uptraining, which is comparable to adding 2,000 labeled questions to the training data. [sent-45, score-0.35]

24 Combining uptraining with 2,000 labeled questions further improves the accuracy to 84. [sent-46, score-0.79]

25 , 2006), which includes a set of manually annotated questions from a TREC question answering task. [sent-55, score-0.295]

26 The questions in the QuestionBank are very different from our training data in terms of grammatical constructions and vocabulary usage, making this a rather extreme case of domainadaptation. [sent-56, score-0.232]

27 We split the 4,000 questions contained in this corpus in three parts: the first 2,000 questions are reserved as a small target-domain training set; the remaining 2,000 questions are split in two equal parts, the first serving as development set and the second as our final test set. [sent-57, score-0.729]

28 We convert the trees in both treebanks from constituencies to labeled dependencies (see Figure 1) using the Stanford converter, which produces 46 types of labeled dependencies1 (de Marneffe et al. [sent-59, score-0.286]

29 We evaluate on both unlabeled (UAS) and labeled dependency accuracy Additionally, we use a set of 2 million questions collected from Internet search queries as unlabeled target domain data. [sent-61, score-0.842]

30 540987 Table 1: Parsing accuracies for parsers trained on newswire data and evaluated on newswire and question test sets. [sent-93, score-0.608]

31 similar in style to the questions in the QuestionBank: (i) the queries must start with an English function word that can be used to start a question (what, who when, how, why, can, does, etc. [sent-94, score-0.324]

32 2 Parsers We use multiple publicly available parsers, as well as our own implementation of a deterministic shiftreduce parser in our experiments. [sent-97, score-0.466]

33 The dependency parsers that we compare are the deterministic shift-reduce MaltParser (Nivre et al. [sent-98, score-0.636]

34 Our shiftreduce parser is a re-implementation of the MaltParser, using a standard set of features and a linear kernel SVM for classification. [sent-101, score-0.259]

35 We also train and evaluate the generative lexicalized parser of Charniak (2000) on its own, as well as in combination with the discriminative reranker of Charniak and Johnson (2005). [sent-102, score-0.241]

36 To facilitate comparisons between constituency and dependency parsers, we convert the output of the constituency parsers to labeled dependencies using the same procedure that is applied to the treebanks. [sent-108, score-1.007]

37 While the constituency parsers used in our experiments view part-of-speech (POS) tagging as an integral part of parsing, the dependency parsers require the input to be tagged with a separate POS tagger. [sent-110, score-0.96]

38 Tagger and parser are always trained on the same data. [sent-112, score-0.202]

39 1 No Labeled Target Domain Data We first trained all parsers on the WSJ training set and evaluated their performance on the two domain specific evaluation sets (newswire and questions). [sent-121, score-0.412]

40 As can be seen in the left columns of Table 1, all parsers perform very well on the WSJ development set. [sent-122, score-0.356]

41 , 2005) that constituency parsers are more accurate at producing dependencies than dependency parsers (at least when the dependencies were produced by a deterministic transformation of a constituency treebank, as is the case here). [sent-125, score-1.515]

42 80 0 Table 2: Parsing accuracies for parsers trained on newswire and question data and evaluated on a question test set. [sent-156, score-0.637]

43 might have expected, the accuracies are significantly lower, however, the drop for some of the parsers is shocking. [sent-157, score-0.526]

44 Most notably, the deterministic shiftreduce parsers lose almost 25% (absolute) on labeled accuracies, while the latent variable parsers lose around 12%. [sent-158, score-1.239]

45 3 Note also that even with gold POS tags, LAS is below 70% for our deterministic shift-reduce parser, suggesting that the drop in accuracy is primarily due to a syntactic shift rather than a lexical shift. [sent-159, score-0.256]

46 These low accuracies are especially disturbing when one considers that the average question in the evaluation set is only nine words long and therefore potentially much less ambiguous than WSJ sentences. [sent-160, score-0.214]

47 Overall, the dependency parsers seem to suffer more from the domain change than the constituency parsers. [sent-162, score-0.753]

48 oTfh thise eis f not a Sli →mit NatPio nV oPf i dne a- pendency parsers in general. [sent-166, score-0.326]

49 Looking at the constituency parsers, we observe 3The difference between our shift-reduce parser and the MaltParser are due to small differences in the feature sets. [sent-170, score-0.407]

50 708 that the lexicalized (reranking) parser of Charniak and Johnson (2005) loses more than the latent variable approach of Petrov et al. [sent-171, score-0.414]

51 Intuitively speaking, some of the latent variables seem to get allocated for modeling the few questions present in the training data, while the lexicalization contexts are not able to distinguish between declarative sentences and questions. [sent-176, score-0.374]

52 When the training and test data are processed this way, the lexicalized parser loses 1. [sent-179, score-0.285]

53 5% F1, while the latent variable parser loses only 0. [sent-180, score-0.375]

54 In the second experiment, we removed all questions from the WSJ training set and retrained both parsers. [sent-183, score-0.232]

55 The lexicalized parser came out ahead in this experiment,4 confirming our hypothesis that the latent variable model is better able to pick up the small amount of relevant evidence that is present in the WSJ training data (rather than being systematically 4The F1 scores were 52. [sent-185, score-0.37]

56 We now consider a situation where a small amount of labeled data (2,000 manually parsed sentences) from the domain of interest is available for training. [sent-192, score-0.246]

57 As Table 2 shows (left columns), even a modest amount of labeled data from the target domain can significantly boost parsing performance, giving double-digit improvements in some cases. [sent-195, score-0.328]

58 While not shown in the table, the parsing accuracies on the WSJ development set where largely unaffected by the additional training data. [sent-196, score-0.237]

59 The parsing accuracies of these domain-specific models are shown in the right columns of Table 2, and are significantly lower than those of models trained on the concatenated training sets. [sent-198, score-0.267]

60 They are often times even lower than the results of parsers trained exclusively on the WSJ, indicating that 2,000 sentences are not sufficient to train accurate parsers, even for quite narrow domains. [sent-199, score-0.369]

61 4 Uptraining for Domain-Adaptation The results in the previous section suggest that parsers without global constraints have difficulties dealing with the syntactic differences between declarative sentences and questions. [sent-200, score-0.362]

62 , 2006), we propose to use automatically labeled target domain data to learn the target domain distribution directly. [sent-206, score-0.366]

63 Self-training The idea of training parsers on their own output has been around for as long as there have been statistical parsers, but typically does not work well at all (Charniak, 1997). [sent-209, score-0.326]

64 (2003) present co-training procedures for parsers and taggers respectively, which are effective when only very little labeled data is available. [sent-212, score-0.444]

65 (2006a) were the first to improve a state-of-the-art constituency parsing system by utilizing unlabeled data for self-training. [sent-214, score-0.409]

66 In subsequent work, they show that the same idea can be used for domain adaptation if the unlabeled data is chosen accordingly (McClosky et al. [sent-215, score-0.245]

67 Sagae and Tsujii (2007) co-train two dependency parsers by adding automatically parsed sentences for which the parsers agree to the training data. [sent-217, score-0.797]

68 performance of the best parser, we want to build a more efficient parser that comes close to the accuracy of the best parser. [sent-241, score-0.202]

69 To do this, we parse the unlabeled data with our most accurate parser and generate noisy, but fairly accurate labels (parse trees) for the unlabeled data. [sent-242, score-0.554]

70 We refer to the parser used for producing the automatic labels as the base parser (unless otherwise noted, we used the latent variable parser of Petrov et al. [sent-243, score-0.803]

71 Because the most accurate base parsers are constituency parsers, we need to convert the parse trees to dependencies using the Stanford converter (see Section 2). [sent-245, score-0.7]

72 The automatically parsed sentences are appended to the labeled training data, and the shift-reduce parser (and the part-of-speech tagger) are trained on this new training set. [sent-246, score-0.362]

73 2 Varying amounts of unlabeled data Figure 2 shows the efficacy of uptraining as a function of the size of the unlabeled data. [sent-249, score-0.676]

74 Both labeled (LAS) and unlabeled accuracies (UAS) improve sharply when automatically parsed sentences from the target domain are added to the training data, and level off after 100,000 sentences. [sent-250, score-0.553]

75 3 Varying the base parser Table 3 then compares uptraining on the output of different base parsers to pure self-training. [sent-259, score-1.044]

76 In these experiments, the same set of 500,000 questions was parsed by different base parsers. [sent-260, score-0.312]

77 The automatic parses were then added to the labeled training data and the parser was retrained. [sent-261, score-0.32]

78 As the results show, self-training provides only modest improvements of less than 2%, while uptraining gives double-digit improvements in some cases. [sent-262, score-0.44]

79 Interestingly, there seems to be no substantial difference between uptraining on the output of a single latent variable parser (Petrov et al. [sent-263, score-0.771]

80 It appears that the roughly 1% accuracy difference between the two base parsers is not important for uptraining. [sent-265, score-0.364]

81 4 POS-less parsing Our uptraining procedure improves parse quality on out-of-domain data to the level of in-domain accuracy. [sent-267, score-0.526]

82 , 1992) to produce a deterministic hierarchical clustering of our input vocabulary. [sent-276, score-0.207]

83 53 Table 4: Parsing accuracies of uptrained parsers with and without part-of-speech tags and word cluster features. [sent-282, score-0.605]

84 This change makes our parser completely deterministic and enables us to process sentences in a single left-to-right pass. [sent-285, score-0.409]

85 5 Error Analysis To provide a better understanding of the challenges involved in parsing questions, we analyzed the errors made by our WSJ-trained shift-reduce parser and also compared them to the errors that are left after uptraining. [sent-286, score-0.356]

86 The parsing accuracies of our shift-reduce parser using gold POS tags are listed in the last rows of Tables 1 and 2. [sent-312, score-0.479]

87 Even with gold POS tags, the deterministic shift-reduce parser falls short of the accuracies of the constituency parsers (with automatic tags), presumably because the shift-reduce model is making only local decisions and is lacking the global constraints provided by the context-free grammar. [sent-313, score-1.091]

88 ” should be enzymes, but the WSJtrained parser labels “What” as the nsubj, which makes sense in a statement but not in a question. [sent-324, score-0.232]

89 root compl nsubj ccompdet nnnn ROOT WP NNS VBD What films featured DT NN NNP NNP . [sent-338, score-0.253]

90 rootcompl amodnsubjccomp nnnsubjp ROOT WRB JJ How many NNS people VBD NNP did Randy Figure 3: Example questions from the QuestionBank development NNP Craft VB . [sent-344, score-0.289]

91 the WSJ model often makes this mistake and therefore the precision is much lower when it doesn’t see more questions in the training data. [sent-347, score-0.232]

92 As a consequence, the WSJ model cannot predict this label in questions very well. [sent-358, score-0.232]

93 6 Conclusions We presented a method for domain adaptation of deterministic shift-reduce parsers. [sent-367, score-0.334]

94 We evaluated multiple state-of-the-art parsers on a question corpus and showed that parsing accuracies degrade substantially on this out-of-domain task. [sent-368, score-0.626]

95 Most notably, deterministic shift-reduce parsers have difficulty dealing with the modified word order and lose more than 20% in accuracy. [sent-369, score-0.571]

96 We then proposed a simple, yet very effective uptraining method for domainadaptation. [sent-370, score-0.44]

97 In a nutshell, we trained a deterministic shift-reduce parser on the output of a more accurate, but slower parser. [sent-371, score-0.439]

98 Uptraining with large amounts of unlabeled data gives similar improvements as having access to 2,000 labeled sentences from the target domain. [sent-372, score-0.274]

99 With 2,000 labeled questions and a large amount of unlabeled questions, uptraining is able to close the gap between in-domain and out-of-domain accuracy. [sent-373, score-0.908]

100 Dependency parsing and domain adaptation with lr models and parser ensembles. [sent-581, score-0.415]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('uptraining', 0.44), ('parsers', 0.326), ('questionbank', 0.278), ('wsj', 0.233), ('questions', 0.232), ('deterministic', 0.207), ('constituency', 0.205), ('parser', 0.202), ('accuracies', 0.151), ('nnp', 0.135), ('unlabeled', 0.118), ('labeled', 0.118), ('dependency', 0.103), ('petrov', 0.095), ('nivre', 0.089), ('parsing', 0.086), ('domain', 0.086), ('nsubj', 0.082), ('mcdonald', 0.079), ('osbourne', 0.076), ('profession', 0.076), ('wrb', 0.076), ('latent', 0.073), ('charniak', 0.073), ('las', 0.068), ('pos', 0.067), ('maltparser', 0.065), ('question', 0.063), ('root', 0.062), ('mcclosky', 0.06), ('kill', 0.059), ('uas', 0.059), ('attr', 0.057), ('craft', 0.057), ('mstparser', 0.057), ('nnnsubjp', 0.057), ('ozzy', 0.057), ('shiftreduce', 0.057), ('uptrained', 0.057), ('wdt', 0.057), ('koo', 0.057), ('variable', 0.056), ('stanford', 0.055), ('vbd', 0.051), ('dependencies', 0.05), ('drop', 0.049), ('born', 0.049), ('randy', 0.049), ('qb', 0.049), ('sagae', 0.048), ('vbz', 0.048), ('nns', 0.046), ('loses', 0.044), ('amod', 0.044), ('wp', 0.043), ('accurate', 0.043), ('parsed', 0.042), ('adaptation', 0.041), ('marneffe', 0.041), ('oldest', 0.041), ('tags', 0.04), ('lexicalized', 0.039), ('abbreviated', 0.038), ('compl', 0.038), ('converter', 0.038), ('dobj', 0.038), ('doyle', 0.038), ('ental', 0.038), ('films', 0.038), ('jjs', 0.038), ('manufacture', 0.038), ('nmivcdreo', 0.038), ('peugeot', 0.038), ('popeye', 0.038), ('sbarq', 0.038), ('lose', 0.038), ('target', 0.038), ('base', 0.038), ('carreras', 0.037), ('declarative', 0.036), ('dep', 0.036), ('newswire', 0.034), ('errors', 0.034), ('enzymes', 0.033), ('osn', 0.033), ('tnt', 0.033), ('serving', 0.033), ('featured', 0.033), ('seem', 0.033), ('nn', 0.032), ('cluster', 0.031), ('dt', 0.031), ('labels', 0.03), ('slower', 0.03), ('columns', 0.03), ('tagger', 0.03), ('cubic', 0.03), ('whnp', 0.03), ('queries', 0.029), ('doesn', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

2 0.16607434 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

Author: Ekaterina Buyko ; Udo Hahn

Abstract: In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. Based on empirical findings from our study we were able to achieve the performance of 57.2% F-score on the development data set of the BioNLP Shared Task 2009.

3 0.16496497 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

4 0.15043038 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

Author: Roi Reichart ; Ari Rappoport

Abstract: We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm’s effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

5 0.14941168 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira

Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.

6 0.14029106 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

7 0.14013091 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

8 0.12369016 51 emnlp-2010-Function-Based Question Classification for General QA

9 0.11426165 74 emnlp-2010-Learning the Relative Usefulness of Questions in Community QA

10 0.11369541 114 emnlp-2010-Unsupervised Parse Selection for HPSG

11 0.11251215 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

12 0.11003923 104 emnlp-2010-The Necessity of Combining Adaptation Methods

13 0.084594987 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

14 0.083301909 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

15 0.079744898 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

16 0.079526164 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

17 0.075583756 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

18 0.074740134 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

19 0.067737274 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

20 0.066595346 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.232), (1, 0.158), (2, 0.228), (3, 0.072), (4, 0.0), (5, 0.187), (6, 0.129), (7, 0.191), (8, 0.105), (9, 0.079), (10, 0.054), (11, 0.031), (12, 0.187), (13, 0.205), (14, 0.01), (15, 0.175), (16, -0.054), (17, 0.066), (18, 0.088), (19, 0.023), (20, 0.1), (21, 0.097), (22, -0.069), (23, 0.015), (24, 0.22), (25, -0.061), (26, -0.033), (27, 0.086), (28, 0.105), (29, 0.02), (30, -0.024), (31, 0.015), (32, 0.046), (33, 0.109), (34, 0.012), (35, -0.014), (36, -0.054), (37, 0.033), (38, -0.105), (39, 0.043), (40, -0.013), (41, -0.015), (42, -0.023), (43, 0.0), (44, -0.069), (45, -0.039), (46, -0.024), (47, 0.034), (48, 0.006), (49, 0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9722749 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

2 0.73978913 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

Author: Roi Reichart ; Ari Rappoport

Abstract: We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm’s effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

3 0.55153435 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: Syntactic consistency is the preference to reuse a syntactic construction shortly after its appearance in a discourse. We present an analysis of the WSJ portion of the Penn Treebank, and show that syntactic consistency is pervasive across productions with various lefthand side nonterminals. Then, we implement a reranking constituent parser that makes use of extra-sentential context in its feature set. Using a linear-chain conditional random field, we improve parsing accuracy over the generative baseline parser on the Penn Treebank WSJ corpus, rivalling a similar model that does not make use of context. We show that the context-aware and the context-ignorant rerankers perform well on different subsets of the evaluation data, suggesting a combined approach would provide further improvement. We also compare parses made by models, and suggest that context can be useful for parsing by capturing structural dependencies between sentences as opposed to lexically governed dependencies.

4 0.51690036 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

Author: Ekaterina Buyko ; Udo Hahn

Abstract: In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. Based on empirical findings from our study we were able to achieve the performance of 57.2% F-score on the development data set of the BioNLP Shared Task 2009.

5 0.48566046 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

6 0.46589741 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

7 0.46449617 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

8 0.44931772 114 emnlp-2010-Unsupervised Parse Selection for HPSG

9 0.44014227 74 emnlp-2010-Learning the Relative Usefulness of Questions in Community QA

10 0.40806708 51 emnlp-2010-Function-Based Question Classification for General QA

11 0.38619614 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

12 0.37507871 104 emnlp-2010-The Necessity of Combining Adaptation Methods

13 0.30159912 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

14 0.29197824 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

15 0.2827116 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

16 0.27744734 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

17 0.27446008 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

18 0.27210274 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

19 0.26824579 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

20 0.26020649 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.01), (10, 0.039), (12, 0.03), (14, 0.012), (29, 0.138), (30, 0.015), (32, 0.012), (52, 0.02), (56, 0.06), (62, 0.031), (66, 0.139), (72, 0.041), (76, 0.042), (77, 0.011), (79, 0.017), (87, 0.022), (89, 0.017), (91, 0.278)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7831232 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

2 0.59696078 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

Author: Samuel Brody

Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.

3 0.59609663 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.

4 0.59596825 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

Author: Samidh Chatterjee ; Nicola Cancedda

Abstract: Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which the actual optimization is performed. Recent work has been done to extend the algorithm to use the entire translation lattice built by the decoder, instead of N-best lists. We propose here a third, intermediate way, consisting in growing the translation pool using samples randomly drawn from the translation lattice. We empirically measure a systematic im- provement in the BLEU scores compared to training using N-best lists, without suffering the increase in computational complexity associated with operating with the whole lattice.

5 0.59414393 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

6 0.59218884 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

7 0.59215599 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

8 0.58867395 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

9 0.58783644 84 emnlp-2010-NLP on Spoken Documents Without ASR

10 0.58725899 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

11 0.58681369 114 emnlp-2010-Unsupervised Parse Selection for HPSG

12 0.58542895 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

13 0.58542603 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

14 0.58527386 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

15 0.58510751 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

16 0.58458722 63 emnlp-2010-Improving Translation via Targeted Paraphrasing

17 0.58436275 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

18 0.58318985 109 emnlp-2010-Translingual Document Representations from Discriminative Projections

19 0.58190471 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

20 0.58147174 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media