acl acl2010 acl2010-241 knowledge-graph by maker-knowledge-mining

241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

Source: pdf

Author: Martin Haulrich

Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Transition-based parsing with Confidence-Weighted Classification Martin Haulrich Dept. [sent-1, score-0.143]

2 dk Abstract We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. [sent-4, score-0.489]

3 We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification. [sent-5, score-0.486]

4 1 Introduction There has been a lot of work on data-driven dependency parsing. [sent-6, score-0.119]

5 , 2005a; McDonald and Pereira, 2006) and for transition-based parsing Support-Vector Machines (Hall et al. [sent-16, score-0.143]

6 (2008) introduce a new approach to margin-based online learning called confidenceweighted classification (CW) and show that the performance of this approach is comparable to that of Support-Vector Machines. [sent-20, score-0.397]

7 In this work we use confidence-weighted classification with transition-based parsing and show that this leads to results comparable to the state-of-the-art results obtained using SVMs. [sent-21, score-0.291]

8 We also compare training time and the effect of pruning when using confidence-weighted learning. [sent-22, score-0.178]

9 2 Transition-based parsing Transition-based parsing builds on the idea that parsing can be viewed as a sequence of transitions between states. [sent-23, score-0.544]

10 A classifier The focus here is on the classifier but we will briefly describe the parsing algorithm in order to understand the classification task better. [sent-27, score-0.344]

11 The parsing algorithm consists of two components, a transition system and an oracle. [sent-28, score-0.198]

12 Nivre (2008) defines a transition system S = (C, T, cs, Ct) in the following way: 1. [sent-29, score-0.055]

13 C is a set of configurations, each of which contains a buffer β of (remaining) nodes and a set A of dependency arcs, 2. [sent-30, score-0.277]

14 A transition sequence for a sentence x in S is a sequence C0,m = (c0, c1 . [sent-40, score-0.055]

15 for every i(1 ≤ i≤ m)ci = t(ci−1) for some fto ∈ eTve The oracle is used during training to determine a transition sequence that leads to the correct parse. [sent-46, score-0.22]

16 The job of the classifier is to ’imitate’ the oracle, i. [sent-47, score-0.065]

17 to try to always pick the transitions that 55 UppsaPlra,o Scewe didnegn,s o 1f3 t Jhuely AC 20L10 20. [sent-49, score-0.155]

18 The information given to the classifier is the current configuration. [sent-52, score-0.065]

19 Therefore the training data for the classifier consists of a number of configurations and the transitions the oracle chose with these configurations. [sent-53, score-0.36]

20 σ is a stack of tokens i≤ k (for some k ≤ n), 2. [sent-59, score-0.093]

21 A is a set of dependency arcs such that G = (0, 1, . [sent-61, score-0.166]

22 (Nivre, 2008) In the work presented here we use the NivreEager algorithm which has four transitions: Shift Push the token at the head of the buffer onto the stack. [sent-65, score-0.336]

23 Left-Arcl Add to the analysis an arc with label l from the token at the head ofthe buffer to the token on the top of the stack, and push the buffer-token onto the stack. [sent-67, score-0.524]

24 Right-Arcl Add to the analysis an arc with label lfrom the token on the top of the stack to the token at the head of the buffer, and pop the stack. [sent-68, score-0.427]

25 1 Classification Transition-based dependency parsing reduces parsing to consecutive multiclass classification. [sent-70, score-0.458]

26 From each configuration one amongst some predefined number of transitions has to be chosen. [sent-71, score-0.115]

27 This means that any classifier can be plugged into the system. [sent-72, score-0.065]

28 The training instances are created by the oracle so the training is offline. [sent-73, score-0.172]

29 So even though we use online learners in the experiments these are used in a batch setting. [sent-74, score-0.121]

30 The best results have been achieved using Support-Vector Machines placing the MaltParser very high in both the CoNNL shared tasks on dependency parsing in 2006 and 2007 (Buchholz and Marsi, 2006; Nivre et al. [sent-75, score-0.262]

31 The standard setting in the MaltParser is to use a 2nd- degree polynomial kernel with the SVM. [sent-78, score-0.126]

32 (2008) introduce confidenceweighted linear classifiers which are onlineclassifiers that maintain a confidence parameter for each weight and uses this to control how to change the weights in each update. [sent-80, score-0.476]

33 A problem with online algorithms is that because they have no memory of previously seen examples they do not know if a given weight has been updated many times or few times. [sent-81, score-0.212]

34 If a weight has been updated many times the current estimation of the weight is probably relatively good and therefore should not be changed too much. [sent-82, score-0.129]

35 On the other hand if it has never been updated before the estimation is probably very bad. [sent-83, score-0.053]

36 CW classification deals with this by having a confidence-parameter for each weight, modeled by a Gaussian distribution, and this parameter is used to make more aggressive updates on weights with lower confidence (Dredze et al. [sent-84, score-0.147]

37 The classifiers also use Passive-Aggressive updates (Crammer et al. [sent-86, score-0.211]

38 , 2006) to try to maximize the margin between positive and negative training instances. [sent-87, score-0.087]

39 CW classifiers are online-algorithms and are therefore fast to train, and it is not necessary to keep all training examples in memory. [sent-88, score-0.217]

40 (2009) extend the approach to multiclass classification and show that also in this setting the classifiers often outperform SVMs. [sent-92, score-0.294]

41 (2008) present different updaterules for CW classification and show that the ones based on standard deviation rather than variance yield the best results. [sent-96, score-0.071]

42 We have integrated confidenceweighted, perceptron and MIRA classifiers into the code. [sent-101, score-0.269]

43 The code for the online classifiers has 1We have used version 1. [sent-102, score-0.291]

44 3 Features The standard setting for the MaltParser is to use SVMs with polynomial kernels, and because of this it uses a relatively small number of features. [sent-110, score-0.077]

45 In most of our experiments the default feature set of MaltParser consisting of 14 features has been used. [sent-111, score-0.202]

46 When using a linear-classifier without a kernel we need to extend the feature set in order to achieve good results. [sent-112, score-0.13]

47 We have done this very uncritically by adding all pair wise combinations of all features. [sent-113, score-0.245]

48 This leads to 91 additional features when using the standard 14 features. [sent-114, score-0.106]

49 1 Online classifiers We compare CW-classifiers with other online algorithms for linear classification. [sent-117, score-0.356]

50 We compare with perceptron (Rosenblatt, 1958) and MIRA (Crammer et al. [sent-118, score-0.099]

51 With both these classifiers we use the same top-1 approach as with the CW-classifers and also averaging which has been shown to alleviate overfitting (Collins, 2002). [sent-120, score-0.17]

52 Table 2 shows Labeled Attachment Score obtained with the three online classifiers. [sent-121, score-0.121]

53 (2009) and show that confidence-weighted classifiers are better than both perceptron and MIRA. [sent-124, score-0.269]

54 2 Training and parsing time The training time of the CW-classifiers depends on the number of iterations used, and this of course affects the accuracy of the parser. [sent-126, score-0.272]

55 Figure 1 shows Labeled Attachment Score as a function of the number of iterations used in training. [sent-127, score-0.082]

56 The horizontal line shows the LAS obtained with SVM. [sent-128, score-0.051]

57 Iterations Figure 1: LAS as a function of number of training iterations on Danish data. [sent-129, score-0.129]

58 The dotted horizontal line shows the performance of the parser trained with SVM. [sent-130, score-0.138]

59 We see that after 4 iterations the CW-classifier has the best performance for the data set (Danish) used in this experiment. [sent-131, score-0.082]

60 Table 1compares training time (10 iterations) and parsing time of a parser using a CW-classifiers and a parser using SVM on the same data set. [sent-133, score-0.274]

61 5CmWmin Table 1: Training and parsing time on Danish data. [sent-137, score-0.143]

62 3 Pruning features Because we explicitly represent pair wise combinations of all of the original features we get an extremely high number of binary features. [sent-139, score-0.293]

63 For some of the larger data sets, the number of features is so big that we cannot hold the weight-vector in memory. [sent-140, score-0.124]

64 57 Table 2: LAS on development data for three online classifers, CW-classifiers with manual feature se- lection and SVM. [sent-143, score-0.279]

65 Statistical significance is measuered between CW-classifiers without feature selection and SVMs. [sent-144, score-0.135]

66 To solve this problem we have tried to use pruning to remove the features occurring fewest times in the training data. [sent-145, score-0.292]

67 If a feature occurs fewer times than a given cutofflimit the feature is not included. [sent-146, score-0.162]

68 This goes against the idea of CW classifiers which are exactly developed so that rare features can be used. [sent-147, score-0.236]

69 Figure 2 shows the labeled attachment score as a function of the cutoff limit on the Danish data. [sent-149, score-0.174]

70 Cutoff limit Figure 2: LAS as a function of the cutoff limit when pruning rare features. [sent-150, score-0.28]

71 The dotted line shows the number of features left after pruning. [sent-151, score-0.111]

72 4 Manual feature selection Instead of pruning the features we tried manually removing some of the pair wise feature combina- tions. [sent-153, score-0.543]

73 We removed some of the combinations that lead to the most extra features, which is especially the case with combinations of lexical features. [sent-154, score-0.158]

74 In the extended default feature set for instance we removed all combinations of lexical features except the combination of the word form of the token at the top of the stack and of the word form of the token at the head of the buffer. [sent-155, score-0.616]

75 For comparison we have included the results from using the standard classifier in the MaltParser, i. [sent-159, score-0.065]

76 6 Results with optimization The results presented above are suboptimal for the SVMs because default parameters have been used for these, and optimizing these can improve ac3In all tables statistical significance is marked with †. [sent-165, score-0.055]

77 Manuel feature selection has been used for languages marked with an *. [sent-170, score-0.135]

78 In CoNNL-X both the hyper parameters for the SVMs and the features have been optimized. [sent-173, score-0.14]

79 Here we do not do feature selection but use the features used by the MaltParser in CoNNL-X4. [sent-174, score-0.201]

80 The only hyper parameter for CW classification is the number of iterations. [sent-175, score-0.145]

81 Although the manual feature selection has been shown to decrease accuracy this has been used for some languages to reduce the size of the model. [sent-177, score-0.175]

82 We see that even though the feature set used are optimized for the SVMs there are not big differences between the parses that use SVMs and the parsers that use CW classification. [sent-179, score-0.221]

83 In general though the parsers with SVMs does better than the parsers with CW classifiers and the difference seems to be biggest on the languages where we did manual feature selection. [sent-180, score-0.422]

84 6 Conclusion We have shown that using confidence-weighted classifiers with transition-based dependency parsing yields results comparable with the state-of-theart results achieved with Support Vector Machines - with faster training and parsing times. [sent-181, score-0.707]

85 Currently we need a very high number of features to achieve these results, and we have shown that pruning this big feature set uncritically hurts performance of 4Available at conl lx / http : / /maltpars er . [sent-182, score-0.572]

86 org/ / conl l the confidence-weighted classifiers. [sent-183, score-0.084]

87 7 Future work Currently the biggest challenge in the approach outlined here is the very high number of features needed to achieve good results. [sent-184, score-0.117]

88 A possible solution is to use kernels with confidence-weighted classification in the same way they are used with the SVMs. [sent-185, score-0.116]

89 Another possibility is to extend the feature set in a more critical way than what is done now. [sent-186, score-0.081]

90 This feature does not convey any information that the POS-tagfeature itself does not. [sent-188, score-0.081]

91 All in all a lot of non-informative features are added as things are now. [sent-190, score-0.066]

92 We have not yet tried to use automatic features selection to select only the combinations that increase accuracy. [sent-191, score-0.247]

93 We will also try to do feature selection on a more general level as this can boost accuracy a lot. [sent-192, score-0.175]

94 The results in table 3 are obtained with the features optimized for the SVMs. [sent-193, score-0.108]

95 These are not necessarily the optimal features for the CW-classifiers. [sent-194, score-0.066]

96 Another comparison we would like to do is with linear SVMs. [sent-195, score-0.065]

97 Unlike the polynomial kernel SVMs used as default in the MaltParser linear SVMs can be trained in linear time (Joachims, 2006). [sent-196, score-0.311]

98 Trying to use the same extended feature set we use with the CW-classifiers with a linear SVM would provide an interesting comparison. [sent-197, score-0.146]

99 Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. [sent-206, score-0.146]

100 Malteval: An evaluation and visualization tool for dependency parsing. [sent-266, score-0.119]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('maltparser', 0.33), ('svms', 0.243), ('crammer', 0.233), ('nivre', 0.214), ('cw', 0.211), ('dredze', 0.192), ('classifiers', 0.17), ('confidenceweighted', 0.168), ('buffer', 0.158), ('parsing', 0.143), ('pruning', 0.131), ('online', 0.121), ('joakim', 0.119), ('dependency', 0.119), ('mcdonald', 0.119), ('jens', 0.117), ('transitions', 0.115), ('koby', 0.114), ('hall', 0.111), ('danish', 0.106), ('token', 0.102), ('perceptron', 0.099), ('nilsson', 0.096), ('las', 0.096), ('stack', 0.093), ('mira', 0.09), ('conl', 0.084), ('malteval', 0.084), ('uncritically', 0.084), ('iterations', 0.082), ('wise', 0.082), ('johan', 0.081), ('feature', 0.081), ('fernando', 0.08), ('combinations', 0.079), ('buchholz', 0.079), ('ryan', 0.078), ('oracle', 0.078), ('cutoff', 0.077), ('polynomial', 0.077), ('hyper', 0.074), ('classification', 0.071), ('svm', 0.069), ('hurts', 0.068), ('features', 0.066), ('classifier', 0.065), ('linear', 0.065), ('attachment', 0.061), ('big', 0.058), ('marsi', 0.057), ('machines', 0.056), ('configurations', 0.055), ('default', 0.055), ('transition', 0.055), ('selection', 0.054), ('updated', 0.053), ('multiclass', 0.053), ('biggest', 0.051), ('horizontal', 0.051), ('cm', 0.051), ('pop', 0.049), ('kernel', 0.049), ('tenth', 0.048), ('tried', 0.048), ('faster', 0.048), ('training', 0.047), ('arcs', 0.047), ('ct', 0.046), ('dotted', 0.045), ('kernels', 0.045), ('york', 0.045), ('arc', 0.043), ('push', 0.043), ('cs', 0.042), ('optimized', 0.042), ('parser', 0.042), ('updates', 0.041), ('try', 0.04), ('manual', 0.04), ('leads', 0.04), ('parsers', 0.04), ('head', 0.038), ('onto', 0.038), ('weight', 0.038), ('smallest', 0.037), ('ci', 0.037), ('ofer', 0.037), ('curacy', 0.037), ('mwh', 0.037), ('eryi', 0.037), ('lection', 0.037), ('aell', 0.037), ('isr', 0.037), ('morocco', 0.037), ('transitionbased', 0.037), ('yoshua', 0.037), ('comparable', 0.037), ('limit', 0.036), ('wn', 0.036), ('confidence', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

Author: Martin Haulrich

2 0.2145578 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii

Abstract: Nivre’s method was improved by enhancing deterministic dependency parsing through application of a tree-based model. The model considers all words necessary for selection of parsing actions by including words in the form of trees. It chooses the most probable head candidate from among the trees and uses this candidate to select a parsing action. In an evaluation experiment using the Penn Treebank (WSJ section), the proposed model achieved higher accuracy than did previous deterministic models. Although the proposed model’s worst-case time complexity is O(n2), the experimental results demonstrated an average pars- ing time not much slower than O(n).

3 0.17706279 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

Author: Carlos Gomez-Rodriguez ; Joakim Nivre

Abstract: Finding a class of structures that is rich enough for adequate linguistic representation yet restricted enough for efficient computational processing is an important problem for dependency parsing. In this paper, we present a transition system for 2-planar dependency trees trees that can be decomposed into at most two planar graphs and show that it can be used to implement a classifier-based parser that runs in linear time and outperforms a stateof-the-art transition-based parser on four data sets from the CoNLL-X shared task. In addition, we present an efficient method – – for determining whether an arbitrary tree is 2-planar and show that 99% or more of the trees in existing treebanks are 2-planar.

4 0.16278379 99 acl-2010-Efficient Third-Order Dependency Parsers

Author: Terry Koo ; Michael Collins

Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.

5 0.14844255 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.

6 0.14482853 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

7 0.11888769 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

8 0.11854349 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

9 0.093432315 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing

10 0.086776823 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

11 0.081408314 133 acl-2010-Hierarchical Search for Word Alignment

12 0.077557832 130 acl-2010-Hard Constraints for Grammatical Function Labelling

13 0.076881118 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

14 0.076811381 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

15 0.075521909 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

16 0.073847041 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

17 0.069676906 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms

18 0.066332243 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

19 0.065839216 114 acl-2010-Faster Parsing by Supertagger Adaptation

20 0.065744638 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.201), (1, -0.029), (2, 0.066), (3, 0.034), (4, -0.066), (5, -0.075), (6, 0.056), (7, 0.012), (8, -0.038), (9, 0.274), (10, -0.244), (11, 0.049), (12, -0.056), (13, 0.132), (14, 0.1), (15, -0.024), (16, -0.015), (17, -0.019), (18, 0.004), (19, -0.049), (20, 0.042), (21, 0.001), (22, 0.024), (23, -0.08), (24, -0.012), (25, -0.036), (26, 0.062), (27, 0.046), (28, -0.024), (29, 0.101), (30, 0.046), (31, 0.042), (32, 0.011), (33, 0.106), (34, -0.045), (35, 0.062), (36, -0.05), (37, 0.006), (38, -0.003), (39, 0.029), (40, 0.005), (41, 0.051), (42, 0.015), (43, -0.069), (44, 0.009), (45, 0.063), (46, -0.033), (47, 0.029), (48, -0.075), (49, -0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95782715 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

Author: Martin Haulrich

2 0.82829195 99 acl-2010-Efficient Third-Order Dependency Parsers

Author: Terry Koo ; Michael Collins

3 0.82293183 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii

4 0.77759093 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

5 0.75791538 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing

Author: Manabu Sassano ; Sadao Kurohashi

Abstract: We investigate active learning methods for Japanese dependency parsing. We propose active learning methods of using partial dependency relations in a given sentence for parsing and evaluate their effectiveness empirically. Furthermore, we utilize syntactic constraints of Japanese to obtain more labeled examples from precious labeled ones that annotators give. Experimental results show that our proposed methods improve considerably the learning curve of Japanese dependency parsing. In order to achieve an accuracy of over 88.3%, one of our methods requires only 34.4% of labeled examples as compared to passive learning.

6 0.74765038 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

7 0.72794634 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

8 0.65366387 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

9 0.56859672 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

10 0.47746062 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

11 0.47497264 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

12 0.46820647 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

13 0.45595902 130 acl-2010-Hard Constraints for Grammatical Function Labelling

14 0.45292497 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

15 0.45167601 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

16 0.43207577 256 acl-2010-Vocabulary Choice as an Indicator of Perspective

17 0.41865563 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning

18 0.4185816 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

19 0.41467094 114 acl-2010-Faster Parsing by Supertagger Adaptation

20 0.41029313 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.029), (25, 0.05), (44, 0.016), (59, 0.13), (73, 0.037), (76, 0.382), (78, 0.019), (83, 0.073), (84, 0.019), (98, 0.166)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8173036 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

Author: Martin Haulrich

2 0.79486442 139 acl-2010-Identifying Generic Noun Phrases

Author: Nils Reiter ; Anette Frank

Abstract: This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rulelike knowledge about kinds or events. Therefore, their identification is important for the automatic construction of knowledge bases. In particular, the distinction between generic and non-generic statements is crucial for the correct encoding of generic and instance-level information. Generic expressions have been studied extensively in formal semantics. Building on this work, we explore a corpus-based learning approach for identifying generic NPs, using selections of linguistically motivated features. Our results perform well above the baseline and existing prior work.

3 0.76428771 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

Author: Peng Li ; Jing Jiang ; Yinglin Wang

Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.

4 0.6653105 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Author: Mohit Bansal ; Dan Klein

Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.

5 0.55869687 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

Author: Bharat Ram Ambati

Abstract: Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-ofthe-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi and Czech. 1

6 0.54746687 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

7 0.54667771 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

8 0.54553896 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

9 0.54524612 133 acl-2010-Hierarchical Search for Word Alignment

10 0.54073244 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

11 0.5374102 130 acl-2010-Hard Constraints for Grammatical Function Labelling

12 0.53691053 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

13 0.5368191 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

14 0.53661144 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

15 0.53455687 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

16 0.5333339 162 acl-2010-Learning Common Grammar from Multilingual Corpus

17 0.52911872 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

18 0.52785617 195 acl-2010-Phylogenetic Grammar Induction

19 0.52768081 114 acl-2010-Faster Parsing by Supertagger Adaptation

20 0.52695537 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers