acl acl2012 acl2012-213 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. [sent-2, score-0.714]
2 The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. [sent-3, score-0.22]
3 Based on the dependency language model, we represent a set of features for the parsing model. [sent-4, score-0.457]
4 Finally, the features are efficiently integrated into the parsing model during decoding using beam search. [sent-5, score-0.429]
5 The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data. [sent-10, score-0.187]
6 In the graph-based models, dependency parsing is treated as a structured prediction problem in which the graphs are usually represented as factored structures. [sent-16, score-0.388]
7 How to enrich high-order feature representations without increasing the decoding complexity for graph-based models becomes a very challenging problem in the dependency parsing task. [sent-26, score-0.584]
8 The parsing model searches for the final dependency trees by considering the original scores and the scores of DLM. [sent-35, score-0.458]
9 In our approach, the DLM is built on a large amount of auto-parsed data, which is processed by an original first-order parser (McDonald et al. [sent-36, score-0.187]
10 The DLM-based features can capture the N-gram information of the parent-children structures for the parsing model. [sent-39, score-0.237]
11 Our new parsing model can utilize rich high-order feature representations but without increasing the complexity. [sent-41, score-0.286]
12 In summary, we make the following contributions: • • • We utilize the dependency language model to eWnhea untcilei ethe th graph-based parsing mgeo mdelo. [sent-44, score-0.434]
13 The new parsing model uses the rich high-order fTehaetu nreesw d peafirnsiendg over a v uiseews tohfe large scope radnedr and additional large raw corpus, but without increasing the decoding complexity. [sent-46, score-0.292]
14 Our parser achieves the best accuracy on the COuhrine psaer sdearta a acnhide comparable accuracy yw oitnh tthhee best known systems on the English data. [sent-47, score-0.187]
15 (2008) proposed a dependency language model (DLM) to exploit longdistance word relations for SMT. [sent-53, score-0.22]
16 The N-gram DLM predicts the next child of a head based on the N − 1 pimremdiecdtisat thee previous dch oilfdr ae hne aandd b tahseed dh oenad t hiets Nelf. [sent-54, score-0.251]
17 , xn), where x0 = ROOT and does not depend on any other token in x and each token xi refers to a word. [sent-63, score-0.244]
18 For each xh ∈ H(y), we have a dependency structure Dh = (xLk, . [sent-65, score-0.423]
19 xL1 are the children on the left side from the farthest to the nearest and xR1 . [sent-74, score-0.209]
20 xRm are the children on the right side from the nearest to the farthest. [sent-77, score-0.216]
21 For a dependency tree, we calculate the probability as follows: Y P(y) = xh P(Dh) (3) ∈YH(y) In this paper, we use a linear model to calculate the scores for the parsing models (defined in Section 3. [sent-87, score-0.626]
22 scoreDLM(y) = fDLM · wDLM 3 (4) Parsing with dependency language model In this section, we propose a parsing model which includes the dependency language model by extending the model of McDonald et al. [sent-92, score-0.608]
23 1 Graph-based parsing model The graph-based parsing model aims to search for the maximum spanning tree (MST) in a graph (McDonald et al. [sent-95, score-0.435]
24 We write (xi, xj) ∈ y if there is a dependency in tree y from wor∈d xi to word xj (xi is the head and xj is the dependent). [sent-97, score-0.468]
25 L xet T(Gx) r bee hthee nsoetd eosf ianll Vthe subgraphs of Gx that are valid dependency trees (McDonald and Nivre, 2007) for sentence x. [sent-105, score-0.22]
26 The formulation defines the score of a dependency tree y ∈ T(Gx) to be the sum of the edge scores, = − s(x,y) = Xscore(w,x,g) (5) Xg∈y where g is a spanning subgraph of y. [sent-106, score-0.319]
27 g can be a single dependency or adjacent dependencies. [sent-107, score-0.22]
28 The parsing model finds a maximum spanning tree (MST), which is the highest scoring tree in T(Gx). [sent-111, score-0.312]
29 Then for a given sentence x, we find yD∗LM, After adding the DLM scores, the new parsing model can capture richer information. [sent-114, score-0.168]
30 In the original first-order parsing model, we only utilize the information of single arc (xh, xL(k−1)) for xL(k−1) as shown in Figure 1-(a). [sent-116, score-0.214]
31 If we use 3-gram DLM, we can utilize the additional information of the two previous children (nearer to xh than xL(k−1)): xL(k−2) and xL(k−3) as shown in Figure 1-(b). [sent-117, score-0.352]
32 Figure 1: Adding the DLM scores to the parsing model 3. [sent-118, score-0.203]
33 For each child xch on the left side, we have PLc(xch|HIS), where HIS refers to the N − 1 immediate previous right c HhIiSlrderfeenr san tdo thheead N xh. [sent-126, score-0.48]
34 The feature templates are outlined in Table 1, where TYPE refers to one of the types:PL or PR, h pos refers to the part-of-speech tag of xh, h word refers to the lexical form of xh, ch pos refers to the part-ofspeech tag of xch, and ch word refers to the lexical form of xch. [sent-131, score-1.424]
35 1 Rescoring We add the DLM-based features into the decoding procedure by using the rescoring technique used in (Shen et al. [sent-135, score-0.299]
36 We can use an original parser to produce the K-best list. [sent-137, score-0.187]
37 However, because the performance of this method is restricted to the K-best list, we may have to set K to a high number in order to find the best parsing tree (with DLM) or a tree acceptably close to the best (Shen et al. [sent-139, score-0.258]
38 2 Intersect Then, we add the DLM-based features in the decoding algorithm directly. [sent-142, score-0.277]
39 The algorithm was extensions of the parsing algorithm of (Eisner, 1996), which was a modified version of the CKY chart parsing algorithm. [sent-146, score-0.498]
40 The parsing algorithm independently parses the left and right dependents of a word and combines them later. [sent-149, score-0.361]
41 There are two types of chart items (McDonald and Pereira, 2006): 1) a complete item in which the words are unable to accept more dependents in a certain direction; and 2) an incomplete item in which the words can accept more dependents in a certain direction. [sent-150, score-0.441]
42 The direction of a dependency is from the head to the dependent. [sent-152, score-0.307]
43 The right (left) direction indicates the dependent is on the right (left) side of the head. [sent-153, score-0.209]
44 Figure 2 illustrates the cubic parsing actions of the algorithm (Eisner, 1996) in the right direction, where s, r, and t refer to the start and end indices of the chart items. [sent-156, score-0.442]
45 In Figure 2-(a), all the items on the left side are complete and the algorithm creates the incomplete item (trapezoid on the right side) of s t. [sent-157, score-0.374]
46 This action builds a dependency relation from s to t. [sent-158, score-0.22]
47 In Figure 2-(b), the item of s r is incomplete and the item of r t is complete. [sent-159, score-0.2]
48 – – – – – Figure 2: Cubic parsing actions of Eisner (Eisner, 1996) Then, we add the DLM-based features into the parsing actions. [sent-165, score-0.497]
49 Because the parsing algorithm is in the bottom-up style, the nearer children are generated earlier than the farther ones of the same head. [sent-166, score-0.371]
50 Thus, we calculate the left or right side probability for a new child when a new dependency relation is built. [sent-167, score-0.42]
51 Figure 3: Add DLM-based features in cubic parsing We use beam search to choose the one having the overall best score as the final parse, where K spans are built at each step (Zhang and Clark, 2008). [sent-172, score-0.357]
52 At each step, we perform the parsing actions in the current beam and then choose the best K resulting spans for the next step. [sent-173, score-0.285]
53 1 Baseline parser Weimplement ourparsers based on theMSTParser1, a freely available implementation of the graph-based model proposed by (McDonald and Pereira, 2006). [sent-178, score-0.187]
54 We train a first-order parser on the training data (described in Section 6. [sent-179, score-0.187]
55 2 Build dependency language models We use a large amount of unannotated data to build the dependency language model. [sent-184, score-0.532]
56 Given the dependency trees, we estimate the prob- ability distribution by relative frequency: Pu(xch|HIS) =Pxc0cohucnotu(xncth(x,H0chI,SH)IS) (7) No smoothing is pePrformed because we use the mapping function for the feature representations. [sent-189, score-0.254]
57 Tool “Penn2Malt”2 was used to convert the data into dependency structures using a standard set of head rules (Yamada and Matsumoto, 2003). [sent-206, score-0.263]
58 3 We used the MXPOST tagger trained on training data to assign part-of-speech tags and used the Baseline parser to process the sentences of the BLLIP corpus. [sent-211, score-0.187]
59 We also used the “Penn2Malt” tool to convert the data and created a data split: files 1-270 and files 400-931 for training, files 271-300 for testing, and files 301-325 for development. [sent-214, score-0.208]
60 , 2009) trained on the training data to perform word segmentation and POS tagging and used the Baseline parser to parse all the sentences in the data. [sent-233, score-0.224]
61 2 Features for basic and enhanced parsers The previous studies have defined four types of features: (FT1) the first-order features defined in McDonald et al. [sent-235, score-0.262]
62 (2005), (FT2SB) the second-order parent-siblings features defined in McDonald and Pereira (2006), (FT2GC) the second-order parentchild-grandchild features defined in Carreras (2007), and (FT3) the third-order features defined in (Koo and Collins, 2010). [sent-236, score-0.207]
63 Table 2 shows the feature settings of the systems, where MST1/2 refers to the basic first-/second-order parser and MSTB 1/2 refers to the enhanced first-/second-order parser. [sent-239, score-0.671]
64 SM yS sTteB12mTab(FleT at21 u:)+rBFes(TaF2eSliBnSe)+p(aFrTse2G C+FT3) We measured the parser quality by the unlabeled attachment score (UAS), i. [sent-243, score-0.187]
65 In the following experiments, we used “Inter” to refer to the parser with Intersect, and “Rescore” to refer to the parser with Rescoring. [sent-246, score-0.374]
66 The parsing performance generally increased as the K increased. [sent-251, score-0.168]
67 The parser with Intersect always outperformed the one with Rescoring. [sent-252, score-0.187]
68 62 Table 3: The parsing times on the development set (seconds for all the sentences) Table 3 shows the parsing times of Intersect on the development set for English. [sent-260, score-0.336]
69 For Chinese, the parser obtained the best score when N=4. [sent-272, score-0.187]
70 The results are shown in Table 5, where DLM refers to adding the DLM-based features to the Baselines. [sent-277, score-0.261]
71 As in the English experiments, the parsers using the DLMbased features consistently outperformed the Baselines. [sent-291, score-0.196]
72 6 Compare with previous work on English Table 7 shows the performance of the graph-based systems that were compared, where McDonald06 refers to the second-order parser of McDonald 219 and Pereira (2006), Koo08-standard refers to the second-order parser with the features defined in Koo et al. [sent-304, score-0.827]
73 (2008), Koo10-model1 refers to the third-order parser with model1 of Koo and Collins (2010), Koo08-dep2c refers to the second-order parser with cluster-based features of (Koo et al. [sent-305, score-0.827]
74 , 2008), Suzuki09 refers to the parser of Suzuki et al. [sent-306, score-0.379]
75 (2009), Chen09-ord2s refers to the second-order parser with subtree-based features of Chen et al. [sent-307, score-0.448]
76 (2009), and Zhou1 1refers to the second-order parser with web-derived selectional preference features of Zhou et al. [sent-308, score-0.256]
77 However, their decoding complexities were higher than ours and we believe that the performance of our parser can be further enhanced by integrating their methods with our parser. [sent-315, score-0.377]
78 pervised graph-based parsers, S denotes the graph-based parsers with semi-supervised methods, D denotes our new parsers 6. [sent-316, score-0.254]
79 7 Compare with previous work on Chinese Table 8 shows the comparative results, where Chen08 refers to the parser of (Chen et al. [sent-317, score-0.379]
80 , 2008), Zhao09 refers to the parser of (Zhao et al. [sent-319, score-0.379]
81 , 2009), and Chen09-ord2s refers to the second-order parser with subtree-based features of Chen et al. [sent-320, score-0.448]
82 Figure 5 shows the results for English, where the X-axis represents the number of children, the Yaxis represents the accuracies, OURS refers to MSTDLM1, and Baseline refers to MST1. [sent-329, score-0.384]
83 1965784123456Ba7OseUliRne8S910 Number of children Figure 5: Improvement relative to numbers of children 8 Related work Several previous studies related to our work have been conducted. [sent-335, score-0.206]
84 (2008) used a clustering algorithm to produce word clusters on a large amount of unannotated data and represented new features based on the clusters for dependency parsing models. [sent-337, score-0.59]
85 (2009) proposed an approach that extracted partial tree structures from a large amount of data and used them as the additional features to improve dependency parsing. [sent-339, score-0.334]
86 They extended a Semi-supervised Structured Conditional Model (SSSCM)(Suzuki and Isozaki, 2008) to the dependency parsing problem and combined their method with the approach of Koo et al. [sent-343, score-0.388]
87 First, two parsers were used to parse the sentences in unannotated data. [sent-350, score-0.219]
88 They retrained a parser on newly parsed sentences and the original labeled data. [sent-352, score-0.187]
89 9 Conclusion We have presented an approach to utilizing the dependency language model to improve graph-based dependency parsing. [sent-354, score-0.44]
90 We represent new features based on the dependency language model and integrate them in the decoding algorithm directly using beam-search. [sent-355, score-0.454]
91 Our approach enriches the feature representations but without increasing the decoding complexity. [sent-356, score-0.196]
92 Dependency parsing with short dependency relations in unlabeled data. [sent-377, score-0.388]
93 Three new probabilistic models for dependency parsing: An exploration. [sent-399, score-0.22]
94 Dependency parsing and domain adaptation with LR models and parser ensembles. [sent-477, score-0.355]
95 A new string-to-dependency machine translation algorithm with a target dependency language model. [sent-481, score-0.261]
96 An empirical study of semi-supervised structured conditional models for dependency parsing. [sent-491, score-0.22]
97 Chinese dependency parsing with large scale automatically constructed case structures. [sent-503, score-0.388]
98 A tale of two parsers: Investigating and combining graph-based and transitionbased dependency parsing. [sent-509, score-0.22]
99 Cross language dependency parsing using a bilingual IJCNLP2009, lexicon. [sent-513, score-0.388]
100 Exploiting web-derived selectional preference prove statistical dependency of ACL-HLT2011, pages parsing. [sent-517, score-0.263]
wordName wordTfidf (topN-words)
[('dlm', 0.348), ('dependency', 0.22), ('pu', 0.206), ('xh', 0.203), ('intersect', 0.199), ('refers', 0.192), ('parser', 0.187), ('mcdonald', 0.18), ('ch', 0.174), ('koo', 0.171), ('parsing', 0.168), ('dh', 0.166), ('xch', 0.149), ('parsers', 0.127), ('decoding', 0.124), ('gx', 0.124), ('mstb', 0.124), ('plc', 0.124), ('xl', 0.107), ('children', 0.103), ('xlk', 0.099), ('suzuki', 0.097), ('unannotated', 0.092), ('chinese', 0.086), ('chart', 0.08), ('item', 0.076), ('ayr', 0.075), ('fdlm', 0.075), ('prc', 0.075), ('vx', 0.074), ('pl', 0.071), ('pereira', 0.071), ('features', 0.069), ('beam', 0.068), ('enhanced', 0.066), ('rescoring', 0.063), ('side', 0.061), ('chen', 0.06), ('bllip', 0.059), ('nearer', 0.059), ('dependents', 0.055), ('spanning', 0.054), ('xj', 0.054), ('kiyotaka', 0.052), ('cubic', 0.052), ('files', 0.052), ('nivre', 0.052), ('xi', 0.052), ('right', 0.052), ('items', 0.051), ('dlms', 0.05), ('mgaxx', 0.05), ('mxpost', 0.05), ('scoredlm', 0.05), ('subroot', 0.05), ('trapezoid', 0.05), ('yscore', 0.05), ('actions', 0.049), ('singapore', 0.049), ('incomplete', 0.048), ('eisner', 0.048), ('uas', 0.048), ('uchimoto', 0.048), ('conll', 0.047), ('utilize', 0.046), ('left', 0.045), ('tree', 0.045), ('arcs', 0.044), ('wenliang', 0.044), ('isozaki', 0.044), ('direction', 0.044), ('add', 0.043), ('crs', 0.043), ('xg', 0.043), ('head', 0.043), ('jun', 0.043), ('ctb', 0.043), ('pages', 0.043), ('child', 0.042), ('algorithm', 0.041), ('carreras', 0.041), ('pos', 0.041), ('improvements', 0.04), ('crammer', 0.04), ('shen', 0.04), ('representations', 0.038), ('ohio', 0.038), ('heads', 0.038), ('english', 0.038), ('gt', 0.037), ('segmentation', 0.037), ('columbus', 0.035), ('ngram', 0.035), ('scores', 0.035), ('buchholz', 0.035), ('kawahara', 0.035), ('mcclosky', 0.035), ('kruengkrai', 0.035), ('yd', 0.035), ('feature', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
2 0.29234242 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto
Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.
3 0.23077855 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
4 0.22259112 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
5 0.18913926 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata
Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these nonisomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser’s PASs achieved the best dependency and translation accuracies. 1
6 0.18780814 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
7 0.16501904 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
8 0.16265596 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
9 0.15039136 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
10 0.14576611 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
11 0.139972 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
12 0.13937467 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
13 0.13639276 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
14 0.11341988 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
15 0.11094315 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
16 0.10759962 71 acl-2012-Dependency Hashing for n-best CCG Parsing
17 0.10070937 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
18 0.09740027 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing
19 0.09541893 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
20 0.094868906 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
topicId topicWeight
[(0, -0.276), (1, -0.053), (2, -0.265), (3, -0.212), (4, -0.104), (5, -0.114), (6, 0.029), (7, -0.117), (8, 0.097), (9, -0.004), (10, 0.131), (11, 0.145), (12, -0.038), (13, -0.035), (14, 0.008), (15, 0.054), (16, 0.03), (17, 0.073), (18, -0.036), (19, 0.043), (20, 0.022), (21, -0.082), (22, 0.009), (23, -0.078), (24, -0.078), (25, -0.021), (26, -0.02), (27, 0.136), (28, -0.018), (29, -0.022), (30, -0.121), (31, -0.098), (32, 0.028), (33, 0.049), (34, -0.108), (35, 0.037), (36, 0.071), (37, -0.006), (38, -0.0), (39, -0.038), (40, 0.034), (41, -0.035), (42, 0.028), (43, 0.016), (44, -0.065), (45, 0.049), (46, 0.025), (47, 0.019), (48, 0.026), (49, -0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.96549124 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
2 0.93725282 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto
Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.
3 0.85296297 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
4 0.84770459 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
Author: Emily Pitler
Abstract: Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.
5 0.82812876 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
Author: Zhenghua Li ; Ting Liu ; Wanxiang Che
Abstract: We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5. 1 and 6.0) using the Chinese Dependency Treebank as the source treebank. The improvements are respectively 1.37% and 1.10% with automatic part-of-speech tags. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.
6 0.77788079 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
7 0.75909609 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
8 0.71757698 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
10 0.66198516 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing
11 0.6221022 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
12 0.60955119 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
13 0.5780983 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
14 0.57377774 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
15 0.53747499 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
16 0.4910816 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
17 0.48660865 83 acl-2012-Error Mining on Dependency Trees
18 0.4834975 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
19 0.46735698 71 acl-2012-Dependency Hashing for n-best CCG Parsing
20 0.45792374 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
topicId topicWeight
[(7, 0.019), (26, 0.028), (28, 0.061), (30, 0.033), (37, 0.052), (39, 0.037), (49, 0.018), (71, 0.074), (74, 0.03), (76, 0.22), (82, 0.029), (84, 0.023), (85, 0.037), (90, 0.154), (92, 0.042), (94, 0.021), (99, 0.044)]
simIndex simValue paperId paperTitle
1 0.82260984 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
Author: Xin Zhao ; Rishan Chen ; Kai Fan ; Hongfei Yan ; Xiaoming Li
Abstract: Mining retrospective events from text streams has been an important research topic. Classic text representation model (i.e., vector space model) cannot model temporal aspects of documents. To address it, we proposed a novel burst-based text representation model, denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms, which can capture semantic and temporal information. Meanwhile, it significantly reduces the number of non-zero entries in the representation. We test it via scalable event detection, and experiments in a 10-year news archive show that our methods are both effective and efficient.
same-paper 2 0.74589127 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
3 0.67470986 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
4 0.67338371 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata
Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these nonisomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser’s PASs achieved the best dependency and translation accuracies. 1
5 0.66738719 98 acl-2012-Finding Bursty Topics from Microblogs
Author: Qiming Diao ; Jing Jiang ; Feida Zhu ; Ee-Peng Lim
Abstract: Microblogs such as Twitter reflect the general public’s reactions to major events. Bursty topics from microblogs reveal what events have attracted the most online attention. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic. The former helps find eventdriven posts while the latter helps identify and filter out “personal” posts. Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. We also show some case studies that demonstrate the importance of considering both the temporal information and users’ personal interests for bursty topic detection from microblogs.
7 0.63017553 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
8 0.62967765 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
9 0.62862641 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
10 0.62728208 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
11 0.62602556 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
12 0.62602252 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
13 0.62600696 140 acl-2012-Machine Translation without Words through Substring Alignment
14 0.62522858 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
15 0.6235413 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
16 0.62228703 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
17 0.62204164 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
18 0.62124294 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
19 0.62102145 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
20 0.6198144 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization