acl acl2012 acl2012-106 knowledge-graph by maker-knowledge-mining

106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

Source: pdf

Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto

Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ac Abstract This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. [sent-7, score-0.737]

2 This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. [sent-8, score-0.211]

3 Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms. [sent-9, score-0.524]

4 1 Introduction Transition-based parsing algorithms, such as shiftreduce algorithms (Nivre, 2004; Zhang and Clark, 2008), are widely used for dependency analysis because of the efficiency and comparatively good per- formance. [sent-10, score-0.564]

5 However, these parsers have one major problem that they can handle only local information. [sent-11, score-0.153]

6 (2004) pointed out that the drawbacks of shift-reduce parser could be resolved by incorporating top-down information such as root finding. [sent-13, score-0.416]

7 This work presents an O(n2) top-down headdriven transition-based parsing algorithm which can parse complex structures that are not trivial for shiftreduce parsers. [sent-14, score-0.554]

8 The deductive system is very similar to Earley parsing (Earley, 1970). [sent-15, score-0.306]

9 The Earley prediction is tied to a particular grammar rule, but the proposed algorithm is data-driven, following the current trends of dependency parsing (Nivre, 2006; McDonald and Pereira, 2006; Koo et al. [sent-16, score-0.707]

10 To do the prediction without any grammar rules, we introduce a weighted prediction that is to predict lower nodes from higher nodes with a statistical model. [sent-18, score-0.506]

11 j p s To improve parsing flexibility in deterministic parsing, our top-down parser uses beam search algorithm with dynamic programming (Huang and Sagae, 2010). [sent-22, score-0.877]

12 The complexity becomes O(n2 ∗ b) × where b is the beam size. [sent-23, score-0.183]

13 To reduce prediction errors, we propose a lookahead technique based on a FIRST function, inspired by the LL(1) parser (Aho and Ullman, 1972). [sent-24, score-0.515]

14 Experimental results show that the proposed top-down parser achieves competitive results with other data-driven parsing algorithms. [sent-25, score-0.474]

15 2 Definition of Dependency Graph A dependency graph is defined as follows. [sent-26, score-0.293]

16 1 (Dependency Graph) Given an input sentence W = n0 . [sent-28, score-0.076]

17 nn where n0 is a special root node $, a directed graph is defined as GW = (VW, AW) where VW = {0, 1, . [sent-31, score-0.442]

18 , n} is a set of (indices of) nodes and AW ⊆ VW VW i ss a set of directed arcs. [sent-34, score-0.141]

19 The set of arcs i sV a set of pairs (x, y) where x is a head and y is a dependent of x. [sent-35, score-0.182]

20 A directed graph GW = (VW, AW) is well-formed if and only if: • • • There is no node x such that (x, 0) ∈ AW. [sent-37, score-0.311]

21 If (x, y) ∈ AW then there is no node x′ such Ithfa (xt (x′, y) ∈ AW and x′ x. [sent-38, score-0.16]

22 = These conditions are refered to ROOT, SINGLEHEAD, and ACYCLICITY, and we call an wellformed directed graph as a dependency graph. [sent-43, score-0.364]

23 2 (PROJECTIVITY) A dependency graph GW = (VW, AW) is projective if and only if, ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-45, score-0.433]

24 |s0⟩ : π{q ∈ π, h < i 3n : ⟨n + 1, 0, n + 1, s0⟩ : ∅ Figure 1: The non-weighted deductive system of top-down dependency parsing algorithm: means “take anything”. [sent-74, score-0.519]

25 for every arc (x, y) ∈ AW and node l in x < l < y or y < ly < x, txh,eyre) i∈s a path x lor y l. [sent-75, score-0.211]

26 The proposed algorithm in this paper is for projective dependency graphs. [sent-76, score-0.475]

27 If a projective dependency graph is connected, we call it a dependency tree, and if not, a dependency forest. [sent-77, score-0.859]

28 →∗ →∗ 3 Top-down Parsing Algorithm Our proposed algorithm is a transition-based algo- rithm, which uses stack and queue data structures. [sent-78, score-0.358]

29 This algorithm formally uses the following state: ℓ : ⟨i, h, j,S⟩ : π where ℓ is a step size, S is a stack of trees sd| . [sent-79, score-0.23]

30 |s0 where s0 is a top tree and d is a window siz|e. [sent-82, score-0.079]

31 In the deterministic case, π is a singleton set except for the initial state. [sent-85, score-0.098]

32 This algorithm has four actions, predictx(predx), predicty(predy), scan and complete(comp). [sent-86, score-0.281]

33 The deductive system of the top-down algorithm is shown in Figure 1. [sent-87, score-0.239]

34 The initial state p0 is a state initialized by an artificial root node n0. [sent-88, score-0.367]

35 This algorithm 658 applies one action to each state selected from applicable actions in each step. [sent-89, score-0.336]

36 Each of three kinds of actions, pred, scan, and comp, occurs n times, and this system takes 3n steps for a complete analysis. [sent-90, score-0.044]

37 Action predx puts nk onto stack S selected from the input queue in the range, i ≤ k < h, which is ttho eth ien plueftt qoufe uthee irnoo tht nh ning teh,e i s ≤tac kk top. [sent-91, score-0.634]

38 Similarly, action predy puts a node nk onto stack S selected from the input queue in the range, h < i ≤ k < j, wfrohmich t hise to ptuhet right eo ifn th thee root nh hin < < c′fw or cfw = c′fw ∧ cin < c′in. [sent-92, score-1.184]

39 (cfw, cin) and p ≻ (9) We prioritize the forward cost over the inside cost since forward cost pertains to longer action sequence and is better suited to evaluate hypothesis states than inside cost (Nederhof, 2003). [sent-93, score-0.298]

40 4 FIRST Function for Lookahead Top-down backtrack parser usually reduces backtracking by precomputing the set FIRST(·) (Aho and Ullman, 1972). [sent-95, score-0.349]

41 oWmep duetinfinget htehes set FIRST(·) fhoor our top-down dependency parser: FIRST(t’) = {ld. [sent-96, score-0.213]

42 t|ld ∈ lmdescendant(Tree, t’) Tree ∈ Corpus} (10) where t’ is a POS-tag, Tree is a correct depen- dency tree which exists in Corpus, a function lmdescendant(Tree, t’) returns the set of the leftmost descendant node ld of each nodes in Tree whose POS-tag is t’, and ld. [sent-97, score-0.347]

43 Though our parser does not backtrack, it looks ahead when selecting possible child nodes at the prediction step by using the function FIRST. [sent-99, score-0.584]

44 t is a POS-tag of the node ni on the top of the queue, and nk. [sent-109, score-0.16]

45 If there are no nodes which satisfy the condition, our top-down parser creates new states for all nodes, and pushes them into hypo in line 9 of Algorithm 1. [sent-112, score-0.45]

46 6 Time Complexity Our proposed top-down algorithm has three kinds of actions which are scan, comp and predict. [sent-113, score-0.321]

47 Each scan and comp actions occurs n times when parsing a sentence with the length n. [sent-114, score-0.585]

48 Predict action also occurs n times in which a child node is selected from 662 a node sequence in the input queue. [sent-115, score-0.489]

49 Thus, the algorithm takes the following times for prediction: n + (n − 1) + · · + 1 =∑nii =n(n2 + 1). [sent-116, score-0.122]

50 (11) As n2 for prediction is the most dominant factor, the time complexity of the algorithm is O(n2) and that of the algorithm with beam search is O(n2 ∗ b). [sent-117, score-0.61]

51 7 Related Work Alshawi (1996) proposed head automaton which recognizes an input sentence top-down. [sent-118, score-0.292]

52 Eisner and Satta (1999) showed that there is a cubic-time parsing algorithm on the formalism of the head automaton grammars, which are equivalently converted into split-head bilexical context-free grammars (SBCFGs) (McAllester, 1999; Johnson, 2007). [sent-119, score-0.665]

53 Although our proposed algorithm does not employ the formalism of SBCFGs, it creates left children before right children, implying that it does not have spurious ambiguities as well as parsing algorithms on the SBCFGs. [sent-120, score-0.395]

54 Head-corner parsing algorithm (Kay, 1989) creates dependency tree top-down, and in this our algorithm has similar spirit to it. [sent-121, score-0.775]

55 Yamada and Matsumoto (2003) applied a shiftreduce algorithm to dependency analysis, which is known as arc-standard transition-based algorithm (Nivre, 2004). [sent-122, score-0.585]

56 The arc-eager algorithm processes rightdependent top-down, but this does not involve the prediction of lower nodes from higher nodes. [sent-124, score-0.375]

57 Therefore, the arc-eager algorithm is a totally bottom-up algorithm. [sent-125, score-0.122]

58 Zhang and Clark (2008) proposed a combination approach of the transition-based algorithm with graph-based algorithm (McDonald and Pereira, 2006), which is the same as our combination model of stack-based and prediction models. [sent-126, score-0.427]

59 We used Yamada and Matsumoto (2003)’s head rules to convert phrase structure to dependency structure. [sent-129, score-0.352]

60 For the Chinese data, attachment score, complete is a sentence complete rate, and root is a correct root rate. [sent-130, score-0.388]

61 length of input sentence Figure 5: Scatter plot of parsing time against sentence length, comparing with top-down, 2nd-MST and shiftreduce parsers (beam size: 8, pred size: 5) we used the information of words and fine-grained POS-tags for features. [sent-132, score-0.616]

62 We used an early update version of averaged perceptron algorithm (Collins and Roark, 2004) for training of shift-reduce and top-down parsers. [sent-135, score-0.171]

63 A set of feature templates in (Huang and Sagae, 2010) were used for the stack-based model, and a set of feature templates in (McDonald and Pereira, 2006) were used for the 2nd-order prediction model. [sent-136, score-0.183]

64 The weighted prediction and stack-based models of topdown parser were jointly trained. [sent-137, score-0.627]

65 1 Results for English Data During training, we fixed the prediction size and beam size to 5 and 16, respectively, judged by pre663 Tabloer2c:loeraO(c tlerocpae(+tlsoeph+ sm cos htr)e,cah9 o43u. [sent-139, score-0.488]

66 032569t parse for each sentence on test data from results of topdown (beam 8, pred 5) and shift-reduce (beam 8) and MST(2nd) parsers in Table 1. [sent-145, score-0.382]

67 After 25 iterations of perceptron training, we achieved 92. [sent-147, score-0.049]

68 94 unlabeled accuracy for top-down parser with the FIRST function and 93. [sent-148, score-0.32]

69 01 unlabeled accuracy for shift-reduce parser on development data by set- ting the beam size to 8 for both parsers and the prediction size to 5 in top-down parser. [sent-149, score-0.92]

70 We compared top-down parsing algorithm with other data-driven parsing algorithms in Table 1. [sent-151, score-0.534]

71 Top-down parser achieved comparable unlabeled accuracy with others, and outperformed them on the sentence complete rate. [sent-152, score-0.402]

72 On the other hand, topdown parser was less accurate than shift-reduce sh2tifon-pNcdr-eodMro. [sent-153, score-0.444]

73 ed portion is the head of the underlined portion. [sent-158, score-0.139]

74 In step 0, topdown parser predicts a child node, a root node of a complete tree, using little syntactic information, which may lead to errors in the root node selection. [sent-161, score-1.116]

75 Therefore, we think that it is important to seek more suitable features for the prediction in future work. [sent-162, score-0.183]

76 Figure 5 presents the parsing time against sentence length. [sent-163, score-0.227]

77 Our proposed top-down parser is theoretically slower than shift-reduce parser and Fig- ure 5 empirically indicates the trends. [sent-164, score-0.57]

78 This indicates that the parses produced by each parser are different from each other. [sent-167, score-0.285]

79 However, the gains obtained by the combination of top-down and 2nd-MST parsers are smaller than other combinations. [sent-168, score-0.112]

80 This is because top-down parser uses the same features as 2nd-MST parser, and these are more effective than those of stack-based model. [sent-169, score-0.285]

81 It is worth noting that as shown in Figure 5, our O(n2 ∗ b) (b = 8) top-down parser is much faster than O(n3∗)b )E (isbn =er- 8S)a tottpa -CdoKwYn parsing. [sent-170, score-0.285]

82 Following English experiments, shift-reduce parser was trained by setting beam size to 16, and top-down parser was trained with the beam size and the prediction size to 16 and 5, respectively. [sent-173, score-1.302]

83 Table 3 shows the results on the Chinese test data when setting beam size to 8 for both parsers and prediction size to 5 in top-down parser. [sent-174, score-0.6]

84 3 Analysis of Results Table 4 shows two interesting results, on which topdown parser is superior to either shift-reduce parser or 2nd-MST parser. [sent-177, score-0.729]

85 717 contains an adverbial clause structure between the subject and the main verb. [sent-179, score-0.046]

86 Top-down parser is able to handle the long-distance dependency while shift-reudce parser cannot correctly analyze it. [sent-180, score-0.824]

87 The effectiveness on the clause structures implies that our head-driven parser may handle non-projective structures well, which are introduced by Johansonn’s head rule (Johansson and Nugues, 2007). [sent-181, score-0.595]

88 127 contains a coordination structure, which it is difficult for bottom-up parsers to handle, but, top-down parser handles it well because its top-down prediction globally captures the coordination. [sent-183, score-0.661]

89 9 Conclusion This paper presents a novel head-driven parsing al- gorithm and empirically shows that it is as practical as other dependency parsing algorithms. [sent-184, score-0.591]

90 Our head-driven parser has potential for handling nonprojective structures better than other non-projective dependency algorithms (McDonald et al. [sent-185, score-0.638]

91 We are in the process of extending our head-driven parser for non-projective structures as our future work. [sent-188, score-0.327]

92 Efficient parsing for bilexical context-free grammars and head automaton grammars. [sent-227, score-0.543]

93 A deterministic word dependency analyzer enhanced with preference learning. [sent-258, score-0.311]

94 Transforming projective bilexical dependency grammars into efficiently-parsable CFGs with unfold-fold. [sent-271, score-0.491]

95 Tree-based deterministic dependency parsing — an application to nivre’s method In Proc. [sent-284, score-0.5]

96 the 48th ACL 2010 Short Papers, pages 189–193, July. [sent-285, score-0.051]

97 A reformulation of eisner and satta’s cubic time parser for split head automata grammars. [sent-308, score-0.52]

98 the ACL Workshop Incremental Parsing: Bringing Engineering and Cognition Together, pages 50–57. [sent-353, score-0.051]

99 An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. [sent-373, score-0.349]

100 A tale of two parsers: Investigating and combining graph-based and transitionbased dependency parsing using beam-search. [sent-386, score-0.402]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('parser', 0.285), ('vw', 0.22), ('dependency', 0.213), ('parsing', 0.189), ('prediction', 0.183), ('predy', 0.183), ('beam', 0.183), ('aw', 0.168), ('node', 0.16), ('scan', 0.159), ('topdown', 0.159), ('predx', 0.146), ('projective', 0.14), ('head', 0.139), ('root', 0.131), ('queue', 0.128), ('shiftreduce', 0.128), ('algorithm', 0.122), ('deductive', 0.117), ('parsers', 0.112), ('earley', 0.11), ('zsta', 0.11), ('stack', 0.108), ('comp', 0.108), ('mcdonald', 0.103), ('deterministic', 0.098), ('nivre', 0.092), ('actions', 0.091), ('bilexical', 0.087), ('action', 0.085), ('koo', 0.084), ('aho', 0.082), ('graph', 0.08), ('tree', 0.079), ('automaton', 0.077), ('pred', 0.073), ('cfw', 0.073), ('headdriven', 0.073), ('lmdescendant', 0.073), ('sbcfgs', 0.073), ('ullman', 0.073), ('directed', 0.071), ('nodes', 0.07), ('gw', 0.07), ('pereira', 0.069), ('nonprojective', 0.064), ('cin', 0.064), ('backtrack', 0.064), ('hayashi', 0.064), ('sagae', 0.063), ('size', 0.061), ('nh', 0.058), ('incremental', 0.058), ('onto', 0.056), ('yamada', 0.055), ('eisner', 0.053), ('matsumoto', 0.053), ('nara', 0.051), ('ly', 0.051), ('puts', 0.051), ('watanabe', 0.051), ('pages', 0.051), ('grammars', 0.051), ('creates', 0.05), ('fw', 0.049), ('asahara', 0.049), ('nk', 0.049), ('japan', 0.049), ('perceptron', 0.049), ('satta', 0.047), ('lookahead', 0.047), ('child', 0.046), ('clause', 0.046), ('xl', 0.045), ('states', 0.045), ('chinese', 0.045), ('complete', 0.044), ('arcs', 0.043), ('iwpt', 0.043), ('handles', 0.043), ('automata', 0.043), ('isozaki', 0.043), ('cost', 0.042), ('structures', 0.042), ('huang', 0.042), ('handle', 0.041), ('jp', 0.041), ('te', 0.04), ('johansson', 0.039), ('ld', 0.038), ('coordination', 0.038), ('state', 0.038), ('efficient', 0.038), ('input', 0.038), ('sentence', 0.038), ('communications', 0.037), ('unlabeled', 0.035), ('index', 0.035), ('collins', 0.035), ('algorithms', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto

2 0.29234242 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Author: Wenliang Chen ; Min Zhang ; Haizhou Li

Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.

3 0.23216327 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata

Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these nonisomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser’s PASs achieved the best dependency and translation accuracies. 1

4 0.22316976 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens

Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.

5 0.20205224 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

Author: Xiao Chen ; Chunyu Kit

Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.

6 0.19869421 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

7 0.1884153 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

8 0.17876443 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

9 0.158508 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

10 0.15434782 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

11 0.14930266 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

12 0.14635389 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

13 0.12561344 71 acl-2012-Dependency Hashing for n-best CCG Parsing

14 0.10919608 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

15 0.1073867 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

16 0.099711768 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing

17 0.097156748 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

18 0.092669576 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

19 0.089385852 139 acl-2012-MIX Is Not a Tree-Adjoining Language

20 0.087619372 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.251), (1, -0.041), (2, -0.298), (3, -0.215), (4, -0.133), (5, -0.158), (6, 0.018), (7, -0.045), (8, 0.125), (9, -0.009), (10, 0.122), (11, 0.185), (12, -0.042), (13, -0.021), (14, 0.044), (15, 0.003), (16, 0.001), (17, 0.062), (18, -0.037), (19, 0.046), (20, 0.031), (21, -0.089), (22, 0.039), (23, -0.116), (24, -0.076), (25, -0.01), (26, -0.011), (27, 0.12), (28, -0.011), (29, -0.028), (30, -0.15), (31, -0.046), (32, -0.019), (33, -0.001), (34, -0.044), (35, 0.102), (36, 0.052), (37, 0.04), (38, -0.092), (39, -0.016), (40, -0.05), (41, -0.032), (42, 0.097), (43, 0.088), (44, -0.032), (45, 0.017), (46, 0.004), (47, 0.031), (48, 0.026), (49, -0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98356193 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto

2 0.88131428 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Author: Wenliang Chen ; Min Zhang ; Haizhou Li

3 0.79238832 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata

4 0.78555709 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

Author: Emily Pitler

Abstract: Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.

5 0.74961442 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu

Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.

6 0.70842993 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

7 0.67808807 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

8 0.67407811 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

9 0.6592682 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing

10 0.6475262 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

11 0.62406057 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

12 0.54280657 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

13 0.53398955 83 acl-2012-Error Mining on Dependency Trees

14 0.51717633 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

15 0.49103674 71 acl-2012-Dependency Hashing for n-best CCG Parsing

16 0.4717868 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

17 0.43146399 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

18 0.42361704 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

19 0.4214651 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

20 0.40925646 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.014), (26, 0.024), (28, 0.042), (30, 0.03), (37, 0.051), (39, 0.027), (49, 0.343), (71, 0.046), (74, 0.048), (82, 0.046), (85, 0.024), (90, 0.091), (92, 0.061), (94, 0.029), (99, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75411552 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto

2 0.65311998 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

Author: Asher Stern ; Ido Dagan

Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.

3 0.64608997 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

Author: Christian Chiarcos

Abstract: This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. 1 Motivation Texts are not merely accumulations of isolated utterances, but the arrangement of utterances conveys meaning; human text understanding can thus be described as a process to recover the global structure of texts and the relations linking its different parts (Vallduv ı´ 1992; Gernsbacher et al. 2004). To capture these aspects of meaning in NLP, it is necessary to develop operationalizable theories, and, within a supervised approach, large amounts of annotated training data. To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. 213 Discourse relations involve different aspects of meaning. This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e.g., if one utterance represents the cause for another, Mann and Thompson 1988, p.257), argumentative purposes (a ‘presentational’ relation, e.g., one utterance motivates the reader to accept a claim formulated in another utterance, ibid., p.257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. 2003). Discourse relations can be indicated explicitly by optional cues, e.g., adverbials (e.g., however), conjunctions (e.g., but), or complex phrases (e.g., in contrast to what Peter said a minute ago). Here, these cues are referred to as relation words. Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. As a heuristic, an adjacency preference is adopted, i.e., the target is identified with the main event of the preceding utterance.1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. add the candidate triple to the BKB, if it found in the BKB, increase its score by (or initialize it with) 1, – – • perform a pruning on all candidate triples, calcpuerlaftoer significance aonnd a lclo crarneldaitdioante scores Pruning uses statistical significance tests to evaluate whether the relative frequency of a relation word for a pair of events is significantly higher or lower than the relative frequency of the relation word in the entire corpus. Assuming that incorrect candidate triples (i.e., where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. The goal of this paper is to evaluate the validity of this approach. 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. Formally, an event is the semantic representation of the meaning conveyed in the utterance. We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. For the experiment described here, events are heuristically identified with the main predicates of a sentence, i.e., non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. These triples are harvested from large syntactically annotated corpora. For intersentential relations, the target is identified with the event of the immediately preceding main clause. These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. For this purpose, statistical significance tests are adopted (χ2 for triples of frequent events and relation words, t-test for rare events and/or relation words) that compare the relative frequency of a rela- tion word given a pair of events with the relative frequency of the relation word in the entire corpus. All results with p ≥ .05 are excluded, i.e., only triples are preserved pfo ≥r w .0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. Assuming an even distribution of incorrect target events, this should rule these out. Additionally, it also serves as a means of evaluation. Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant.2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. All things being equal, we thus need millions of sentences to create the BKB. Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. 2009). PukWaC is a 2G-token web corpus of British English crawled from the uk domain (Ferraresi et al. 2Subsequent studies may employ less rigid pruning criteria. For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. 2008), and parsed with MaltParser (Nivre et al. 2006). It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82.2% (72.5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. Wackypedia EN is a 0.8G-token dump of the English Wikipedia, annotated with the same tools. It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. The portion analyzed here comprises 33.2M sentences, 75.9% of the corpus. The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. The target (antecedent) event was identified with the last main event of the preceding sentence. As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ?), reproducibility (can these correlations confirmed on independent data sets ?), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ?). 3.1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. The left column of Tab. 1 shows the number of triples obtained from PukWaC subcorpora of different size. For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. We254Mn1a c:CeAs(gurb42)et760cr8m,iop3e61r4l28np0st6uwicho21rm9W,e2673mas048p7c3okenytpdoagi21p8r,o35eE0s29Nit36nvgreipol8796r50s9%.n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8.0 Htr5iple1v s, 45.12T35av9sg7.reH7em nv6 ts62(. %.9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. Table 1 confirms that results obtained on one resource can be reproduced on another. This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. 3.2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. These expectations are tested here for three of the most frequent relation words found in the corpora, i.e., but, then and however. But and however can be grouped together under a generalized notion of contrast (Knott and Dale 1994; Prasad et al. 2008); then, on the other hand, indicates a tem- poral and/or causal relation. Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. 4 Discussion and Outlook This paper described a novel approach towards the unsupervised acquisition of discourse relations, with encouraging preliminary results: Large collections of parsed text are used to assess distributional profiles of relation words that indicate discourse relations that are possible between specific types of events; on this basis, a background knowledge base (BKB) was created that can be used to predict an appropriatediscoursemarkertoconnecttwoutterances with no overt relation word. This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. Similarly, Zhou et al. (2010) used a language model to predict discourse markers for implicitly realized discourse relations. As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. The validity of the approach was evaluated with respect to three evaluation criteria: The extracted associations between relation words and event pairs could be shown to be statistically significant, and to be reproducible on other corpora; for three highly frequent relation words, theoretical predictions about their relative distribution could be confirmed, indicating their interpretability in terms of presupposed taxonomies of discourse relations. Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. In the Natural Language Understanding domain, the BKB may help to disambiguate or to identify discourse relations between different events; in the context of Machine Translation, it may represent a factor guid- ing the insertion of relation words, a task that has been found to be problematic for languages that dif216 fer in their inventory and usage of discourse markers, e.g., German and English (Stede and Schmitz 2000). The approach is language-independent (except for the syntactic extraction patterns), and it does not require manually annotated data. It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. Riaz and Girju (2010) exploit distributional information about pairs of utterances. Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. Their approach can thus be applied to comparably small data sets. However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. Prospectively, both approaches could be combined to compensate their respective weaknesses. Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. The goal of this paper was to evaluate the methdological validity of the approach. It thus represents the basis for further experiments, e.g., with respect to the enrichment the BKB with information provided by Riaz and Girju (2010), Chambers and Jurafsky (2009) and Kasch and Oates (2010). Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. Acknowledgments This work was supported by a fellowship within the Postdoc program of the German Academic Exchange Service (DAAD). Initial experiments were conducted at the Collaborative Research Center (SFB) 632 “Information Structure” at the University of Potsdam, Germany. Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. References M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226, 2009. N. Chambers and D. Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 602–610. Association for Computational Linguistics, 2009. A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54, 2008. Morton Ann Gernsbacher, Rachel R. W. Robertson, Paola Palladino, and Necia K. Werner. Managing mental representations during narrative comprehension. Discourse Processes, 37(2): 145–164, 2004. N. Kasch and T. Oates. Mining script-like structures from the web. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 34–42. Association for Computational Linguistics, 2010. A. Knott and R. Dale. Using linguistic phenomena to motivate a set ofcoherence relations. Discourse processes, 18(1):35–62, 1994. 217 J. van Kuppevelt and R. Smith, editors. Current Directions in Discourse andDialogue. Kluwer, Dordrecht, 2003. William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. J. Nivre, J. Hall, and J. Nilsson. Maltparser: A data-driven parser-generator for dependency parsing. In Proc. of LREC, pages 2216–2219. Citeseer, 2006. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008. M. Riaz and R. Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on, pages 361–368. IEEE, 2010. M. Stede and B. Schmitz. Discourse particles and discourse functions. Machine translation, 15(1): 125–147, 2000. Enric Vallduv ı´. The Informational Component. Garland, New York, 1992. Bonnie L. Webber. Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes, 2(6): 107–135, 1991. Bonnie L. Webber, Matthew Stone, Aravind K. Joshi, and Alistair Knott. Anaphora and discourse structure. Computational Linguistics, 4(29):545– 587, 2003. Z.-M. Zhou, Y. Xu, Z.-Y. Niu, M. Lan, J. Su, and C.L. Tan. Predicting discourse connectives for implicit discourse relation recognition. In COLING 2010, pages 1507–15 14, Beijing, China, August 2010.

4 0.38674593 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

Author: Ce Zhang ; Feng Niu ; Christopher Re ; Jude Shavlik

Abstract: Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. There is, however, no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap, we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant, positive impact on quality (F1 score). In contrast, human feedback has a positive and statistically significant, but lower, impact on precision and recall.

5 0.38539878 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata

6 0.38529259 191 acl-2012-Temporally Anchored Relation Extraction

7 0.38454288 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

8 0.38377774 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

9 0.38368231 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

10 0.38288331 31 acl-2012-Authorship Attribution with Author-aware Topic Models

11 0.38090873 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

12 0.38025582 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

13 0.3780767 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

14 0.37792203 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

15 0.37630808 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

16 0.37538213 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

17 0.37488821 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

18 0.37429124 71 acl-2012-Dependency Hashing for n-best CCG Parsing

19 0.37405846 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

20 0.37327924 187 acl-2012-Subgroup Detection in Ideological Discussions