acl acl2013 acl2013-26 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
Reference: text
sentIndex sentText sentNum sentScore
1 it Joakim Nivre Department of Linguistics and Philology Uppsala University, Sweden s att a @ de i Abstract We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. [sent-3, score-0.347]
2 While all previous transition systems assume a static parsing strategy with respect to top-down and bottom-up processing, our new system allows a dynamic strategy for ordering parsing decisions. [sent-22, score-0.838]
3 This has the advantage that the parser can postpone difficult decisions until the relevant information becomes available, in a way that is not possible in existing transition systems. [sent-23, score-0.497]
4 Our experiments show that these advantages lead to significant improvements in parsing accuracy, compared to a baseline parser that uses the arc-eager transition system of Nivre (2003), which is one of the most 135 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t. [sent-25, score-0.589]
5 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgices 135–144, widely used static transition systems. [sent-27, score-0.334]
6 In the context of dependency parsing, a parsing strategy is called purely bottom-up if every dependency h → d is constructed only after all dependencies o →f →th de form d → ihave been constructed. [sent-30, score-0.573]
7 In contrast, a parsing strategy is called purely top-down if h → d is constructed before any dependency of theh fo →rm d d → i. [sent-32, score-0.442]
8 If we consider transition-based dependency parsing (Nivre, 2008), the purely bottom-up strategy is implemented by the arc-standard model of Nivre (2004). [sent-33, score-0.397]
9 In this model, a dependency h → d is constructed using a purely bottom-up strategy idf it represents a left-arc, that is, if the dependent d is placed to the left of the head h in the input string. [sent-36, score-0.407]
10 By this we mean that each dependency is constructed according to some fixed criterion, depending on structural conditions such as the fact that the dependency represents a left or a right arc. [sent-39, score-0.447]
11 This should be contrasted with dynamic parsing strategies in which several parsing options are simultaneously available for the dependencies being constructed. [sent-40, score-0.349]
12 In contrast with this scenario, in the next sections we implement a dynamic parsing strategy that allows a transition system to decide between the attachments α2 and α3 after it has seen all of the four nodes V, N1, P and N2. [sent-50, score-0.615]
13 Other additional advantages of dynamic parsing strategies with respect to static strategies are related to the increase in the feature inventory that we apply to parser states, and to the increase of spurious ambiguity. [sent-51, score-0.57]
14 3 Dependency Parser In this section we present a novel transition-based parser for projective dependency trees, implementing a dynamic parsing strategy. [sent-53, score-0.547]
15 A dependency tree for w is a directed, ordered tree Tw = (Vw, Aw), where Vw = {wi | i ∈ 136 w4 w1w2w3w5w6w7 Figure 2: A dependency tree with left spine hw4, w2, w1i and right spine hw4, w7i. [sent-61, score-1.438]
16 The left spine of Tw is an ordered sequence hu1, . [sent-66, score-0.447]
17 The right spine of Tw is defined symmetrically; see again Figure 2. [sent-71, score-0.45]
18 Note that the left and the right spines share the root node and no other node. [sent-72, score-0.396]
19 2 Basic Idea Transition-based dependency parsers use a stack data structure, where each stack element is associated with a tree spanning some (contiguous) substring of the input w. [sent-74, score-1.02]
20 The parser can combine two trees T and T0 through attachment operations, called left-arc or right-arc, under the condition that T and T0 appear at the two topmost positions in the stack. [sent-75, score-0.381]
21 In contrast, a stack element in our parser records the entire left spine and right spine of the associated tree. [sent-77, score-1.439]
22 This allows us to extend the inventory of the attachment operations of the parser by including the attachment of tree T as a dependent of any node in the left or in the right spine of a second tree T0, provided that this does not violate projectivity. [sent-78, score-1.182]
23 Furthermore, the new parsing strategy is clearly dy1A dependency tree for w is projective if every subtree has a contiguous yield in w. [sent-81, score-0.489]
24 The new strategy is more powerful than the strategy of the arc-eager model, since we can use top-down parsing at left arcs, which is not allowed in arc-eager parsing, and we do not have the restrictions of parsing right arcs (h → d) before the attachment of right dependents (aht n→ od de) d. [sent-84, score-0.897]
25 We observe that the new parsing strategy allows the construction of a tree T0 consisting of the only dependency V → N1 and a tree T, placed at the right of T0, consisting o1f the only dependency P → N2. [sent-86, score-0.797]
26 Since the right spine of T0 consists of Pno →des N V2 and N1, we can freely choose between attachment V → P and attachment N1 → P. [sent-87, score-0.68]
27 uOseurd itnran §5sit fioorn o-buars eexdp parser uses a stack data structure to store partial parses for the input string w. [sent-92, score-0.511]
28 We represent the stack as an ordered sequence σ = [σd, . [sent-93, score-0.329]
29 , σ1], d ≥ 0, of stack elements, with the topmost elemendt placed at the right. [sent-96, score-0.419]
30 The parser also uses a buffer to store the portion of the input string still to be processed. [sent-113, score-0.317]
31 A configuration of the parser relative to w is a triple c = (σ, β, A), where σ and β are a stack and a buffer, respectively, and A ⊆ Vw Vw is a set of arcs. [sent-125, score-0.666]
32 The set of terminal configurations consis]ts, ∅o)f all configurations of the form ( [σ1] , [] , A), where σ1 is associated with a tree hav- ing root w0, that is, u1,1 = v1,1 = w0, and A is any set of arcs. [sent-130, score-0.359]
33 Each transition is a binary relation defined over the set of configurations of the parser. [sent-132, score-0.351]
34 Since the set of configurations is infinite, a transition is infinite as well, when viewed as a set. [sent-133, score-0.351]
35 This transition removes the first node from the buffer and pushes into the stack a new element, consisting of the left and right spines of the associated tree. [sent-137, score-1.022]
36 Let h be the k-th nokde ≥ ≥in 1 the left spine of the topmost tree in the stack, and let d be the root node of the second topmost tree in the stack. [sent-139, score-0.957]
37 Fur- thermore, the two topmost stack ehle →men dts are replaced by a new element associated with the tree resulting from the h → d attachment. [sent-141, score-0.562]
38 This transition is dekfin ≥ed 1 symmetrically with respect to lak. [sent-166, score-0.334]
39 Transitions lak and rak are parametric in k, where k is bounded by the length of the input string and not by a fixed constant (but see also the experimental findings in §5). [sent-177, score-0.5]
40 It is not difficult to see that the transition-based parser specified above is sound, meaning that the set of arcs constructed in any complete computation on w is always a dependency tree for w. [sent-185, score-0.593]
41 The parser is also complete, meaning that every (projective) dependency tree for w is constructed by some complete computation on w. [sent-186, score-0.515]
42 4 Deterministic Parsing Algorithm The transition-based parser of the previous section is a nondeterministic device, since several transitions can be applied to a given configuration. [sent-189, score-0.428]
43 We present here an algorithm that runs the parser in pseudo-deterministic mode, greedily choosing at each configuration the transition that maximizes some score function. [sent-194, score-0.677]
44 Algorithm 1 takes as input a string w and a scoring function score() defined over parser transitions and parser configurations. [sent-195, score-0.641]
45 The output of the parser is a dependency tree for w. [sent-197, score-0.409]
46 At each iteration the algorithm checks whether there are at least two elements in the stack and, if this is not the case, it shifts elements from the buffer to the stack. [sent-198, score-0.495]
47 Then the algorithm uses the function score() to evaluate all transitions that can be applied under the current configuration c = (σ, β, A), and it applies the transition with the highest score, updating the current configuration. [sent-199, score-0.764]
48 In the worst case, each transition application involves 1+ p + s transition evaluations. [sent-201, score-0.581]
49 We therefore conclude that the algorithm always reaches a configuration with an empty buffer and a stack which contains only one element. [sent-202, score-0.644]
50 Then the algorithm stops, returning the dependency tree whose arc set is defined as in the current configuration. [sent-203, score-0.403]
51 d using the current ω, until the highest score selected transition bestT is incorrect according to Ag. [sent-220, score-0.339]
52 When this happens, ω is updated by decreasing the weights of the features associated with the incorrect bestT and by increasing the weights of the features associated with the transition bestCorrectT having the highest score among all possible correct transitions. [sent-221, score-0.462]
53 After each update, the learning algorithm resumes parsing from the current configuration by applying bestCorrectT, and moves on using the updated weights. [sent-222, score-0.344]
54 2 Correct and Incorrect Transitions Standard transition-based dependency parsers are trained by associating each gold tree with a canonical complete computation. [sent-224, score-0.403]
55 This means that, for each configuration of interest, only one transition 139 σ2 σ1 b1 σ2 σ1 b1 (a) ··· σ2 σ1 (c) (b) bi ··· σ2 σ1 (d) bi Figure 4: Graphical representation of configurations; drawn arcs are in Ag but have not yet been added to the configuration. [sent-225, score-0.537]
56 Transition sh is incorrect for configuration (a) and (b); sh and ra1 are correct for (c); sh and la1 are correct for (d). [sent-226, score-0.724]
57 In this paper we depart from such a methodology, and follow Goldberg and Nivre (2012) in allowing more than one correct transition for each configuration, as explained in detail below. [sent-228, score-0.335]
58 Observe that all complete computations for Ag share the same initial configuration cI,w and final configuration cF,Ag . [sent-234, score-0.436]
59 Consider now the set C(w) of all configurations c that are reachable froCm(w cI,w, meaning that there exists a sequence of transitions that takes the parser from cI,w to c. [sent-235, score-0.55]
60 A configuration c ∈ C(w) is correct for Ag if cF,Ag is reachable fcro ∈m C c; otherwise, c is incorrect for Ag. [sent-236, score-0.352]
61 Let also v1,k, k ∈ [1, q], be the nodes in the right spine of σ1. [sent-242, score-0.507]
62 Assuming that transition rak creates a new arc (h → d) ∈ Ag, we argue that from configuration (ch0 w→it hd c ‘rak c0 we can still reach the final configuration a ‘ssociated with Ag. [sent-247, score-0.983]
63 The tree fragments in σ with roots v2,k+1 and u1,1 must be adjacent siblings in the tree associated with Ag, since c is a correct configuration for Ag and (v2,k → u1,1) ∈ Ag. [sent-249, score-0.494]
64 , v2,s in the right spine in σ2 in c must have already acquired all of its right dependents, since the tree is projective. [sent-253, score-0.63]
65 Therefore it is safe for transition rak to eliminate the nodes v2,k+1 , . [sent-254, score-0.528]
66 Arc (h → d) is not in A, and the only way we could creat(eh (h → d) from c0 is by reaching a new configuration hw →ith d dd) in the topmost stack symbol, which amounts to say thatσ10 can be reduced by a correct transition. [sent-263, score-0.621]
67 From stack σ0 = σ00|σ20 |σ01 it is always possible to construct (h → d) consuming the substring of β up to wi a(nhd → ending up with stack σ00|σred, where σred is a stack element with root wi. [sent-269, score-1.103]
68 σFrom there, the parser can move on to the final configuration cF,Ag . [sent-270, score-0.368]
69 From condition (i) in Lemma 1and from the fact that there are no cycles in Ag, it follows that there is at most one correct transition among the transitions of type lak or rak. [sent-273, score-0.809]
70 From condition (ii) in the lemma we can also see that the existence of a correct transition of type lak or rak for some configuration does not imply that the sh transition is incorrect 140 for the same configuration; see Figures 4(c,d) for examples. [sent-274, score-1.436]
71 As already mentioned, we do not impose any canonical form on complete computations that would hardwire a preference for some correct transition and get rid of spurious ambiguity. [sent-278, score-0.514]
72 Our main goal is that the training algorithm learns to prefer a sh transition in a configuration that does not provide enough information for the choice of the correct arc. [sent-280, score-0.671]
73 In the context of dependency parsing, the strategy of delaying arc construction when the current configuration is not informative is called the easy-first strategy, and has been first explored by Goldberg and Elhadad (2010). [sent-281, score-0.554]
74 This is not possible in our system, since the number of transitions lak and rak is not bounded by a constant. [sent-285, score-0.715]
75 Furthermore, it is not meaningful to associate transitions lak and rak, for any k ≥ 1, always with the same features, since the ckon ≥str 1ucted arcs impinge on nodes at different depths in the involved spines. [sent-286, score-0.609]
76 It seems indeed more significant to extract information that is local to the arc h → d being constructed by each transition, suchh as →fo dr instance the grandparent and the great grandparent nodes of d. [sent-287, score-0.368]
77 We index the nodes in the stack σ relative to the head node of the arc being constructed, in case of the transitions lak or rak, or else relative to the root node of σ1, in case of the transition sh. [sent-290, score-1.491]
78 Figure 5 shows an example of feature extraction for the displayed configuration c = (σ, β, A) and the transition la2. [sent-294, score-0.459]
79 Note that in Table 1placeholders are dynamically assigned in such a way that s1 and s2 refer to the nodes in the constructed arc h → d, and gp, gg refer to the grandparent and the great grandparent nodes, respectively, of d. [sent-296, score-0.433]
80 Furthermore, the node assigned to s3 is the parent node of s2, if such a node is defined; otherwise, the node assigned to s3 is the root of the tree fragment in the stack underneath σ2. [sent-297, score-0.787]
81 Symmetrically, placeholders q1 and q2 refer to the parent and grandparent nodes of s1, respectively, when these nodes are defined; otherwise, these placeholders get assigned tokens from the buffer. [sent-298, score-0.348]
82 The extension is obtained by adding top-down features for left-arcs (based on placeholders gp and gg), and by adding right child features for the first stack element. [sent-302, score-0.585]
83 Symbols uj,k and vj,k are the k-th nodes in the l|eσft a|σnd| right spines, respectively, of stack element σj, w 1ith uj,1 = vj,1 being the shared root of σj ; none is an artificial element used when some context’s placeholder is not available. [sent-313, score-0.642]
84 stack · ·u3,1= v3v,31,2 s3 buffer β σ u 2,2 ,1= v2v,12,2v2,3la2u1,3u 1,12,1= v1v,1 ,2v1,3 s2 s1 q1=gp context extracted for la2 b1b2b3 q2 Figure 5: Extraction of atomic features for context C(c, la2) = (s3, s2, s1, q1, q2, gp, gg), c = (σ, β, A) . [sent-314, score-0.428]
85 The same template is adapted to the arc-standard parser, by removing the top-down parent features and by adding the right child features for the first stack element. [sent-327, score-0.461]
86 All our parsers attach the root node at the end of the parsing process, following the ‘None’ approach discussed by Ballesteros and Nivre (2013). [sent-331, score-0.439]
87 The first component is the left-/right- spine representation for stack elements, introduced in §3. [sent-340, score-0.661]
88 The second component is the easy-first strategy, implemented on the basis of the spurious ambiguity of our parser and the definition of correct/incorrect transitions in §4. [sent-342, score-0.501]
89 In this perspective, we observe that our parser can indeed be viewed as an arc-standard model augmented with (i) the spine representation, and (ii) the easy-first strategy. [sent-344, score-0.548]
90 On the other hand, (ii) drops the limitation of canonical computations for the arc-standard, and leverages 142 on the spurious ambiguity of the parser to enlarge the search space. [sent-346, score-0.331]
91 More precisely, the arc-standard + spine model uses the transitions lak/rak but retains the definition of canonical computation, defined by ap- plying each lak/rak transition as soon as possible. [sent-348, score-0.944]
92 On the other hand, the arc-standard + easy-first model retains the original la/ra transitions but is trained allowing any correct transition at each configuration. [sent-349, score-0.604]
93 In this case the characterization of correct and incorrect configurations in Lemma 1has been adapted to transitions la/ra, taking into account the bottom-up constraint. [sent-350, score-0.473]
94 Analyzing these results, and comparing with the plain arcstandard, we see that the spine representation and the easy-first strategy individually improve accuracy. [sent-352, score-0.457]
95 At each iteration our parser evaluates a number of transitions bounded by γ + 1, with γ the maximum value ofthe sum ofthe lengths ofthe left spine in σ1 and of the right spine in σ2. [sent-355, score-1.34]
96 The idea of representing the right spine of a tree within the stack elements of a shift-reduce device is quite old in parsing, predating empirical approaches. [sent-371, score-0.871]
97 In the context of transition-based dependency parsers, right spines have also been exploited by Kitagawa and Tanaka-Ishii (2010) to decide where to attach the next word from the buffer. [sent-374, score-0.351]
98 2 Since one can regard a spine as a stack in itself, whose elements are tree nodes, our model is reminiscent of the embedded pushdown automata of Schabes and Vijay-Shanker (1990), used to parse tree adjoining grammars (Joshi and Schabes, 1997) and exploiting a stack of stacks. [sent-376, score-1.175]
99 An interesting line of future research is to combine our dynamic parsing strategy with a training method that allows the parser to explore transitions that apply to incorrect configurations, as in Goldberg and Nivre (2012). [sent-378, score-0.773]
100 Deterministic left to right parsing of tree adjoining languages. [sent-456, score-0.361]
wordName wordTfidf (topN-words)
[('spine', 0.363), ('stack', 0.298), ('transition', 0.276), ('transitions', 0.243), ('lak', 0.231), ('rak', 0.195), ('ag', 0.193), ('parser', 0.185), ('configuration', 0.183), ('arc', 0.146), ('nivre', 0.135), ('dependency', 0.131), ('parsing', 0.128), ('vw', 0.126), ('sh', 0.12), ('attachment', 0.115), ('parsers', 0.11), ('buffer', 0.104), ('spines', 0.094), ('strategy', 0.094), ('tree', 0.093), ('right', 0.087), ('placeholders', 0.087), ('root', 0.084), ('topmost', 0.081), ('node', 0.078), ('arcs', 0.078), ('gp', 0.076), ('configurations', 0.075), ('joakim', 0.073), ('spurious', 0.073), ('bestcorrectt', 0.071), ('bestt', 0.071), ('goldberg', 0.069), ('wi', 0.067), ('gg', 0.065), ('incorrect', 0.063), ('tw', 0.06), ('dynamic', 0.06), ('grandparent', 0.06), ('correct', 0.059), ('static', 0.058), ('symmetrically', 0.058), ('element', 0.058), ('nodes', 0.057), ('wn', 0.057), ('argmaxt', 0.053), ('kitagawa', 0.053), ('left', 0.053), ('ii', 0.047), ('reachable', 0.047), ('bounded', 0.046), ('constructed', 0.045), ('purely', 0.044), ('projective', 0.043), ('deterministic', 0.042), ('placed', 0.04), ('template', 0.039), ('attach', 0.039), ('child', 0.037), ('computations', 0.037), ('schabes', 0.037), ('canonical', 0.036), ('hwii', 0.036), ('iscorrect', 0.036), ('padua', 0.036), ('postpone', 0.036), ('ttra', 0.036), ('roots', 0.034), ('algorithm', 0.033), ('strategies', 0.033), ('complete', 0.033), ('lemma', 0.033), ('characterization', 0.033), ('dependents', 0.033), ('associated', 0.032), ('ballesteros', 0.031), ('subba', 0.031), ('ordered', 0.031), ('greedy', 0.031), ('let', 0.031), ('elements', 0.03), ('ui', 0.03), ('worst', 0.029), ('update', 0.029), ('bes', 0.029), ('applies', 0.029), ('string', 0.028), ('computation', 0.028), ('incremental', 0.028), ('unlabeled', 0.028), ('wj', 0.028), ('kf', 0.027), ('append', 0.027), ('reduction', 0.027), ('yoav', 0.027), ('empty', 0.026), ('retains', 0.026), ('symmetrical', 0.026), ('atomic', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
2 0.35381162 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
Author: Jinho D. Choi ; Andrew McCallum
Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.
3 0.32765007 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
4 0.22133279 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
5 0.19323246 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
Author: Yang Liu
Abstract: We introduce a shift-reduce parsing algorithm for phrase-based string-todependency translation. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. As our approach combines the merits of phrase-based and string-todependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
6 0.17316742 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
7 0.14022315 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
8 0.13811813 288 acl-2013-Punctuation Prediction with Transition-based Parsing
9 0.13689299 80 acl-2013-Chinese Parsing Exploiting Characters
10 0.13441397 4 acl-2013-A Context Free TAG Variant
11 0.12618878 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
12 0.12529378 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
13 0.12468731 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
14 0.1226777 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
15 0.11511196 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
16 0.098210976 335 acl-2013-Survey on parsing three dependency representations for English
17 0.097641818 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
18 0.096271247 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
19 0.089942515 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
20 0.087692663 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation
topicId topicWeight
[(0, 0.205), (1, -0.144), (2, -0.264), (3, 0.026), (4, -0.17), (5, 0.008), (6, 0.144), (7, -0.021), (8, 0.026), (9, -0.14), (10, 0.003), (11, 0.067), (12, -0.089), (13, -0.0), (14, 0.198), (15, 0.008), (16, -0.15), (17, -0.025), (18, -0.004), (19, -0.029), (20, -0.046), (21, 0.072), (22, 0.095), (23, 0.029), (24, -0.044), (25, 0.001), (26, -0.021), (27, -0.004), (28, 0.054), (29, 0.035), (30, -0.097), (31, -0.036), (32, -0.018), (33, -0.049), (34, -0.036), (35, -0.019), (36, -0.034), (37, 0.052), (38, -0.143), (39, 0.053), (40, 0.021), (41, -0.045), (42, 0.063), (43, 0.06), (44, -0.068), (45, 0.035), (46, 0.02), (47, -0.008), (48, 0.015), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.95295852 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
2 0.94232959 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
3 0.90635973 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
Author: Jinho D. Choi ; Andrew McCallum
Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.
4 0.86348122 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
Author: Ji Ma ; Jingbo Zhu ; Tong Xiao ; Nan Yang
Abstract: In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. We propose a simple variant of “early-update” to ensure valid update in the training process. The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity. On CTB, we achieve 94.01% tagging accuracy and 86.33% unlabeled attachment score with a relatively small beam width. On PTB, we also achieve state-of-the-art performance. 1
5 0.78343445 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
6 0.75502163 288 acl-2013-Punctuation Prediction with Transition-based Parsing
7 0.68697286 335 acl-2013-Survey on parsing three dependency representations for English
8 0.68306315 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
9 0.67413658 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
10 0.63314319 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
11 0.56931788 176 acl-2013-Grounded Unsupervised Semantic Parsing
12 0.5658136 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
13 0.54644108 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
14 0.52396297 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
15 0.48598224 275 acl-2013-Parsing with Compositional Vector Grammars
16 0.47752735 163 acl-2013-From Natural Language Specifications to Program Input Parsers
17 0.4602097 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
18 0.46011525 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
19 0.45490938 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
20 0.44717366 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
topicId topicWeight
[(0, 0.062), (6, 0.037), (11, 0.485), (14, 0.013), (24, 0.022), (26, 0.055), (35, 0.051), (42, 0.048), (48, 0.034), (70, 0.043), (88, 0.02), (90, 0.016), (95, 0.038)]
simIndex simValue paperId paperTitle
1 0.98080873 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations
Author: Kimi Kaneko ; Yusuke Miyao ; Daisuke Bekki
Abstract: This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by investigating recognition accuracy.
2 0.97538811 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
Author: Flavio De Benedictis ; Stefano Faralli ; Roberto Navigli
Abstract: We present GlossBoot, an effective minimally-supervised approach to acquiring wide-coverage domain glossaries for many languages. For each language of interest, given a small number of hypernymy relation seeds concerning a target domain, we bootstrap a glossary from the Web for that domain by means of iteratively acquired term/gloss extraction patterns. Our experiments show high performance in the acquisition of domain terminologies and glossaries for three different languages.
same-paper 3 0.9710753 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
Author: Tirthankar Dasgupta
Abstract: In this work we present psycholinguistically motivated computational models for the organization and processing of Bangla morphologically complex words in the mental lexicon. Our goal is to identify whether morphologically complex words are stored as a whole or are they organized along the morphological line. For this, we have conducted a series of psycholinguistic experiments to build up hypothesis on the possible organizational structure of the mental lexicon. Next, we develop computational models based on the collected dataset. We observed that derivationally suffixed Bangla words are in general decomposed during processing and compositionality between the stem . and the suffix plays an important role in the decomposition process. We observed the same phenomena for Bangla verb sequences where experiments showed noncompositional verb sequences are in general stored as a whole in the ML and low traces of compositional verbs are found in the mental lexicon. 1 IInnttrroodduuccttiioonn Mental lexicon is the representation of the words in the human mind and their associations that help fast retrieval and comprehension (Aitchison, 1987). Words are known to be associated with each other in terms of, orthography, phonology, morphology and semantics. However, the precise nature of these relations is unknown. An important issue that has been a subject of study for a long time is to identify the fundamental units in terms of which the mental lexicon is i itkgp .ernet . in organized. That is, whether lexical representations in the mental lexicon are word based or are they organized along morphological lines. For example, whether a word such as “unimaginable” is stored in the mental lexicon as a whole word or do we break it up “un-” , “imagine” and “able”, understand the meaning of each of these constituent and then recombine the units to comprehend the whole word. Such questions are typically answered by designing appropriate priming experiments (Marslen-Wilson et al., 1994) or other lexical decision tasks. The reaction time of the subjects for recognizing various lexical items under appropriate conditions reveals important facts about their organization in the brain. (See Sec. 2 for models of morphological organization and access and related experiments). A clear understanding of the structure and the processing mechanism of the mental lexicon will further our knowledge of how the human brain processes language. Further, these linguistically important and interesting questions are also highly significant for computational linguistics (CL) and natural language processing (NLP) applications. Their computational significance arises from the issue of their storage in lexical resources like WordNet (Fellbaum, 1998) and raises the questions like, how to store morphologically complex words, in a lexical resource like WordNet keeping in mind the storage and access efficiency. There is a rich literature on organization and lexical access of morphologically complex words where experiments have been conducted mainly for derivational suffixed words of English, Hebrew, Italian, French, Dutch, and few other languages (Marslen-Wilson et al., 2008; Frost et al., 1997; Grainger, et al., 1991 ; Drews and Zwitserlood, 1995). However, we do not know of any such investigations for Indian languages, which 123 Sofia, BuPrlgoacreiead, iAngusgu osft 4h-e9 A 2C01L3 S.tu ?c d2en0t1 3Re Ases aorc hiat Wio nrk fsohro Cp,om papguesta 1ti2o3n–a1l2 L9in,guistics are morphologically richer than many of their Indo-European cousins. Moreover, Indian languages show some distinct phenomena like, compound and composite verbs for which no such investigations have been conducted yet. On the other hand, experiments indicate that mental representation and processing of morphologically complex words are not quite language independent (Taft, 2004). Therefore, the findings from experiments in one language cannot be generalized to all languages making it important to conduct similar experimentations in other languages. This work aims to design cognitively motivated computational models that can explain the organization and processing of Bangla morphologically complex words in the mental lexicon. Presently we will concentrate on the following two aspects: OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa PPo o l yy-mmoorrpphheemmiicc wwoorrddss:: our objective here is to determine whether the mental lexicon decomposes morphologically complex words into its constituent morphemes or does it represent the unanalyzed surface form of a word. OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoomm-ppoouunndd vveerrbbss ((CCVV)) :: compound verbs are the subject of much debate in linguistic theory. No consensus has been reached yet with respect to the issue that whether to consider them as unitary lexical units or are they syntactically assembled combinations of two independent lexical units. As linguistic arguments have so far not led to a consensus, we here use cognitive experiments to probe the brain signatures of verb-verb combinations and propose cognitive as well as computational models regarding the possible organization and processing of Bangla CVs in the mental lexicon (ML). With respect to this, we apply the different priming and other lexical decision experiments, described in literature (Marslen-Wilson et al., 1994; Bentin, S. and Feldman, 1990) specifically for derivationally suffixed polymorphemic words and compound verbs of Bangla. Our cross-modal and masked priming experiment on Bangla derivationally suffixed words shows that morphological relatedness between lexical items triggers a significant priming effect, even when the forms are phonologically/orthographically unrelated. These observations are similar to those reported for English and indicate that derivationally suffixed words in Bangla are in general accessed through decomposition of the word into its constituent morphemes. Further, based on the experimental data we have developed a series of computational models that can be used to predict the decomposition of Bangla polymorphemic words. Our evaluation result shows that decom- position of a polymorphemic word depends on several factors like, frequency, productivity of the suffix and the compositionality between the stem and the suffix. The organization of the paper is as follows: Sec. 2 presents related works; Sec. 3 describes experiment design and procedure; Sec. 4 presents the processing of CVs; and finally, Sec. 5 concludes the paper by presenting the future direction of the work. 2 RReellaatteedd WWoorrkkss 2. . 11 RReepprreesseennttaattiioonn ooff ppoollyymmoorrpphheemmiicc wwoorrddss Over the last few decades many studies have attempted to understand the representation and processing of morphologically complex words in the brain for various languages. Most of the studies are designed to support one of the two mutually exclusive paradigms: the full-listing and the morphemic model. The full-listing model claims that polymorphic words are represented as a whole in the human mental lexicon (Bradley, 1980; Butterworth, 1983). On the other hand, morphemic model argues that morphologically complex words are decomposed and represented in terms of the smaller morphemic units. The affixes are stripped away from the root form, which in turn are used to access the mental lexicon (Taft and Forster, 1975; Taft, 1981 ; MacKay, 1978). Intermediate to these two paradigms is the partial decomposition model that argues that different types of morphological forms are processed separately. For instance, the derived morphological forms are believed to be represented as a whole, whereas the representation of the inflected forms follows the morphemic model (Caramazza et al., 1988). Traditionally, priming experiments have been used to study the effects of morphology in language processing. Priming is a process that results in increase in speed or accuracy of response to a stimulus, called the target, based on the occurrence of a prior exposure of another stimulus, called the prime (Tulving et al., 1982). Here, subjects are exposed to a prime word for a short duration, and are subsequently shown a target word. The prime and target words may be morphologically, phonologically or semantically re124 lated. An analysis of the effect of the reaction time of subjects reveals the actual organization and representation of the lexicon at the relevant level. See Pulvermüller (2002) for a detailed account of such phenomena. It has been argued that frequency of a word influences the speed of lexical processing and thus, can serve as a diagnostic tool to observe the nature and organization of lexical representations. (Taft, 1975) with his experiment on English inflected words, argued that lexical decision responses of polymorphemic words depends upon the base word frequency. Similar observation for surface word frequency was also observed by (Bertram et al., 2000;Bradley, 1980;Burani et al., 1987;Burani et al., 1984;Schreuder et al., 1997; Taft 1975;Taft, 2004) where it has been claimed that words having low surface frequency tends to decompose. Later, Baayen(2000) proposed the dual processing race model that proposes that a specific morphologically complex form is accessed via its parts if the frequency of that word is above a certain threshold of frequency, then the direct route will win, and the word will be accessed as a whole. If it is below that same threshold of frequency, the parsing route will win, and the word will be accessed via its parts. 2. . 22 RReepprreesseennttaattiioonn ooff CCoommppoouunndd A compound verb (CV) consists of two verbs (V1 and V2) acting as and expresses a single expression For example, in the sentence VVeerrbbss a sequence of a single verb of meaning. রুটিগুল ো খেল খেল ো (/ruTigulo kheYe phela/) ―bread-plural-the eat and drop-pres. Imp‖ ―Eat the breads‖ the verb sequence “খেল খেল ো (eat drop)” is an example of CV. Compound verbs are a special phenomena that are abundantly found in IndoEuropean languages like Indian languages. A plethora of works has been done to provide linguistic explanations on the formation of such word, yet none so far has led to any consensus. Hook (1981) considers the second verb V2 as an aspectual complex comparable to the auxiliaries. Butt (1993) argues CV formations in Hindi and Urdu are either morphological or syntactical and their formation take place at the argument struc- ture. Bashir (1993) tried to construct a semantic analysis based on “prepared” and “unprepared mind”. Similar findings have been proposed by Pandharipande (1993) that points out V1 and V2 are paired on the basis of their semantic compatibility, which is subject to syntactic constraints. Paul (2004) tried to represent Bangla CVs in terms of HPSG formalism. She proposes that the selection of a V2 by a V1 is determined at the semantic level because the two verbs will unify if and only if they are semantically compatible. Since none of the linguistic formalism could satisfactorily explain the unique phenomena of CV formation, we here for the first time drew our attention towards psycholinguistic and neurolinguistic studies to model the processing of verb-verb combinations in the ML and compare these responses with that of the existing models. 3 TThhee PPrrooppoosseedd AApppprrooaacchheess 3. . 11 TThhee ppssyycchhoolliinngguuiissttiicc eexxppeerriimmeennttss We apply two different priming experiments namely, the cross modal priming and masked priming experiment discussed in (Forster and Davis, 1984; Rastle et al., 2000;Marslen-Wilson et al., 1994; Marslen-Wilson et al., 2008) for Bangla morphologically complex words. Here, the prime is morphologically derived form of the target presented auditorily (for cross modal priming) or visually (for masked priming). The subjects were asked to make a lexical decision whether the given target is a valid word in that language. The same target word is again probed but with a different audio or visual probe called the control word. The control shows no relationship with the target. For example, baYaska (aged) and baYasa (age) is a prime-target pair, for which the corresponding control-target pair could be naYana (eye) and baYasa (age). Similar to (Marslen-Wilson et al., 2008) the masked priming has been conducted for three different SOA (Stimulus Onset Asynchrony), 48ms, 72ms and 120ms. The SOA is measured as the amount of time between the start the first stimulus till the start of the next stimulus. TCM abl-’+ Sse-+ O1 +:-DatjdgnmAshielbatArDu)f(osiAMrawnteihmsgcdaoe)lEx-npgmAchebamr)iD-gnatmprhdiYlbeaA(n ftrTsli,ae(+gnrmdisc)phroielctn)osrelated, and - implies unrelated. There were 500 prime-target and controltarget pairs classified into five classes. Depending on the class, the prime is related to the target 125 either in terms of morphology, semantics, orthography and/or Phonology (See Table 1). The experiments were conducted on 24 highly educated native Bangla speakers. Nineteen of them have a graduate degree and five hold a post graduate degree. The age of the subjects varies between 22 to 35 years. RReessuullttss:: The RTs with extreme values and incorrect decisions were excluded from the data. The data has been analyzed using two ways ANOVA with three factors: priming (prime and control), conditions (five classes) and prime durations (three different SOA). We observe strong priming effects (p<0.05) when the target word is morphologically derived and has a recognizable suffix, semantically and orthographically related with respect to the prime; no priming effects are observed when the prime and target words are orthographically related but share no morphological or semantic relationship; although not statistically significant (p>0.07), but weak priming is observed for prime target pairs that are only semantically related. We see no significant difference between the prime and control RTs for other classes. We also looked at the RTs for each of the 500 target words. We observe that maximum priming occurs for words in [M+S+O+](69%), some priming is evident in [M+S+O-](51%) and [M'+S-O+](48%), but for most of the words in [M-S+O-](86%) and [M-S-O+](92%) no priming effect was observed. 3. . 22 FFrreeqquueennccyy DDiissttrriibbuuttiioonn MMooddeellss ooff MMoo rrpphhoo-llooggiiccaall PPrroocceessssiinngg From the above results we saw that not all polymorphemic words tend to decompose during processing, thus we need to further investigate the processing phenomena of Bangla derived words. One notable means is to identify whether the stem or suffix frequency is involved in the processing stage of that word. For this, we apply different frequency based models to the Bangla polymorphemic words and try to evaluate their performance by comparing their predicted results with the result obtained through the priming experiment. MMooddeell --11:: BBaassee aanndd SSuurrffaaccee wwoorrdd ffrreeqquueennccyy ee ff-ffeecctt -- It states that the probability of decomposition of a Bangla polymorphemic word depends upon the frequency of its base word. Thus, if the stem frequency of a polymorphemic word crosses a given threshold value, then the word will decomposed into its constituent morpheme. Similar claim has been made for surface word frequency model where decomposition depends upon the frequency of the surface word itself. We have evaluated both the models with the 500 words used in the priming experiments discussed above. We have achieved an accuracy of 62% and 49% respectively for base and surface word frequency models. MMooddeell --22:: CCoommbbiinniinngg tthhee bbaassee aanndd ssuurrffaaccee wwoorrdd ffrreeq quueennccyy -- In a pursuit towards an extended model, we combine model 1 and 2 together. We took the log frequencies of both the base and the derived words and plotted the best-fit regression curve over the given dataset. The evaluation of this model over the same set of 500 target words returns an accuracy of 68% which is better than the base and surface word frequency models. However, the proposed model still fails to predict processing of around 32% of words. This led us to further enhance the model. For this, we analyze the role of suffixes in morphological processing. MMooddeell -- 33:: DDeeggrreeee ooff AAffffiixxaattiioonn aanndd SSuuffffiixx PPrroodd-uuccttiivviittyy:: we examine whether the regression analysis between base and derived frequency of Bangla words varies between suffixes and how these variations affect morphological decomposition. With respect to this, we try to compute the degree of affixation between the suffix and the base word. For this, we perform regression analysis on sixteen different Bangla suffixes with varying degree of type and token frequencies. For each suffix, we choose 100 different derived words. We observe that those suffixes having high value of intercept are forming derived words whose base frequencies are substantially high as compared to their derived forms. Moreover we also observe that high intercept value for a given suffix indicates higher inclination towards decomposition. Next, we try to analyze the role of suffix type/token ratio and compare them with the base/derived frequency ratio model. This has been done by regression analysis between the suffix type-token ratios with the base-surface frequency ratio. We further tried to observe the role of suffix productivity in morphological processing. For this, we computed the three components of productivity P, P* and V as discussed in (Hay and Plag, 2004). P is the “conditioned degree of productivity” and is the probability that we are encountering a word with an affix and it is representing a new type. P* is the “hapaxedconditioned degree of productivity”. It expresses the probability that when an entirely new word is 126 encountered it will contain the suffix. V is the “type frequency”. Finally, we computed the productivity of a suffix through its P, P* and V values. We found that decomposition of Bangla polymorphemic word is directly proportional to the productivity of the suffix. Therefore, words that are composed of productive suffixes (P value ranges between 0.6 and 0.9) like “-oYAlA”, “-giri”, “-tba” and “-panA” are highly decomposable than low productive suffixes like “-Ani”, “-lA”, “-k”, and “-tama”. The evaluation of the proposed model returns an accuracy of 76% which comes to be 8% better than the preceding models. CCoommbbiinniinngg MMooddeell --22 aanndd MMooddeell -- 33:: One important observation that can be made from the above results is that, model-3 performs best in determining the true negative values. It also possesses a high recall value of (85%) but having a low precision of (50%). In other words, the model can predict those words for which decomposition will not take place. On the other hand, results of Model-2 posses a high precision of 70%. Thus, we argue that combining the above two models can better predict the decomposition of Bangla polymorphemic words. Hence, we combine the two models together and finally achieved an overall accuracy of 80% with a precision of 87% and a recall of 78%. This surpasses the performance of the other models discussed earlier. However, around 22% of the test words were wrongly classified which the model fails to justify. Thus, a more rigorous set of experiments and data analysis are required to predict access mechanisms of such Bangla polymorphemic words. 3. . 33 SStteemm- -SSuuffffiixx CCoommppoossiittiioonnaalliittyy Compositionality refers to the fact that meaning of a complex expression is inferred from the meaning of its constituents. Therefore, the cost of retrieving a word from the secondary memory is directly proportional to the cost of retrieving the individual parts (i.e the stem and the suffix). Thus, following the work of (Milin et al., 2009) we define the compositionality of a morphologically complex word (We) as: C(We)=α 1H(We)+α α2H(e)+α α3H(W|e)+ α4H(e|W) Where, H(x) is entropy of an expression x, H(W|e) is the conditional entropy between the stem W and suffix e and is the proportionality factor whose value is computed through regression analysis. Next, we tried to compute the compositionality of the stem and suffixes in terms of relative entropy D(W||e) and Point wise mutual information (PMI). The relative entropy is the measure of the distance between the probability distribution of the stem W and the suffix e. The PMI measures the amount of information that one random variable (the stem) contains about the other (the suffix). We have compared the above three techniques with the actual reaction time data collected through the priming and lexical decision experiment. We observed that all the three information theoretic models perform much better than the frequency based models discussed in the earlier section, for predicting the decomposability of Bangla polymorphemic words. However, we think it is still premature to claim anything concrete at this stage of our work. We believe much more rigorous experiments are needed to be per- formed in order to validate our proposed models. Further, the present paper does not consider factors related to age of acquisition, and word familiarity effects that plays important role in the processing of morphologically complex words. Moreover, it is also very interesting to see how stacking of multiple suffixes in a word are processed by the human brain. 44 OOrrggaanniizzaattiioonn aanndd PPrroocceessssiinngg ooff CCoomm-ppoouunndd VVeerrbbss iinn tthhee MMeennttaall LLeexxiiccoonn Compound verbs, as discussed above, are special type of verb sequences consisting of two or more verbs acting as a single verb and express a single expression of meaning. The verb V1 is known as pole and V2 is called as vector. For example, “ওঠে পড়া ” (getting up) is a compound verb where individual words do not entirely reflects the meaning of the whole expression. However, not all V1+V2 combinations are CVs. For example, expressions like, “নিঠে য়াও ”(take and then go) and “ নিঠে আঠ ়া” (return back) are the examples of verb sequences where meaning of the whole expression can be derived from the mean- ing of the individual component and thus, these verb sequences are not considered as CV. The key question linguists are trying to identify for a long time and debating a lot is whether to consider CVs as a single lexical units or consider them as two separate units. Since linguistic rules fails to explain the process, we for the first time tried to perform cognitive experiments to understand the organization and processing of such verb sequences in the human mind. A clear understanding about these phenomena may help us to classify or extract actual CVs from other verb 127 sequences. In order to do so, presently we have applied three different techniques to collect user data. In the first technique, we annotated 4500 V1+V2 sequences, along with their example sentences, using a group of three linguists (the expert subjects). We asked the experts to classify the verb sequences into three classes namely, CV, not a CV and not sure. Each linguist has received 2000 verb pairs along with their respective example sentences. Out of this, 1500 verb sequences are unique to each of them and rest 500 are overlapping. We measure the inter annotator agreement using the Fleiss Kappa (Fleiss et al., 1981) measure (κ) where the agreement lies around 0.79. Next, out of the 500 common verb sequences that were annotated by all the three linguists, we randomly choose 300 V1+V2 pairs and presented them to 36 native Bangla speakers. We ask each subjects to give a compositionality score of each verb sequences under 1-10 point scale, 10 being highly compositional and 1 for noncompositional. We found an agreement of κ=0.69 among the subjects. We also observe a continuum of compositionality score among the verb sequences. This reflects that it is difficult to classify Bangla verb sequences discretely into the classes of CV and not a CV. We then, compare the compositionality score with that of the expert user’s annotation. We found a significant correlation between the expert annotation and the compositionality score. We observe verb sequences that are annotated as CVs (like, খেঠে খিল )কঠে খি ,ওঠে পড ,have got low compositionality score (average score ranges between 1-4) on the other hand high compositional values are in general tagged as not a cv (নিঠে য়া (come and get), নিঠে আে (return back), তুঠল খেঠেনি (kept), গনিঠে পিল (roll on floor)). This reflects that verb sequences which are not CV shows high degree of compositionality. In other words non CV verbs can directly interpret from their constituent verbs. This leads us to the possibility that compositional verb sequences requires individual verbs to be recognized separately and thus the time to recognize such expressions must be greater than the non-compositional verbs which maps to a single expression of meaning. In order to validate such claim we perform a lexical decision experiment using 32 native Bangla speakers with 92 different verb sequences. We followed the same experimental procedure as discussed in (Taft, 2004) for English polymorphemic words. However, rather than derived words, the subjects were shown a verb sequence and asked whether they recognize them as a valid combination. The reaction time (RT) of each subject is recorded. Our preliminarily observation from the RT analysis shows that as per our claim, RT of verb sequences having high compositionality value is significantly higher than the RTs for low or noncompositional verbs. This proves our hypothesis that Bangla compound verbs that show less compositionality are stored as a hole in the mental lexicon and thus follows the full-listing model whereas compositional verb phrases are individually parsed. However, we do believe that our experiment is composed of a very small set of data and it is premature to conclude anything concrete based only on the current experimental results. 5 FFuuttuurree DDiirreeccttiioonnss In the next phase of our work we will focus on the following aspects of Bangla morphologically complex words: TThhee WWoorrdd FFaammiilliiaarriittyy EEffffeecctt:: Here, our aim is to study the role of familiarity of a word during its processing. We define the familiarity of a word in terms of corpus frequency, Age of acquisition, the level of language exposure of a person, and RT of the word etc. RRoollee ooff ssuuffffiixx ttyyppeess iinn mmoorrpphhoollooggiiccaall ddeeccoo mm ppoo-ssiittiioonn:: For native Bangla speakers which morphological suffixes are internalized and which are just learnt in school, but never internalized. We can compare the representation of Native, Sanskrit derived and foreign suffixes in Bangla words. CCoommppuuttaattiioonnaall mmooddeellss ooff oorrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoommppoouunndd vveerrbbss :: presently we have performed some small set of experiments to study processing of compound verbs in the mental lexicon. In the next phase of our work we will extend the existing experiments and also apply some more techniques like, crowd sourcing and language games to collect more relevant RT and compositionality data. Finally, based on the collected data we will develop computational models that can explain the possible organizational structure and processing mechanism of morphologically complex Bangla words in the mental lexicon. Reference Aitchison, J. (1987). ―Words in the mind: An introduction to the mental lexicon‖. Wiley-Blackwell, 128 Baayen R. H. (2000). ―On frequency, transparency and productivity‖. G. Booij and J. van Marle (eds), Yearbook of Morphology, pages 181-208, Baayen R.H. (2003). ―Probabilistic approaches to morphology‖. Probabilistic linguistics, pages 229287. Baayen R.H., T. Dijkstra, and R. Schreuder. (1997). ―Singulars and plurals in dutch: Evidence for a parallel dual-route model‖. Journal of Memory and Language, 37(1):94-1 17. Bashir, E. (1993), ―Causal Chains and Compound Verbs.‖ In M. K. Verma ed. (1993). Bentin, S. & Feldman, L.B. (1990). The contribution of morphological and semantic relatedness to repetition priming at short and long lags: Evidence from Hebrew. Quarterly Journal of Experimental Psychology, 42, pp. 693–71 1. Bradley, D. (1980). Lexical representation of derivational relation, Juncture, Saratoga, CA: Anma Libri, pp. 37-55. Butt, M. (1993), ―Conscious choice and some light verbs in Urdu.‖ In M. K. Verma ed. (1993). Butterworth, B. (1983). Lexical Representation, Language Production, Vol. 2, pp. 257-294, San Diego, CA: Academic Press. Caramazza, A., Laudanna, A. and Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28, pp. 297-332. Drews, E., and Zwitserlood, P. (1995).Morphological and orthographic similarity in visual word recognition. Journal of Experimental Psychology:HumanPerception andPerformance, 21, 1098– 1116. Fellbaum, C. (ed.). (1998). WordNet: An Electronic Lexical Database, MIT Press. Forster, K.I., and Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 680–698. Frost, R., Forster, K.I., & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked-priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 829–856. Grainger, J., Cole, P., & Segui, J. (1991). Masked morphological priming in visual word recognition. Journal of Memory and Language, 30, 370–384. Hook, P. E. (1981). ―Hindi Structures: Intermediate Level.‖ Michigan Papers on South and Southeast Asia, The University of Michigan Center for South and Southeast Studies, Ann Arbor, Michigan. Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 1981. The measurement of interrater agreement. Statistical methods for rates and proportions,2:212–236. MacKay,D.G.(1978), Derivational rules and the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 17, pp.61-71. Marslen-Wilson, W.D., & Tyler, L.K. (1997). Dissociating types of mental computation. Nature, 387, pp. 592–594. Marslen-Wilson, W.D., & Tyler, L.K. (1998). Rules, representations, and the English past tense. Trends in Cognitive Sciences, 2, pp. 428–435. Marslen-Wilson, W.D., Tyler, L.K., Waksler, R., & Older, L. (1994). Morphology and meaning in the English mental lexicon. Psychological Review, 101, pp. 3–33. Marslen-Wilson,W.D. and Zhou,X.( 1999). Abstractness, allomorphy, and lexical architecture. Language and Cognitive Processes, 14, 321–352. Milin, P., Kuperman, V., Kosti´, A. and Harald R., H. (2009). Paradigms bit by bit: an information- theoretic approach to the processing of paradigmatic structure in inflection and derivation, Analogy in grammar: Form and acquisition, pp: 214— 252. Pandharipande, R. (1993). ―Serial verb construction in Marathi.‖ In M. K. Verma ed. (1993). Paul, S. (2004). An HPSG Account of Bangla Compound Verbs with LKB Implementation, Ph.D. Dissertation. CALT, University of Hyderabad. Pulvermüller, F. (2002). The Neuroscience guage. Cambridge University Press. of Lan- Stolz, J.A., and Feldman, L.B. (1995). The role of orthographic and semantic transparency of the base morpheme in morphological processing. In L.B. Feldman (Ed.) Morphological aspects of language processing. Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Taft, M., and Forster, K.I.(1975). Lexical storage and retrieval of prefix words. Journal of Verbal Learning and Verbal Behavior, Vol.14, pp. 638-647. Taft, M.(1988). A morphological decomposition model of lexical access. Linguistics, 26, pp. 657667. Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. Quarterly Journal of Experimental Psychology, 57A, pp. 745-765 Tulving, E., Schacter D. L., and Heather A.(1982). Priming Effects in Word Fragment Completion are independent of Recognition Memory. Journal of Experimental Psychology: Learning, Memory and Cognition, vol.8 (4). 129
5 0.95326382 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
Author: Ruey-Cheng Chen
Abstract: We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.
6 0.94760752 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
7 0.91645974 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
8 0.8898778 61 acl-2013-Automatic Interpretation of the English Possessive
9 0.83678585 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
10 0.75036907 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
11 0.74309206 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
12 0.72780252 242 acl-2013-Mining Equivalent Relations from Linked Data
13 0.69964486 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
14 0.68571585 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
15 0.66477239 154 acl-2013-Extracting bilingual terminologies from comparable corpora
16 0.65046507 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
17 0.64839453 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
18 0.64584208 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
19 0.6420629 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
20 0.64067298 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web