acl acl2013 acl2013-362 knowledge-graph by maker-knowledge-mining

362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers


Source: pdf

Author: Andre Martins ; Miguel Almeida ; Noah A. Smith

Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 pt Abstract We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. [sent-7, score-0.276]

2 Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. [sent-8, score-0.855]

3 Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German). [sent-9, score-0.407]

4 1 Introduction Dependency parsing has become a prominent approach to syntax in the last few years, with increasingly fast and accurate models being devised (K¨ ubler et al. [sent-10, score-0.157]

5 In projective parsing, the arcs in the dependency tree are constrained to be nested, and the problem of finding the best tree can be addressed with dynamic programming. [sent-12, score-0.509]

6 This results in cubic-time decoders for arc-factored and sibling second-order models (Eisner, 1996; McDonald and Pereira, 2006), and quartic-time for grandparent models (Carreras, 2007) and third-order models (Koo and Collins, 2010). [sent-13, score-0.072]

7 Recently, Rush and Petrov (2012) trained third-order parsers with vine pruning cascades, achieving runtimes only a small factor slower than first-order systems. [sent-14, score-0.48]

8 Third-order features have also been included in transition systems (Zhang and Nivre, 2011) and graph-based parsers with cube-pruning (Zhang and McDonald, 2012). [sent-15, score-0.171]

9 Unfortunately, non-projective dependency parsers (appropriate for languages with a more flexible word order, such as Czech, Dutch, and German) lag behind these recent advances. [sent-16, score-0.276]

10 The main obstacle is that non-projective parsing is NP-hard beyond arc-factored models (McDonald nasmith@ c s . [sent-17, score-0.084]

11 Approximate parsers have therefore been introduced, based on belief propagation (Smith and Eisner, 2008), dual decomposition (Koo et al. [sent-20, score-0.492]

12 These are all instances of turbo parsers, as shown by Martins et al. [sent-23, score-0.103]

13 While this line of research has led to accuracy gains, none of these parsers use third-order contexts, and their speeds are well behind those of projective parsers. [sent-25, score-0.417]

14 This extension is non-trivial since exact dynTahmisic e programming oisn -ntroivt applicable. [sent-27, score-0.039]

15 Instead, we adapt AD3, the dual decomposition algorithm proposed by Martins et al. [sent-28, score-0.321]

16 (201 1), to handle third-order features, by introducing specialized head automata. [sent-29, score-0.224]

17 • We make our parser substantially faster than the many-components approach ollfy M faasrtetirn tsh aent hale. [sent-30, score-0.067]

18 While AD3 requires solving quadratic subproblems as an intermediate step, recent results (Martins et al. [sent-32, score-0.248]

19 , 2012) show that they can be addressed with the same oracles used in the subgradient method (Koo et al. [sent-33, score-0.107]

20 This enables AD3 to exploit combinatorial subproblems like the the head automata above. [sent-35, score-0.465]

21 1 2 Dependency Parsing with AD3 Dual decomposition is a class of optimization techniques that tackle the dual of combinatorial 1Released as TurboParser 2. [sent-37, score-0.38]

22 Firstorder models factor over arcs (Eisner, 1996; McDonald et al. [sent-46, score-0.127]

23 , 2005), and second-order models include also consecutive siblings and grandparents (Carreras, 2007). [sent-47, score-0.101]

24 Our parsers add also arbitrary siblings (not necessarily consecutive) and head bigrams, as in Martins et al. [sent-48, score-0.362]

25 In this paper, we employ alternating directions dual de- composition (AD3; Martins et al. [sent-53, score-0.229]

26 The difference is that the AD3 subproblems have an additional quadratic term to accelerate consensus. [sent-57, score-0.215]

27 , 2012) has shown that: (i) AD3 converges at a faster rate,2 and (ii) the quadratic subproblems can be solved using the same combinatorial machinery that is used in the subgradient algorithm. [sent-59, score-0.454]

28 This opens the door for larger subproblems (such as the combination of trees and head automata in Koo et al. [sent-60, score-0.406]

29 =W me parameterize a dependency tree via an indicator vector = u := huaia∈A, where ua is 1if the arc a is in the tree, a hnud 0i otherwise, and we denote by Y ⊆ tthreee s,e atn odf 0 0s outchhe vrewcitsoer,s a tnhda tw are ienndoitcea btoyrs Y o ⊆f w Rell- R|A| a ? [sent-74, score-0.252]

30 LWete { assume that the scPore of a parse tree u ∈ ⊆Y decomposes as f(u) := fs (zs), wtrehee ure e∈ac Yh zs := hzs,aia∈As ius a “paPrtial view” of u, and each loc:a=l score function fs comes from a feature-based linear model. [sent-81, score-0.542]

31 Past work in dependency parsing considered either (i) a few “large” components, such as trees and head automata (Smith and Eisner, 2008; Koo et al. [sent-82, score-0.458]

32 , those that are partial views of an actual parse tree. [sent-87, score-0.088]

33 We assume each cPorseS= o1f R|As | QsS=1 parse u ∈ uYc corresponds uniquely to a globally cpoarnsseis utent ∈ tuple orrfe views, sa undn vquiceel-yve trosa a. [sent-92, score-0.086]

34 (201 1), the problem of obtaining the best-scored tree can be written as follows: maximize PsS=1 fs(zs) ∈ R|A|, zs ∈ Ys, w. [sent-94, score-0.339]

35 zs,a = ua, ∀s, ∀a ∈ As, (1) where the equality constraint ensures that the partial views “glue” together to form a coherent parse tree. [sent-99, score-0.121]

36 2 Dual Decomposition and AD3 Dual decomposition methods dualize out the equality constraint in Eq. [sent-101, score-0.16]

37 1 by introducing Lagrange multipliers In doing so, they solve a relaxation where the combinatorial sets Ys are replaced by their convex hulls Zs := conv(Ys). [sent-102, score-0.13]

38 λs,azs,a (2) Typically, Assumption 1is met whenever the maximization of fs over Ys is tractable, since the objective in Eq. [sent-115, score-0.077]

39 The convex hPull of Ys is the set conv(Ys) := {Pys αys ys | α ∈ ∆P|Ys | }. [sent-122, score-0.38]

40 Its members represent marginPal probabilities over ∈the ∆ arcs }in. [sent-123, score-0.127]

41 (3) Above, ρ is a coPnstant and the quadratic term penalizes deviations from the current global solution (stored in We will see (Prop. [sent-131, score-0.078]

42 2) that this problem can be solved iteratively using u(t)). [sent-132, score-0.035]

43 (4) λ-updates, where the LaPgrange multipliers are adjusted etos penalize disagreements: λ(st,+a1) := λs(t,a) − ρ(zs(t,a+1) − u(at+1)). [sent-136, score-0.041]

44 (5) In sum, the only difference between AD3 and the subgradient method is in the z-updates, which in AD3 require solving a quadratic problem. [sent-137, score-0.218]

45 While closed-form solutions have been developed for some specialized components (Martins et al. [sent-138, score-0.111]

46 , 2011), this problem is in general more difficult than the one arising in the subgradient algorithm. [sent-139, score-0.107]

47 , 2012) that maintains an estimate of W by iteratively adding and removing elements computed through the oracle in Eq. [sent-148, score-0.037]

48 This has a huge impact in practice and is crucial to obtain the fast runtimes in §4 (see Fig. [sent-150, score-0.125]

49 7 AD3 obtain the fast 5In our experiments (§4), we set ρ = 0. [sent-153, score-0.04]

50 We show averaged runtimes in PTB §22 as a feuntn. [sent-167, score-0.085]

51 3 Solving the Subproblems We next describe the actual components used in our third-order parsers. [sent-173, score-0.03]

52 , 2005): fTREE(z) = σARC (π(m) , m), where π(m) is the parent oPf the mth word according to the parse tree z, Pand σARC (h, m) is the score of an individual arc. [sent-176, score-0.103]

53 The parse tree that maximizes this function can be found in time O(L3) via the Chu-Liu-Edmonds’ algorithm (Chu and Liu, 1965; Edmonds, 1967). [sent-177, score-0.103]

54 Let Aihn and Ahout denote respectively the sets of incoming and outgoing candidate arcs for the hth word, where the latter subdivides into arcs pointing to the right, Ahou,→t, and to the left, Ahou,←t. [sent-179, score-0.33]

55 For each head word h in the parse tree z, define g := π(h), and let hm0, m1, . [sent-182, score-0.247]

56 , where we use the shorthand z |B to denote the swuhbevreect wore o ufs z tihnede sxheodr by tdhe z arcs in B A. [sent-189, score-0.127]

57 sNuobtvee tchtoatr t ohifs score xfuednct bioyn t haebs aorrcbss grandparent and consecutive sibling scores, in addition to the grand-sibling scores. [sent-190, score-0.126]

58 9 For each h, fhG,S→IB can be ⊆ 8In fact, there is an asymptotically faster O(L2) algorithm (Tarjan, 1977). [sent-191, score-0.038]

59 Moreover, ifthe set ofpossible arcs is reduced to a subset B ⊆ A (via pruning), then the fastest known altgoo raith sumb s(eGta Bbow ⊆ e At a (lv. [sent-192, score-0.127]

60 619 TAGSR EIBQ ENoO p( rL un423)ingO(K|AOL(imnK+ |2L≤ 32lKo)gLsOam(KeO, L(+J |K+A2 Loh2uLlt)o|g≤LJ) Table 1: Theoretical runtimes of each subproblem without pruning, limiting the number of candidate heads, and limiting (in addition) the number of modifiers. [sent-197, score-0.214]

61 Note the O(L log L) total runtime per AD3 iteration in the latter case. [sent-198, score-0.04]

62 maximized in time O(L3) with dynamic programming, yielding O(L4) total runtime. [sent-199, score-0.089]

63 In addition, we define left and right-side tri-sibling head automata that remember the previous two modifiers of a head word. [sent-201, score-0.413]

64 Again, each of thPese functions can be maximized in time O(L3), yielding O(L4) runtime. [sent-203, score-0.059]

65 − Each score σHPB (m, h, h0) is obtained via features that look at the heads of consecutive words (as in Martins et al. [sent-206, score-0.097]

66 This function can be maximized in time O(L3) with the Viterbi algorithm. [sent-208, score-0.059]

67 We handle arbitrary siblings as in Martins et al. [sent-210, score-0.076]

68 With a simple strategy that limits the number of candidate heads per word to a constant K, this drops to cubic time. [sent-217, score-0.043]

69 10 Further speed-ups are possible with more pruning: by limiting the number of possible modifiers to a con- stant J, the runtime would reduce to O(L log L). [sent-218, score-0.083]

70 10In our experiments, we employed this strategy with K = 10, by pruning with a first-order probabilistic model. [sent-219, score-0.103]

71 Following Koo and Collins (2010), for each word m, we also pruned away incoming arcs hh, mi with posterior probability lperussn tehda na w0. [sent-220, score-0.166]

72 7948015–62374,158026347502† Table 2: Results for the projective English dataset. [sent-226, score-0.139]

73 We report unlabeled attachment scores (UAS) ignoring punctuation, and parsing speeds in tokens per second. [sent-227, score-0.191]

74 Our speeds include the time necessary for pruning, evaluating features, and decoding, as measured on a Intel Core i7 processor @3. [sent-228, score-0.107]

75 The others are speeds reported in the cited papers; those marked with were converted from times per sentence. [sent-230, score-0.107]

76 † 4 Experiments We first evaluated our non-projective parser in a projective English dataset, to see how its speed and accuracy compares with recent projective parsers, which can take advantage of dynamic programming. [sent-231, score-0.337]

77 To this end, we converted the Penn Treebank to dependencies through (i) the head rules of Yamada and Matsumoto (2003) (PTB-YM) and (ii) basic dependencies from the Stanford parser 2. [sent-232, score-0.173]

78 To ensure valid parse trees at test time, we rounded fractional solutions as in Martins et al. [sent-237, score-0.079]

79 (2009)— yet, solutions were integral ≈ 95% of the time. [sent-238, score-0.03]

80 In the dev-set, we see consistent gains when more expressive features are added, the best accuracies be- ing achieved with the full third-order model; this comes at the cost of a 6-fold drop in runtime compared with a first-order model. [sent-241, score-0.087]

81 By looking at the two bottom blocks, we observe that our parser has slightly better accuracies than recent projective parsers, with comparable speed levels (with the exception of the highly optimized vine cascade approach of Rush and Petrov, 2012). [sent-242, score-0.304]

82 in Wg eto tr aoinbteadin a a suitmompleati 2cn dp-aortrd-oefr-s tpaegegcehr wtaitgsh for §22–23, with accuracies 97. [sent-245, score-0.047]

83 The last for the CoNLL-2006 datasets and the non-projective English dataset of CoNLL-2008. [sent-249, score-0.03]

84 UAS” includes the most accurate parsers among Nivre et al. [sent-250, score-0.171]

85 Our third-order model achieved the best reported scores for English, Czech, German, and Dutch— which includes the three largest datasets and the ones with the most non-projective dependencies— and is on par with the state of the art for the remaining languages. [sent-258, score-0.03]

86 To our knowledge, the speeds are the highest reported among higherorder non-projective parsers, and only about 3– 4 times slower than the vine parser of Rush and Petrov (2012), which has lower accuracies. [sent-259, score-0.257]

87 5 Conclusions We presented new third-order non-projective parsers which are both fast and accurate. [sent-260, score-0.211]

88 We decoded with AD3, an accelerated dual decomposition algorithm which we adapted to handle large components, including specialized head automata for the third-order features, and a sequence model for head bigrams. [sent-261, score-0.855]

89 Results are above the state of the art for large datasets and non-projective languages. [sent-262, score-0.03]

90 In the hope that other researchers may find our implementation useful or are willing to con- tribute with further improvements, we made our parsers publicly available as open source software. [sent-263, score-0.171]

91 Three new probabilistic models for dependency parsing: An exploration. [sent-303, score-0.105]

92 Multilingual dependency analysis with a two-stage dis- criminative parser. [sent-452, score-0.105]

93 On dual decomposition and linear programming relaxations for natural language processing. [sent-483, score-0.36]

94 The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. [sent-500, score-0.084]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('martins', 0.366), ('ys', 0.35), ('zs', 0.285), ('dual', 0.194), ('rush', 0.184), ('parsers', 0.171), ('koo', 0.168), ('head', 0.144), ('projective', 0.139), ('subproblems', 0.137), ('mk', 0.134), ('arcs', 0.127), ('decomposition', 0.127), ('automata', 0.125), ('mcdonald', 0.117), ('ib', 0.11), ('subgradient', 0.107), ('speeds', 0.107), ('dependency', 0.105), ('turbo', 0.103), ('pruning', 0.103), ('vine', 0.089), ('aihn', 0.087), ('pys', 0.087), ('runtimes', 0.085), ('parsing', 0.084), ('quadratic', 0.078), ('fs', 0.077), ('smith', 0.068), ('lisboa', 0.067), ('combinatorial', 0.059), ('maximized', 0.059), ('aghs', 0.058), ('ahou', 0.058), ('pkp', 0.058), ('qss', 0.058), ('zsi', 0.058), ('portugal', 0.057), ('eisner', 0.055), ('petrov', 0.055), ('consecutive', 0.054), ('tree', 0.054), ('priberam', 0.052), ('sib', 0.052), ('conv', 0.052), ('plm', 0.052), ('specialized', 0.051), ('parse', 0.049), ('arc', 0.048), ('komodakis', 0.048), ('figueiredo', 0.048), ('collins', 0.047), ('siblings', 0.047), ('accuracies', 0.047), ('nivre', 0.047), ('ua', 0.045), ('chu', 0.045), ('limiting', 0.043), ('heads', 0.043), ('subproblem', 0.043), ('instituto', 0.043), ('multipliers', 0.041), ('accelerated', 0.041), ('aguiar', 0.041), ('runtime', 0.04), ('fast', 0.04), ('grandparent', 0.039), ('incoming', 0.039), ('views', 0.039), ('programming', 0.039), ('hh', 0.038), ('faster', 0.038), ('oracle', 0.037), ('hth', 0.037), ('nocedal', 0.037), ('tuple', 0.037), ('mp', 0.035), ('alternating', 0.035), ('zhang', 0.035), ('solved', 0.035), ('sibling', 0.033), ('uas', 0.033), ('ubler', 0.033), ('solving', 0.033), ('equality', 0.033), ('arxiv', 0.033), ('slower', 0.032), ('carreras', 0.031), ('dynamic', 0.03), ('optimum', 0.03), ('convex', 0.03), ('datasets', 0.03), ('solutions', 0.03), ('components', 0.03), ('blocks', 0.03), ('dutch', 0.03), ('handle', 0.029), ('parser', 0.029), ('cmu', 0.029), ('surdeanu', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers

Author: Andre Martins ; Miguel Almeida ; Noah A. Smith

Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).

2 0.24676585 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Author: Miguel Almeida ; Andre Martins

Abstract: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers.

3 0.19538078 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching

Author: Jinho D. Choi ; Andrew McCallum

Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.

4 0.18685035 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

Author: Greg Coppola ; Mark Steedman

Abstract: Higher-order dependency features are known to improve dependency parser accuracy. We investigate the incorporation of such features into a cube decoding phrase-structure parser. We find considerable gains in accuracy on the range of standard metrics. What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. WSJ). This suggests that higher-order dependency features are not simply overfitting the training material.

5 0.13633691 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing

Author: Guangyou Zhou ; Jun Zhao

Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.

6 0.1296552 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

7 0.12613244 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint

8 0.12468731 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

9 0.12360664 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model

10 0.12046111 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

11 0.11357526 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

12 0.11352443 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

13 0.11286007 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

14 0.099423006 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

15 0.098463148 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction

16 0.093679965 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

17 0.092585027 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

18 0.084561169 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

19 0.082221419 237 acl-2013-Margin-based Decomposed Amortized Inference

20 0.080953248 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.164), (1, -0.109), (2, -0.176), (3, 0.037), (4, -0.107), (5, -0.005), (6, 0.114), (7, -0.025), (8, -0.069), (9, -0.127), (10, 0.022), (11, -0.057), (12, -0.116), (13, -0.082), (14, 0.082), (15, 0.078), (16, -0.04), (17, -0.037), (18, 0.066), (19, -0.025), (20, 0.01), (21, 0.078), (22, -0.039), (23, 0.081), (24, -0.069), (25, 0.034), (26, -0.077), (27, -0.016), (28, 0.024), (29, -0.016), (30, 0.086), (31, -0.01), (32, -0.053), (33, 0.078), (34, 0.017), (35, 0.068), (36, 0.023), (37, -0.061), (38, 0.14), (39, 0.053), (40, -0.038), (41, -0.04), (42, -0.042), (43, -0.082), (44, -0.061), (45, 0.002), (46, -0.028), (47, 0.013), (48, -0.048), (49, -0.13)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94831127 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers

Author: Andre Martins ; Miguel Almeida ; Noah A. Smith

Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).

2 0.73110974 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

Author: Matthew R. Gormley ; Jason Eisner

Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.

3 0.73037076 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing

Author: Guangyou Zhou ; Jun Zhao

Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.

4 0.68433458 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

Author: Greg Coppola ; Mark Steedman

Abstract: Higher-order dependency features are known to improve dependency parser accuracy. We investigate the incorporation of such features into a cube decoding phrase-structure parser. We find considerable gains in accuracy on the range of standard metrics. What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. WSJ). This suggests that higher-order dependency features are not simply overfitting the training material.

5 0.66234112 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint

Author: Jun Suzuki ; Masaaki Nagata

Abstract: This paper proposes a framework of supervised model learning that realizes feature grouping to obtain lower complexity models. The main idea of our method is to integrate a discrete constraint into model learning with the help of the dual decomposition technique. Experiments on two well-studied NLP tasks, dependency parsing and NER, demonstrate that our method can provide state-of-the-art performance even if the degrees of freedom in trained models are surprisingly small, i.e., 8 or even 2. This significant benefit enables us to provide compact model representation, which is especially useful in actual use.

6 0.6392436 237 acl-2013-Margin-based Decomposed Amortized Inference

7 0.63141811 382 acl-2013-Variational Inference for Structured NLP Models

8 0.63073003 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

9 0.60287571 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching

10 0.58925223 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

11 0.55444509 335 acl-2013-Survey on parsing three dependency representations for English

12 0.53452986 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

13 0.52460259 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

14 0.52180737 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

15 0.51623148 94 acl-2013-Coordination Structures in Dependency Treebanks

16 0.51134515 288 acl-2013-Punctuation Prediction with Transition-based Parsing

17 0.49996352 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model

18 0.49488083 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

19 0.46140769 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

20 0.45558527 275 acl-2013-Parsing with Compositional Vector Grammars


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.51), (6, 0.046), (11, 0.06), (24, 0.019), (26, 0.034), (28, 0.014), (35, 0.041), (42, 0.037), (48, 0.048), (70, 0.035), (88, 0.018), (90, 0.015), (95, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97720563 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

Author: Daniel Bar ; Torsten Zesch ; Iryna Gurevych

Abstract: We present DKPro Similarity, an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. DKPro Similarity comprises a wide variety of measures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector comparisons and structural, stylistic, and phonetic measures. In order to promote the reproducibility of experimental results and to provide reliable, permanent experimental conditions for future studies, DKPro Similarity additionally comes with a set of full-featured experimental setups which can be run out-of-the-box and be used for future systems to built upon.

2 0.96920329 269 acl-2013-PLIS: a Probabilistic Lexical Inference System

Author: Eyal Shnarch ; Erel Segal-haLevi ; Jacob Goldberger ; Ido Dagan

Abstract: This paper presents PLIS, an open source Probabilistic Lexical Inference System which combines two functionalities: (i) a tool for integrating lexical inference knowledge from diverse resources, and (ii) a framework for scoring textual inferences based on the integrated knowledge. We provide PLIS with two probabilistic implementation of this framework. PLIS is available for download and developers of text processing applications can use it as an off-the-shelf component for injecting lexical knowledge into their applications. PLIS is easily configurable, components can be extended or replaced with user generated ones to enable system customization and further research. PLIS includes an online interactive viewer, which is a powerful tool for investigating lexical inference processes. 1 Introduction and background Semantic Inference is the process by which machines perform reasoning over natural language texts. A semantic inference system is expected to be able to infer the meaning of one text from the meaning of another, identify parts of texts which convey a target meaning, and manipulate text units in order to deduce new meanings. Semantic inference is needed for many Natural Language Processing (NLP) applications. For instance, a Question Answering (QA) system may encounter the following question and candidate answer (Example 1): Q: which explorer discovered the New World? A: Christopher Columbus revealed America. As there are no overlapping words between the two sentences, to identify that A holds an answer for Q, background world knowledge is needed to link Christopher Columbus with explorer and America with New World. Linguistic knowledge is also needed to identify that reveal and discover refer to the same concept. Knowledge is needed in order to bridge the gap between text fragments, which may be dissimilar on their surface form but share a common meaning. For the purpose of semantic inference, such knowledge can be derived from various resources (e.g. WordNet (Fellbaum, 1998) and others, detailed in Section 2.1) in a form which we denote as inference links (often called inference/entailment rules), each is an ordered pair of elements in which the first implies the meaning of the second. For instance, the link ship→vessel can be derived from tshtaen hypernym rkel sahtiiopn→ ovfe Wsseolr cdNanet b. Other applications can benefit from utilizing inference links to identify similarity between language expressions. In Information Retrieval, the user’s information need may be expressed in relevant documents differently than it is expressed in the query. Summarization systems should identify text snippets which convey the same meaning. Our work addresses a generic, application in- dependent, setting of lexical inference. We therefore adopt the terminology of Textual Entailment (Dagan et al., 2006), a generic paradigm for applied semantic inference which captures inference needs of many NLP applications in a common underlying task: given two textual fragments, termed hypothesis (H) and text (T), the task is to recognize whether T implies the meaning of H, denoted T→H. For instance, in a QA application, H reprTe→seHnts. Fthoer question, a innd a T Q a c aanpdpilidcaattei answer. pInthis setting, T is likely to hold an answer for the question if it entails the question. It is challenging to properly extract the needed inference knowledge from available resources, and to effectively utilize it within the inference process. The integration of resources, each has its own format, is technically complex and the quality 97 ProceedingSsof oiaf, th Beu 5lg1asrtia A,n Anuuaglu Mst 4ee-9tin 2g0 o1f3. th ?ec A20ss1o3ci Aastisoonci faotrio Cno fomrp Cuotamtipountaalti Loinnaglu Lisitnigcsu,is patigcess 97–102, Figure 1: PLIS schema - a text-hypothesis pair is processed by the Lexical Integrator which uses a set of lexical resources to extract inference chains which connect the two. The Lexical Inference component provides probability estimations for the validity of each level of the process. ofthe resulting inference links is often unknown in advance and varies considerably. For coping with this challenge we developed PLIS, a Probabilistic Lexical Inference System1 . PLIS, illustrated in Fig 1, has two main modules: the Lexical Integra- tor (Section 2) accepts a set of lexical resources and a text-hypothesis pair, and finds all the lexical inference relations between any pair of text term ti and hypothesis term hj, based on the available lexical relations found in the resources (and their combination). The Lexical Inference module (Section 3) provides validity scores for these relations. These term-level scores are used to estimate the sentence-level likelihood that the meaning of the hypothesis can be inferred from the text, thus making PLIS a complete lexical inference system. Lexical inference systems do not look into the structure of texts but rather consider them as bag ofterms (words or multi-word expressions). These systems are easy to implement, fast to run, practical across different genres and languages, while maintaining a competitive level of performance. PLIS can be used as a stand-alone efficient inference system or as the lexical component of any NLP application. PLIS is a flexible system, allowing users to choose the set of knowledge resources as well as the model by which inference 1The complete software package is available at http:// www.cs.biu.ac.il/nlp/downloads/PLIS.html and an online interactive viewer is available for examination at http://irsrv2. cs.biu.ac.il/nlp-net/PLIS.html. is done. PLIS can be easily extended with new knowledge resources and new inference models. It comes with a set of ready-to-use plug-ins for many common lexical resources (Section 2.1) as well as two implementation of the scoring framework. These implementations, described in (Shnarch et al., 2011; Shnarch et al., 2012), provide probability estimations for inference. PLIS has an interactive online viewer (Section 4) which provides a visualization of the entire inference process, and is very helpful for analysing lexical inference models and lexical resources usability. 2 Lexical integrator The input for the lexical integrator is a set of lexical resources and a pair of text T and hypothesis H. The lexical integrator extracts lexical inference links from the various lexical resources to connect each text term ti ∈ T with each hypothesis term hj ∈ H2. A lexical i∈nfTer wenicthe elianckh hinydpicoathteess a semantic∈ rHelation between two terms. It could be a directional relation (Columbus→navigator) or a bai ddiirreeccttiioonnaall one (car ←→ automobile). dSirinecceti knowledge resources vary lien) their representation methods, the lexical integrator wraps each lexical resource in a common plug-in interface which encapsulates resource’s inner representation method and exposes its knowledge as a list of inference links. The implemented plug-ins that come with PLIS are described in Section 2.1. Adding a new lexical resource and integrating it with the others only demands the implementation of the plug-in interface. As the knowledge needed to connect a pair of terms, ti and hj, may be scattered across few resources, the lexical integrator combines inference links into lexical inference chains to deduce new pieces of knowledge, such as Columbus −r −e −so −u −rc −e →2 −r −e −so −u −rc −e →1 navigator explorer. Therefore, the only assumption −t −he − l−e −x →ica elx integrator makes, regarding its input lexical resources, is that the inferential lexical relations they provide are transitive. The lexical integrator generates lexical infer- ence chains by expanding the text and hypothesis terms with inference links. These links lead to new terms (e.g. navigator in the above chain example and t0 in Fig 1) which can be further expanded, as all inference links are transitive. A transitivity 2Where iand j run from 1 to the length of the text and hypothesis respectively. 98 limit is set by the user to determine the maximal length for inference chains. The lexical integrator uses a graph-based representation for the inference chains, as illustrates in Fig 1. A node holds the lemma, part-of-speech and sense of a single term. The sense is the ordinal number of WordNet sense. Whenever we do not know the sense of a term we implement the most frequent sense heuristic.3 An edge represents an inference link and is labeled with the semantic relation of this link (e.g. cytokine→protein is larbeellaetdio wni othf tt hheis sW linokrd (Nee.gt .re clayttiookni hypernym). 2.1 Available plug-ins for lexical resources We have implemented plug-ins for the follow- ing resources: the English lexicon WordNet (Fellbaum, 1998)(based on either JWI, JWNL or extJWNL java APIs4), CatVar (Habash and Dorr, 2003), a categorial variations database, Wikipedia-based resource (Shnarch et al., 2009), which applies several extraction methods to derive inference links from the text and structure of Wikipedia, VerbOcean (Chklovski and Pantel, 2004), a knowledge base of fine-grained semantic relations between verbs, Lin’s distributional similarity thesaurus (Lin, 1998), and DIRECT (Kotlerman et al., 2010), a directional distributional similarity thesaurus geared for lexical inference. To summarize, the lexical integrator finds all possible inference chains (of a predefined length), resulting from any combination of inference links extracted from lexical resources, which link any t, h pair of a given text-hypothesis. Developers can use this tool to save the hassle of interfacing with the different lexical knowledge resources, and spare the labor of combining their knowledge via inference chains. The lexical inference model, described next, provides a mean to decide whether a given hypothesis is inferred from a given text, based on weighing the lexical inference chains extracted by the lexical integrator. 3 Lexical inference There are many ways to implement an inference model which identifies inference relations between texts. A simple model may consider the 3This disambiguation policy was better than considering all senses of an ambiguous term in preliminary experiments. However, it is a matter of changing a variable in the configuration of PLIS to switch between these two policies. 4http://wordnet.princeton.edu/wordnet/related-projects/ number of hypothesis terms for which inference chains, originated from text terms, were found. In PLIS, the inference model is a plug-in, similar to the lexical knowledge resources, and can be easily replaced to change the inference logic. We provide PLIS with two implemented baseline lexical inference models which are mathematically based. These are two Probabilistic Lexical Models (PLMs), HN-PLM and M-PLM which are described in (Shnarch et al., 2011; Shnarch et al., 2012) respectively. A PLM provides probability estimations for the three parts of the inference process (as shown in Fig 1): the validity probability of each inference chain (i.e. the probability for a valid inference relation between its endpoint terms) P(ti → hj), the probability of each hypothesis term to →b e i hnferred by the entire text P(T → hj) (term-level probability), eanntdir teh tee probability o hf the entire hypothesis to be inferred by the text P(T → H) (sentencelteov eble probability). HN-PLM describes a generative process by which the hypothesis is generated from the text. Its parameters are the reliability level of each of the resources it utilizes (that is, the prior probability that applying an arbitrary inference link derived from each resource corresponds to a valid inference). For learning these parameters HN-PLM applies a schema of the EM algorithm (Dempster et al., 1977). Its performance on the recognizing textual entailment task, RTE (Bentivogli et al., 2009; Bentivogli et al., 2010), are in line with the state of the art inference systems, including complex systems which perform syntactic analysis. This model is improved by M-PLM, which deduces sentence-level probability from term-level probabilities by a Markovian process. PLIS with this model was used for a passage retrieval for a question answering task (Wang et al., 2007), and outperformed state of the art inference systems. Both PLMs model the following prominent aspects of the lexical inference phenomenon: (i) considering the different reliability levels of the input knowledge resources, (ii) reducing inference chain probability as its length increases, and (iii) increasing term-level probability as we have more inference chains which suggest that the hypothesis term is inferred by the text. Both PLMs only need sentence-level annotations from which they derive term-level inference probabilities. To summarize, the lexical inference module 99 ?(? → ?) Figure 2: PLIS interactive viewer with Example 1 demonstrates knowledge integration of multiple inference chains and resource combination (additional explanations, which are not part of the demo, are provided in orange). provides the setting for interfacing with the lexical integrator. Additionally, the module provides the framework for probabilistic inference models which estimate term-level probabilities and integrate them into a sentence-level inference decision, while implementing prominent aspects of lexical inference. The user can choose to apply another inference logic, not necessarily probabilistic, by plugging a different lexical inference model into the provided inference infrastructure. 4 The PLIS interactive system PLIS comes with an online interactive viewer5 in which the user sets the parameters of PLIS, inserts a text-hypothesis pair and gets a visualization of the entire inference process. This is a powerful tool for investigating knowledge integration and lexical inference models. Fig 2 presents a screenshot of the processing of Example 1. On the right side, the user configures the system by selecting knowledge resources, adjusting their configuration, setting the transitivity limit, and choosing the lexical inference model to be applied by PLIS. After inserting a text and a hypothesis to the appropriate text boxes, the user clicks on the infer button and PLIS generates all lexical inference chains, of length up to the transitivity limit, that connect text terms with hypothesis terms, as available from the combination of the selected input re5http://irsrv2.cs.biu.ac.il/nlp-net/PLIS.html sources. Each inference chain is presented in a line between the text and hypothesis. PLIS also displays the probability estimations for all inference levels; the probability of each chain is presented at the end of its line. For each hypothesis term, term-level probability, which weighs all inference chains found for it, is given below the dashed line. The overall sentence-level probability integrates the probabilities of all hypothesis terms and is displayed in the box at the bottom right corner. Next, we detail the inference process of Example 1, as presented in Fig 2. In this QA example, the probability of the candidate answer (set as the text) to be relevant for the given question (the hypothesis) is estimated. When utilizing only two knowledge resources (WordNet and Wikipedia), PLIS is able to recognize that explorer is inferred by Christopher Columbus and that New World is inferred by America. Each one of these pairs has two independent inference chains, numbered 1–4, as evidence for its inference relation. Both inference chains 1 and 3 include a single inference link, each derived from a different relation of the Wikipedia-based resource. The inference model assigns a higher probability for chain 1since the BeComp relation is much more reliable than the Link relation. This comparison illustrates the ability of the inference model to learn how to differ knowledge resources by their reliability. Comparing the probability assigned by the in100 ference model for inference chain 2 with the probabilities assigned for chains 1 and 3, reveals the sophisticated way by which the inference model integrates lexical knowledge. Inference chain 2 is longer than chain 1, therefore its probability is lower. However, the inference model assigns chain 2 a higher probability than chain 3, even though the latter is shorter, since the model is sensitive enough to consider the difference in reliability levels between the two highly reliable hypernym relations (from WordNet) of chain 2 and the less reliable Link relation (from Wikipedia) of chain 3. Another aspect of knowledge integration is exemplified in Fig 2 by the three circled probabilities. The inference model takes into consideration the multiple pieces of evidence for the inference of New World (inference chains 3 and 4, whose probabilities are circled). This results in a termlevel probability estimation for New World (the third circled probability) which is higher than the probabilities of each chain separately. The third term of the hypothesis, discover, remains uncovered by the text as no inference chain was found for it. Therefore, the sentence-level inference probability is very low, 37%. In order to identify that the hypothesis is indeed inferred from the text, the inference model should be provided with indications for the inference of discover. To that end, the user may increase the transitivity limit in hope that longer inference chains provide the needed information. In addition, the user can examine other knowledge resources in search for the missing inference link. In this example, it is enough to add VerbOcean to the input of PLIS to expose two inference chains which connect reveal with discover by combining an inference link from WordNet and another one from VerbOcean. With this additional information, the sentence-level probability increases to 76%. This is a typical scenario of utilizing PLIS, either via the interactive system or via the software, for analyzing the usability of the different knowledge resources and their combination. A feature of the interactive system, which is useful for lexical resources analysis, is that each term in a chain is clickable and links to another screen which presents all the terms that are inferred from it and those from which it is inferred. Additionally, the interactive system communicates with a server which runs PLIS, in a fullduplex WebSocket connection6. This mode of operation is publicly available and provides a method for utilizing PLIS, without having to install it or the lexical resources it uses. Finally, since PLIS is a lexical system it can easily be adjusted to other languages. One only needs to replace the basic lexical text processing tools and plug in knowledge resources in the target language. If PLIS is provided with bilingual resources,7 it can operate also as a cross-lingual inference system (Negri et al., 2012). For instance, the text in Fig 3 is given in English, while the hypothesis is written in Spanish (given as a list of lemma:part-of-speech). The left side of the figure depicts a cross-lingual inference process in which the only lexical knowledge resource used is a man- ually built English-Spanish dictionary. As can be seen, two Spanish terms, jugador and casa remain uncovered since the dictionary alone cannot connect them to any of the English terms in the text. As illustrated in the right side of Fig 3, PLIS enables the combination of the bilingual dictionary with monolingual resources to produce cross-lingual inference chains, such as footballer−h −y −p −er−n y −m →player− −m −a −nu − →aljugador. Such inferenc−e − c−h −a −in − →s hpalavey trh− e− capability otro. overcome monolingual language variability (the first link in this chain) as well as to provide cross-lingual translation (the second link). 5 Conclusions To utilize PLIS one should gather lexical resources, obtain sentence-level annotations and train the inference model. Annotations are available in common data sets for task such as QA, Information Retrieval (queries are hypotheses and snippets are texts) and Student Response Analysis (reference answers are the hypotheses that should be inferred by the student answers). For developers of NLP applications, PLIS offers a ready-to-use lexical knowledge integrator which can interface with many common lexical knowledge resources and constructs lexical inference chains which combine the knowledge in them. A developer who wants to overcome lexical language variability, or to incorporate background knowledge, can utilize PLIS to inject lex6We used the socket.io implementation. 7A bilingual resource holds inference links which connect terms in different languages (e.g. an English-Spanish dictionary can provide the inference link explorer→explorador). 101 Figure 3 : PLIS as a cross-lingual inference system. Left: the process with a single manual bilingual resource. Right: PLIS composes cross-lingual inference chains to increase hypothesis coverage and increase sentence-level inference probability. ical knowledge into any text understanding application. PLIS can be used as a lightweight inference system or as the lexical component of larger, more complex inference systems. Additionally, PLIS provides scores for infer- ence chains and determines the way to combine them in order to recognize sentence-level inference. PLIS comes with two probabilistic lexical inference models which achieved competitive performance levels in the tasks of recognizing textual entailment and passage retrieval for QA. All aspects of PLIS are configurable. The user can easily switch between the built-in lexical resources, inference models and even languages, or extend the system with additional lexical resources and new inference models. Acknowledgments The authors thank Eden Erez for his help with the interactive viewer and Miquel Espl a` Gomis for the bilingual dictionaries. This work was partially supported by the European Community’s 7th Framework Programme (FP7/2007-2013) under grant agreement no. 287923 (EXCITEMENT) and the Israel Science Foundation grant 880/12. References Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the web for fine-grained semantic verb relations. In Proc. of EMNLP. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, series [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for English. In Proc. of NAACL. Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, 16(4):359–389. Dekang Lin. 1998. Automatic retrieval and clustering of similar words. In Proc. of COLOING-ACL. Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2012. Semeval-2012 task 8: Cross-lingual textual entailment for content synchronization. In Proc. of SemEval. Eyal Shnarch, Libby Barak, and Ido Dagan. 2009. Extracting lexical reference rules from Wikipedia. In Proc. of ACL. Eyal Shnarch, Jacob Goldberger, and Ido Dagan. 2011. Towards a probabilistic model for lexical entailment. In Proc. of the TextInfer Workshop. Eyal Shnarch, Ido Dagan, and Jacob Goldberger. 2012. A probabilistic lexical model for ranking textual inferences. In Proc. of *SEM. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasisynchronous grammar for QA. In Proc. of EMNLP. 102

3 0.96478772 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures

Author: Sean Szumlanski ; Fernando Gomez ; Valerie K. Sims

Abstract: We have elicited human quantitative judgments of semantic relatedness for 122 pairs of nouns and compiled them into a new set of relatedness norms that we call Rel-122. Judgments from individual subjects in our study exhibit high average correlation to the resulting relatedness means (r = 0.77, σ = 0.09, N = 73), although not as high as Resnik’s (1995) upper bound for expected average human correlation to similarity means (r = 0.90). This suggests that human perceptions of relatedness are less strictly constrained than perceptions of similarity and establishes a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We compare the results of several WordNet-based similarity and relatedness measures to our Rel-122 norms and demonstrate the limitations of WordNet for discovering general indications of semantic relatedness. We also offer a critique of the field’s reliance upon similarity norms to evaluate relatedness measures.

same-paper 4 0.9645099 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers

Author: Andre Martins ; Miguel Almeida ; Noah A. Smith

Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).

5 0.95687062 150 acl-2013-Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications

Author: Georgios Kontonatsios ; Paul Thompson ; Riza Theresa Batista-Navarro ; Claudiu Mihaila ; Ioannis Korkontzelos ; Sophia Ananiadou

Abstract: U-Compare is a UIMA-based workflow construction platform for building natural language processing (NLP) applications from heterogeneous language resources (LRs), without the need for programming skills. U-Compare has been adopted within the context of the METANET Network of Excellence, and over 40 LRs that process 15 European languages have been added to the U-Compare component library. In line with METANET’s aims of increasing communication between citizens of different European countries, U-Compare has been extended to facilitate the development of a wider range of applications, including both mul- tilingual and multimodal workflows. The enhancements exploit the UIMA Subject of Analysis (Sofa) mechanism, that allows different facets of the input data to be represented. We demonstrate how our customised extensions to U-Compare allow the construction and testing of NLP applications that transform the input data in different ways, e.g., machine translation, automatic summarisation and text-to-speech.

6 0.95510554 277 acl-2013-Part-of-speech tagging with antagonistic adversaries

7 0.8996107 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

8 0.84158647 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling

9 0.80653435 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

10 0.75569582 105 acl-2013-DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation

11 0.73791784 51 acl-2013-AnnoMarket: An Open Cloud Platform for NLP

12 0.67622447 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

13 0.67430782 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

14 0.67029279 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

15 0.66919088 297 acl-2013-Recognizing Partial Textual Entailment

16 0.65104163 237 acl-2013-Margin-based Decomposed Amortized Inference

17 0.64630359 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

18 0.64404845 385 acl-2013-WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

19 0.64240557 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

20 0.61967951 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages