emnlp emnlp2010 emnlp2010-110 knowledge-graph by maker-knowledge-mining

110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference


Source: pdf

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 pt Abstract We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. [sent-12, score-0.274]

2 By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. [sent-14, score-0.175]

3 We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. [sent-15, score-0.159]

4 The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. [sent-16, score-0.092]

5 Often, inference with such models becomes computationally intractable, causing a demand for understanding and improving approximate parsing algorithms. [sent-22, score-0.115]

6 In this paper, we show a formal connection between two recently-proposed approximate inference techniques for non-projective dependency parsing: loopy belief propagation (Smith and Eisner, 2008) and linear programming relaxation (Martins et al. [sent-23, score-0.399]

7 While those two parsers are differently motivated, we show that both correspond to inference in 34 M a´rio A. [sent-25, score-0.071]

8 pt a factor graph, and both optimize objective functions over local approximations of the marginal polytope. [sent-29, score-0.385]

9 The connection is made clear by writing the explicit declarative optimization problem underlying Smith and Eisner (2008) and by showing the factor graph underlying Martins et al. [sent-30, score-0.303]

10 1 Our contributions are not limited to dependency parsing: we present a general method for inference in factor graphs with hard constraints (§2), which ienxt efancdtso some hcsom wbithina htoarrdial c ofanscttorarsin tcson (§s2id)e,rwe dh by Smith and Eisner (2008). [sent-34, score-0.383]

11 After presenting a geometric view of the variational approximations underlying message-passing algorithms (§3), and closing tyhien gap bsesatwgee-epna tshsien tgw aol gaofroirtehmmesn (t§io3n),ed an parsers (§4), we consider the problem of learning the model parameters (§5). [sent-35, score-0.26]

12 Our experiments (§6) show state-of-the-art performance on dependency parsing tbaeten-cohfm-taherk-asr. [sent-43, score-0.083]

13 (1993) for which decoding algorithms are equivalent to running belief propagation in a graph with loops (McEliece et al. [sent-45, score-0.227]

14 hFyor example, eianc uhn ylabeled dependency parsing, I is the number of candidate dependency arcs (quadratic in the sentence length), and each Yi = {0, 1}. [sent-62, score-0.181]

15 1: it is a bipartite graph Gx comprised of variable nodes {1, . [sent-73, score-0.094]

16 , I} and factor nodes pCr ∈ C, vwaritiha an edge connecting atnhed fiatcht ovra rnioadbeles nCod ∈e a Cnd, a fitahcto anr n eoddgee C co inffn ie ∈ nCg. [sent-76, score-0.175]

17 Hence, vtharei afablce- × tnoord graph Gx mctoakre nso explicit ft ihe ∈ dCi re. [sent-77, score-0.066]

18 This requires hard constraint factors that rule out forbidden partial assignments by mapping them to zero potential values. [sent-88, score-0.25]

19 See Table 1for an inventory of hard constraint factors used in this paper. [sent-89, score-0.275]

20 We let the soft factor potentials take the form ΨC(x,yC) yC)), where ∈ Rd is a vector of parameters (shared across factors) a Rnd φC(x, yC) is a local feature vector. [sent-92, score-0.292]

21 Smith and Eisner (2008) proposed a factor graph representation for dependency parsing (Fig. [sent-96, score-0.324]

22 The graph has O(n2) variable nodes (n is the sentence length), one per candidate arc a hh, mi linking a head h and modifier m. [sent-98, score-0.329]

23 Outputs are binary, nwkiithng ya = a1d dif hf arc a belongs to the dependency tree. [sent-99, score-0.233]

24 There is a hard factor TREE connected to all variables, that constrains the overall arc configurations to form a spanning tree. [sent-100, score-0.396]

25 There is a unary soft factor per arc, whose log-potential reflects the score of that arc. [sent-101, score-0.221]

26 There are also O(n3) pairwise factors; their log-potentials reflect the scores of sibling and grandparent arcs. [sent-102, score-0.118]

27 These factors create loops, thus calling for approximate inference. [sent-103, score-0.228]

28 , 2005),2 and computing posterior marginals for all arcs takes O(n3) time via the matrix-tree theorem (Smith and Smith, 2007; Koo et al. [sent-105, score-0.18]

29 Two of these factors (TREE and XOR) had been proposed by Smith and Eisner (2008); we provide further information (max-product messages, entropies, and local agreement constraints). [sent-123, score-0.251]

30 This inventory covers many cases, since the above formulae can be extended to the case where some inputs are negated: just replace the corresponding messages by their reciprocal, vi by 1 vi, etc. [sent-125, score-0.107]

31 This allows building factors NAND (an OR factor with negated inputs), IMPLY (a 2-input OR with the bfiyrs 1t input negated), and XOR-WITH-OUTPUT (an XOR factor with the last input negated). [sent-126, score-0.593]

32 − In sum-product BP, the messages take the form:3 Mi→C(yi) ∝ QD6=C MD→i(yi) (4) MC→i(yi) ∝ PQyC∼yiΨC(yC) Qj6=i Mj→C(yj). [sent-127, score-0.082]

33 Upon convergence, variable and factor beliefs are computed as: τi (yi) ∝ QC MC→i (yi) (6) τC(yC) ∝ ΨQC(yC) Qi Mi→C(yi). [sent-130, score-0.244]

34 (7) BP is exact when the factor gQraph is a tree: in the sum-product case, the beliefs in Eqs. [sent-131, score-0.216]

35 6–7 correspond 3We employ the standard ∼ notation, where a summa- tion/Wmaex iemmipzlaotiyon t hined setxaendd by yC ∼ yi means trheat a ait s uism over all yC with the i-th component held∼ ∼fixe yd and set to yi. [sent-132, score-0.203]

36 In graphs with loops, BP is an approximate method, not guaranteed to converge, nicknamed loopy BP. [sent-134, score-0.258]

37 We highlight a variational perspective of loopy BP in §3; fhoirg now we acroinatsiiodnearl algorithmic oisfs luoeosp. [sent-135, score-0.278]

38 N BoPte in th §3at; computing the factor-to-variable messages for each factor C (Eq. [sent-136, score-0.257]

39 Fortunately, for all the hard constraint factors in rows 3–5 of Table 1, this computation can be done in linear time (and polynomial for the TREE factor)—this extends results presented in Smith and Eisner (2008). [sent-138, score-0.25]

40 4 4The insight behind these speed-ups is that messages on binary-valued potentials can be expressed as MC→i (yi) ∝ SIB SIB SIB Figure 1: Factor graph corresponding to the dependency parsing model of Smith and Eisner (2008) with sibling and grandparent features. [sent-139, score-0.338]

41 Circles denote variable nodes, and squares denote factor nodes. [sent-140, score-0.203]

42 Note the loops created by the inclusion of pairwise factors (GRAND and SIB). [sent-141, score-0.296]

43 Wt hee fnaemxti present an alternative parametrization for the distributions in Px in terms of factor marginals. [sent-149, score-0.235]

44 We will see that each distribution can be seen as a point in the socalled marginal polytope (Wainwright and Jordan, 2008); this will pave the way for the variational representations to be derived next. [sent-150, score-0.434]

45 A part is a pair hC, yCi, where C is a soft factor and yC a partial output assignQment. [sent-152, score-0.221]

46 The marginal polytope is the convex hull5 of all the “valid” output indicator vectors: M(Gx) conv{χ(y) | y ∈ Y(x)}. [sent-167, score-0.368]

47 Note that M(Gx) only depends on the factor graph Gx and the hard constraints (i. [sent-168, score-0.323]

48 The following variational representation for the log-partition function (mentioned in Eq. [sent-190, score-0.128]

49 ,P µ Par met r�spaceFactors�plaocge-p� o� t e�n tials�Marginal�polytope Figure 2: Dual parametrization of the distributions in Px. [sent-197, score-0.06]

50 Our parameter space (left) is first linearly mapped to the space of factor log-potentials (middle). [sent-198, score-0.175]

51 The latter is mapped to the marginal polytope M(Gx) (right). [sent-199, score-0.306]

52 9 is convex and its solution is attained at the factor marginals, i. [sent-204, score-0.237]

53 4 Approximate Inference & Turbo Parsing We now show how the variational machinery just described relates to message-passing algorithms and provides a common framework for analyzing two recent dependency parsers. [sent-217, score-0.181]

54 1 Loopy BP as a Variational Approximation For general factor graphs with loops, the marginal polytope M(Gx) cannot be compactly specified and the entropy term H(µ) lacks a closed form, rendering exact optimizations in Eqs. [sent-221, score-0.529]

55 A popular approximate algorithm for marginal inference is sum-product loopy BP, which passes messages as described in §2 and, upon convergence, computes dbeeslicerfisb evdia Eqs. [sent-223, score-0.463]

56 W, uerpeo loopy vBePrg exact, these beliefs would be the true marginals and hence a point in the marginal polytope M(Gx) . [sent-225, score-0.602]

57 (2001) and others, who first analyzed loopy BP from a variational perspective. [sent-227, score-0.278]

58 The following two approximations underlie loopy BP: • The marginal polytope M(Gx) is approximated by tThhee local polytope tLo (Gx). [sent-228, score-0.75]

59 TGhis is an outer bound; its name derives from the fact that it only imposes local agreement constraints ∀i, yi ∈ Yi, C ∈ C: , Pyi τi(yi) = 1, PyC∼yi τC(yC) = τi(yi). [sent-229, score-0.327]

60 The entropy H is replacPed by its Bethe approxiTmhaeti eonnt HBethe(τ) p,la (1 di)H(τi) + PC∈C H(τC), where di P= |{C | −i ∈ C} | is the nPumC∈bCer of factors connePc=ted | tCo t |h ei ∈ith C variable, H(Pτi) −Pyi τi (yi) log τi (yi) and H(τC) − , PedIi =b1y PyC τC(yCP) logτC(yC). [sent-236, score-0.197]

61 − , Any Pstationary point of sum-product BP is a local optimum of the variational problem in Eq. [sent-237, score-0.172]

62 Table 1 shows closed form expressions for the local agreement constraints and entropies of some hard-constraint factors, obtained by invoking Eq. [sent-241, score-0.169]

63 2 Two Dependency Turbo Parsers We next present our main contribution: a formal connection between two recent approximate dependency parsers, which at first sight appear unrelated. [sent-246, score-0.113]

64 Recall that (i) Smith and Eisner (2008) proposed a factor graph (Fig. [sent-247, score-0.241]

65 1) in which they run loopy BP, and that (ii) Martins et al. [sent-248, score-0.15]

66 (2009) approximate parsing as the solution of a linear program. [sent-249, score-0.09]

67 Here, we fill the blanks in the two approaches: we derive explicitly the variational problem addressed in (i) and we provide the underlying factor graph in (ii). [sent-250, score-0.4]

68 This puts the two approaches side-by-side as approximate methods for marginal and MAP inference. [sent-251, score-0.171]

69 11) that ignore the loops in their graphical models, we dub them turbo parsers by analogy with error-correcting turbo decoders (see footnote 1). [sent-253, score-0.416]

70 1—call it Gx—includes pairwise soft factors connecting sibling and grandparent arcs. [sent-256, score-0.332]

71 6 We next characterize the local polytope L (Gx) and the Bethe approximation HBethe inherent in Smith and Eisner’s loopy BP algorithm. [sent-257, score-0.389]

72 Let A be the set of candidate arcs, and P A2 the set of pairs of arcs that have factors. [sent-258, score-0.075]

73 Siinc we tahll vτariab=les h are binary, we may write, for each a ∈ A, τa(1) = za and τa(0) = 1w za, owrh eearceh za i s∈ a Ava,r τiable constrained to [0, 1]. [sent-260, score-0.432]

74 L1e −t zA hzaia∈A; the local agreement constraints at the TREE zfacitor (see Table 1) are written as zA ∈ Ztree(x), where Ztree(x) is the arborescence polytope, i. [sent-261, score-0.124]

75 , the convex hull of all incidence vectors of dependency trees (Martins et al. [sent-263, score-0.145]

76 The approximate vPariational expression becomes log Zx (θ) ≈ maxz θ>F(x)z + Htree(zA) −X Ia;b(za,zb,zab) haX,bi ∈P s. [sent-270, score-0.089]

77 zab ≤ za, zab ≥≤ za + zab ≤ zb, zb −≤ 1, ∀ha, bi ∈ P, τab(1, 1) = zab, τab(0, 0) = 1− za − zb + zab τab(1, 0) = za − zab, τab(0, 1) = zb − zab. [sent-272, score-1.455]

78 onW ofe M noarw- Ztree(x),( d12e-) inequalities which, along with zA ∈ fine the local polytope L(Gx). [sent-275, score-0.239]

79 As fo∈r t Zhe factor entropies, start by noting that the TREE-factor entropy Htree can be obtained in closed form by computing the marginals zA and the partition function Zx (θ) (via the matrix-tree theorem) and recalling the variational representation in Eq. [sent-276, score-0.437]

80 We next construct a factor graph G0x and show that the LP relaxation corresponds to an optimization of the form in Eq. [sent-282, score-0.281]

81 10, with the marginal polytope M(G0x) replaced by L (G0x). [sent-283, score-0.306]

82 G0x includes the following auxiliary variable nodes: path variables hpijii=0,. [sent-284, score-0.093]

83 ,n, which innoddiceas:te pwahtheth vearr iwabolreds j pdesicends from iin the dependency tree, and flow variables hfakia∈A,k=1,. [sent-290, score-0.175]

84 ,n, pwehnicdhen ecvya ltureaete, to 1 f oifwf arc a l“ecsar hrfiesi flow” to k, i. [sent-293, score-0.18]

85 , iff there is a path from the root to k that passes through a. [sent-295, score-0.065]

86 , any word descends from the root and from itself, and arcs leaving a word carry no flow to that word. [sent-298, score-0.162]

87 This can be done with unary hard constraint factors. [sent-299, score-0.082]

88 3: • O(n) XOR factors, each connecting all arc variaOb(lnes) of the form {hh, mi }h=0,. [sent-302, score-0.235]

89 Each factor p0k yields a local agreement constraint (see Table 1): Phn=0 zhh,mi = 1, • m∈ {1, . [sent-307, score-0.299]

90 , n} (16) O(nP3) IMPLY factors, each expressing that if an arc ncarries flow, then that arc must be active. [sent-310, score-0.36]

91 Such factors are OR factors with the first input negated, hence, the local agreement constraints are: ≤ za, a ∈ A, k ∈ {1, . [sent-311, score-0.46]

92 (17) O(n2) XOR-WITH-OUTPUT factors, which impose the constraint that each path variable pmk is active if and only if exactly one incoming arc in {hh, mi }h=0,. [sent-315, score-0.399]

93 Such factors are XOR factors with the last input negated, and hence their local constraints are: fak • pmk • = Phn=0 fhkh,mi, m, k ∈ {1, . [sent-319, score-0.456]

94 , n} (18) O(n2) XOPR-WITH-OUTPUT factors to impose the cOo(nnstraint that words don’t consume other words’ commodities; i. [sent-322, score-0.168]

95 he to fk h hif6 f= exactly one outgoing arc in {hh, mi }m=1,. [sent-325, score-0.262]

96 ,n carries flow to k: = phk = = Pnm=1 fhkh,mi, h, k ∈ {0, . [sent-328, score-0.087]

97 (2009) in their multi-commodity flow model, for the configuration with siblings and grandparent features. [sent-340, score-0.169]

98 7 They also considered a configuration with non-projectivity features—which fire if an arc is non-projective. [sent-341, score-0.212]

99 8 That configuration can also be obtained here if variables {nhh,mi } are can also be ob 7To be precise, the constraints of Martins et al. [sent-342, score-0.108]

100 8An arc hh, mi is non-projective if there is some word in its span Anont a descending sf nroomn- phr (Kahane ef tt haelr. [sent-345, score-0.235]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('yc', 0.42), ('gx', 0.39), ('za', 0.216), ('yi', 0.203), ('polytope', 0.195), ('arc', 0.18), ('factor', 0.175), ('factors', 0.168), ('bp', 0.163), ('martins', 0.152), ('loopy', 0.15), ('turbo', 0.14), ('pr', 0.13), ('variational', 0.128), ('zab', 0.123), ('marginal', 0.111), ('zb', 0.105), ('marginals', 0.105), ('eisner', 0.103), ('loops', 0.09), ('xor', 0.09), ('hbethe', 0.088), ('flow', 0.087), ('messages', 0.082), ('hh', 0.075), ('negated', 0.075), ('arcs', 0.075), ('mc', 0.075), ('smith', 0.072), ('htree', 0.07), ('instituto', 0.07), ('sib', 0.07), ('yci', 0.07), ('ztree', 0.07), ('zx', 0.068), ('px', 0.068), ('graph', 0.066), ('convex', 0.062), ('parametrization', 0.06), ('hc', 0.06), ('approximate', 0.06), ('mi', 0.055), ('approximations', 0.055), ('ab', 0.055), ('dependency', 0.053), ('bethe', 0.053), ('grandparent', 0.05), ('graphs', 0.048), ('parsers', 0.046), ('soft', 0.046), ('entropies', 0.045), ('pyc', 0.045), ('local', 0.044), ('qi', 0.044), ('hard', 0.041), ('constraint', 0.041), ('qc', 0.041), ('beliefs', 0.041), ('constraints', 0.041), ('relaxation', 0.04), ('agreement', 0.039), ('pairwise', 0.038), ('belief', 0.038), ('aguiar', 0.035), ('csoft', 0.035), ('ecnico', 0.035), ('lisboa', 0.035), ('logzx', 0.035), ('phn', 0.035), ('pmk', 0.035), ('portugal', 0.035), ('pyi', 0.035), ('yedidia', 0.035), ('passes', 0.035), ('variables', 0.035), ('propagation', 0.033), ('configuration', 0.032), ('underlying', 0.031), ('map', 0.03), ('incoming', 0.03), ('hxe', 0.03), ('argmaxy', 0.03), ('hull', 0.03), ('family', 0.03), ('sibling', 0.03), ('path', 0.03), ('parsing', 0.03), ('partition', 0.029), ('log', 0.029), ('lp', 0.029), ('ia', 0.029), ('tree', 0.028), ('variable', 0.028), ('ex', 0.027), ('ey', 0.027), ('potentials', 0.027), ('outgoing', 0.027), ('crfs', 0.026), ('inference', 0.025), ('inventory', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

2 0.14058644 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

3 0.11566744 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

4 0.099147573 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira

Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.

5 0.073010519 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

Author: Xian Qian ; Qi Zhang ; Yaqian Zhou ; Xuanjing Huang ; Lide Wu

Abstract: Many sequence labeling tasks in NLP require solving a cascade of segmentation and tagging subtasks, such as Chinese POS tagging, named entity recognition, and so on. Traditional pipeline approaches usually suffer from error propagation. Joint training/decoding in the cross-product state space could cause too many parameters and high inference complexity. In this paper, we present a novel method which integrates graph structures of two subtasks into one using virtual nodes, and performs joint training and decoding in the factorized state space. Experimental evaluations on CoNLL 2000 shallow parsing data set and Fourth SIGHAN Bakeoff CTB POS tagging data set demonstrate the superiority of our method over cross-product, pipeline and candidate reranking approaches.

6 0.072276451 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

7 0.061052695 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

8 0.058387265 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model

9 0.047665726 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

10 0.046528596 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

11 0.046294346 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

12 0.045906611 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

13 0.043615907 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

14 0.039358307 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

15 0.038287055 22 emnlp-2010-Automatic Evaluation of Translation Quality for Distant Language Pairs

16 0.037448775 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping

17 0.033522319 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

18 0.033465024 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

19 0.033056132 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

20 0.032578804 74 emnlp-2010-Learning the Relative Usefulness of Questions in Community QA


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.136), (1, 0.069), (2, 0.098), (3, -0.045), (4, 0.007), (5, 0.103), (6, -0.003), (7, 0.169), (8, -0.019), (9, -0.067), (10, -0.209), (11, 0.127), (12, -0.116), (13, -0.058), (14, -0.01), (15, 0.012), (16, -0.022), (17, -0.009), (18, 0.064), (19, -0.063), (20, -0.014), (21, -0.026), (22, -0.042), (23, -0.094), (24, 0.03), (25, -0.02), (26, 0.029), (27, 0.045), (28, -0.128), (29, -0.057), (30, -0.083), (31, 0.042), (32, 0.152), (33, -0.008), (34, -0.03), (35, -0.097), (36, -0.099), (37, -0.077), (38, 0.055), (39, -0.035), (40, 0.16), (41, -0.165), (42, 0.264), (43, -0.033), (44, 0.063), (45, 0.11), (46, -0.12), (47, -0.16), (48, -0.218), (49, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97808248 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

2 0.41449752 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

3 0.39455456 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira

Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.

4 0.36013904 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

5 0.29545391 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs

Author: Ioannis Klapaftis ; Suresh Manandhar

Abstract: Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. The inferred hierarchical structures are applied to the problem of word sense disambiguation, where we show that our method performs sig- nificantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.

6 0.26770234 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping

7 0.25557297 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

8 0.2444118 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

9 0.24186821 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition

10 0.2312701 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

11 0.22644864 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

12 0.17786571 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

13 0.16782829 117 emnlp-2010-Using Unknown Word Techniques to Learn Known Words

14 0.16074206 22 emnlp-2010-Automatic Evaluation of Translation Quality for Distant Language Pairs

15 0.15915965 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model

16 0.15843333 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

17 0.14838232 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

18 0.14578579 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

19 0.14028271 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

20 0.1401263 56 emnlp-2010-Hashing-Based Approaches to Spelling Correction of Personal Names


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.01), (10, 0.014), (12, 0.031), (14, 0.023), (29, 0.096), (30, 0.012), (32, 0.017), (45, 0.293), (52, 0.034), (56, 0.072), (60, 0.028), (62, 0.076), (66, 0.052), (72, 0.034), (76, 0.014), (77, 0.018), (79, 0.016), (82, 0.015), (87, 0.023), (89, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83873916 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

2 0.53727132 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

Author: Niklas Jakob ; Iryna Gurevych

Abstract: In this paper, we focus on the opinion target extraction as part of the opinion mining task. We model the problem as an information extraction task, which we address based on Conditional Random Fields (CRF). As a baseline we employ the supervised algorithm by Zhuang et al. (2006), which represents the state-of-the-art on the employed data. We evaluate the algorithms comprehensively on datasets from four different domains annotated with individual opinion target instances on a sentence level. Furthermore, we investigate the performance of our CRF-based approach and the baseline in a single- and cross-domain opinion target extraction setting. Our CRF-based approach improves the performance by 0.077, 0.126, 0.071 and 0. 178 regarding F-Measure in the single-domain extraction in the four domains. In the crossdomain setting our approach improves the performance by 0.409, 0.242, 0.294 and 0.343 regarding F-Measure over the baseline.

3 0.46269873 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

4 0.44705677 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

5 0.44060978 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

6 0.42061818 90 emnlp-2010-Positional Language Models for Clinical Information Retrieval

7 0.4188745 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

8 0.41528705 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

9 0.41069037 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

10 0.40826517 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

11 0.40406257 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

12 0.40102813 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

13 0.40034363 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

14 0.39956698 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

15 0.39891627 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

16 0.39725348 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

17 0.39340484 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

18 0.39108455 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

19 0.39073062 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

20 0.39064509 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment