emnlp emnlp2010 emnlp2010-88 knowledge-graph by maker-knowledge-mining

88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing


Source: pdf

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. [sent-4, score-0.669]

2 The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. [sent-5, score-0.215]

3 The approach provably solves a linear programming (LP) relaxation of the global inference problem. [sent-6, score-0.511]

4 It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. [sent-7, score-0.3]

5 We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger. [sent-8, score-0.324]

6 1 Introduction Dynamic programming algorithms have been remarkably useful for inference in many NLP problems. [sent-9, score-0.287]

7 Unfortunately, as models become more complex, for example through the addition of new features or components, dynamic programming algorithms can quickly explode in terms of computational or implementational complexity. [sent-10, score-0.306]

8 This paper introduces dual decomposition (Dantzig and Wolfe, 1960; Komodakis et al. [sent-12, score-0.521]

9 Dual decomposition leverages the observation that complex inference problems can often be decomposed into efficiently solvable sub-problems. [sent-14, score-0.284]

10 The approach leads to inference algorithms with the following properties: 1The same is true for NLP inference algorithms based on other exact combinatorial methods, for example methods based on minimum-weight spanning trees (McDonald et al. [sent-15, score-0.436]

11 • 1 The resulting algorithms are simple and efficient, building on sgta anldgaorrdit dynamic-programming algorithms as oracle solvers for sub-problems,2 together with a method for forcing agreement between the oracles. [sent-17, score-0.305]

12 • • The algorithms provably solve a linear programming (LP) m resla pxraotivoanb oyf tohlev original ipnrofegreranmceproblem. [sent-18, score-0.386]

13 Empirically, the LP relaxation often leads to an eExmapcti rsioclaultlyio,n t htoe LthPe original problem. [sent-19, score-0.153]

14 The connection to linear programming ensures that the algorithms provide a certificate of optimality when they recover the exact solution, and also opens up the possibility of methods that incrementally tighten the LP relaxation until it is exact (Sherali and Adams, 1994; Sontag et al. [sent-21, score-0.653]

15 We first give two examples as an illustration of the approach: 1) integrated parsing and trigram part-ofspeech (POS) tagging; and 2) combined phrasestructure and dependency parsing. [sent-24, score-0.299]

16 In both settings, it is possible to solve the integrated problem through an “intersected” dynamic program (e. [sent-25, score-0.174]

17 Next, we describe exact polyhedral formulations for the two problems, building on connections between dynamic programming algorithms and marginal polytopes, as described in Martin et al. [sent-30, score-0.537]

18 These allow us to precisely characterize the relationship between the exact formulations and the 2More generally, other exact inference methods can be used as oracles, for example spanning tree algorithms for nonprojective dependency structures. [sent-32, score-0.439]

19 c et2h0o1d0s A ins Nsoactiuatriaoln L faonrg Cuoamgepu Ptraotcioens ailn Lgi,n pgaugiestsic 1s–1 , LP relaxations that we solve. [sent-35, score-0.169]

20 We then give guarantees of convergence for our algorithms by showing that they are instantiations of Lagrangian relaxation, a general method for solving linear programs of a particular form. [sent-36, score-0.314]

21 First, we con- sider the integration of the generative model for phrase-structure parsing of Collins (2003), with the second-order discriminative dependency parser of Koo et al. [sent-38, score-0.241]

22 7%; and the dual decomposition method has an F1 score of 90. [sent-44, score-0.521]

23 In a second set of experiments, we use dual decomposition to integrate the trigram POS tagger of Toutanova and Manning (2000) with the parser of Collins (2003). [sent-46, score-0.676]

24 We again find that the method finds an exact solution in almost all cases, with conver- gence in just a few iterations of decoding. [sent-47, score-0.167]

25 Although the focus of this paper is on dynamic programming algorithms—both in the experiments, and also in the formal results concerning marginal polytopes—it is straightforward to use other combinatorial algorithms within the approach. [sent-48, score-0.499]

26 (2010) describe a dual decomposition approach for non-projective dependency parsing, which makes use of both dynamic programming and spanning tree inference algorithms. [sent-50, score-0.908]

27 2 Related Work Dual decomposition is a classical method for solving optimization problems that can be decomposed into efficiently solvable sub-problems. [sent-51, score-0.306]

28 Our work is inspired by dual decomposition methods for inference in Markov random fields (MRFs) (Wainwright 2 et al. [sent-52, score-0.579]

29 The resulting inference algo- rithms provably solve an LP relaxation of the MRF inference problem, often significantly faster than commercial LP solvers (Yanover et al. [sent-56, score-0.434]

30 Our work is also related to methods that incorporate combinatorial solvers within loopy belief propagation (LBP), either for MAP inference (Duchi et al. [sent-58, score-0.252]

31 Our approach similarly makes use of combinatorial algorithms to efficiently solve subproblems of the global inference problem. [sent-60, score-0.271]

32 However, unlike LBP, our algorithms have strong theoretical guarantees, such as guaranteed convergence and the possibility of a certificate of optimality. [sent-61, score-0.167]

33 These guarantees are possible because our algorithms directly solve an LP relaxation. [sent-62, score-0.178]

34 Other work has considered LP or integer linear programming (ILP) formulations of inference in NLP (Martins et al. [sent-63, score-0.323]

35 We will see that dynamic programming algorithms such as CKY can be considered to be very effi- cient solvers for particular LPs. [sent-67, score-0.384]

36 In dual decomposition, these LPs—and their efficient solvers—can be embedded within larger LPs corresponding to more complex inference problems. [sent-68, score-0.4]

37 We take some care to set up notation that will allow us to make a clear connection between inference problems and linear programming. [sent-70, score-0.175]

38 wn, a parse tree is a set of rule productions of the form hA → B C, i, k, ji where A, B, C ∈ N, and h1A ≤ →i ≤ Bk < j ≤ n. [sent-80, score-0.164]

39 n3s We now define the index set for CFG parsing as − I {hA = → B C, i, k, ji: A, B, C ∈ N, 1 ≤ i≤ k < j ≤ n} Each parse tree is a vector y = {yr : r ∈ I}, wiEtha yr = 1s eif t rreuele i r ais ienc ttoher parse tree, a rnd ∈ yr = 0 otherwise. [sent-94, score-0.478]

40 pTrohed optimal pParse tree is y∗ = arg maxy∈Y y · where y · = Pr is the inner product betwye ·en θ y hanerde We use yr and y(r) interchangeably (similarly for θPr and θ(r)) to refer to the r’th component of the vector y. [sent-99, score-0.179]

41 We assume a trigram tagger, where a tag sequence is represented through decisions h(A, B) → C, ii where A, B, C ∈ T, and is ∈ {3 . [sent-112, score-0.193]

42 is E tahceh tag oodfu uwctoiordn wi, arnesde (A, B) are 3We do not require rules of the form A → wi in this representaWtioen d, as they are ere rduulensda ofnt t:h specifically, a w rule production hA → B C, i,k, ji implies a rule B → wi iff i = k, and hCA → wj Bif C j = kk, + 1 im. [sent-121, score-0.305]

43 , under a PCFG we define θ(A → B C, i,k, j) = log P(A → uBn Cer | A) + δi,k log P(B → wi |B) + δk+1,j log P(C → wj |C) |w Ahe)re + δx,y = 1g Pif x = y, 0 w o|thBe)rw +is δe. [sent-125, score-0.235]

44 5 → Each tag sequence is represented as a vector z = {zr : r ∈ Itag}, and we denote the set of valid tag sequences, a su}b,s aentd do wf {0, as tZ o. [sent-128, score-0.19]

45 ) Note that this representation is over-complete, since a parse tree determines a unique tagging for a sentence: more explicitly, for any i ∈ {1. [sent-139, score-0.168]

46 rFrerodm to h aesre I on we will make exclusive use of extended index sets for CFG parsing and trigram tagging. [sent-145, score-0.232]

47 4 Two Examples This section describes the dual decomposition approach for two inference problems in NLP. [sent-151, score-0.626]

48 1 Integrated Parsing and Trigram Tagging We now describe the dual decomposition approach for integrated parsing and trigram tagging. [sent-153, score-0.743]

49 aTirhse integrated parsing and trigram tagging problem is then to solve (ym,z)a∈xQ? [sent-155, score-0.345]

50 2 is that it makes explicit the idea of maximizing over all pairs (y, z) under a set of agreement constraints y(i, t) = z(i, t)—this concept will be central to the algorithms in this paper. [sent-161, score-0.215]

51 With this in mind, we note that we have efficient methods for the inference problems of tagging and parsing alone, and that our combined objective almost separates into these two independent problems. [sent-162, score-0.255]

52 The two arg max problems can be solved using dynamic programming. [sent-169, score-0.205]

53 solves the harder optimization problem using an existing CFG parser and trigram tagger. [sent-170, score-0.247]

54 1 we will show that the variables u(i, t) are Lagrange multipliers enforcing agreement constraints, and that the algorithm corre- sponds to a (sub)gradient method for optimization of a dual function. [sent-173, score-0.5]

55 2 Integrating Two Lexicalized Parsers Our second example problem is the integration of a phrase-structure parser with a higher-order dependency parser. [sent-176, score-0.17]

56 First, we define an index set for second-order unlabeled projective dependency parsing. [sent-178, score-0.176]

57 n}, i j} Here (i, j) represents a dependency with head wi = and modifier wj (i = 0 corresponds to the root sym- 1}|Id0ep| bol in the parse). [sent-189, score-0.159]

58 (3) R = {(y, d) : y ∈ H, d ∈ D, y(i, j) = d(i, j) for all (i, j) ∈ Ifirst} This problem has a very similar structure to the problem of integrated parsing and tagging, and we can derive a similar dual decomposition algorithm. [sent-212, score-0.645]

59 R|Ifirst | The Lagrange multipliers u are a vector in enforcing agreement between dependency assignments. [sent-213, score-0.191]

60 5 Marginal Polytopes and LP Relaxations We now give formal guarantees for the algorithms in the previous section, showing that they solve LP relaxations of the problems in Eqs. [sent-216, score-0.394]

61 To make the connection to linear programming, we first introduce the idea of marginal polytopes in section 5. [sent-218, score-0.285]

62 2, we give a precise statement of the LP relaxations that are being solved by the example algorithms, making direct use of marginal polytopes. [sent-221, score-0.322]

63 The set of all possible marginal vectors, known as the marginal polytope, is defined as follows: M = {µ∈Rm : ∃α ∈ ∆ such that µ = Xαyy} Xy∈Y M is also frequently referred to as the convex hull of Y, w isri atltseon as conv(Y). [sent-226, score-0.324]

64 F thoer an arbitrary hsiest Y, rth, ein marginal polytope conv(Y) can ibtrea complex ,to t dees mcrairbgei. [sent-229, score-0.201]

65 We now give an explicit description of the re- µ× sulting constraints for CFG parsing:7 similar constraints arise for other dynamic programming algorithms for parsing, for example the algorithms of Eisner (2000). [sent-233, score-0.552]

66 However, a description of the constraints gives valuable intuition for the structure of the marginal polytope. [sent-235, score-0.192]

67 Figure 2: The linear constraints defining the marginal polytope for CFG parsing. [sent-270, score-0.349]

68 hX → Y Z, i, k, ji in a parse tree, there must be exactly one production higher ien trheee tree trhea mt generates (X, i,j) as one of its children. [sent-271, score-0.172]

69 The marginal polytope for tagging, conv(Z), can alsTo hbee expressed using plien feoarr tcaogngsintrgai,n ctos as (inZ Eq. [sent-284, score-0.201]

70 As a final point, the following theorem gives an important property of marginal polytopes, which we will use at several points in this paper: Theorem 5. [sent-288, score-0.226]

71 n} : ν(i,X) =YX,Z∈Tν((Y,Z) → X,i) ∀X ∈ T : ν(1,X) =Y,XZ∈Tν((X,Y ) → Z,3) ∀X ∈ T : ν(2,X) =Y,XZ∈Tν((Y,X) → Z,3) Figure 3: The linear constraints defining the marginal polytope for trigram POS tagging. [sent-299, score-0.447]

72 The problem maxµ∈conv(Y) · θ is a linear programming problem. [sent-301, score-0.209]

73 Weighted CFG parsing can be framed as a linear programming problem, of the form maxµ∈conv(Y) µ· θ, where conv(Y) is specified by a polynomial numbθe, rw ohfe rlien ecaorn vco(Yns)tr isain sptse. [sent-303, score-0.28]

74 Conversely, dynamic programming algorithms such as the CKY algorithm can be considered to be oracles that efficiently solve LPs of the form maxµ∈conv(Y) · θ. [sent-305, score-0.401]

75 2 Linear Programming Relaxations We now describe the LP relaxations that are solved by the example algorithms in section 4. [sent-308, score-0.298]

76 The LP relaxation then corresponds to the follow- ing optimiza(µtim,oνa)n∈xQ pr0o? [sent-325, score-0.153]

77 In many applications of LeaPs relaxations—including t Ihne examples dcaistciounssse odf in this paper—the relaxation in Eq. [sent-343, score-0.153]

78 In these cases, solving the LP relaxation exactly )s. [sent-347, score-0.189]

79 1 Lagrangian Relaxation We now show that the example algorithms solve their respective LP relaxations given in the previous section. [sent-353, score-0.303]

80 Note that if we set X1 = conv(Y), X2 = conv(Z), and define E and F= to specify t)h,e X agreecmoennvt cZon),st arnadint dse µ(i,t) = ν(i, t), tpheecnif we hea avger teheeLP relaxation in Eq. [sent-363, score-0.19]

81 It is natural to apply Lagrangian relaxation in cases where the sub-problems maxx1∈X1 θ1 · x1 and maxx2∈X2 θ2 · x2 can be efficiently solved by combinatorial algorithms for any values of θ1, θ2, but where the constraints Ex1 = Fx2 “complicate” the problem. [sent-365, score-0.439]

82 rc Wee th inet oladtutecre s Leta gorfa constraints, giving the Lagrangian: , L(u, x1 x2) = θ1 · x1 + θ2 · x2 + u · (Ex1 − Fx2) The dual objective function is L(u) =x1∈Xm1a,xx2∈X2L(u,x1,x2) and the dual problem is to find minu∈Rq L(u). [sent-367, score-0.684]

83 Subgradients are tangent lines that lower bound a function even at points of non-differentiability: formally, a subgradient of a convex function L : Rn → R at a point u is a vector gu svuecxh f tuhnactt fioonr aLll : v, L(v) L(u) + gu · (v u). [sent-376, score-0.298]

84 Subgradient algorithms perform updates that are similar to gradient descent: u(k+1) ← u(k) − αkg(k) where g(k) is the subgradient of L at u(k) and αk > 0 is the step size of the update. [sent-379, score-0.22]

85 Under an appropriate definition of the step sizes αk, it follows that the algorithm in figure 1 defines a sequence of Lagrange multiplers minimizing a dual of the LP relaxation in Eq. [sent-387, score-0.495]

86 2 Recovering the LP Solution The previous section described how the method in figure 4 can be used to minimize the dual L(u) of the original linear program. [sent-393, score-0.412]

87 Each column gives the percentage of sentences whose exact solutions were found in a given range of subgradient iterations. [sent-433, score-0.191]

88 12 This problem is similar to the second example in section 4; a very similar dual decomposition algorithm to that described in section 4. [sent-439, score-0.521]

89 We ran the dual decomposition algorithm with a limit of K = 50 iterations. [sent-444, score-0.521]

90 The dual decomposition algorithm returns an exact solution if case 1oc- curs as defined in section 6. [sent-445, score-0.641]

91 Over 80% ofthe examples converge in 5 iterations or fewer; over 90% converge in 10 iterations or fewer. [sent-449, score-0.19]

92 The dual decomposition method gives a significant gain in precision and recall over the naive combination method, and boosts the performance of Model 1 to a level that is close to some of the best single-pass parsers on the Penn treebank test set. [sent-466, score-0.521]

93 Figure 5 shows performance of the approach as a function of K, the maximum number of iterations of dual decomposition. [sent-469, score-0.389]

94 2 Integrated Phrase-Structure Parsing and Trigram POS tagging In a second experiment, we used dual decomposition to integrate the Model 1 parser with the Stanford max-ent trigram POS tagger (Toutanova and Manning, 2000), using a very similar algorithm to that described in section 4. [sent-483, score-0.755]

95 Table 1 gives statistics showing the speed of convergence for different examples: over 94% of the examples converge to an exact solution in 10 iterations or fewer. [sent-491, score-0.249]

96 Given the widespread use of dynamic programming in NLP, there should be many applications for the approach. [sent-498, score-0.216]

97 Let ν be the convex combination of the following two tag sequences, each with probability 0. [sent-515, score-0.191]

98 This learning rate drops at a rate of 1/2t, where t is≤ ≤the k number of times that the dual increases from one iteration to the next. [sent-540, score-0.342]

99 Approximate primal solutions and rate analysis for dual subgradient methods. [sent-681, score-0.563]

100 A hierarchy of relaxations and convex hull characterizations for mixed-integer zero–one programming problems. [sent-708, score-0.404]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('conv', 0.39), ('dual', 0.342), ('lp', 0.257), ('decomposition', 0.179), ('relaxations', 0.169), ('cfg', 0.156), ('relaxation', 0.153), ('programming', 0.139), ('iuni', 0.135), ('subgradient', 0.13), ('ifirst', 0.118), ('xz', 0.115), ('marginal', 0.114), ('theorem', 0.112), ('lagrange', 0.101), ('polytopes', 0.101), ('yr', 0.101), ('koo', 0.1), ('trigram', 0.098), ('convex', 0.096), ('tag', 0.095), ('primal', 0.091), ('algorithms', 0.09), ('lagrangian', 0.087), ('polytope', 0.087), ('tagging', 0.079), ('combinatorial', 0.079), ('solvers', 0.078), ('constraints', 0.078), ('dynamic', 0.077), ('dependency', 0.077), ('komodakis', 0.072), ('jaakkola', 0.072), ('parsing', 0.071), ('linear', 0.07), ('multipliers', 0.067), ('wainwright', 0.067), ('index', 0.063), ('exact', 0.061), ('collins', 0.06), ('solution', 0.059), ('inference', 0.058), ('sontag', 0.058), ('parser', 0.057), ('formulations', 0.056), ('integrated', 0.053), ('parse', 0.053), ('rm', 0.053), ('rq', 0.052), ('log', 0.051), ('argmy', 0.051), ('ayx', 0.051), ('itag', 0.051), ('lps', 0.051), ('oracles', 0.051), ('vygen', 0.051), ('ha', 0.048), ('converge', 0.048), ('solves', 0.048), ('problems', 0.047), ('agreement', 0.047), ('iterations', 0.047), ('wi', 0.045), ('production', 0.044), ('optimization', 0.044), ('guarantees', 0.044), ('solve', 0.044), ('certificate', 0.043), ('globerson', 0.043), ('korte', 0.043), ('mrf', 0.043), ('provably', 0.043), ('rush', 0.043), ('fractional', 0.042), ('arg', 0.042), ('lexicalized', 0.042), ('programs', 0.04), ('ji', 0.039), ('solved', 0.039), ('specify', 0.037), ('wj', 0.037), ('belief', 0.037), ('tree', 0.036), ('solving', 0.036), ('gu', 0.036), ('integral', 0.036), ('optimality', 0.036), ('projective', 0.036), ('productions', 0.036), ('integration', 0.036), ('pos', 0.035), ('martin', 0.034), ('convergence', 0.034), ('argx', 0.034), ('ayxy', 0.034), ('certificates', 0.034), ('cvo', 0.034), ('dantzig', 0.034), ('iasl', 0.034), ('idep', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

2 0.48840532 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

3 0.14058644 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

4 0.10358308 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira

Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.

5 0.10173455 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

Author: Yoong Keok Lee ; Aria Haghighi ; Regina Barzilay

Abstract: Part-of-speech (POS) tag distributions are known to exhibit sparsity a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorporating this sparsity constraint improves tagging accuracy. However, in existing systems, this expansion come with a steep increase in model complexity. This paper proposes a simple and effective tagging method that directly models tag sparsity and other distributional properties of valid POS tag assignments. In addition, this formulation results in a dramatic reduction in the number of model parameters thereby, enabling unusually rapid training. Our experiments consistently demonstrate that this model architecture yields substantial performance gains over more complex tagging — counterparts. On several languages, we report performance exceeding that of more complex state-of-the art systems.1

6 0.093751237 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

7 0.083965287 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

8 0.081386641 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

9 0.081124552 114 emnlp-2010-Unsupervised Parse Selection for HPSG

10 0.079840869 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

11 0.079744898 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

12 0.076419219 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

13 0.075549655 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

14 0.072633766 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

15 0.068749264 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

16 0.068728112 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

17 0.062237728 94 emnlp-2010-SCFG Decoding Without Binarization

18 0.058478545 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

19 0.054050073 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

20 0.052528728 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.216), (1, 0.104), (2, 0.285), (3, -0.091), (4, 0.023), (5, 0.226), (6, 0.003), (7, 0.322), (8, -0.001), (9, -0.233), (10, -0.373), (11, 0.238), (12, -0.242), (13, -0.15), (14, 0.015), (15, -0.057), (16, -0.092), (17, 0.075), (18, -0.039), (19, -0.008), (20, -0.133), (21, 0.014), (22, 0.062), (23, -0.034), (24, 0.055), (25, 0.031), (26, -0.042), (27, -0.031), (28, 0.002), (29, 0.042), (30, 0.009), (31, -0.057), (32, -0.053), (33, -0.053), (34, 0.004), (35, 0.027), (36, 0.049), (37, -0.019), (38, -0.009), (39, 0.02), (40, -0.024), (41, 0.031), (42, -0.058), (43, -0.014), (44, -0.06), (45, 0.011), (46, 0.036), (47, -0.009), (48, 0.059), (49, -0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96338689 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

2 0.9601478 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

3 0.47043481 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

4 0.25598463 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

Author: Michael Lamar ; Yariv Maron ; Elie Bienenstock

Abstract: We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are computed from the immediate left and right context of each word in the corpus, are updated based on the previous state of the cluster assignments. The LDC algorithm is simple and intuitive. Using standard evaluation criteria for unsupervised POS tagging, LDC shows a substantial improvement in performance over state-of-the-art methods, along with a several-fold reduction in computational cost.

5 0.23484072 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

Author: Phil Blunsom ; Trevor Cohn

Abstract: Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency grammars, however the models employed are overly simplistic, particularly in comparison to supervised parsing models. In this paper we present an approach to dependency grammar induction using tree substitution grammar which is capable of learning large dependency fragments and thereby better modelling the text. We define a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions. This approach significantly improves the state-of-the-art, when measured by head attachment accuracy.

6 0.22701663 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

7 0.22526208 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

8 0.22088981 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

9 0.21595213 114 emnlp-2010-Unsupervised Parse Selection for HPSG

10 0.21002625 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

11 0.20902321 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

12 0.18915142 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

13 0.18625817 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

14 0.18561803 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

15 0.18485099 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

16 0.18203633 94 emnlp-2010-SCFG Decoding Without Binarization

17 0.18060896 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

18 0.1802998 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

19 0.16835582 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

20 0.15833187 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.013), (12, 0.018), (14, 0.044), (29, 0.073), (30, 0.014), (32, 0.019), (52, 0.034), (56, 0.057), (62, 0.483), (66, 0.066), (72, 0.031), (76, 0.026), (77, 0.01), (79, 0.016), (87, 0.013), (89, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88131791 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

Author: Alexander M Rush ; David Sontag ; Michael Collins ; Tommi Jaakkola

Abstract: This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram part-of-speech tagger.

2 0.7442838 90 emnlp-2010-Positional Language Models for Clinical Information Retrieval

Author: Florian Boudin ; Jian-Yun Nie ; Martin Dawes

Abstract: The PECO framework is a knowledge representation for formulating clinical questions. Queries are decomposed into four aspects, which are Patient-Problem (P), Exposure (E), Comparison (C) and Outcome (O). However, no test collection is available to evaluate such framework in information retrieval. In this work, we first present the construction of a large test collection extracted from systematic literature reviews. We then describe an analysis of the distribution of PECO elements throughout the relevant documents and propose a language modeling approach that uses these distributions as a weighting strategy. In our experiments carried out on a collection of 1.5 million documents and 423 queries, our method was found to lead to an improvement of 28% in MAP and 50% in P@5, as com- pared to the state-of-the-art method.

3 0.70356035 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

4 0.67422247 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

5 0.45357493 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

Author: Andre Martins ; Noah Smith ; Eric Xing ; Pedro Aguiar ; Mario Figueiredo

Abstract: We present a unified view of two state-of-theart non-projective dependency parsers, both approximate: the loopy belief propagation parser of Smith and Eisner (2008) and the relaxed linear program of Martins et al. (2009). By representing the model assumptions with a factor graph, we shed light on the optimization problems tackled in each method. We also propose a new aggressive online algorithm to learn the model parameters, which makes use of the underlying variational representation. The algorithm does not require a learning rate parameter and provides a single framework for a wide family of convex loss functions, includ- ing CRFs and structured SVMs. Experiments show state-of-the-art performance for 14 languages.

6 0.3784312 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

7 0.34635431 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

8 0.34471607 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

9 0.3416298 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

10 0.3378121 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

11 0.3206332 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

12 0.31995404 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

13 0.31895572 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

14 0.31895408 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

15 0.31111372 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

16 0.30490589 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

17 0.30282074 80 emnlp-2010-Modeling Organization in Student Essays

18 0.30176997 94 emnlp-2010-SCFG Decoding Without Binarization

19 0.30154315 68 emnlp-2010-Joint Inference for Bilingual Semantic Role Labeling

20 0.29952976 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction