emnlp emnlp2011 emnlp2011-45 knowledge-graph by maker-knowledge-mining

45 emnlp-2011-Dual Decomposition with Many Overlapping Components


Source: pdf

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 pt , , Abstract Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. [sent-15, score-0.158]

2 , due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. [sent-18, score-0.264]

3 We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. [sent-20, score-0.342]

4 (2010) applied dual decomposition as a way of combining models which alone permit efficient decoding, but whose combination is intractable. [sent-31, score-0.391]

5 This results in a relaxation of the original problem that is elegantly solved with the subgradient algorithm. [sent-32, score-0.381]

6 With many compo- nents, the subgradient algorithm exhibits extremely slow convergence (cf. [sent-37, score-0.26]

7 Unfortunately, a lightweight decomposition is not always at hand, either because the problem does not factor in a natural way, or because one would like to incorporate features that cannot be easily absorbed in few tractable components. [sent-40, score-0.188]

8 1), a recently proposed algorithm that accelerates dual decomposition (Martins et al. [sent-43, score-0.391]

9 DD-ADMM retains the modularity of the subgradient-based method, but it speeds up consensus by regularizing each slave subproblem towards the averaged votes obtained in the previous round (cf. [sent-45, score-0.318]

10 While this yields more involved subproblems (with a quadratic term), we show that exact solutions can still be efficiently computed for all cases of interest, by using sort operations. [sent-48, score-0.284]

11 (2010) have proposed a dual decomposition framework to address NLP problems in which the global score decomposes as f(y) = f1(z1) + f2 (z2), where z1 and z2 are two overlapping “views” of the output, so that Eq. [sent-71, score-0.466]

12 We nYex ’t {fohzrmalizie ∈ ∈th Yese× nYotio|n zs a∼nd z proceed utom compositions of an arbitrary number of models. [sent-79, score-0.425]

13 For example, in dependency parsing, R can be the set of all possible dependency arcs (see Fig. [sent-87, score-0.139]

14 Consecutive siblings and grandparent parts introduce horizontal and vertical Markovization (McDonald et al. [sent-99, score-0.233]

15 We break the horizontal Markov assumption via all siblings parts and the vertical one through parts which indicate a directed path between two words. [sent-101, score-0.279]

16 Each Ys is associated with its own set of parts Rs, in the same sense as above; we represent the elements of Ys as binary vectors zs = hzs (r)ir∈Rs . [sent-107, score-0.519]

17 , S}; Each zs ∈ Ys is completely defined by its entries iEnadcehxze d by elements of R¯s, from which we can guess the ones in Rs \ R¯s. [sent-121, score-0.425]

18 This implies that each y ∈ Y has a unique decomposition hz1, . [sent-122, score-0.158]

19 1 shows several parts used in dependency parsing models; in phrase-based parsing, these could be spans and production rules anchored in the surface string; in sequence labeling, they can be unigram, bigram, and trigram labels. [sent-127, score-0.169]

20 Two components zs ∈ Ys and zt ∈ Yt are said to be consistent (denoted zs ∼ zt) if they agree on their overlaps, i. [sent-134, score-0.938]

21 With this setup, assumPing that the score function decomposes as f(z) = fs (zs), the decoding problem (which extends PEq. [sent-142, score-0.119]

22 3 Dual Decomposition We describe dual decomposition in a slightly different manner than Rush et al. [sent-155, score-0.391]

23 (5) If the score functions fs are convex, P0 becomes a convex program (unlike P, which is discrete); being a relaxation, it provides an upper bound of P. [sent-185, score-0.128]

24 , λSi Ps:r∈R¯s =i 0, ∀r ∈ R, λs(r) (7) where the gs(λs) arPe the solution values of the following subproblems (the slaves): mw. [sent-203, score-0.274]

25 , S are now decoupled, which makes things easier provided each slave subproblem (Eq. [sent-219, score-0.251]

26 In fact, this is always a concern in the mind of the model’s designer when she chooses a decomposition (the framework that we describe in §3, pino some sense, amlleewvioatreks hhaert wfroem d ethscisr concern). [sent-221, score-0.158]

27 8 becomes a linear program, hfoθr (wrh)iich a solution exists at a vertex of Zs (which P, 3This is guaranteed if the score functions fs are linear. [sent-225, score-0.13]

28 7) yields a remarkably simple algorithm, which at each round t solves the subproblems in Eq. [sent-234, score-0.219]

29 Intuitively, the algorithm pushes for a consensus among the slaves (Eq. [sent-240, score-0.216]

30 The subgradient method is guaranteed to converge to the solution of D (Eq. [sent-243, score-0.282]

31 , 1999); it also provides a certificate of optimality in case the relaxation is tight (i. [sent-245, score-0.214]

32 However, convergence is slow when S is large (as we will show in the experimental section), and no certificates are available when there is a relaxation gap (P < P0). [sent-248, score-0.208]

33 241 Taking a look back at the relaxed primal problem P0 (Eq. [sent-254, score-0.17]

34 5), we see that any primal feasible solution must satisfy the agreement constraints. [sent-255, score-0.177]

35 13 have a closed form, which is precisely the averaging operation performed by the subgradient method (Eq. [sent-274, score-0.227]

36 Like in the subgradient approach, the maximization in Eq. [sent-277, score-0.227]

37 12 can be separated into S independent slave subproblems, which now take the form: maximize fs (zPs) + Pr∈R¯s λs (r)zs (r) −2ρ Pr∈R¯Ps(zs(r) − ut(r))2 w. [sent-278, score-0.209]

38 14 for linear score functions may not be at a vertex (in contrast to the subgradient method). [sent-286, score-0.227]

39 Before going into details, we mention another advantage of ADMM over the subgradient algorithm: it knows when to stop. [sent-289, score-0.227]

40 Recall that the sub- gradient method provides optimality certificates when the relaxation is tight (P = P0) and an exact solution of P has been found. [sent-291, score-0.372]

41 We would like to have similar guarantees concerning the relaxed primal P0. [sent-295, score-0.17]

42 5 A general weakness of subgradient algorithms is that they do not have this capacity, and so are usually stopped by specifying a maximum number of iterations. [sent-296, score-0.227]

43 In contrast, ADMM allows to keep track of primal and dual residuals (Boyd et al. [sent-297, score-0.394]

44 This allows providing certificates not only for the exact solution of P (when the relaxation is tight), but also to terminate when a near optimal solution of the relaxed problem P0 has been found. [sent-299, score-0.366]

45 The primal residual rPt measures the amount by which the agreement constraints are violated: rtP=PsS=1PrP∈R¯s(zstδ(r(r)) − ut(r))2; (15) the dual residual rDt Pis the amount by which a dual optimality condition is violated (see Boyd et al. [sent-300, score-0.767]

46 Problems with many slaves tend to be less exact, hence relaxation gaps are frequent. [sent-309, score-0.254]

47 Also, when decoding is embedded in training, it is useful to obtain the fractional solution of the relaxed primal P (rather than an approximate integer solution). [sent-310, score-0.303]

48 15–16) 12: output: relaxed primal and dual solutions u, z, λ Martins et al. [sent-327, score-0.435]

49 6 4 Solving the Subproblems In this section, we address the slave subproblems of DD-ADMM (Eq. [sent-336, score-0.353]

50 We show how these subproblems can be solved efficiently for several important cases that arise in NLP applications. [sent-338, score-0.268]

51 It is also the scenario studied in previous work in dual decomposition (Rush et al. [sent-343, score-0.391]

52 Under this as- sumption, and discarding constant terms, the slave subproblem in Eq. [sent-345, score-0.251]

53 03 and then increase/decrease ρ by a factor of 2 whenever the primal residual becomes > 10 times larger/smaller than the dual residual. [sent-353, score-0.387]

54 the grandparent, sibling, and head bigram parts in Fig. [sent-362, score-0.128]

55 Many problems involve constraining variables to take a single value: for example, in dependency parsing, a modifier can only take one head. [sent-374, score-0.124]

56 YXOR ∈ zi The convPex hull of YXOR ∈is { ZXOR = {P Phz1 , . [sent-385, score-0.169]

57 9 Up to a constant, Pthe slave subproblem becomes: minimize w. [sent-391, score-0.251]

58 10 7This matches the asymptotic time that would be necessary to solve the corresponding problems in the subgradient method, for which algorithms are straightforward to derive. [sent-402, score-0.26]

59 The point is that with ADMM fewer instances of these subproblems need to be solved, due to faster convergence of the master problem. [sent-403, score-0.252]

60 , zni {0, 1}n | Win=1 zi = 1}, YOR ∈ and t=he Pconvex hulli i s∈ ZOR = {Whz1 , . [sent-419, score-0.254]

61 , n, set zi = 1 zi0 if the ith variable is negated, and zi = zi0 o=the 1r−wizse. [sent-459, score-0.268]

62 f oTwhe t on od)n-, abnasdic l parts are (thhe, pairwise dfaiccatotirnsg s tihbatl,( uhp, m, sos),i tgiornan k,d t(hge, lha, smt s)e, annd m bodiigfirearm o(fb h, h o,c mcu)r ; as well as each logical formula. [sent-482, score-0.131]

63 Columns 3–4 indicate the number of parts of each kind, and the time complexity for solving each subproblem. [sent-483, score-0.137]

64 , zni ∈ [0, 1]n | z0 ≥ Pin=1 zi, z0 ≤ zi, ∀ {i = 1, . [sent-489, score-0.12]

65 The only disadvantage of DDADMM in comparison with the subgradient algorithm is that there is not an obvious way of solving the subproblem in Eq. [sent-519, score-0.387]

66 Hence the method may still work if the slaves are solved numerically up to some accuracy. [sent-524, score-0.198]

67 Note that a subgradient-based method could handle some of those parts efficiently (arcs, consecutive siblings, grandparents, and head bigrams) by composing arc-factored models, head automata, and a sequence labeler. [sent-542, score-0.222]

68 However, no lightweight decomposition seems possible for incorporating parts for all siblings, directed paths, and non-projective arcs. [sent-543, score-0.282]

69 The last two columns show a consistent improvement (with the exceptions of Chinese and Arabic) when using the full set of features over a second order model with grandparent and consecutive siblings, which is our reproduction of the model of Koo et al. [sent-562, score-0.147]

70 (2009a) also incorporated consecutive siblings in one of their configurations, our constraints are tighter than theirs. [sent-565, score-0.193]

71 2 compares the performance of DD-ADMM and the subgradient algorithms in the validation section of the PTB. [sent-603, score-0.227]

72 14 For the second order model, the subgradient 14The learning rate in the subgradient method was set as ηt = η0/(1 + Nincr (t)), as in Koo et al. [sent-604, score-0.454]

73 (2010), where Nincr (t) is the number of dual increases up to the tth iteration, and η0 is chosen to maximize dual decrease after 20 iterations (in a per sentence basis). [sent-605, score-0.466]

74 (2010): it has a slave imposing the TREE constraint (whose subproblems consists on finding a minimum spanning tree) and several for the all-sibling parts, yielding an average number of 310. [sent-609, score-0.386]

75 The ADMM method has many more slaves due to the multicommodity flow constraints (average 1870. [sent-612, score-0.191]

76 The reason is the number of slaves: in this configuration and dataset the average number of slaves per instance is 3327. [sent-615, score-0.149]

77 On the contrary, the ADMM method keeps a robust performance, with a large fraction of optimality certificates in early iterations. [sent-617, score-0.143]

78 If by iteration t all variables in a subproblem s are idle, then zst+1 (r) = does not need to be zst (r), hence the subproblem resolved. [sent-642, score-0.386]

79 3 shows that 15Even if not all variables are idle in s, caching may still be useful: note that the z-updates in Eq. [sent-644, score-0.127]

80 14 tend to be sparse for the subproblems described in §4 (these are Euclidean projections osnubtop polytopes ewscithri b0e/1d v ienr §ti4ce s(t,h wesheic ahr tee Endu tcloi dh ieta corners). [sent-645, score-0.249]

81 iAonn-s other trick that may accelerate the algorithm is warm-starting: since many subproblems involve a sort operation, storing the sorted indexes may speedup the next round. [sent-646, score-0.219]

82 246 cve%tia184620 02 40FuItlerAatDioMns6%M %0 a c tiv e vsmau8srb0gpsroblems10 Figure 3: Fraction of active variables, subproblems and messages along DD-ADMM iterations (full model). [sent-647, score-0.219]

83 many variables and subproblems are left untouched after the first few rounds. [sent-658, score-0.266]

84 4 compares the runtimes of our implementation of DD-ADMM with those achieved by a state-of-the-art LP solver, CPLEX, in its best performing configuration: the simplex algorithm applied to the dual LP. [sent-660, score-0.315]

85 6 Related Work Riedel and Clarke (2006) first formulated dependency parsing as an integer program, along with logical constraints. [sent-666, score-0.146]

86 t)10 Figure 2: UAS including punctuation (left) and fraction of optimality certificates (right) accross iterations of the subgradient and DD-ADMM algorithms, in PTB §22. [sent-668, score-0.37]

87 “Full” is our full model; “Sec Ord” is a second-order model swuibthg grandparents aDn-dA aDllM siblings, rfiotrh mwsh,ic inh PthTe subgradient lm”e itsh ooudr uses a coarser decomposition ewciothn a oTRrdEeEr mfaoctdoer. [sent-669, score-0.424]

88 l Since subgradient and DD-ADMM are solving the same problems, the solid lines (as the dashed ones) would meet in the limit, however subgradient converges very slowly for the full model. [sent-670, score-0.497]

89 The right plot shows optimality certificates for both methods, indicating that an exact solution of P has been found; for DD-ADMM we also plot the fraction of instances that converged to an accurate solution of P0 (primal and dual residuals < 10−3) and hence can be stopped. [sent-671, score-0.589]

90 (2010) proposed a subgradient-based dual decomposition method that elegantly combines head automata with maximum spanning tree algorithms; these parsers, as well as the loopy belief propagation method of Smith and Eisner (2008), are all instances of turbo parsers (Martins et al. [sent-675, score-0.522]

91 Other methods have been recently proposed to accelerate dual decomposition, such as Jojic et al. [sent-681, score-0.233]

92 (2010) and Meshi and Globerson (201 1) (the latter applying ADMM in the dual rather than the primal). [sent-682, score-0.233]

93 While our paper shows limitations of the subgradient method when there are many overlapping components, this method may still be advantageous over ADMM in problems that are nicely decomposable, since it often allows reusing existing combinatorial machinery. [sent-683, score-0.363]

94 Since exact decoding is intractable, we solve an LP relaxation through a recently proposed consensus al247 gorithm, DD-ADMM, which is suitable for problems with many overlapping components. [sent-688, score-0.324]

95 DD-ADMM compares favourably against the subgradient method in several aspects: it is faster to reach a consensus, it has better stopping conditions, and it works better in non-lightweight decompositions. [sent-691, score-0.227]

96 While its slave subproblems are more involved, we derived closedform solutions for many cases of interest, such as first-order logic formulas and combinatorial factors. [sent-692, score-0.486]

97 A dual algorithm for the solution ofnonlinear variational problems via finite element approximation. [sent-836, score-0.321]

98 Exact decoding of syntactic translation models through lagrangian relaxation. [sent-1067, score-0.13]

99 On dual decomposition and linear programming relaxations for natural language processing. [sent-1075, score-0.484]

100 A fast finite-state relaxation method for enforcing global constraints on sequence decoding. [sent-1123, score-0.147]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zs', 0.425), ('martins', 0.297), ('dual', 0.233), ('subgradient', 0.227), ('subproblems', 0.219), ('admm', 0.179), ('decomposition', 0.158), ('rush', 0.157), ('slaves', 0.149), ('koo', 0.14), ('slave', 0.134), ('zi', 0.134), ('primal', 0.122), ('zni', 0.12), ('subproblem', 0.117), ('rs', 0.106), ('relaxation', 0.105), ('zst', 0.105), ('parts', 0.094), ('siblings', 0.091), ('pss', 0.09), ('lagrangian', 0.086), ('ut', 0.083), ('ys', 0.079), ('fs', 0.075), ('bertsekas', 0.075), ('glowinski', 0.075), ('optimality', 0.073), ('certificates', 0.07), ('consensus', 0.067), ('pin', 0.067), ('boyd', 0.065), ('cplex', 0.064), ('combinatorial', 0.061), ('consecutive', 0.06), ('smith', 0.058), ('turbo', 0.058), ('solution', 0.055), ('relaxations', 0.055), ('zt', 0.054), ('convex', 0.053), ('ptb', 0.052), ('arcs', 0.051), ('solved', 0.049), ('relaxed', 0.048), ('grandparent', 0.048), ('variables', 0.047), ('instituto', 0.047), ('negated', 0.046), ('ablation', 0.045), ('gabay', 0.045), ('idle', 0.045), ('rdt', 0.045), ('zsi', 0.045), ('decoding', 0.044), ('dependency', 0.044), ('lagrange', 0.043), ('simplex', 0.043), ('solving', 0.043), ('constraints', 0.042), ('overlapping', 0.042), ('pr', 0.042), ('logic', 0.04), ('parsers', 0.039), ('residuals', 0.039), ('quantification', 0.039), ('runtimes', 0.039), ('figueiredo', 0.039), ('grandparents', 0.039), ('nonlinear', 0.039), ('reproduction', 0.039), ('collins', 0.038), ('sontag', 0.038), ('programming', 0.038), ('logical', 0.037), ('tight', 0.036), ('caching', 0.035), ('win', 0.035), ('hull', 0.035), ('komodakis', 0.035), ('alternating', 0.035), ('runtime', 0.035), ('augmented', 0.034), ('integer', 0.034), ('head', 0.034), ('components', 0.034), ('yielding', 0.033), ('exact', 0.033), ('convergence', 0.033), ('problems', 0.033), ('residual', 0.032), ('solutions', 0.032), ('ps', 0.031), ('parsing', 0.031), ('lp', 0.031), ('indicating', 0.031), ('lightweight', 0.03), ('projections', 0.03), ('uas', 0.03), ('ir', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 45 emnlp-2011-Dual Decomposition with Many Overlapping Components

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1

2 0.27856314 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Author: Yin-Wen Chang ; Michael Collins

Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.

3 0.11863629 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

Author: Sebastian Riedel ; Andrew McCallum

Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.

4 0.11782721 129 emnlp-2011-Structured Sparsity in Structured Prediction

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.

5 0.10866226 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems

Author: Moshe Dubiner ; Yoram Singer

Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.

6 0.10797843 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

7 0.10368689 9 emnlp-2011-A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels

8 0.086084202 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

9 0.077547394 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser

10 0.076764658 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

11 0.064385638 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling

12 0.063985199 96 emnlp-2011-Multilayer Sequence Labeling

13 0.062729545 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation

14 0.061729692 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

15 0.060572617 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

16 0.05626436 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

17 0.055655591 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests

18 0.050633639 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax

19 0.049985621 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

20 0.049972016 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.189), (1, 0.05), (2, -0.027), (3, 0.072), (4, 0.066), (5, 0.017), (6, -0.007), (7, -0.359), (8, -0.01), (9, -0.02), (10, -0.033), (11, -0.115), (12, 0.233), (13, -0.253), (14, -0.031), (15, -0.171), (16, -0.061), (17, -0.104), (18, -0.027), (19, -0.107), (20, -0.102), (21, -0.08), (22, -0.012), (23, -0.039), (24, -0.025), (25, 0.02), (26, 0.026), (27, -0.089), (28, -0.044), (29, -0.108), (30, -0.016), (31, 0.064), (32, 0.009), (33, -0.113), (34, -0.03), (35, -0.022), (36, -0.049), (37, -0.05), (38, -0.066), (39, -0.038), (40, -0.059), (41, -0.076), (42, 0.003), (43, -0.004), (44, -0.052), (45, 0.005), (46, 0.065), (47, 0.059), (48, -0.064), (49, -0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95438838 45 emnlp-2011-Dual Decomposition with Many Overlapping Components

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1

2 0.78171766 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Author: Yin-Wen Chang ; Michael Collins

Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.

3 0.77044463 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems

Author: Moshe Dubiner ; Yoram Singer

Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.

4 0.50151086 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

Author: Sebastian Riedel ; Andrew McCallum

Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.

5 0.38270736 129 emnlp-2011-Structured Sparsity in Structured Prediction

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.

6 0.33313164 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

7 0.30465189 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

8 0.29661137 9 emnlp-2011-A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels

9 0.29397672 96 emnlp-2011-Multilayer Sequence Labeling

10 0.25450206 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser

11 0.23098426 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling

12 0.23031814 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

13 0.22085428 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

14 0.21273981 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

15 0.21248701 32 emnlp-2011-Computing Logical Form on Regulatory Texts

16 0.20530416 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

17 0.20439848 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

18 0.20026888 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing

19 0.19546616 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types

20 0.19297723 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.08), (36, 0.027), (37, 0.019), (45, 0.049), (53, 0.016), (54, 0.019), (57, 0.011), (62, 0.027), (64, 0.444), (66, 0.019), (69, 0.05), (79, 0.043), (82, 0.053), (87, 0.012), (96, 0.036), (98, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9209196 45 emnlp-2011-Dual Decomposition with Many Overlapping Components

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1

2 0.84723067 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

Author: Ryan McDonald ; Slav Petrov ; Keith Hall

Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.

3 0.84262437 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith

Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.

4 0.51374406 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

Author: Sebastian Riedel ; Andrew McCallum

Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.

5 0.49557918 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization

Author: Taylor Berg-Kirkpatrick ; Dan Klein

Abstract: We present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information.

6 0.45203328 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

7 0.44414908 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

8 0.43372503 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

9 0.43082368 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

10 0.42376095 77 emnlp-2011-Large-Scale Cognate Recovery

11 0.42375675 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

12 0.41407678 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

13 0.41090095 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation

14 0.4106715 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion

15 0.40916589 64 emnlp-2011-Harnessing different knowledge sources to measure semantic relatedness under a uniform model

16 0.40369949 136 emnlp-2011-Training a Parser for Machine Translation Reordering

17 0.40277678 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

18 0.39910087 129 emnlp-2011-Structured Sparsity in Structured Prediction

19 0.39870298 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests

20 0.39864534 118 emnlp-2011-SMT Helps Bitext Dependency Parsing