acl acl2013 acl2013-260 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
Reference: text
sentIndex sentText sentNum sentScore
1 Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. [sent-5, score-0.329]
2 ) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. [sent-9, score-0.348]
3 We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. [sent-12, score-0.446]
4 Figure 1: Each node contains a local upper bound for its subspace, computed by a relaxation. [sent-38, score-0.335]
5 Others have achieved accuracy improvements by enforcing linguistically motivated posterior constraints on the parameters (Gillenwater et al. [sent-57, score-0.406]
6 We can also impose posterior constraints on the latent structure. [sent-65, score-0.376]
7 As we show, maximizing the joint loglikelihood of the parses and the parameters can be formulated as a mathematical program (MP) with a nonconvex quadratic objective and with integer linear and nonlinear constraints. [sent-66, score-0.847]
8 1 To globally optimize the objective function, we employ a branch-and-bound algorithm that searches the continuous space of the model parameters by branching on individual parameters (see Figure 1). [sent-68, score-0.251]
9 We compute an upper bound at each node by solving a relaxed maximization problem tailored to its box. [sent-74, score-0.374]
10 At each node, our relaxation derives a linear programming problem (LP) that can be efficiently solved by the dual simplex method. [sent-77, score-0.356]
11 First, we linearly relax the constraints that grammar rule probabilities sum to 1—these constraints are nonlinear in our parameters, which are log-probabilities. [sent-78, score-0.625]
12 Finally, if the node is not pruned, we search for a better incumbent solution under that node by projecting the solution of the RLT relaxation back onto the feasible region. [sent-80, score-0.932]
13 Furthermore, we show how posterior constraints inspired by Gillenwater et al. [sent-88, score-0.339]
14 This is why nonconvex quadratic optimization by LP-based branch-and-bound usually fails with more than 80 variables (Burer and Vandenbussche, 2009). [sent-95, score-0.565]
15 2 The Constrained Optimization Task We begin by describing how for our typical model, the Viterbi EM objective can be formulated as a mixed integer quadratic programming (MIQP) problem with nonlinear constraints (Figure 2). [sent-98, score-0.766]
16 However, the objective be- comes quadratic in unsupervised learning, because the feature counts are themselves unknown variables to be optimized. [sent-102, score-0.417]
17 The nonlinear constraints ensure that the model parameters are true log-probabilities. [sent-106, score-0.407]
18 The linear constraints in (3) will ensure that the arc variables for each sentence es encode a valid latent dependency tree, and that the f variables count up the features of these trees. [sent-109, score-0.691]
19 The final constraints (4) simply specify the range of possible values for the model parameters and their integer count variables. [sent-110, score-0.392]
20 We encode the DMV using integer linear constraints on the arc variables e and feature counts f. [sent-113, score-0.569]
21 The constraints must declaratively specify that the arcs form a valid dependency tree and that the resulting feature values are as defined by the DMV. [sent-115, score-0.42]
22 Tree Constraints To ensure that our arc variables, es, form a dependency tree, we employ the same single-commodity flow constraints of Magnanti and Wolsey (1994) as adapted by Martins et al. [sent-116, score-0.409]
23 The single-commodity flow constraints simultaneously enforce that each node has exactly one parent, the special root node (position 0) has no incoming arcs, and the arcs form a connected graph. [sent-120, score-0.593]
24 The constraints below specify that the root node emits Ns units of flow (5), that one unit of flow is consumed by each each node (6), that the flow is zero on each disabled arc (7), and that the arcs are binary variables (8). [sent-122, score-0.865]
25 The constraint (10) defines the feature count for model parameter θroot,t as the number of all enabled arcs connecting the root node to a word of type t, summing over all sentences s. [sent-145, score-0.267]
26 3 A Branch-and-Bound Algorithm The mixed integer quadratic program with nonlinear constraints, given in the previous section, maximizes the nonconvex Viterbi EM objective and is NP-hard to solve (Cohen and Smith, 2010). [sent-156, score-0.682]
27 Yet local search can only provide a lower (pessimistic) bound on the global maximum. [sent-158, score-0.281]
28 This algorithm may be halted at any time, to obtain the best current solution and a bound on how much better the global optimum could be. [sent-160, score-0.285]
29 A feasible solution is an assignment to all the variables—both model parameters and corpus parse—that satisfies all constraints. [sent-161, score-0.267]
30 Our branchand-bound algorithm maintains an incumbent solution: the best known feasible solution according to the objective function. [sent-162, score-0.424]
31 Our algorithm implicitly defines a search tree in which each node corresponds to a region of model parameter space. [sent-164, score-0.272]
32 In the bounding step, we solve a relaxation of the original problem to provide an upper bound on the objective achievable within that node’s subregion. [sent-167, score-0.502]
33 The overall optimistic bound is given by the worst optimistic bound of all current leaf nodes. [sent-173, score-0.378]
34 The projecting step, if the node is not pruned, projects the solution of the relaxation back to the feasible region, replacing the current incumbent if this projection provides a better lower bound. [sent-174, score-0.692]
35 We do binary branching on the variable θm with the highest regret, defined as zm θmfm, where zm is the auxiliary objective va−ria θble we will introduce in § 4. [sent-179, score-0.3]
36 − 4 Relaxations The relaxation in the bounding step computes an optimistic bound for a subspace of the model parameters. [sent-187, score-0.447]
37 This upper bound would ideally be not much greater than the true maximum achievable on that region, but looser upper bounds are generally faster to compute. [sent-188, score-0.369]
38 447 We present successive relaxations to the orig- inal nonconvex mixed integer quadratic program with nonlinear constraints from (1)–(4). [sent-189, score-0.939]
39 First, we show how the nonlinear sum-to-one constraints can be relaxed into linear constraints and tightened. [sent-190, score-0.678]
40 Second, we apply a classic approach to bound the nonconvex quadratic objective by a linear concave envelope. [sent-191, score-0.88]
41 1 Relaxing the sum-to-one constraint In this section, we use cutting planes to create a linear relaxation for the sum-to-one constraint (2). [sent-195, score-0.369]
42 In most cases, the relaxation is not perfectly tight and so will have an enlarged space of feasible solutions. [sent-199, score-0.348]
43 We begin by weakening constraint (2) to X exp(θm) ≤ 1 mX∈Mc (17) The optimal solution under (17) still satisfies the original equality constraint (2) because of the maximization. [sent-200, score-0.241]
44 Figure 3a shows the feasible region constructed from N=3 linear functions on two logprobabilities θ1 , θ2. [sent-204, score-0.238]
45 The piecewise-linear upper surface is its concave envelope (raised by 20 for illustration; in reality they touch). [sent-219, score-0.428]
46 Although the parameters are not fixed, the branch-and-bound algorithm does box them into a small region, where the quadratic objective is “more linear. [sent-221, score-0.339]
47 ” Since it is easy to maximize a concave function, we will maximize the concave envelope—the concave function that most tightly upper-bounds our objective over the region. [sent-222, score-0.662]
48 Each node of the branch-and-bound tree specifies a region via bounds constraints θmmin < θm < θmmax, ∀m. [sent-225, score-0.569]
49 McCormick (1976) described the concave envelope for a single quadratic term subject to bounds constraints (Figure 3b). [sent-227, score-0.9]
50 The quadratic terms in the objective and in these new constraints are redefined as auxiliary variables, thereby linearizing the program. [sent-234, score-0.554]
51 In this section, we will show how the RLT can be applied to our grammar induction problem and contrast it with the concave envelope relaxation presented in section 4. [sent-235, score-0.602]
52 Consider the original MP in equations (1) (4), with the nonlinear sum-to-one constraints in (2) replaced by our linear constraints proposed in (18). [sent-237, score-0.639]
53 If we remove the integer constraints in (4), the result is a quadratic program with purely linear constraints. [sent-238, score-0.576]
54 For convenience of presentation, we represent both the linear inequality constraints and the bounds constraints, under a different parameterization using the matrix G and vector g. [sent-244, score-0.408]
55 "( −(UbLik−k−+ A xi xkx)k ) ≥ ≥ 0 0, 1 1 ≤ ≤ k i ≤ ≤ m n n#≡h(1g ≤i− i ≤ Gi mx) + ≥ 2 0n,i The reformulation step forms all possible products of these linear constraints and then adds them to the original quadratic program. [sent-245, score-0.575]
56 This yields the following RLT relaxation: inequality constraints (23) and bounds (24), be- cause they are fully enforced by the new RLT constraints (26) from the reformulation step (Sherali and Tuncbilek, 1995). [sent-249, score-0.67]
57 If the original QP contains equality constraints of the form Gex = ge, then we can form constraints by multiplying this one by each variable xi. [sent-251, score-0.509]
58 (26) will impose the concave envelope constraints (20)–(21) (Anstreicher, 2009). [sent-254, score-0.602]
59 The constraints presented above are considered to be first-level constraints corresponding to the first-level variables wij. [sent-255, score-0.588]
60 These higher level constraints/variables have been shown to provide increasingly tighter relaxations (Sherali and Adams, 1990) at the cost of a large number of variables and constraints. [sent-257, score-0.274]
61 In the case where x ∈ {0, 1}n the degree-n RLT constraints will resxtr ∈ict { 0to, 1th}e convex hull of the feasible solutions (Sherali and Adams, 1990). [sent-258, score-0.511]
62 This is in direct contrast to the concave envelope relaxation presented in section 4. [sent-259, score-0.554]
63 2 which relaxes to the convex hull of each quadratic term independently. [sent-260, score-0.308]
64 Yet when we project to a higherdimentional space by including the auxiliary variables, the linearized constraints cut off portions of the feasible region given by only the concave envelope relaxation in eqs. [sent-262, score-1.012]
65 4 Adding Posterior Constraints It is a simple extension to impose posterior constraints within our framework. [sent-265, score-0.339]
66 Here we emphasize constraints that are analogous to the universal linguistic constraints from Naseem et al. [sent-266, score-0.474]
67 Projections A pessimistic bound, from the projecting step, will correspond to a feasible but not necessarily optimal solution to the original problem. [sent-277, score-0.31]
68 We propose several methods for obtaining pessimistic bounds during the branch-and-bound search, by projecting and improving the solutions found by the relaxation. [sent-278, score-0.262]
69 A solution to the relaxation may be infeasible in the original problem for two reasons: the model parameters might not sum to one, and/or the parse may contain fractional edges. [sent-279, score-0.433]
70 Given a relaxed joint solution to the parameters and the latent variables, one must be able to project it to a nearby feasible one, by projecting either the fractional parameters or the fractional latent variables into the feasible space and then solving exactly for the other. [sent-288, score-0.879]
71 Gimpel and Smith (2012) proposed a concave model for unsupervised dependency parsing using IBM Model 1. [sent-290, score-0.283]
72 By contrast, our approach incorporates the tree constraints directly into our convex relaxation and embeds the relaxation in a branch-and-bound algorithm capable of solving the original DMV maximum-likelihood estimation problem. [sent-292, score-0.778]
73 Several integer linear programming (ILP) formulations of dependency parsing (Riedel and Clarke, 2006; Martins et al. [sent-300, score-0.275]
74 Branch-and-bound has also been applied to semi-supervised SVM training, a nonconvex search problem (Chapelle et al. [sent-307, score-0.268]
75 We add posterior constraints to Viterbi EM’s E- step. [sent-318, score-0.339]
76 First, we run a relaxed linear programming (LP) parser, then project the (possibly fractional) parses back to the feasible region. [sent-319, score-0.315]
77 The posterior constraint in the LP parser is tighter4 than the one used in the true optimization problem, so the projections tends to be feasible under the true (looser) posterior constraints. [sent-321, score-0.418]
78 1k uses the concave envelope and also randomly samples 1,000 other RLT constraints, and so on for Max. [sent-334, score-0.365]
79 0 Proportion of RLT rows included Figure 4: The bound quality at the root improves as the proportion of RLT constraints increases, on 5 synthetic sentences. [sent-353, score-0.473]
80 A random subset of 70% of the 320,126 possible RLT constraints matches the relaxation quality of the full set. [sent-354, score-0.426]
81 This bound is very tight: the relaxations in Figure 5 solve hundreds of nodes before such a bound is achieved. [sent-355, score-0.371]
82 Figure 5: The global upper and lower bounds improve over time for branch-and-bound using different subsets of RLT constraints on 5 synthetic sentences. [sent-356, score-0.537]
83 ) constraints with a nonzero coefficient for one of the RLT variables zm from the linearized objective. [sent-363, score-0.42]
84 3 Real Data In this section, we compare our global search method to Viterbi EM with random restarts each with or without posterior constraints. [sent-389, score-0.258]
85 In our global search method, like Viterbi EM, the posterior constraints lead to lower log-likelihoods. [sent-399, score-0.451]
86 In both global and local search, the posterior constraints lead to higher accuracies. [sent-403, score-0.437]
87 Viterbi EM with posterior constraints demonstrates the oscillation of incumbent accuracy: starting at 58. [sent-404, score-0.48]
88 0k with posterior constraints obtains the highest overall accuracy of 61. [sent-409, score-0.339]
89 The method can also be used with posterior constraints or a regularized objective. [sent-415, score-0.339]
90 Future work includes algorithmic improvements for solving the relaxation and the development of tighter relaxations. [sent-416, score-0.281]
91 The Dantzig-Wolfe decomposition (Dantzig and Wolfe, 1960) or Lagrangian Relaxation (Held and Karp, 1970) might satisfy both of these goals by pushing the integer tree constraints into a subproblem solved by a dynamic programming parser. [sent-417, score-0.447]
92 Recent work on semidefinite relaxations (Anstreicher, 2009) suggests they may provide tighter bounds at the expense of greater computation time. [sent-418, score-0.269]
93 Perhaps even more important than tightening the bounds at each node are search heuristics (e. [sent-419, score-0.261]
94 A tight linearization and an algorithm for zero-one quadratic programming problems. [sent-434, score-0.393]
95 programming technique for quadratic proOptimization, Taylor Berg-Kirkpatrick, Alexandre Bouchard-C oˆt´ e, DeNero, John DeNero, and Dan Klein. [sent-442, score-0.255]
96 Globally solving box-constrained nonconvex quadratic programs with semidefinite-based finite branch-andbound. [sent-449, score-0.443]
97 Concise integer linear programming formulations for dependency parsing. [sent-576, score-0.275]
98 Computability of global solutions to factorable nonconvex programs: Part I—Convex underestimating problems. [sent-582, score-0.325]
99 A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. [sent-618, score-0.288]
100 A reformulation-convexification approach for solving nonconvex quadratic programming problems. [sent-630, score-0.509]
wordName wordTfidf (topN-words)
[('rlt', 0.469), ('constraints', 0.237), ('nonconvex', 0.219), ('concave', 0.193), ('quadratic', 0.189), ('relaxation', 0.189), ('sherali', 0.187), ('viterbi', 0.173), ('envelope', 0.172), ('incumbent', 0.141), ('bound', 0.134), ('em', 0.115), ('variables', 0.114), ('feasible', 0.112), ('bounds', 0.109), ('node', 0.103), ('relaxations', 0.103), ('nonlinear', 0.103), ('posterior', 0.102), ('dmv', 0.102), ('linearization', 0.091), ('adams', 0.089), ('integer', 0.088), ('solution', 0.088), ('reformulation', 0.087), ('mc', 0.087), ('objective', 0.083), ('mfm', 0.078), ('convex', 0.072), ('spectral', 0.071), ('zm', 0.069), ('spitkovsky', 0.068), ('arcs', 0.068), ('arc', 0.068), ('parameters', 0.067), ('programming', 0.066), ('synthetic', 0.065), ('region', 0.064), ('global', 0.063), ('upper', 0.063), ('linear', 0.062), ('constraint', 0.059), ('smith', 0.059), ('dependency', 0.059), ('projecting', 0.059), ('naseem', 0.058), ('tighter', 0.057), ('fractional', 0.056), ('tree', 0.056), ('optimistic', 0.055), ('cohen', 0.052), ('pessimistic', 0.051), ('search', 0.049), ('grammar', 0.048), ('hanif', 0.047), ('hull', 0.047), ('lglobal', 0.047), ('mmin', 0.047), ('tight', 0.047), ('coin', 0.046), ('auxiliary', 0.045), ('flow', 0.045), ('restarts', 0.044), ('optimization', 0.043), ('solutions', 0.043), ('toy', 0.041), ('simplex', 0.039), ('relaxed', 0.039), ('eisner', 0.037), ('latent', 0.037), ('root', 0.037), ('parses', 0.036), ('subspace', 0.036), ('mmax', 0.036), ('shay', 0.036), ('local', 0.035), ('solving', 0.035), ('equality', 0.035), ('pm', 0.035), ('branching', 0.034), ('martins', 0.034), ('child', 0.033), ('parse', 0.033), ('bounding', 0.033), ('lp', 0.032), ('mx', 0.032), ('gillenwater', 0.032), ('anstreicher', 0.031), ('burer', 0.031), ('dantzig', 0.031), ('fmmax', 0.031), ('fmmin', 0.031), ('lema', 0.031), ('lobjois', 0.031), ('lps', 0.031), ('luque', 0.031), ('magnanti', 0.031), ('tuncbilek', 0.031), ('ulocal', 0.031), ('unsupervised', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
2 0.13280357 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
Author: Kai Liu ; Yajuan Lu ; Wenbin Jiang ; Qun Liu
Abstract: This paper describes a novel strategy for automatic induction of a monolingual dependency grammar under the guidance of bilingually-projected dependency. By moderately leveraging the dependency information projected from the parsed counterpart language, and simultaneously mining the underlying syntactic structure of the language considered, it effectively integrates the advantages of bilingual projection and unsupervised induction, so as to induce a monolingual grammar much better than previous models only using bilingual projection or unsupervised induction. We induced dependency gram- mar for five different languages under the guidance of dependency information projected from the parsed English translation, experiments show that the bilinguallyguided method achieves a significant improvement of 28.5% over the unsupervised baseline and 3.0% over the best projection baseline on average.
3 0.11357526 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
Author: Andre Martins ; Miguel Almeida ; Noah A. Smith
Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).
4 0.10046921 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
Author: Miguel Almeida ; Andre Martins
Abstract: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers.
5 0.098980203 237 acl-2013-Margin-based Decomposed Amortized Inference
Author: Gourab Kundu ; Vivek Srikumar ; Dan Roth
Abstract: Given that structured output prediction is typically performed over entire datasets, one natural question is whether it is possible to re-use computation from earlier inference instances to speed up inference for future instances. Amortized inference has been proposed as a way to accomplish this. In this paper, first, we introduce a new amortized inference algorithm called the Margin-based Amortized Inference, which uses the notion of structured margin to identify inference problems for which previous solutions are provably optimal. Second, we introduce decomposed amortized inference, which is designed to address very large inference problems, where earlier amortization methods become less ef- fective. This approach works by decomposing the output structure and applying amortization piece-wise, thus increasing the chance that we can re-use previous solutions for parts of the output structure. These parts are then combined to a global coherent solution using Lagrangian relaxation. In our experiments, using the NLP tasks of semantic role labeling and entityrelation extraction, we demonstrate that with the margin-based algorithm, we need to call the inference engine only for a third of the test examples. Further, we show that the decomposed variant of margin-based amortized inference achieves a greater reduction in the number of inference calls.
6 0.094751358 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint
7 0.088047147 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs
8 0.085682541 109 acl-2013-Decipherment Complexity in 1:1 Substitution Ciphers
9 0.084792547 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
10 0.079228997 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model
11 0.074500605 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation
12 0.068049252 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
13 0.066652246 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
14 0.066205949 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
15 0.065691181 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
16 0.0647204 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
17 0.064264119 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
18 0.064024515 4 acl-2013-A Context Free TAG Variant
19 0.063495941 325 acl-2013-Smoothed marginal distribution constraints for language modeling
20 0.062460691 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
topicId topicWeight
[(0, 0.164), (1, -0.053), (2, -0.084), (3, 0.002), (4, -0.063), (5, -0.01), (6, 0.074), (7, -0.005), (8, -0.084), (9, -0.077), (10, 0.018), (11, -0.08), (12, -0.047), (13, -0.111), (14, -0.034), (15, -0.036), (16, 0.029), (17, 0.057), (18, 0.047), (19, -0.04), (20, 0.039), (21, 0.062), (22, 0.016), (23, 0.034), (24, -0.072), (25, -0.025), (26, -0.034), (27, 0.051), (28, -0.03), (29, -0.006), (30, 0.11), (31, -0.007), (32, -0.021), (33, 0.031), (34, 0.066), (35, 0.005), (36, 0.009), (37, 0.004), (38, 0.069), (39, 0.01), (40, 0.023), (41, -0.032), (42, 0.004), (43, -0.043), (44, -0.071), (45, 0.052), (46, -0.019), (47, 0.029), (48, -0.066), (49, -0.092)]
simIndex simValue paperId paperTitle
same-paper 1 0.94725811 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
2 0.72607487 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
Author: Andre Martins ; Miguel Almeida ; Noah A. Smith
Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).
3 0.70999086 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint
Author: Jun Suzuki ; Masaaki Nagata
Abstract: This paper proposes a framework of supervised model learning that realizes feature grouping to obtain lower complexity models. The main idea of our method is to integrate a discrete constraint into model learning with the help of the dual decomposition technique. Experiments on two well-studied NLP tasks, dependency parsing and NER, demonstrate that our method can provide state-of-the-art performance even if the degrees of freedom in trained models are surprisingly small, i.e., 8 or even 2. This significant benefit enables us to provide compact model representation, which is especially useful in actual use.
4 0.70190525 237 acl-2013-Margin-based Decomposed Amortized Inference
Author: Gourab Kundu ; Vivek Srikumar ; Dan Roth
Abstract: Given that structured output prediction is typically performed over entire datasets, one natural question is whether it is possible to re-use computation from earlier inference instances to speed up inference for future instances. Amortized inference has been proposed as a way to accomplish this. In this paper, first, we introduce a new amortized inference algorithm called the Margin-based Amortized Inference, which uses the notion of structured margin to identify inference problems for which previous solutions are provably optimal. Second, we introduce decomposed amortized inference, which is designed to address very large inference problems, where earlier amortization methods become less ef- fective. This approach works by decomposing the output structure and applying amortization piece-wise, thus increasing the chance that we can re-use previous solutions for parts of the output structure. These parts are then combined to a global coherent solution using Lagrangian relaxation. In our experiments, using the NLP tasks of semantic role labeling and entityrelation extraction, we demonstrate that with the margin-based algorithm, we need to call the inference engine only for a third of the test examples. Further, we show that the decomposed variant of margin-based amortized inference achieves a greater reduction in the number of inference calls.
5 0.67781323 382 acl-2013-Variational Inference for Structured NLP Models
Author: David Burkett ; Dan Klein
Abstract: unkown-abstract
6 0.66443038 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs
7 0.6211437 349 acl-2013-The mathematics of language learning
8 0.6016171 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model
9 0.59880829 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
10 0.55333424 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
11 0.54573601 311 acl-2013-Semantic Neighborhoods as Hypergraphs
12 0.54359925 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
13 0.52672696 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
14 0.5264352 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
15 0.52048743 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach
16 0.499136 251 acl-2013-Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce
17 0.49670464 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics
18 0.4966858 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
19 0.49014601 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
20 0.48962888 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
topicId topicWeight
[(0, 0.08), (6, 0.051), (11, 0.061), (24, 0.026), (26, 0.061), (28, 0.016), (35, 0.058), (40, 0.257), (42, 0.053), (48, 0.082), (64, 0.011), (70, 0.059), (88, 0.03), (90, 0.022), (95, 0.035)]
simIndex simValue paperId paperTitle
1 0.82116544 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
same-paper 2 0.80529141 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
3 0.7953701 94 acl-2013-Coordination Structures in Dependency Treebanks
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
4 0.77656639 163 acl-2013-From Natural Language Specifications to Program Input Parsers
Author: Tao Lei ; Fan Long ; Regina Barzilay ; Martin Rinard
Abstract: We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0% F-Score accu- , racy compared to an F-Score of 66.7% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing).1
5 0.66625941 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
Author: Yuki Arase ; Ming Zhou
Abstract: We propose a method for automatically detecting low-quality Web-text translated by statistical machine translation (SMT) systems. We focus on the phrase salad phenomenon that is observed in existing SMT results and propose a set of computationally inexpensive features to effectively detect such machine-translated sentences from a large-scale Web-mined text. Unlike previous approaches that require bilingual data, our method uses only monolingual text as input; therefore it is applicable for refining data produced by a variety of Web-mining activities. Evaluation results show that the proposed method achieves an accuracy of 95.8% for sentences and 80.6% for text in noisy Web pages.
6 0.62736338 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
7 0.55968708 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
8 0.5580256 275 acl-2013-Parsing with Compositional Vector Grammars
9 0.55418152 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
10 0.55267805 237 acl-2013-Margin-based Decomposed Amortized Inference
11 0.55164284 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
12 0.54873151 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
13 0.54611254 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
14 0.54501033 225 acl-2013-Learning to Order Natural Language Texts
15 0.54290837 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
16 0.54254574 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
17 0.54199868 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
18 0.53796333 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
19 0.53669852 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
20 0.53640491 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation