acl acl2011 acl2011-340 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hui Lin ; Jeff Bilmes
Abstract: We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches.
Reference: text
sentIndex sentText sentNum sentScore
1 hl in@ ee washingt on edu Abstract We cast the word alignment problem as maximizing a submodular function under matroid constraints. [sent-4, score-1.426]
2 Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. [sent-5, score-0.737]
3 We show that submodularity naturally arises when modeling word fertility. [sent-6, score-0.204]
4 Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches. [sent-7, score-0.525]
5 1 Introduction Word alignment is a key component in most statistical machine translation systems. [sent-8, score-0.218]
6 While classical approaches for word alignment are based on generative models (e. [sent-9, score-0.218]
7 , 1996)), word alignment can also be viewed as a matching problem, where each word pair is associated with a score reflecting the desirability of aligning that pair, and the alignment is then the highest scored matching under some constraints. [sent-13, score-0.703]
8 Melamed (2000) introduces the competitive linking algorithm which greedily constructs matchings under the one-to-one mapping assumption. [sent-15, score-0.129]
9 , 2004), matchings are found using an algorithm for constructing a maximum weighted bipartite graph matching (Schrijver, 2003), where word pair scores come from alignment posteriors of generative models. [sent-17, score-0.374]
10 (2005) cast word alignment as a maximum weighted matching problem and propose a 170 Jeff Bilmes Dept. [sent-19, score-0.335]
11 of Electrical Engineering University of Washington Seattle, WA 98195, USA bilme s @ ee . [sent-20, score-0.059]
12 edu framework for learning word pair scores as a function of arbitrary features of that pair. [sent-22, score-0.058]
13 These approaches, however, have two potentially substantial limitations: words have fertility of at most one, and interactions between alignment decisions are not representable. [sent-23, score-0.497]
14 (2006) address this issue by formulating the alignment problem as a quadratic assignment problem, and off-the-shelf integer linear programming (ILP) solvers are used to solve to optimization problem. [sent-25, score-0.273]
15 While efficient for some median scale problems, ILP-based approaches are limited since when modeling more sophisticated interactions, the number of variables (and/or constraints) required grows polynomially, or even exponentially, making the resultant optimization impractical to solve. [sent-26, score-0.06]
16 In this paper, we treat the word alignment problem as maximizing a submodular function subject to matroid constraints (to be defined in Section 2). [sent-27, score-1.423]
17 Submodular objective functions can represent complex interactions among alignment decisions, and essentially extend the modular (linear) objectives used in the aforementioned approaches. [sent-28, score-0.309]
18 This is because maximizing a monotone submodular function under a matroid constraint can be solved efficiently using a simple greedy algorithm. [sent-30, score-1.456]
19 The greedy algorithm, moreover, is a constant factor approximation algorithm that guarantees a near-optimal solution. [sent-31, score-0.298]
20 In this paper, we moreover show that submodularity naturally arises in word alignment problems when modeling word fertility (see Section 4). [sent-32, score-0.647]
21 Experiment results on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to the maximum weighted matching approach, while being at least 50 times Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-33, score-0.525]
22 2 Background Matroids and submodularity both play important roles in combinatorial optimization. [sent-36, score-0.174]
23 Matroids are combinatorial structures that generalize the notion of linear independence in matrices. [sent-38, score-0.085]
24 A pair (V, I) is called a matroid if V is a finite ground set a(nVd, II)is a nonempty collection of subsets of V that are independent. [sent-39, score-0.568]
25 In particular, I must satisfy (i) if X ⊂ Y and Y ∈ I X ∈ I, (ii) if X, Y ∈ I then andX |X| < |Y | tYhen ∈ X I∪ {e} ∈ I∈ f Ior some e ∈ Y \X. [sent-40, score-0.073]
26 We typically r |efer toX a ∪m{aet}ro ∈id I by listing iets ∈ ground set and its family of independent sets: M = (V, I). [sent-41, score-0.096]
27 lar (Edmonds, 1970) if it →sati Rsfies the property of diminishing returns: for any X Y V \ v, a submodular function f must satisfy f(X + v) −f(X) ≥ f(Y + v) − f(Y ). [sent-43, score-0.707]
28 That is, the fin(cXre+mev)n−talf “(vXa)lue ≥” fof( v d +ec vre)a −ses f as t )he context in which v is considered grows from X to Y. [sent-44, score-0.033]
29 If this is satisfied everywhere with equality, then the function f is called modular. [sent-45, score-0.085]
30 A set function f is monotone nondecreasing if ∀X Y, f(X) ≤ f(Y ). [sent-46, score-0.283]
31 As shorthand, in this paper, ⊆mo Ynotofn(eX nondecreasing submodular functions will simply be referred to as monotone submodular. [sent-47, score-0.727]
32 Historically, submodular functions have their roots in economics, game theory, combinatorial optimization, and operations research. [sent-48, score-0.557]
33 More recently, submodular functions have started receiving attention in the ⊆ ⊆ ⊆ × machine learning and computer vision community (Kempe et al. [sent-49, score-0.502]
34 3 Approach We are given a source language (English) string e1I = e1, · · · , ei, · · · , eI and a target language (French) string ,fe1J = f1, · · · , fj , · · · , fJ that have to be aligned. [sent-52, score-0.134]
35 Define t,h·e· ·w ,ofrd, positions in the English 171 string as set E {1, · · · , I} and positions in the French string as ,set {F1 {1, · · · , J}. [sent-53, score-0.162]
36 An alignment A between the two wFor ,d strings can then be seen as a subset of the Cartesian product of the word positions, i. [sent-54, score-0.218]
37 , A ⊆ {(i, j) : i ∈ E, j ∈ F} V, and V = E FA Ai s⊆ ⊆th {e( ground s ∈et. [sent-56, score-0.066]
38 EF,ojr convenience, we rVefe =r tEo e× le Fment (i, j) ∈ A as an edge that connects i and j in alignment jA). [sent-57, score-0.264]
39 , ,·· , × Restricting the fertility of word fj to be at most kj is mathematically equivalent to having |A ∩ PjE| ≤ kj, where A ⊆ V is an alignment and PjE = E×{j}. [sent-58, score-0.672]
40 Intuitively, PjE is the set of all possible edges in the ground set that connect to j, and the cardinality of the intersection between A and PjE indicates how many edges in A are connected to j. [sent-59, score-0.066]
41 Similarly, we can impose constraints on the fertility of English words by constraining the alignment A to satisfy |A ∩ PiF| ≤ ki for i ∈ E where PiF = {i} F. [sent-60, score-0.662]
42 |NAo t∩e Pthat| |e ≤ithe kr of {PjE : j ∈ F} or {P=iF : i } ∈ E} constitute a partition Pof V:. [sent-61, score-0.04]
43 Suppose we have a set function f : 2V → R+ that measures quality (or scores) of an alignment AR ⊆ V, then when also considering fertility constraints, we can treat the word alignment problem as maximizing a set function subject to matroid constraint: Problem 1. [sent-63, score-1.415]
44 maxA⊆V f(A) , subject to: A ∈ I, where Ithe set of independent sets of a matroid (or is it might be the set of independent sets simultaneously in two matroids, as we shall see later). [sent-64, score-0.575]
45 Independence in partition matroids generalizes the typical matching constraints for word alignment, where each word aligns to at most one word (kj = 1, ∀j) in the other sentence (Matusov et al. [sent-65, score-0.31]
46 Our matroid generalizations provide flexibility in modeling fertility, and also strategies for solving the word alignment problem efficiently and near-optimally. [sent-68, score-0.748]
47 In particular, when f is monotone submodular, near-optimal solutions for Problem 1can be efficiently guaranteed. [sent-69, score-0.196]
48 , 1978), a simple greedy algorithm for monotone submodular function maximization with a matroid constraint is shown to have a constant approximation factor. [sent-71, score-1.504]
49 Precisely, the greedy algorithm finds a solution A such that f(A) ≥ m+11f(A∗) where A∗ is the optimal solution afn(dA m ≥is number of matroid constraints. [sent-72, score-0.706]
50 When there is only one matroid constraint, we get an approxima- 21 tion factor . [sent-73, score-0.504]
51 Constant factor approximation algorithms are particularly attractive since the quality of the solution does not depend on the size of the problem, so even very large size problems do well. [sent-74, score-0.123]
52 It is also important to note that this is a worst case bound, and in most cases the quality of the solution obtained will be much better than this bound suggests. [sent-75, score-0.031]
53 Vondra´k (2008) shows a continuous greedy algorithm followed by pipage rounding with approximation factor 1 − 1/e (≈ 0. [sent-76, score-0.264]
54 63) for maximizing a monotone sub1m −od 1u/lear f≈un 0ct. [sent-77, score-0.261]
55 (2009) improve the approximation result in (Fisher et al. [sent-80, score-0.06]
56 , 1978) by showing a local-search algorithm has approximation guarantee of m+1? [sent-81, score-0.135]
57 for the problem of maximizing a monotone submodular function subject to m matroid constraints (m ≥ 2 and ? [sent-82, score-1.371]
58 In this paper, however, we use thme simple greedy algorithm for the sake of efficiency. [sent-84, score-0.172]
59 We outline our greedy algorithm for Problem 1in Algorithm 1, which is slightly different from the one in (Fisher et al. [sent-85, score-0.172]
60 , 1978) as in line 4 of Algorithm 1, we have an additional requirement on a such that the increment of adding a is strictly greater than zero. [sent-86, score-0.03]
61 This additional requirement is to maintain a higher precision word alignment solution. [sent-87, score-0.248]
62 The m+11- theoretical guarantee still holds as f is monotone i. [sent-88, score-0.194]
63 , Algorithm 1is a 21-approximation algorithm for Problem 1 (only one matroid constraint) when f is monotone submodular. [sent-90, score-0.685]
64 — Algorithm 1:A greedy algorithm for Problem 1. [sent-91, score-0.172]
65 In practice, the argmax in Algorithm 1can be efficient implemented with priority queue when f is submodular (Minoux, 1978), which brings the complexity down to O( |V | log |V |) oracle function calls. [sent-96, score-0.523]
66 4 Submodular Fertility We begin this section by demonstrating that submodularity arises naturally when modeling word fertility. [sent-97, score-0.204]
67 To do so, we borrow an example of fertility from (Melamed, 2000). [sent-98, score-0.225]
68 01, where s(ei, fj) represents the score of aligning ei and fj. [sent-102, score-0.197]
69 To find the correct alignment (e1, f1) and (e2, f2), the competitive linking algorithm in (Melamed, 2000) poses a one-to-one assumption to prevent choosing (e1, f2) over (e2, f2) . [sent-103, score-0.296]
70 The one-to-one assumption, however, limits the algorithm’s capability of handling models with fertility larger than one. [sent-104, score-0.225]
71 The scores estimated from the data for aligning word pairs (the, le), (the, de) and (of, de) are 0. [sent-107, score-0.121]
72 44, regardless the fact that the is already aligned with le. [sent-115, score-0.028]
73 Now if we use a submodular function to model the score of aligning an English word to a set of French words, we might obtain the correct alignments (the, le) and (of, de) by incorporating the diminishing returns property (i. [sent-116, score-0.823]
74 Formally, for each iin E, we define a mapping δi : 2V → 2F with δi(A) = {j ∈ F|(i, j) ∈ A}, (1) i. [sent-121, score-0.045]
75 , δi(A) is the set of positions in F that are aligned with position iin alignment A. [sent-123, score-0.342]
76 We use function fi : 2F → R+ to represent the benefit of aligning position i∈ →E R to a set of positions in F. [sent-124, score-0.23]
77 Given score have, for S ⊆ F, si,j of aligning iand j,we could fi(S) =jX∈Ssi,jα, (2) where 0 < α ≤ 1, i. [sent-125, score-0.121]
wordName wordTfidf (topN-words)
[('matroid', 0.472), ('submodular', 0.465), ('pje', 0.236), ('fertility', 0.225), ('alignment', 0.218), ('matroids', 0.168), ('monotone', 0.166), ('bilmes', 0.152), ('pif', 0.135), ('kj', 0.128), ('greedy', 0.125), ('aligning', 0.121), ('submodularity', 0.119), ('maximizing', 0.095), ('narasimhan', 0.089), ('hansards', 0.082), ('diminishing', 0.082), ('ei', 0.076), ('fj', 0.074), ('satisfy', 0.073), ('schrijver', 0.067), ('ki', 0.067), ('melamed', 0.067), ('ground', 0.066), ('approximation', 0.06), ('krause', 0.059), ('nondecreasing', 0.059), ('ithe', 0.059), ('fisher', 0.058), ('matching', 0.058), ('function', 0.058), ('combinatorial', 0.055), ('matusov', 0.055), ('interactions', 0.054), ('electrical', 0.051), ('matchings', 0.051), ('positions', 0.051), ('arises', 0.051), ('algorithm', 0.047), ('le', 0.046), ('constraint', 0.045), ('iin', 0.045), ('constraints', 0.044), ('subject', 0.043), ('de', 0.041), ('partition', 0.04), ('functions', 0.037), ('returns', 0.037), ('impose', 0.035), ('french', 0.035), ('constant', 0.034), ('naturally', 0.034), ('wa', 0.034), ('grows', 0.033), ('factor', 0.032), ('maximization', 0.032), ('seattle', 0.032), ('linking', 0.031), ('cast', 0.031), ('rates', 0.031), ('alignments', 0.031), ('solution', 0.031), ('independent', 0.03), ('washington', 0.03), ('requirement', 0.03), ('independence', 0.03), ('bilme', 0.03), ('jegelka', 0.03), ('minoux', 0.03), ('washingt', 0.03), ('zabin', 0.03), ('desirability', 0.03), ('ownership', 0.03), ('pthat', 0.03), ('ior', 0.03), ('vre', 0.03), ('iith', 0.03), ('cdc', 0.03), ('lue', 0.03), ('anr', 0.03), ('nonempty', 0.03), ('maxa', 0.03), ('comme', 0.03), ('wfor', 0.03), ('polynomially', 0.03), ('nao', 0.03), ('nvd', 0.03), ('string', 0.03), ('efficiently', 0.03), ('ee', 0.029), ('property', 0.029), ('guarantee', 0.028), ('aligned', 0.028), ('problem', 0.028), ('optimization', 0.027), ('concave', 0.027), ('everywhere', 0.027), ('kolmogorov', 0.027), ('mathematically', 0.027), ('diminishes', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
Author: Hui Lin ; Jeff Bilmes
Abstract: We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches.
2 0.49651638 4 acl-2011-A Class of Submodular Functions for Document Summarization
Author: Hui Lin ; Jeff Bilmes
Abstract: We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.
3 0.1403815 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
4 0.13135934 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
Author: Coskun Mermer ; Murat Saraclar
Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r
5 0.10773331 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith
Abstract: We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs.
6 0.097744524 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
7 0.093742907 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
8 0.091146447 141 acl-2011-Gappy Phrasal Alignment By Agreement
9 0.088930525 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
10 0.08042907 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
11 0.075614028 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
12 0.069413744 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
13 0.06920746 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
14 0.062737726 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
15 0.058698226 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
16 0.055400066 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
17 0.052984271 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
18 0.047403913 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations
19 0.046323813 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
20 0.046234496 106 acl-2011-Dual Decomposition for Natural Language Processing
topicId topicWeight
[(0, 0.118), (1, -0.055), (2, 0.025), (3, 0.078), (4, 0.001), (5, -0.019), (6, -0.01), (7, 0.078), (8, -0.013), (9, 0.058), (10, 0.122), (11, 0.123), (12, -0.032), (13, 0.119), (14, -0.224), (15, 0.032), (16, 0.093), (17, -0.03), (18, -0.101), (19, 0.022), (20, -0.053), (21, -0.072), (22, -0.018), (23, 0.036), (24, -0.037), (25, -0.054), (26, 0.049), (27, -0.095), (28, 0.086), (29, -0.212), (30, 0.043), (31, -0.046), (32, 0.115), (33, -0.066), (34, 0.077), (35, 0.254), (36, -0.124), (37, 0.009), (38, 0.031), (39, -0.039), (40, -0.068), (41, -0.097), (42, 0.062), (43, -0.025), (44, 0.155), (45, 0.308), (46, -0.055), (47, 0.082), (48, -0.106), (49, -0.078)]
simIndex simValue paperId paperTitle
same-paper 1 0.91972202 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
Author: Hui Lin ; Jeff Bilmes
Abstract: We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches.
2 0.77385831 4 acl-2011-A Class of Submodular Functions for Document Summarization
Author: Hui Lin ; Jeff Bilmes
Abstract: We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.
3 0.41373894 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
4 0.40984511 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
Author: Kapil Thadani ; Kathleen McKeown
Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.
5 0.37884954 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
Author: John DeNero ; Klaus Macherey
Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.
6 0.37631604 141 acl-2011-Gappy Phrasal Alignment By Agreement
7 0.37300503 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
8 0.32530585 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
9 0.32412744 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
10 0.32202947 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
11 0.31310034 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
12 0.30616817 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
13 0.28756991 187 acl-2011-Jointly Learning to Extract and Compress
14 0.28306541 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
15 0.27725929 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
16 0.27285585 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
17 0.26310202 291 acl-2011-SystemT: A Declarative Information Extraction System
18 0.25989023 149 acl-2011-Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation
19 0.24018277 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations
20 0.22990213 303 acl-2011-Tier-based Strictly Local Constraints for Phonology
topicId topicWeight
[(5, 0.023), (17, 0.041), (25, 0.369), (26, 0.016), (37, 0.066), (39, 0.024), (41, 0.064), (50, 0.019), (55, 0.029), (59, 0.041), (62, 0.012), (72, 0.038), (91, 0.031), (96, 0.14)]
simIndex simValue paperId paperTitle
same-paper 1 0.75956345 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
Author: Hui Lin ; Jeff Bilmes
Abstract: We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches.
2 0.63480616 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
Author: Tara McIntosh ; Lars Yencken ; James R. Curran ; Timothy Baldwin
Abstract: State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories.
3 0.5349232 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers
Author: Anders Sgaard
Abstract: We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and stateof-the-art unsupervised dependency parsing, and results are comparable to more complex projection-based cross language adaptation algorithms.
4 0.51115656 4 acl-2011-A Class of Submodular Functions for Document Summarization
Author: Hui Lin ; Jeff Bilmes
Abstract: We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.
5 0.45012945 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld
Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
6 0.44913679 187 acl-2011-Jointly Learning to Extract and Compress
7 0.4488138 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
8 0.44702524 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
9 0.44693586 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
10 0.44684619 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts
11 0.44566602 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
12 0.44534653 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
13 0.44478515 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
14 0.4443976 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
15 0.44425598 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
16 0.44380721 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
17 0.44308376 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
18 0.44301131 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
19 0.44271654 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
20 0.44271171 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction