acl acl2011 acl2011-325 knowledge-graph by maker-knowledge-mining

325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

Source: pdf

Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith

Abstract: We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Smith Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213, USA j hclark alavie nasmith} @ cs . [sent-2, score-0.096]

2 edu , , Abstract We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. [sent-4, score-0.581]

3 In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. [sent-6, score-0.865]

4 However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. [sent-7, score-0.85]

5 Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs. [sent-8, score-0.734]

6 1 Introduction Word alignment is an important subtask in statistical machine translation which is typically solved in one of two ways. [sent-9, score-0.452]

7 The more common approach uses a generative translation model that relates bilingual string pairs using a latent alignment variable to designate which source words (or phrases) generate which target words. [sent-10, score-0.987]

8 The parameters in these models can be learned straightforwardly from parallel sentences using EM, and standard inference techniques can recover most probable alignments (Brown et al. [sent-11, score-0.489]

9 This approach is attractive because it only requires parallel training data. [sent-13, score-0.229]

10 An alternative to the generative approach uses a discriminatively trained 409 alignment model to predict word alignments in the parallel corpus. [sent-14, score-0.98]

11 Discriminative models are attractive because they can incorporate arbitrary, overlapping features, meaning that errors observed in the predictions made by the model can be addressed by engineering new and better features. [sent-15, score-0.488]

12 In the case of discriminative alignment mod- els, manual alignment data is required for training, which is problematic for at least three reasons. [sent-17, score-0.761]

13 Manual alignments are notoriously difficult to create and are available only for a handful of language pairs. [sent-18, score-0.35]

14 Third, the “correct” alignment annotation for different tasks may vary: for example, relatively denser or sparser alignments may be optimal for different approaches to (downstream) translation model induction (Lopez, 2008; Fraser, 2007). [sent-20, score-0.797]

15 Generative models have a different limitation: the joint probability of a particular setting of the random variables must factorize according to steps in a process that successively “generates” the values of the variables. [sent-21, score-0.331]

16 At each step, the probability of some value being generated may depend only on the generation history (or a subset thereof), and the possible values a variable will take must form a locally nor- malized conditional probability distribution (CPD). [sent-22, score-0.453]

17 While these locally normalized CPDs may be paProceedinPgosrt olafn thde, 4 O9rtehg Aonn,n Juuanle M 1e9e-2tin4g, 2 o0f1 t1h. [sent-23, score-0.201]

18 Ac s2s0o1ci1a Atiosnso fcoirat Cioonm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 409–419, rameterized so as to make use of multiple, overlapping features (Berg-Kirkpatrick et al. [sent-25, score-0.332]

19 , 2010), the requirement that models factorize according to a particular generative process imposes a considerable restriction on the kinds of features that can be incorporated. [sent-26, score-0.579]

20 1 In this paper, we introduce a discriminatively trained, globally normalized log-linear model of lexical translation that can incorporate arbitrary, overlapping features, and use it to infer word alignments. [sent-29, score-0.941]

21 Our model enjoys the usual benefits of discriminative modeling (e. [sent-30, score-0.154]

22 , parameter regularization, wellunderstood learning algorithms), but is trained entirely from parallel sentences without gold-standard × word alignments. [sent-32, score-0.114]

23 Thus, it addresses the two limitations of current word alignment approaches. [sent-33, score-0.228]

24 We begin by introducing our model (§2), and follow this with a dinistrcoudsusicoinng go fo tractability, parameter estimation, tahn da inference using finite-state techniques (§3). [sent-35, score-0.186]

25 We then idnefsecrreinbece et uhes specific -fsetaattuer etesc we quuseeds (§4) a Wnde provdiedsec experimental iecva fleuaattuiroens wofe eth ues model, showing substantial improvements in three diverse language pairs (§5). [sent-36, score-0.083]

26 2 Model In this section, we develop a conditional model p(t | s) that, given a source language sentence s with length m = |s| , assigns probabilities steon a target sentleenncgeth ht mwi =th length n, sw phroerbea ebailcihti wso tord a tj rigs an eenl-ement in the finite target vocabulary Ω. [sent-39, score-0.869]

27 We begin by using the chain rule to factor this probability into two components, a translation model and a length model. [sent-40, score-0.435]

28 p(t | s) = p(t, n | s) = p(t | s, n) p|(t|{ z s,n}) tra|nslati{ozn mo}del p(n | s) |p(n {z | s }) le|ngth { zmod }el 1Moore (2005) likewise uses| this{ ezxam}ple to |mo {tivzate } the need for models that support arbitrary, overlapping features. [sent-41, score-0.24]

29 410 In the translation model, we then assume that each word tj is a translation of one source word, or a special null token. [sent-42, score-0.576]

30 We therefore introduce a latent alignment variable a = ha1, a2 , . [sent-43, score-0.369]

31 , ani ∈ [0, m]n, walihgenrme aj = a0r represents a special null tiok ∈en [. [sent-46, score-0.195]

32 0 p(t | s,n) =Xp(t,a | s,n) Xa So far, our model is identical to that of (Brown et al. [sent-47, score-0.06]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('alignment', 0.228), ('alignments', 0.201), ('generative', 0.193), ('arbitrary', 0.192), ('overlapping', 0.185), ('discriminatively', 0.184), ('translation', 0.163), ('factorize', 0.146), ('rk', 0.141), ('problematic', 0.137), ('brown', 0.121), ('mo', 0.12), ('globally', 0.12), ('attractive', 0.115), ('tj', 0.115), ('parallel', 0.114), ('locally', 0.104), ('normalized', 0.097), ('tord', 0.096), ('hclark', 0.096), ('independencies', 0.096), ('nasmith', 0.096), ('abandoned', 0.096), ('rameterized', 0.096), ('discriminative', 0.094), ('limitation', 0.09), ('cpd', 0.089), ('xexp', 0.089), ('malized', 0.089), ('sadat', 0.089), ('chain', 0.084), ('ues', 0.083), ('regime', 0.083), ('commitment', 0.083), ('tractability', 0.083), ('notoriously', 0.083), ('imposes', 0.083), ('variable', 0.082), ('finite', 0.082), ('assumptions', 0.081), ('denser', 0.079), ('wso', 0.079), ('wanted', 0.079), ('null', 0.077), ('designate', 0.076), ('successively', 0.076), ('begin', 0.074), ('manual', 0.074), ('incorporate', 0.073), ('fertility', 0.073), ('opportunities', 0.073), ('target', 0.071), ('ple', 0.07), ('els', 0.07), ('overcoming', 0.07), ('conditional', 0.07), ('lopez', 0.068), ('tra', 0.068), ('thereof', 0.068), ('extrinsic', 0.066), ('sparser', 0.066), ('handful', 0.066), ('straightforwardly', 0.066), ('habash', 0.064), ('downstream', 0.064), ('xa', 0.061), ('permits', 0.061), ('subtask', 0.061), ('model', 0.06), ('ht', 0.06), ('ani', 0.06), ('motivate', 0.06), ('xp', 0.06), ('introduce', 0.059), ('aj', 0.058), ('source', 0.058), ('impose', 0.057), ('fraser', 0.057), ('relates', 0.056), ('vocabulary', 0.056), ('decompose', 0.055), ('models', 0.055), ('targets', 0.054), ('probability', 0.054), ('unannotated', 0.053), ('parameters', 0.053), ('alon', 0.052), ('da', 0.052), ('components', 0.052), ('lavie', 0.051), ('restriction', 0.051), ('sw', 0.051), ('features', 0.051), ('freely', 0.051), ('intrinsic', 0.051), ('requirements', 0.051), ('pittsburgh', 0.051), ('del', 0.051), ('mellon', 0.051), ('lti', 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith

2 0.21590884 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

Author: Coskun Mermer ; Murat Saraclar

Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r

3 0.2124085 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

4 0.17414029 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

5 0.16524047 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

6 0.1343466 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

7 0.12833628 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

8 0.12288645 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

9 0.12006404 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

10 0.11812809 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

11 0.10773331 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

12 0.10559777 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

13 0.10530087 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

14 0.10490607 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

15 0.10225648 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

16 0.10127968 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

17 0.10025707 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

18 0.098019138 44 acl-2011-An exponential translation model for target language morphology

19 0.096943192 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

20 0.096399635 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.241), (1, -0.141), (2, 0.1), (3, 0.103), (4, 0.051), (5, 0.015), (6, 0.028), (7, 0.053), (8, -0.017), (9, 0.113), (10, 0.148), (11, 0.124), (12, 0.033), (13, 0.095), (14, -0.114), (15, 0.054), (16, 0.04), (17, -0.014), (18, -0.076), (19, 0.013), (20, -0.045), (21, -0.017), (22, -0.055), (23, 0.016), (24, -0.007), (25, 0.011), (26, -0.025), (27, 0.009), (28, -0.014), (29, -0.043), (30, 0.028), (31, -0.015), (32, -0.065), (33, 0.002), (34, 0.004), (35, 0.04), (36, 0.045), (37, -0.011), (38, 0.035), (39, 0.034), (40, -0.073), (41, 0.016), (42, 0.102), (43, 0.019), (44, -0.074), (45, -0.009), (46, 0.001), (47, -0.022), (48, 0.066), (49, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95528656 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith

2 0.91850549 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

3 0.91165996 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

4 0.8770653 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

Author: Coskun Mermer ; Murat Saraclar

5 0.86614317 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

6 0.80907601 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

7 0.79102194 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

8 0.77658069 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

9 0.74684739 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

10 0.74126172 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

11 0.72540128 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

12 0.71646315 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

13 0.66905785 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

14 0.63147283 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

15 0.60411644 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

16 0.56783068 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

17 0.55998832 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations

18 0.55077296 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

19 0.5382477 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

20 0.53147572 44 acl-2011-An exponential translation model for target language morphology

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.024), (17, 0.068), (26, 0.018), (37, 0.106), (39, 0.05), (41, 0.073), (55, 0.04), (59, 0.073), (62, 0.25), (72, 0.039), (91, 0.029), (96, 0.171)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94122219 74 acl-2011-Combining Indicators of Allophony

Author: Luc Boruta

Abstract: Allophonic rules are responsible for the great variety in phoneme realizations. Infants can not reliably infer abstract word representations without knowledge of their native allophonic grammar. We explore the hypothesis that some properties of infants’ input, referred to as indicators, are correlated with allophony. First, we provide an extensive evaluation of individual indicators that rely on distributional or lexical information. Then, we present a first evaluation of the combination of indicators of different types, considering both logical and numerical combinations schemes. Though distributional and lexical indicators are not redundant, straightforward combinations do not outperform individual indicators.

2 0.8915447 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis

Author: Daniel Bar ; Nicolai Erbs ; Torsten Zesch ; Iryna Gurevych

Abstract: We present Wikulu1, a system focusing on supporting wiki users with their everyday tasks by means of an intelligent interface. Wikulu is implemented as an extensible architecture which transparently integrates natural language processing (NLP) techniques with wikis. It is designed to be deployed with any wiki platform, and the current prototype integrates a wide range of NLP algorithms such as keyphrase extraction, link discovery, text segmentation, summarization, or text similarity. Additionally, we show how Wikulu can be applied for visually analyzing the results of NLP algorithms, educational purposes, and enabling semantic wikis.

3 0.83049619 217 acl-2011-Machine Translation System Combination by Confusion Forest

Author: Taro Watanabe ; Eiichiro Sumita

Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.

same-paper 4 0.82053566 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith

5 0.78476083 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua

Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.

6 0.7000528 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

7 0.69887125 121 acl-2011-Event Discovery in Social Media Feeds

8 0.69489729 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

9 0.694592 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

10 0.69326425 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

11 0.69066197 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

12 0.68569589 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

13 0.68452245 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

14 0.68420833 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

15 0.68414479 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

16 0.6834867 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

17 0.68225384 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

18 0.68137115 311 acl-2011-Translationese and Its Dialects

19 0.68124735 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

20 0.68044853 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment