emnlp emnlp2011 emnlp2011-60 knowledge-graph by maker-knowledge-mining

60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation


Source: pdf

Author: Jason Riesa ; Ann Irvine ; Daniel Marcu

Abstract: unkown-abstract

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Proce Ed iningbsu orfg th ,e S 2c0o1tl1an Cdo,n UfeKr,en Jcuely on 27 E–m31p,ir 2ic0a1l1 M. [sent-3137, score-0.052]

2 ec th2o0d1s1 i Ans Nsoactuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgaugies ti 4c9s7–507, ? [sent-3139, score-0.073]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('np', 0.363), ('chpinap', 0.318), ('lcp', 0.318), ('lpc', 0.318), ('ossnforpeji', 0.318), ('tgrnaden', 0.318), ('innp', 0.274), ('epochs', 0.23), ('yi', 0.23), ('pp', 0.208), ('xi', 0.161), ('jj', 0.155), ('gy', 0.137), ('ayr', 0.115), ('nn', 0.106), ('lc', 0.102), ('rp', 0.089), ('cr', 0.089), ('tn', 0.067), ('ae', 0.065), ('rule', 0.03), ('nlp', 0.026), ('de', 0.026), ('weight', 0.024), ('iningbsu', 0.023), ('pgaugies', 0.02), ('proce', 0.02), ('cuaogmep', 0.02), ('fonrg', 0.02), ('jcuely', 0.02), ('nsoactuiartaioln', 0.02), ('orfg', 0.02), ('purtoatcieosnsainlg', 0.02), ('ans', 0.019), ('time', 0.018), ('ec', 0.018), ('extracted', 0.017), ('ed', 0.016), ('ti', 0.015), ('la', 0.014), ('feature', 0.011), ('training', 0.009), ('th', 0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation

Author: Jason Riesa ; Ann Irvine ; Daniel Marcu

Abstract: unkown-abstract

2 0.08407709 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

Author: Amit Dubey ; Frank Keller ; Patrick Sturt

Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.

3 0.083575629 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

Author: Juri Ganitkevitch ; Chris Callison-Burch ; Courtney Napoles ; Benjamin Van Durme

Abstract: Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization. We discuss how our model can be adapted to many text generation tasks by augmenting its feature set, development data, and parameter estimation routine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve results competitive with state-of-the-art compression systems.

4 0.081240505 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

Author: Sze-Meng Jojo Wong ; Mark Dras

Abstract: Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this paper explores the usefulness of syntactic features for native language identification. We take two types of parse substructure as features— horizontal slices of trees, and the more general feature schemas from discriminative parse reranking—and show that using this kind of syntactic feature results in an accuracy score in classification of seven native languages of around 80%, an error reduction of more than 30%.

5 0.076841339 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

Author: Zhifei Li ; Ziyuan Wang ; Jason Eisner ; Sanjeev Khudanpur ; Brian Roark

Abstract: Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation system. Intuitively, our method strives to ensure that probabilistic “round-trip” translation from a target- language sentence to the source-language and back will have low expected loss. Theoretically, this may be justified as (discriminatively) minimizing an imputed empirical risk. Empirically, we demonstrate that augmenting supervised training with unsupervised data improves translation performance over the supervised case for both IWSLT and NIST tasks.

6 0.071315795 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

7 0.070823282 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

8 0.053705264 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues

9 0.048029624 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing

10 0.032768127 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis

11 0.032725412 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

12 0.026234936 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

13 0.025040621 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees

14 0.023073185 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

15 0.021974429 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge

16 0.019661803 129 emnlp-2011-Structured Sparsity in Structured Prediction

17 0.019138465 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing

18 0.019017641 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French

19 0.017737215 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

20 0.017661743 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.056), (1, 0.023), (2, 0.01), (3, -0.013), (4, 0.031), (5, -0.045), (6, -0.088), (7, 0.056), (8, 0.066), (9, -0.041), (10, -0.045), (11, 0.034), (12, -0.014), (13, 0.06), (14, 0.023), (15, -0.161), (16, 0.023), (17, -0.036), (18, -0.047), (19, -0.038), (20, -0.148), (21, 0.112), (22, 0.141), (23, -0.123), (24, 0.095), (25, 0.188), (26, -0.122), (27, 0.065), (28, -0.077), (29, 0.304), (30, -0.091), (31, 0.084), (32, -0.244), (33, -0.191), (34, -0.128), (35, 0.09), (36, 0.007), (37, 0.009), (38, -0.063), (39, -0.161), (40, -0.099), (41, 0.095), (42, 0.033), (43, 0.11), (44, -0.123), (45, -0.034), (46, 0.1), (47, 0.079), (48, 0.11), (49, -0.193)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99604934 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation

Author: Jason Riesa ; Ann Irvine ; Daniel Marcu

Abstract: unkown-abstract

2 0.47831076 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

Author: Amit Dubey ; Frank Keller ; Patrick Sturt

Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.

3 0.41435409 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

Author: Sze-Meng Jojo Wong ; Mark Dras

Abstract: Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this paper explores the usefulness of syntactic features for native language identification. We take two types of parse substructure as features— horizontal slices of trees, and the more general feature schemas from discriminative parse reranking—and show that using this kind of syntactic feature results in an accuracy score in classification of seven native languages of around 80%, an error reduction of more than 30%.

4 0.3884764 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

Author: Zhifei Li ; Ziyuan Wang ; Jason Eisner ; Sanjeev Khudanpur ; Brian Roark

Abstract: Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation system. Intuitively, our method strives to ensure that probabilistic “round-trip” translation from a target- language sentence to the source-language and back will have low expected loss. Theoretically, this may be justified as (discriminatively) minimizing an imputed empirical risk. Empirically, we demonstrate that augmenting supervised training with unsupervised data improves translation performance over the supervised case for both IWSLT and NIST tasks.

5 0.37651026 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

Author: Federico Sangati ; Willem Zuidema

Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.

6 0.3114152 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues

7 0.23780212 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

8 0.20844041 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

9 0.16122253 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

10 0.14573818 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

11 0.13065869 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge

12 0.12438175 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing

13 0.1226286 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis

14 0.09504573 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

15 0.087994695 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation

16 0.082211941 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

17 0.082118377 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

18 0.078396618 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees

19 0.073693104 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models

20 0.071302183 70 emnlp-2011-Identifying Relations for Open Information Extraction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(66, 0.813)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9845612 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation

Author: Jason Riesa ; Ann Irvine ; Daniel Marcu

Abstract: unkown-abstract

2 0.7729786 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

Author: Tim Van de Cruys ; Thierry Poibeau ; Anna Korhonen

Abstract: This paper presents a novel method for the computation of word meaning in context. We make use of a factorization model in which words, together with their window-based context words and their dependency relations, are linked to latent dimensions. The factorization model allows us to determine which dimensions are important for a particular context, and adapt the dependency-based feature vector of the word accordingly. The evaluation on a lexical substitution task carried out for both English and French – indicates that our approach is able to reach better results than state-of-the-art methods in lexical substitution, while at the same time providing more accurate meaning representations. –

3 0.74908924 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky

Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.

4 0.61795741 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder

Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.

5 0.31175748 107 emnlp-2011-Probabilistic models of similarity in syntactic context

Author: Diarmuid O Seaghdha ; Anna Korhonen

Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.

6 0.29135951 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

7 0.27970481 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

8 0.26119298 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

9 0.24992087 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

10 0.24078867 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

11 0.23800713 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

12 0.23610133 138 emnlp-2011-Tuning as Ranking

13 0.22126187 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

14 0.22122955 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French

15 0.2200716 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization

16 0.21836078 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

17 0.20463642 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

18 0.20038936 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction

19 0.19597708 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

20 0.19351657 106 emnlp-2011-Predicting a Scientific Communitys Response to an Article