emnlp emnlp2011 emnlp2011-60 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
Reference: text
sentIndex sentText sentNum sentScore
1 Proce Ed iningbsu orfg th ,e S 2c0o1tl1an Cdo,n UfeKr,en Jcuely on 27 E–m31p,ir 2ic0a1l1 M. [sent-3137, score-0.052]
2 ec th2o0d1s1 i Ans Nsoactuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgaugies ti 4c9s7–507, ? [sent-3139, score-0.073]
wordName wordTfidf (topN-words)
[('np', 0.363), ('chpinap', 0.318), ('lcp', 0.318), ('lpc', 0.318), ('ossnforpeji', 0.318), ('tgrnaden', 0.318), ('innp', 0.274), ('epochs', 0.23), ('yi', 0.23), ('pp', 0.208), ('xi', 0.161), ('jj', 0.155), ('gy', 0.137), ('ayr', 0.115), ('nn', 0.106), ('lc', 0.102), ('rp', 0.089), ('cr', 0.089), ('tn', 0.067), ('ae', 0.065), ('rule', 0.03), ('nlp', 0.026), ('de', 0.026), ('weight', 0.024), ('iningbsu', 0.023), ('pgaugies', 0.02), ('proce', 0.02), ('cuaogmep', 0.02), ('fonrg', 0.02), ('jcuely', 0.02), ('nsoactuiartaioln', 0.02), ('orfg', 0.02), ('purtoatcieosnsainlg', 0.02), ('ans', 0.019), ('time', 0.018), ('ec', 0.018), ('extracted', 0.017), ('ed', 0.016), ('ti', 0.015), ('la', 0.014), ('feature', 0.011), ('training', 0.009), ('th', 0.009)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
2 0.08407709 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
3 0.083575629 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Author: Juri Ganitkevitch ; Chris Callison-Burch ; Courtney Napoles ; Benjamin Van Durme
Abstract: Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization. We discuss how our model can be adapted to many text generation tasks by augmenting its feature set, development data, and parameter estimation routine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve results competitive with state-of-the-art compression systems.
4 0.081240505 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
Author: Sze-Meng Jojo Wong ; Mark Dras
Abstract: Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this paper explores the usefulness of syntactic features for native language identification. We take two types of parse substructure as features— horizontal slices of trees, and the more general feature schemas from discriminative parse reranking—and show that using this kind of syntactic feature results in an accuracy score in classification of seven native languages of around 80%, an error reduction of more than 30%.
5 0.076841339 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
Author: Zhifei Li ; Ziyuan Wang ; Jason Eisner ; Sanjeev Khudanpur ; Brian Roark
Abstract: Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation system. Intuitively, our method strives to ensure that probabilistic “round-trip” translation from a target- language sentence to the source-language and back will have low expected loss. Theoretically, this may be justified as (discriminatively) minimizing an imputed empirical risk. Empirically, we demonstrate that augmenting supervised training with unsupervised data improves translation performance over the supervised case for both IWSLT and NIST tasks.
6 0.071315795 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
7 0.070823282 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
8 0.053705264 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
9 0.048029624 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
10 0.032768127 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis
11 0.032725412 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
12 0.026234936 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
13 0.025040621 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
14 0.023073185 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
15 0.021974429 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
16 0.019661803 129 emnlp-2011-Structured Sparsity in Structured Prediction
17 0.019138465 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
18 0.019017641 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French
19 0.017737215 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
20 0.017661743 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
topicId topicWeight
[(0, 0.056), (1, 0.023), (2, 0.01), (3, -0.013), (4, 0.031), (5, -0.045), (6, -0.088), (7, 0.056), (8, 0.066), (9, -0.041), (10, -0.045), (11, 0.034), (12, -0.014), (13, 0.06), (14, 0.023), (15, -0.161), (16, 0.023), (17, -0.036), (18, -0.047), (19, -0.038), (20, -0.148), (21, 0.112), (22, 0.141), (23, -0.123), (24, 0.095), (25, 0.188), (26, -0.122), (27, 0.065), (28, -0.077), (29, 0.304), (30, -0.091), (31, 0.084), (32, -0.244), (33, -0.191), (34, -0.128), (35, 0.09), (36, 0.007), (37, 0.009), (38, -0.063), (39, -0.161), (40, -0.099), (41, 0.095), (42, 0.033), (43, 0.11), (44, -0.123), (45, -0.034), (46, 0.1), (47, 0.079), (48, 0.11), (49, -0.193)]
simIndex simValue paperId paperTitle
same-paper 1 0.99604934 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
2 0.47831076 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
3 0.41435409 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
Author: Sze-Meng Jojo Wong ; Mark Dras
Abstract: Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this paper explores the usefulness of syntactic features for native language identification. We take two types of parse substructure as features— horizontal slices of trees, and the more general feature schemas from discriminative parse reranking—and show that using this kind of syntactic feature results in an accuracy score in classification of seven native languages of around 80%, an error reduction of more than 30%.
4 0.3884764 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
Author: Zhifei Li ; Ziyuan Wang ; Jason Eisner ; Sanjeev Khudanpur ; Brian Roark
Abstract: Discriminative training for machine translation has been well studied in the recent past. A limitation of the work to date is that it relies on the availability of high-quality in-domain bilingual text for supervised training. We present an unsupervised discriminative training framework to incorporate the usually plentiful target-language monolingual data by using a rough “reverse” translation system. Intuitively, our method strives to ensure that probabilistic “round-trip” translation from a target- language sentence to the source-language and back will have low expected loss. Theoretically, this may be justified as (discriminatively) minimizing an imputed empirical risk. Empirically, we demonstrate that augmenting supervised training with unsupervised data improves translation performance over the supervised case for both IWSLT and NIST tasks.
5 0.37651026 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
6 0.3114152 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
7 0.23780212 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
8 0.20844041 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
9 0.16122253 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
10 0.14573818 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
11 0.13065869 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
12 0.12438175 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
13 0.1226286 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis
14 0.09504573 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
15 0.087994695 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
16 0.082211941 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models
17 0.082118377 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
18 0.078396618 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
19 0.073693104 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
20 0.071302183 70 emnlp-2011-Identifying Relations for Open Information Extraction
topicId topicWeight
[(66, 0.813)]
simIndex simValue paperId paperTitle
same-paper 1 0.9845612 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
2 0.7729786 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
Author: Tim Van de Cruys ; Thierry Poibeau ; Anna Korhonen
Abstract: This paper presents a novel method for the computation of word meaning in context. We make use of a factorization model in which words, together with their window-based context words and their dependency relations, are linked to latent dimensions. The factorization model allows us to determine which dimensions are important for a particular context, and adapt the dependency-based feature vector of the word accordingly. The evaluation on a lexical substitution task carried out for both English and French – indicates that our approach is able to reach better results than state-of-the-art methods in lexical substitution, while at the same time providing more accurate meaning representations. –
3 0.74908924 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky
Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.
4 0.61795741 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
5 0.31175748 107 emnlp-2011-Probabilistic models of similarity in syntactic context
Author: Diarmuid O Seaghdha ; Anna Korhonen
Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.
6 0.29135951 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
7 0.27970481 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
8 0.26119298 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
9 0.24992087 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
10 0.24078867 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
11 0.23800713 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
12 0.23610133 138 emnlp-2011-Tuning as Ranking
13 0.22126187 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
14 0.22122955 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French
15 0.2200716 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
16 0.21836078 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
17 0.20463642 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
18 0.20038936 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
19 0.19597708 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
20 0.19351657 106 emnlp-2011-Predicting a Scientific Communitys Response to an Article