emnlp emnlp2011 knowledge-graph by maker-knowledge-mining
1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman
Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
2 emnlp-2011-A Cascaded Classification Approach to Semantic Head Recognition
Author: Lukas Michelbacher ; Alok Kothari ; Martin Forst ; Christina Lioma ; Hinrich Schutze
Abstract: Most NLP systems use tokenization as part of preprocessing. Generally, tokenizers are based on simple heuristics and do not recognize multi-word units (MWUs) like hot dog or black hole unless a precompiled list of MWUs is available. In this paper, we propose a new cascaded model for detecting MWUs of arbitrary length for tokenization, focusing on noun phrases in the physics domain. We adopt a classification approach because unlike other work on MWUs – tokenization requires a completely automatic approach. We achieve an accuracy of 68% for recognizing non-compositional MWUs and show that our MWU recognizer improves retrieval performance when used as part of an information retrieval system. – 1
3 emnlp-2011-A Correction Model for Word Alignments
Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu
Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.
4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church
Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.
6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
Author: Prodromos Malakasiotis ; Ion Androutsopoulos
Abstract: We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind.
7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling
Author: Vivek Srikumar ; Dan Roth
Abstract: This paper presents a model that extends semantic role labeling. Existing approaches independently analyze relations expressed by verb predicates or those expressed as nominalizations. However, sentences express relations via other linguistic phenomena as well. Furthermore, these phenomena interact with each other, thus restricting the structures they articulate. In this paper, we use this intuition to define a joint inference model that captures the inter-dependencies between verb semantic role labeling and relations expressed using prepositions. The scarcity of jointly labeled data presents a crucial technical challenge for learning a joint model. The key strength of our model is that we use existing structure predictors as black boxes. By enforcing consistency constraints between their predictions, we show improvements in the performance of both tasks without retraining the individual models.
8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
Author: Amit Dubey ; Frank Keller ; Patrick Sturt
Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.
Author: Chao Shen ; Tao Li
Abstract: In active dual supervision, not only informative examples but also features are selected for labeling to build a high quality classifier with low cost. However, how to measure the informativeness for both examples and feature on the same scale has not been well solved. In this paper, we propose a non-negative matrix factorization based approach to address this issue. We first extend the matrix factorization framework to explicitly model the corresponding relationships between feature classes and examples classes. Then by making use of the reconstruction error, we propose a unified scheme to determine which feature or example a classifier is most likely to benefit from having labeled. Empirical results demonstrate the effectiveness of our proposed methods.
Author: Wei Lu ; Hwee Tou Ng
Abstract: This paper describes a novel probabilistic approach for generating natural language sentences from their underlying semantics in the form of typed lambda calculus. The approach is built on top of a novel reduction-based weighted synchronous context free grammar formalism, which facilitates the transformation process from typed lambda calculus into natural language sentences. Sentences can then be generated based on such grammar rules with a log-linear model. To acquire such grammar rules automatically in an unsupervised manner, we also propose a novel approach with a generative model, which maps from sub-expressions of logical forms to word sequences in natural language sentences. Experiments on benchmark datasets for both English and Chinese generation tasks yield significant improvements over results obtained by two state-of-the-art machine translation models, in terms of both automatic metrics and human evaluation.
11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion
Author: Zhiyuan Liu ; Xinxiong Chen ; Maosong Sun
Abstract: It is popular for users in Web 2.0 era to freely annotate online resources with tags. To ease the annotation process, it has been great interest in automatic tag suggestion. We propose a method to suggest tags according to the text description of a resource. By considering both the description and tags of a given resource as summaries to the resource written in two languages, we adopt word alignment models in statistical machine translation to bridge their vocabulary gap. Based on the translation probabilities between the words in descriptions and the tags estimated on a large set of description-tags pairs, we build a word trigger method (WTM) to suggest tags according to the words in a resource description. Experiments on real world datasets show that WTM is effective and robust compared with other methods. Moreover, WTM is relatively simple and efficient, which is practical for Web applications.
12 emnlp-2011-A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents
Author: Yufan Guo ; Anna Korhonen ; Thierry Poibeau
Abstract: Documents Anna Korhonen Thierry Poibeau Computer Laboratory LaTTiCe, UMR8094 University of Cambridge, UK CNRS & ENS, France alk2 3 @ cam . ac .uk thierry .po ibeau @ ens . fr tific literature according to categories of information structure (or discourse, rhetorical, argumentative or Argumentative Zoning (AZ) analysis of the argumentative structure of a scientific paper has proved useful for a number of information access tasks. Current approaches to AZ rely on supervised machine learning (ML). – – Requiring large amounts of annotated data, these approaches are expensive to develop and port to different domains and tasks. A potential solution to this problem is to use weaklysupervised ML instead. We investigate the performance of four weakly-supervised classifiers on scientific abstract data annotated for multiple AZ classes. Our best classifier based on the combination of active learning and selftraining outperforms our best supervised classifier, yielding a high accuracy of 81% when using just 10% of the labeled data. This result suggests that weakly-supervised learning could be employed to improve the practical applicability and portability of AZ across different information access tasks.
13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
Author: Karthik Visweswariah ; Rajakrishnan Rajkumar ; Ankur Gandhe ; Ananthakrishnan Ramanathan ; Jiri Navratil
Abstract: Preordering of source side sentences has proved to be useful in improving statistical machine translation. Most work has used a parser in the source language along with rules to map the source language word order into the target language word order. The requirement to have a source language parser is a major drawback, which we seek to overcome in this paper. Instead of using a parser and then using rules to order the source side sentence we learn a model that can directly reorder source side sentences to match target word order using a small parallel corpus with highquality word alignments. Our model learns pairwise costs of a word immediately preced- ing another word. We use the Lin-Kernighan heuristic to find the best source reordering efficiently during training and testing and show that it suffices to provide good quality reordering. We show gains in translation performance based on our reordering model for translating from Hindi to English, Urdu to English (with a public dataset), and English to Hindi. For English to Hindi we show that our technique achieves better performance than a method that uses rules applied to the source side English parse.
Author: Bryan Rink ; Sanda Harabagiu
Abstract: This paper presents a generative model for the automatic discovery of relations between entities in electronic medical records. The model discovers relation instances and their types by determining which context tokens express the relation. Additionally, the valid semantic classes for each type of relation are determined. We show that the model produces clusters of relation trigger words which better correspond with manually annotated relations than several existing clustering techniques. The discovered relations reveal some of the implicit semantic structure present in patient records.
15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
Author: Jun Xie ; Haitao Mi ; Qun Liu
Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state- of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrasebased model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-theart translation models.
16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
17 emnlp-2011-Active Learning with Amazon Mechanical Turk
Author: Florian Laws ; Christian Scheible ; Hinrich Schutze
Abstract: Supervised classification needs large amounts of annotated training data that is expensive to create. Two approaches that reduce the cost of annotation are active learning and crowdsourcing. However, these two approaches have not been combined successfully to date. We evaluate the utility of active learning in crowdsourcing on two tasks, named entity recognition and sentiment detection, and show that active learning outperforms random selection of annotation examples in a noisy crowdsourcing scenario.
18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
Author: Xabier Saralegi ; Iker Manterola ; Inaki San Vicente
Abstract: An A-C bilingual dictionary can be inferred by merging A-B and B-C dictionaries using B as pivot. However, polysemous pivot words often produce wrong translation candidates. This paper analyzes two methods for pruning wrong candidates: one based on exploiting the structure of the source dictionaries, and the other based on distributional similarity computed from comparable corpora. As both methods depend exclusively on easily available resources, they are well suited to less resourced languages. We studied whether these two techniques complement each other given that they are based on different paradigms. We also researched combining them by looking for the best adequacy depending on various application scenarios. ,
19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP
Author: Amit Goyal ; Hal Daume III
Abstract: We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explicitly storing the word pair itself. These methods use hashing to deal with massive amounts of streaming text. We apply CountMin sketch to approximate word pair counts and exhibit their effectiveness on three important NLP tasks. Our experiments demonstrate that on all of the three tasks, we get performance comparable to Exact word pair counts setting and state-of-the-art system. Our method scales to 49 GB of unzipped web data using bounded space of 2 billion counters (8 GB memory).
20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
Author: Jiajun Zhang ; Feifei Zhai ; Chengqing Zong
Abstract: Due to its explicit modeling of the grammaticality of the output via target-side syntax, the string-to-tree model has been shown to be one of the most successful syntax-based translation models. However, a major limitation of this model is that it does not utilize any useful syntactic information on the source side. In this paper, we analyze the difficulties of incorporating source syntax in a string-totree model. We then propose a new way to use the source syntax in a fuzzy manner, both in source syntactic annotation and in rule matching. We further explore three algorithms in rule matching: 0-1 matching, likelihood matching, and deep similarity matching. Our method not only guarantees grammatical output with an explicit target tree, but also enables the system to choose the proper translation rules via fuzzy use of the source syntax. Our extensive experiments have shown significant improvements over the state-of-the-art string-to-tree system. 1
21 emnlp-2011-Bayesian Checking for Topic Models
22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction
24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
26 emnlp-2011-Class Label Enhancement via Related Instances
27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts
29 emnlp-2011-Collaborative Ranking: A Case Study on Entity Linking
30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis
31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
32 emnlp-2011-Computing Logical Form on Regulatory Texts
34 emnlp-2011-Corpus-Guided Sentence Generation of Natural Images
35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
36 emnlp-2011-Corroborating Text Evaluation Results with Heterogeneous Measures
37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
38 emnlp-2011-Data-Driven Response Generation in Social Media
40 emnlp-2011-Discovering Relations between Noun Categories
41 emnlp-2011-Discriminating Gender on Twitter
44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
45 emnlp-2011-Dual Decomposition with Many Overlapping Components
46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data
49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
57 emnlp-2011-Extreme Extraction - Machine Reading in a Week
58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model
62 emnlp-2011-Generating Subsequent Reference in Shared Visual Scenes: Computation vs Re-Use
63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification
65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
66 emnlp-2011-Hierarchical Phrase-based Translation Representations
67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
70 emnlp-2011-Identifying Relations for Open Information Extraction
71 emnlp-2011-Identifying and Following Expert Investors in Stock Microblogs
72 emnlp-2011-Improved Transliteration Mining Using Graph Reinforcement
73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
74 emnlp-2011-Inducing Sentence Structure from Parallel Corpora for Reordering
75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts
77 emnlp-2011-Large-Scale Cognate Recovery
78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms
82 emnlp-2011-Learning Local Content Shift Detectors from Document-level Information
84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association
87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
89 emnlp-2011-Linguistic Redundancy in Twitter
90 emnlp-2011-Linking Entities to a Knowledge Base with Query Expansion
91 emnlp-2011-Literal and Metaphorical Sense Identification through Concrete and Abstract Context
92 emnlp-2011-Minimally Supervised Event Causality Identification
93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
94 emnlp-2011-Modelling Discourse Relations for Arabic
95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
96 emnlp-2011-Multilayer Sequence Labeling
98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
100 emnlp-2011-Optimal Search for Minimum Error Rate Training
101 emnlp-2011-Optimizing Semantic Coherence in Topic Models
102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models
105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
106 emnlp-2011-Predicting a Scientific Communitys Response to an Article
107 emnlp-2011-Probabilistic models of similarity in syntactic context
108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base
110 emnlp-2011-Ranking Human and Machine Summarization Systems
111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures
113 emnlp-2011-Relation Acquisition using Word Classes and Partial Patterns
114 emnlp-2011-Relation Extraction with Relation Topics
115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
116 emnlp-2011-Robust Disambiguation of Named Entities in Text
117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs
118 emnlp-2011-SMT Helps Bitext Dependency Parsing
120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
121 emnlp-2011-Semi-supervised CCG Lexicon Extension
122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
125 emnlp-2011-Statistical Machine Translation with Local Language Models
126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
128 emnlp-2011-Structured Relation Discovery using Generative Models
129 emnlp-2011-Structured Sparsity in Structured Prediction
131 emnlp-2011-Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
135 emnlp-2011-Timeline Generation through Evolutionary Trans-Temporal Summarization
136 emnlp-2011-Training a Parser for Machine Translation Reordering
137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
138 emnlp-2011-Tuning as Ranking
139 emnlp-2011-Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter
140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance