emnlp emnlp2010 emnlp2010-59 knowledge-graph by maker-knowledge-mining

59 emnlp-2010-Identifying Functional Relations in Web Text


Source: pdf

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Identifying Functional Relations in Web Text Thomas Lin, Mausam, Oren Etzioni Turing Center University of Washington Seattle, WA 98195, USA {t l maus am, et z ioni} @ cs . [sent-1, score-0.026]

2 edu in , Abstract Determining whether a textual phrase denotes a functional relation (i. [sent-3, score-0.742]

3 , a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. [sent-5, score-0.72]

4 Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. [sent-6, score-0.079]

5 However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. [sent-7, score-0.528]

6 We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. [sent-8, score-0.079]

7 It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. [sent-9, score-0.855]

8 It then aggregates these scores to predict the global functionality for the phrase. [sent-10, score-0.579]

9 LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0. [sent-11, score-0.124]

10 We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations. [sent-14, score-0.572]

11 1 Introduction The paradigm of Open Information Extraction (IE) (Banko et al. [sent-15, score-0.034]

12 , 2007; Banko and Etzioni, 2008) has scaled extraction technology to the massive set of relations expressed in Web text. [sent-16, score-0.306]

13 However, additional work is needed to better understand these relations, 1266 and to place them in richer semantic structures. [sent-17, score-0.095]

14 A step in that direction is identifying the properties of these relations, e. [sent-18, score-0.027]

15 , symmetry, transitivity and our focus in this paper functionality. [sent-20, score-0.053]

16 A binary relation is functional if, for a given arg1, there is exactly one unique value for arg2. [sent-22, score-0.647]

17 Examples of functional relations are father, death date, birth city, etc. [sent-23, score-0.83]

18 We define a relation phrase to be – functional if all semantic relations commonly expressed by that phrase are functional. [sent-24, score-0.994]

19 For example, we say that the phrase ‘was born in ’ denotes a functional relation, because the different semantic relations expressed by the phrase (e. [sent-25, score-1.09]

20 Knowing that a relation is functional is helpful for numerous NLP inference tasks. [sent-29, score-0.688]

21 Previous work has used functionality for the tasks of contradiction detection (Ritter et al. [sent-30, score-0.738]

22 , 2008), quantifier scope disambiguation (Srinivasan and Yates, 2009), and synonym resolution (Yates and Etzioni, 2009). [sent-31, score-0.178]

23 It could also aid in other tasks such as ontology generation and information extraction. [sent-32, score-0.088]

24 For example, consider two sentences from a contradiction detection task: (1) “George Washington was born in Virginia. [sent-33, score-0.474]

25 (2008) points out, we can only determine that the two sentences are contradictory if we know that the semantic relation referred to by the phrase ‘was born in ’ is functional, and that both Virginia and Texas are distinct states. [sent-36, score-0.662]

26 Automatic functionality identification is essential when dealing with a large number of relations as in Open IE, or in complex domains where expert help Proce MdiInTg,s M oaf sthseac 2h0u1s0et Ctso, UnfeSrAe,nc 9e-1 o1n O Ecmtopbireirca 2l0 M10e. [sent-37, score-0.852]

27 c od2s01 in0 N Aastsuorcaialt Lioan g foura Cgeom Prpoucteastisoin ga,l p Laignegsui 1s2ti6c6s–1276, base to determine functionality of Web relations. [sent-39, score-0.547]

28 This paper tackles automatic functionality identification using Web text. [sent-43, score-0.66]

29 While functionality identification has been utilized as a module in various NLP systems, this is the first paper to focus exclusively on functionality identification as a bona fide NLP inference task. [sent-44, score-1.327]

30 It is natural to identify functions based on triples extracted from text instead of analyzing sentences directly. [sent-45, score-0.108]

31 Thus, as our input, we utilize tuples ex- tracted by TEXTRUNNER (Banko and Etzioni, 2008) when run over a corpus of 500 million webpages. [sent-46, score-0.168]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('functionality', 0.521), ('functional', 0.426), ('born', 0.257), ('leibniz', 0.225), ('birth', 0.193), ('relation', 0.184), ('contradiction', 0.174), ('banko', 0.15), ('relations', 0.147), ('etzioni', 0.145), ('textrunner', 0.128), ('washington', 0.122), ('ritter', 0.116), ('yates', 0.094), ('synonym', 0.089), ('identification', 0.086), ('tuples', 0.082), ('numerous', 0.078), ('web', 0.069), ('virginia', 0.064), ('contradictory', 0.064), ('death', 0.064), ('ioni', 0.064), ('precisionrecall', 0.064), ('srinivasan', 0.064), ('synonymy', 0.064), ('determining', 0.064), ('city', 0.062), ('phrase', 0.061), ('ie', 0.06), ('analyzing', 0.058), ('george', 0.058), ('turing', 0.058), ('father', 0.058), ('aggregates', 0.058), ('resolution', 0.055), ('utilize', 0.055), ('transitivity', 0.053), ('ontology', 0.053), ('tackles', 0.053), ('overcomes', 0.053), ('repository', 0.053), ('scarce', 0.053), ('element', 0.052), ('maps', 0.051), ('triples', 0.05), ('nlp', 0.049), ('expressed', 0.049), ('denotes', 0.047), ('relied', 0.047), ('scaled', 0.047), ('module', 0.045), ('texas', 0.045), ('year', 0.045), ('seattle', 0.045), ('detection', 0.043), ('anaphora', 0.043), ('wa', 0.043), ('oren', 0.043), ('semantic', 0.042), ('knowing', 0.039), ('massive', 0.039), ('public', 0.038), ('biomedical', 0.038), ('mentions', 0.038), ('open', 0.038), ('unique', 0.037), ('date', 0.036), ('utilized', 0.036), ('expert', 0.035), ('aid', 0.035), ('curve', 0.035), ('paradigm', 0.034), ('computes', 0.034), ('scope', 0.034), ('essential', 0.033), ('exclusively', 0.032), ('counting', 0.032), ('million', 0.031), ('dealing', 0.03), ('arguments', 0.029), ('expensive', 0.029), ('proce', 0.029), ('lioan', 0.028), ('prpoucteastisoin', 0.028), ('thomas', 0.028), ('referred', 0.028), ('richer', 0.028), ('identifying', 0.027), ('determine', 0.026), ('challenges', 0.026), ('constrained', 0.026), ('cs', 0.026), ('area', 0.025), ('center', 0.025), ('place', 0.025), ('textual', 0.024), ('extraction', 0.024), ('commonly', 0.024), ('aastsuorcaialt', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 59 emnlp-2010-Identifying Functional Relations in Web Text

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

2 0.097879976 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic outof-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline.

3 0.078988738 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

Author: Hugo Hernault ; Danushka Bollegala ; Mitsuru Ishizuka

Abstract: Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of cooccurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Treebank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that the proposed method is a first step towards detecting low-occurrence relations, which is useful for domains with a lack of annotated data.

4 0.070166051 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

Author: Quang Do ; Dan Roth

Abstract: Determining whether two terms in text have an ancestor relation (e.g. Toyota and car) or a sibling relation (e.g. Toyota and Honda) is an essential component of textual inference in NLP applications such as Question Answering, Summarization, and Recognizing Textual Entailment. Significant work has been done on developing stationary knowledge sources that could potentially support these tasks, but these resources often suffer from low coverage, noise, and are inflexible when needed to support terms that are not identical to those placed in them, making their use as general purpose background knowledge resources difficult. In this paper, rather than building a stationary hierarchical structure of terms and relations, we describe a system that, given two terms, determines the taxonomic relation between them using a machine learning-based approach that makes use of existing resources. Moreover, we develop a global constraint opti- mization inference process and use it to leverage an existing knowledge base also to enforce relational constraints among terms and thus improve the classifier predictions. Our experimental evaluation shows that our approach significantly outperforms other systems built upon existing well-known knowledge sources.

5 0.060519055 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

Author: Longhua Qian ; Guodong Zhou

Abstract: Seed sampling is critical in semi-supervised learning. This paper proposes a clusteringbased stratified seed sampling approach to semi-supervised learning. First, various clustering algorithms are explored to partition the unlabeled instances into different strata with each stratum represented by a center. Then, diversity-motivated intra-stratum sampling is adopted to choose the center and additional instances from each stratum to form the unlabeled seed set for an oracle to annotate. Finally, the labeled seed set is fed into a bootstrapping procedure as the initial labeled data. We systematically evaluate our stratified bootstrapping approach in the semantic relation classification subtask of the ACE RDC (Relation Detection and Classification) task. In particular, we compare various clustering algorithms on the stratified bootstrapping performance. Experimental results on the ACE RDC 2004 corpus show that our clusteringbased stratified bootstrapping approach achieves the best F1-score of 75.9 on the subtask of semantic relation classification, approaching the one with golden clustering.

6 0.059913978 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

7 0.057216603 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

8 0.050549563 20 emnlp-2010-Automatic Detection and Classification of Social Events

9 0.048932761 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding

10 0.045212865 12 emnlp-2010-A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web

11 0.036807686 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

12 0.036470681 62 emnlp-2010-Improving Mention Detection Robustness to Noisy Input

13 0.035232991 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

14 0.035113852 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

15 0.034378041 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

16 0.032661103 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

17 0.028860511 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

18 0.024402166 66 emnlp-2010-Inducing Word Senses to Improve Web Search Result Clustering

19 0.023430143 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

20 0.022276856 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.087), (1, 0.059), (2, -0.014), (3, 0.177), (4, 0.044), (5, -0.122), (6, 0.004), (7, 0.07), (8, 0.059), (9, -0.035), (10, -0.044), (11, -0.113), (12, -0.009), (13, -0.118), (14, -0.0), (15, 0.036), (16, 0.025), (17, 0.038), (18, -0.026), (19, 0.044), (20, 0.122), (21, -0.053), (22, 0.101), (23, 0.132), (24, -0.093), (25, -0.094), (26, -0.128), (27, 0.06), (28, 0.013), (29, 0.17), (30, 0.168), (31, 0.068), (32, 0.257), (33, 0.025), (34, -0.13), (35, -0.049), (36, 0.012), (37, -0.112), (38, -0.044), (39, -0.052), (40, -0.092), (41, -0.059), (42, -0.165), (43, 0.042), (44, -0.118), (45, 0.053), (46, 0.069), (47, 0.045), (48, -0.019), (49, -0.135)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97868353 59 emnlp-2010-Identifying Functional Relations in Web Text

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

2 0.57556498 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic outof-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline.

3 0.45082217 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

Author: Hugo Hernault ; Danushka Bollegala ; Mitsuru Ishizuka

Abstract: Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of cooccurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Treebank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that the proposed method is a first step towards detecting low-occurrence relations, which is useful for domains with a lack of annotated data.

4 0.44795567 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a method for the automatic discovery of MANNER relations from text. An extended definition of MANNER is proposed, including restrictions on the sorts of concepts that can be part of its domain and range. The connections with other relations and the lexico-syntactic patterns that encode MANNER are analyzed. A new feature set specialized on MANNER detection is depicted and justified. Experimental results show improvement over previous attempts to extract MANNER. Combinations of MANNER with other semantic relations are also discussed.

5 0.43525875 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding

Author: Ching-Yun Chang ; Stephen Clark

Abstract: Linguistic Steganography is concerned with hiding information in natural language text. One of the major transformations used in Linguistic Steganography is synonym substitution. However, few existing studies have studied the practical application of this approach. In this paper we propose two improvements to the use of synonym substitution for encoding hidden bits of information. First, we use the Web 1T Google n-gram corpus for checking the applicability of a synonym in context, and we evaluate this method using data from the SemEval lexical substitution task. Second, we address the problem that arises from words with more than one sense, which creates a potential ambiguity in terms of which bits are encoded by a particular word. We develop a novel method in which words are the vertices in a graph, synonyms are linked by edges, and the bits assigned to a word are determined by a vertex colouring algorithm. This method ensures that each word encodes a unique sequence of bits, without cutting out large number of synonyms, and thus maintaining a reasonable embedding capacity.

6 0.34327224 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

7 0.31144997 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

8 0.30630875 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

9 0.2664611 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

10 0.26335704 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

11 0.19339935 16 emnlp-2010-An Approach of Generating Personalized Views from Normalized Electronic Dictionaries : A Practical Experiment on Arabic Language

12 0.17885743 20 emnlp-2010-Automatic Detection and Classification of Social Events

13 0.17614052 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

14 0.1649093 123 emnlp-2010-Word-Based Dialect Identification with Georeferenced Rules

15 0.15310962 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

16 0.13628373 108 emnlp-2010-Training Continuous Space Language Models: Some Practical Issues

17 0.13611379 12 emnlp-2010-A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web

18 0.12829635 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

19 0.12232924 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

20 0.11891396 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.025), (10, 0.674), (12, 0.021), (29, 0.017), (56, 0.048), (62, 0.014), (66, 0.051), (72, 0.018), (76, 0.017), (89, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95086408 59 emnlp-2010-Identifying Functional Relations in Web Text

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

2 0.7464875 74 emnlp-2010-Learning the Relative Usefulness of Questions in Community QA

Author: Razvan Bunescu ; Yunfeng Huang

Abstract: We present a machine learning approach for the task of ranking previously answered questions in a question repository with respect to their relevance to a new, unanswered reference question. The ranking model is trained on a collection of question groups manually annotated with a partial order relation reflecting the relative utility of questions inside each group. Based on a set of meaning and structure aware features, the new ranking model is able to substantially outperform more straightforward, unsupervised similarity measures.

3 0.58682591 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Author: Minh-Thang Luong ; Preslav Nakov ; Min-Yen Kan

Abstract: We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.

4 0.32005158 51 emnlp-2010-Function-Based Question Classification for General QA

Author: Fan Bu ; Xingwei Zhu ; Yu Hao ; Xiaoyan Zhu

Abstract: In contrast with the booming increase of internet data, state-of-art QA (question answering) systems, otherwise, concerned data from specific domains or resources such as search engine snippets, online forums and Wikipedia in a somewhat isolated way. Users may welcome a more general QA system for its capability to answer questions of various sources, integrated from existed specialized sub-QA engines. In this framework, question classification is the primary task. However, the current paradigms of question classification were focused on some specified type of questions, i.e. factoid questions, which are inappropriate for the general QA. In this paper, we propose a new question classification paradigm, which includes a question taxonomy suitable to the general QA and a question classifier based on MLN (Markov logic network), where rule-based methods and statistical methods are unified into a single framework in a fuzzy discriminative learning approach. Experiments show that our method outperforms traditional question classification approaches.

5 0.20131251 12 emnlp-2010-A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web

Author: Zornitsa Kozareva ; Eduard Hovy

Abstract: Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root concept, a basic level concept, and recursive surface patterns to learn automatically from the Web hyponym-hypernym pairs subordinated to the root; (2) a Web based concept positioning procedure to validate the learned pairs’ is-a relations; and (3) a graph algorithm that derives from scratch the integrated taxonomy structure of all the terms. Comparing results with WordNet, we find that the algorithm misses some concepts and links, but also that it discovers many additional ones lacking in WordNet. We evaluate the taxonomization power of our method on reconstructing parts of the WordNet taxonomy. Experiments show that starting from scratch, the algorithm can reconstruct 62% of the WordNet taxonomy for the regions tested.

6 0.20083998 37 emnlp-2010-Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks

7 0.19833501 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

8 0.19576624 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

9 0.18999173 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

10 0.18990673 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

11 0.18379651 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

12 0.18311918 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

13 0.17541261 123 emnlp-2010-Word-Based Dialect Identification with Georeferenced Rules

14 0.1717311 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

15 0.16137312 19 emnlp-2010-Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation

16 0.15988736 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

17 0.1553289 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

18 0.15112728 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

19 0.15032572 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

20 0.14876331 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text