acl acl2011 acl2011-63 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hamidreza Kobdani ; Hinrich Schuetze ; Michael Schiehlen ; Hans Kamp
Abstract: In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe – e.g., the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. Since unlabeled text is readily available, our unsupervised approach addresses the sparseness problem. In a self-training framework, we train a decision tree on a corpus that is automatically labeled using word associations. We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. .
Reference: text
sentIndex sentText sentNum sentScore
1 Since unlabeled text is readily available, our unsupervised approach addresses the sparseness problem. [sent-6, score-0.227]
2 We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. [sent-8, score-0.271]
3 1 Introduction Coreference resolution (CoRe) is the process offinding markables (noun phrases) referring to the same real world entity or concept. [sent-10, score-0.473]
4 Until recently, most approaches tried to solve the problem by binary classification, where the probability of a pair of markables being coreferent is estimated from labeled data. [sent-11, score-0.544]
5 Alternatively, a model that determines whether a markable is coreferent with a preceding cluster can be used. [sent-12, score-0.309]
6 To address this challenge, we pursue an unsupervised self-training approach. [sent-21, score-0.175]
7 In contrast, our self-trained system is not trained on any manually labeled data and is therefore a completely unsupervised system. [sent-24, score-0.271]
8 Although training on automatically labeled data can be viewed as a form of supervision, we reserve the term supervised system for systems that are trained on manually labeled data. [sent-25, score-0.291]
9 The key novelty of our approach is that we bootstrap a competitive CoRe system from association information that is mined from an unlabeled corpus in a completely unsupervised fashion. [sent-26, score-0.348]
10 It is a likely coreference pair, but this information is not accessible to standard CoRe systems because they only use string-based features (often called lexical features), named entity features and semantic word class fea- tures (e. [sent-29, score-0.323]
11 In our approach, word association information is used for clustering markables in unsupervised learning. [sent-34, score-0.579]
12 Association information is calculated as association scores between heads of markables as described below. [sent-35, score-0.426]
13 We view association information as an example of a shallow feature space which contrasts with the rich feature space that is generally used in CoRe. [sent-36, score-0.182]
14 1 MCORE can operate in three different settings: unsupervised (subsystem A-INF), supervised (subsystem SUCRE (Kobdani and Sch u¨tze, 2010)), and self-trained (subsystem UNSEL). [sent-38, score-0.279]
15 The unsupervised subsystem A-INF (“Association INFormation”) uses the association scores between heads as the distance measure when clustering markables. [sent-39, score-0.407]
16 Finally, the unsupervised self-trained subsystem UNSEL (“UNsupervised SELf-trained”) uses the unsupervised subsystem A-INF to automatically label an unlabeled corpus that is then used as a training set for SUCRE. [sent-41, score-0.657]
17 We demonstrate that word association information can be used to develop an unsupervised model for shallow coreference resolution (subsystem A-INF). [sent-43, score-0.705]
18 We introduce an unsupervised self-trained method (UNSEL) that takes a two-learner twofeature-space approach. [sent-45, score-0.175]
19 The feature spaces are the shallow and rich feature spaces. [sent-47, score-0.181]
20 We show that the performance of UNSEL is better than the performance of other unsupervised systems when it is self-trained on the automatically labeled corpus and uses the leveraging effect of a rich feature space. [sent-49, score-0.338]
21 Not only is it able to deal with shallow information spaces (A-INF), but it can also deliver competitive results for rich feature spaces (SUCRE and UNSEL). [sent-54, score-0.232]
22 We use the term semi-supervised for approaches that use some amount of human-labeled coreference pairs. [sent-61, score-0.323]
23 (2002) used co-training for coreference resolution, a semi-supervised method. [sent-63, score-0.323]
24 Cotraining puts features into disjoint subsets when learning from labeled and unlabeled data and tries to leverage this split for better performance. [sent-64, score-0.162]
25 (2002) and Ng and Cardie (2003) is that we do not use any human-labeled coreference pairs. [sent-69, score-0.323]
26 Turning to unsupervised CoRe, Haghighi and Klein (2007) proposed a generative Bayesian model with good performance. [sent-72, score-0.175]
27 Poon and Domingos (2008) introduced an unsupervised system in the framework of Markov logic. [sent-73, score-0.205]
28 Ng (2008) presented a generative model that views coreference as an EM clustering process. [sent-74, score-0.355]
29 In this paper, we only compare with completely unsupervised approaches, not with approaches that make some limited use of labeled data. [sent-77, score-0.241]
30 Using such additional resources in our unsupervised system should further improve CoRe performance. [sent-89, score-0.205]
31 (2009) present an unsupervised algorithm for identifying clusters of entities that belong to the same named entity (NE) class. [sent-91, score-0.175]
32 Determining common membership in an NE class like person is an easier task than determining coreference of two NEs. [sent-92, score-0.323]
33 3 System Architecture Figure 1 illustrates the system architecture of our unsupervised self-trained CoRe system (UNSEL). [sent-93, score-0.279]
34 We take a self-training approach to coreference resolution: We first label the corpus using the unsupervised model A-INF and then train the supervised model SUCRE on this automatically labeled training corpus. [sent-95, score-0.693]
35 The MCORE architecture is very flexible; in particular, as will be explained presently, it can be easily adapted for supervised as well as unsupervised settings. [sent-99, score-0.323]
36 The unsupervised and supervised models have an identical top level architecture; we illustrate this in Figure 2. [sent-100, score-0.279]
37 In preprocessing, tokens (words), markables and their attributes are extracted from the input text. [sent-101, score-0.341]
38 The key difference between the unsupervised and supervised approaches is in how pair estimation is accomplished see Sections 3. [sent-102, score-0.356]
39 Figure 3 presents our clustering method, which is used for both supervised and unsupervised CoRe. [sent-106, score-0.311]
40 We search for the best predicted antecedent (with coreference probability p ≥ 0. [sent-107, score-0.387]
41 We use a feature definition language to define the templates according to which the filters and features are calculated. [sent-115, score-0.145]
42 50:1 go to step 3 Pair Estimation (Mi, Mj): If Filtering(Mi, Mj)==FALSE then return 0; else return the probability p (or association score N) of markable pair (Mi, Mj) being coreferent. [sent-134, score-0.289]
43 Filtering (Mi , Mj): return TRUE if all filters for (Mi, Mj) are TRUE else FALSE Figure 3: MCORE chain estimation (clustering) algorithm (test). [sent-135, score-0.171]
44 disagree in number; (iv) a coreferent pair of two pronouns must not disagree in gender. [sent-138, score-0.137]
45 These four filters are used in supervised and unsupervised modes of MCORE. [sent-139, score-0.412]
46 In addition to the filters (i)–(iv) described above, we use the following filter: (v) If the head of markable M2 matches the head of the preceding markable M1, then we ignore all other pairs for M2 in the calculation of association scores. [sent-145, score-0.598]
47 As we will show below, even the simple filters (i)–(v) are sufficient to learn high-quality association scores; this means that we do not need the complex features of “deterministic” systems. [sent-147, score-0.141]
48 To learn word association information from an unlabeled corpus (see Section 4), we compute mutual information (MI) scores between heads ofmarkables. [sent-149, score-0.137]
49 A key virtue of our approach is that in the classification of pairs as coreferent/disreferent, the coreference probability p estimated in supervised learning plays exactly the same role as the association information score N (defined below). [sent-154, score-0.485]
50 However, the weaknesses of this approach are (i) the failure to cover pairs that do not occur in the unlabeled corpus (negatively affecting recall) and (ii) the generation of pairs that are not plausible candidates for coreference (negatively affecting precision). [sent-166, score-0.475]
51 2 Supervised Model (SUCRE) Figure 4 (bottom) presents the architecture of pair estimation for the supervised approach (SUCRE). [sent-169, score-0.225]
52 In the pair generation step for train, we take each coreferent markable pair (Mi, Mj) without intervening coreferent markables and use (Mi, Mj) as a positive training instance and (Mi, Mk), i< k < j, as negative training instances. [sent-170, score-0.83]
53 After filtering, we then calculate a feature vector for each generated pair that survived filters (i)–(iv). [sent-172, score-0.188]
54 We believe that the good performance of our supervised system SUCRE (tables 1 and 2) is the result of our feature engineering approach. [sent-180, score-0.169]
55 , 2010) 787 tree3 (Quinlan, 1993) that is trained on the training set to estimate the coreference probability p for a pair and then applied to the test set. [sent-193, score-0.366]
56 The number of detected markables (all noun phrases extracted from parse trees) is about 9 million. [sent-201, score-0.373]
57 4 This data set is one of the most widely used CoRe benchmarks and was used by the systems that are most comparable to our approach; in particular, it was used in most prior work on unsupervised CoRe. [sent-204, score-0.175]
58 We report results for true markables (markables extracted from the answer keys) to be able to compare with other systems that use true markables. [sent-209, score-0.403]
59 automatic setting is directly related to a second important evaluation issue: the influence of markable detection on CoRe evaluation measures. [sent-220, score-0.245]
60 In a real application, we do not have access to true markables, so an evaluation on system markables (markables automatically detected by the system) reflects actual expected performance better. [sent-221, score-0.427]
61 However, reporting only CoRe numbers (even for system markables) is not sufficient either since accuracy of markable detection is necessary to interpret CoRe scores. [sent-222, score-0.245]
62 Thus, we need (i) measures of the quality of system markables (i. [sent-223, score-0.371]
63 , an evaluation of the markable detection subtask) and CoRe performance on system markables as well as (ii) a measure of CoRe performance on true markables. [sent-225, score-0.617]
64 , 2010): the automatic setting with system markables and the gold setting with true markables. [sent-228, score-0.497]
65 For the experiments with UNSEL, we use its unsupervised subsystem A-INF (which uses Wikipedia association scores) to automatically label the training sets of ACE and OntoNotes. [sent-231, score-0.346]
66 Then for each data set, the supervised subsystem of UNSEL (i. [sent-232, score-0.219]
67 Finally, for 788 the supervised experiments, we use the manually labeled training sets and evaluate on the corresponding test sets. [sent-235, score-0.17]
68 We selected these three metrics because a single metric is often misleading and because we need to use metrics that were used in previous unsupervised work. [sent-239, score-0.175]
69 It is well known that MUC by itself is insufficient because it gives misleadingly high scores to the “single-chain” system that puts all markables into one chain (Luo et al. [sent-240, score-0.47]
70 However, B3 and CEAF have a different bias: they give high scores to the “all-singletons” system that puts each markable in a separate chain. [sent-242, score-0.317]
71 5 Results and Discussion Table 1 compares our unsupervised self-trained model UNSEL and unsupervised model A-INF to 2314BPSAU&-NIDCSENRWFLES-AC2P768 0re. [sent-251, score-0.35]
72 To our knowledge, these three papers are the best and most recent evaluation results for unsupervised learning and they all report results on ACE-2 and ACE-2003. [sent-316, score-0.175]
73 A-INF scores are below some of the earlier unsupervised work reported in the literature (lines 2, 6, 10) although they are close to competitive on two of the datasets (lines 15 and 20: MUC scores are equal or better, CEAF scores are worse). [sent-318, score-0.292]
74 Turning to UNSEL, we see that F1 is always better for UNSEL than for A-INF, for all three measures (lines 3 vs 2, 7 vs 6, 11 vs 10, 16 vs 15, 21 vs 20). [sent-321, score-0.63]
75 When comparing the unsupervised system UNSEL to previous unsupervised results, we find that UNSEL’s F1 is higher in all runs (lines 3 vs 1, 7 vs 5, 11 vs 9, 16 vs 13&14, 21 vs 18&19). [sent-324, score-1.01]
76 Given that MCORE is a simpler and more efficient system than this prior work on unsupervised CoRe, these results are promising. [sent-329, score-0.205]
77 For example, P&D; is better than UNSEL on MUC recall for BNEWS-ACE-2 (lines 1 vs 3) and H&K; is better than UNSEL on CEAF precision for NWIRE-ACE2003 (lines 18 vs 21). [sent-331, score-0.335]
78 Consider the markable pair (Novoselov6,he) in the test set. [sent-338, score-0.258]
79 Using the same representation of pairs, suppose that for the sequence of markables Biden, Obama, President the markable pairs (Biden,President) and (Obama,President) are assigned the feature vectors <8, No, Proper Noun, Proper Noun, Yes> and < 1, No, Proper Noun, Proper Noun, Yes>, respectively. [sent-347, score-0.618]
80 5, A-INF incorrectly puts the three markables into one cluster. [sent-349, score-0.385]
81 But as we would expect, A-INF labels many more markable pairs – – 6The 2010 physics Nobel laureate. [sent-350, score-0.242]
82 with the second feature vector (distance=1) as coreferent than with the first one (distance=8) in the entire automatically labeled training set. [sent-352, score-0.22]
83 To summarize, the advantages of our self-training approach are: (i) We cover cases that do not occur in the unlabeled corpus (better recall effect); and (ii) we use the leveraging effect of a rich feature space including distance, person, number, gender etc. [sent-357, score-0.167]
84 Figure 5 presents MUC scores of A-INF as a function of the number of Wikipedia articles used in unsupervised learning. [sent-360, score-0.203]
85 Comparison of UNSEL with SUCRE Table 2 compares our unsupervised self-trained (UNSEL) and supervised (SUCRE) models with the recently published SemEval-2010 OntoNotes re- S RGUy NeoslCaStdexREmsL e2t0i1ng+1MT0 Dr0uemM63a40Ur. [sent-368, score-0.279]
86 We compare with the scores of the two best systems, Relax and SUCRE20107 (for the gold setting with true markables) and SUCRE2010 and Tanl-1 (for the automatic setting with system markables, 89. [sent-379, score-0.184]
87 It is apparent from this table that our supervised and unsupervised self-trained models outperform Relax, SUCRE2010 and Tanl-1. [sent-381, score-0.279]
88 Table 1 shows that the unsupervised self-trained system (UNSEL) does a lot worse than the supervised system (SUCRE) on ACE. [sent-383, score-0.339]
89 This may indicate that part of our improved unsupervised performance in Table 1 is due to better feature engineering implemented in SUCRE. [sent-396, score-0.21]
90 For an unsupervised approach, which only needs unlabeled data, there is little cost to creating large training sets. [sent-399, score-0.227]
91 Thus, this comparison of ACE-2/Ontonotes results is evidence that in a realistic scenario using association information in an unsupervised self-trained system is almost as good as a system trained on manually labeled data. [sent-400, score-0.332]
92 It is important to note that the comparison of SUCRE to UNSEL is the most direct comparison of supervised and unsupervised CoRe learning we are aware of. [sent-401, score-0.279]
93 6 Conclusion In this paper, we have demonstrated the utility of association information for coreference resolution. [sent-404, score-0.354]
94 We first developed a simple unsupervised model for shallow CoRe that only uses association information for finding coreference chains. [sent-405, score-0.573]
95 We then introduced an unsupervised self-trained approach where a supervised model is trained on a corpus that was automatically labeled by the unsupervised model based on the association information. [sent-406, score-0.576]
96 The results ofthe experiments indicate that the performance of the unsupervised self-trained approach is better than the performance of other unsupervised learning systems. [sent-407, score-0.35]
97 In addition, we showed that our system is a flexible and modular framework that is able to learn from data with different quality (perfect vs noisy markable detection) and domain; and is able to deliver good results for shallow information spaces and competitive results for rich feature spaces. [sent-408, score-0.645]
98 Simple coreference resolution with rich syntactic and semantic features. [sent-437, score-0.492]
99 Optimization in coreference resolution is not needed: A nearly-optimal algorithm with intensional constraints. [sent-450, score-0.455]
100 A machine learning approach to coreference resolution of noun phrases. [sent-528, score-0.487]
wordName wordTfidf (topN-words)
[('unsel', 0.44), ('sucre', 0.373), ('markables', 0.341), ('coreference', 0.323), ('markable', 0.215), ('unsupervised', 0.175), ('core', 0.173), ('ontonotes', 0.141), ('muc', 0.134), ('resolution', 0.132), ('vs', 0.126), ('subsystem', 0.115), ('mj', 0.11), ('filters', 0.11), ('mcore', 0.106), ('mi', 0.106), ('supervised', 0.104), ('coreferent', 0.094), ('kobdani', 0.093), ('ceaf', 0.093), ('recasens', 0.078), ('labeled', 0.066), ('antecedent', 0.064), ('uller', 0.054), ('haghighi', 0.053), ('hamidreza', 0.053), ('unlabeled', 0.052), ('ng', 0.048), ('wikipedia', 0.046), ('pronoun', 0.045), ('lines', 0.045), ('modular', 0.044), ('ii', 0.044), ('puts', 0.044), ('shallow', 0.044), ('architecture', 0.044), ('pair', 0.043), ('recall', 0.043), ('precision', 0.04), ('obama', 0.037), ('rich', 0.037), ('gold', 0.035), ('disreferent', 0.035), ('klenner', 0.035), ('mcenery', 0.035), ('luo', 0.035), ('feature', 0.035), ('iv', 0.034), ('estimation', 0.034), ('competitive', 0.033), ('filtering', 0.033), ('clustering', 0.032), ('noun', 0.032), ('true', 0.031), ('poon', 0.031), ('npaper', 0.031), ('raghunathan', 0.031), ('vilain', 0.031), ('association', 0.031), ('deterministic', 0.031), ('system', 0.03), ('yes', 0.03), ('sch', 0.03), ('setting', 0.03), ('spaces', 0.03), ('cardie', 0.03), ('ace', 0.029), ('associations', 0.029), ('stoyanov', 0.029), ('bnews', 0.029), ('nwire', 0.029), ('bagga', 0.029), ('kehler', 0.029), ('hinrich', 0.028), ('flexible', 0.028), ('tze', 0.028), ('scores', 0.028), ('pairs', 0.027), ('chain', 0.027), ('soon', 0.027), ('xiaoqiang', 0.027), ('singletons', 0.027), ('ci', 0.027), ('mined', 0.027), ('president', 0.026), ('heads', 0.026), ('pradhan', 0.026), ('automatically', 0.025), ('aria', 0.025), ('proper', 0.025), ('klein', 0.025), ('elsner', 0.024), ('relational', 0.024), ('relax', 0.023), ('negatively', 0.023), ('deliver', 0.023), ('modes', 0.023), ('affecting', 0.023), ('turning', 0.023), ('semeval', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999893 63 acl-2011-Bootstrapping coreference resolution using word associations
Author: Hamidreza Kobdani ; Hinrich Schuetze ; Michael Schiehlen ; Hans Kamp
Abstract: In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe – e.g., the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. Since unlabeled text is readily available, our unsupervised approach addresses the sparseness problem. In a self-training framework, we train a decision tree on a corpus that is automatically labeled using word associations. We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. .
2 0.22986092 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models
Author: Dingcheng Li ; Tim Miller ; William Schuler
Abstract: and Wellner, This paper presents a supervised pronoun anaphora resolution system based on factorial hidden Markov models (FHMMs). The basic idea is that the hidden states of FHMMs are an explicit short-term memory with an antecedent buffer containing recently described referents. Thus an observed pronoun can find its antecedent from the hidden buffer, or in terms of a generative model, the entries in the hidden buffer generate the corresponding pronouns. A system implementing this model is evaluated on the ACE corpus with promising performance.
3 0.18138067 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum
Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
4 0.18010348 9 acl-2011-A Cross-Lingual ILP Solution to Zero Anaphora Resolution
Author: Ryu Iida ; Massimo Poesio
Abstract: We present an ILP-based model of zero anaphora detection and resolution that builds on the joint determination of anaphoricity and coreference model proposed by Denis and Baldridge (2007), but revises it and extends it into a three-way ILP problem also incorporating subject detection. We show that this new model outperforms several baselines and competing models, as well as a direct translation of the Denis / Baldridge model, for both Italian and Japanese zero anaphora. We incorporate our model in complete anaphoric resolvers for both Italian and Japanese, showing that our approach leads to improved performance also when not used in isolation, provided that separate classifiers are used for zeros and for ex- plicitly realized anaphors.
5 0.15935294 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel
Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.
6 0.15847063 85 acl-2011-Coreference Resolution with World Knowledge
7 0.13438725 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
8 0.11701045 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
9 0.071216054 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
10 0.068709269 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
11 0.066094212 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
12 0.065450139 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
13 0.065141469 293 acl-2011-Template-Based Information Extraction without the Templates
14 0.064484574 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
15 0.062783182 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
16 0.060786925 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
17 0.057924259 334 acl-2011-Which Noun Phrases Denote Which Concepts?
18 0.057562795 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
19 0.055450089 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
20 0.05420142 8 acl-2011-A Corpus of Scope-disambiguated English Text
topicId topicWeight
[(0, 0.164), (1, 0.035), (2, -0.109), (3, -0.007), (4, 0.081), (5, 0.033), (6, 0.049), (7, -0.056), (8, -0.201), (9, 0.037), (10, 0.072), (11, -0.032), (12, -0.072), (13, -0.049), (14, 0.015), (15, 0.049), (16, -0.055), (17, 0.039), (18, 0.033), (19, 0.04), (20, -0.057), (21, 0.006), (22, 0.03), (23, 0.116), (24, -0.064), (25, 0.025), (26, -0.051), (27, -0.075), (28, -0.108), (29, -0.18), (30, 0.102), (31, -0.112), (32, -0.055), (33, -0.059), (34, -0.117), (35, -0.051), (36, 0.037), (37, 0.005), (38, 0.108), (39, 0.017), (40, 0.146), (41, -0.011), (42, -0.019), (43, 0.03), (44, 0.087), (45, -0.01), (46, -0.02), (47, -0.07), (48, -0.014), (49, -0.086)]
simIndex simValue paperId paperTitle
1 0.90815598 9 acl-2011-A Cross-Lingual ILP Solution to Zero Anaphora Resolution
Author: Ryu Iida ; Massimo Poesio
Abstract: We present an ILP-based model of zero anaphora detection and resolution that builds on the joint determination of anaphoricity and coreference model proposed by Denis and Baldridge (2007), but revises it and extends it into a three-way ILP problem also incorporating subject detection. We show that this new model outperforms several baselines and competing models, as well as a direct translation of the Denis / Baldridge model, for both Italian and Japanese zero anaphora. We incorporate our model in complete anaphoric resolvers for both Italian and Japanese, showing that our approach leads to improved performance also when not used in isolation, provided that separate classifiers are used for zeros and for ex- plicitly realized anaphors.
same-paper 2 0.90012509 63 acl-2011-Bootstrapping coreference resolution using word associations
Author: Hamidreza Kobdani ; Hinrich Schuetze ; Michael Schiehlen ; Hans Kamp
Abstract: In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe – e.g., the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. Since unlabeled text is readily available, our unsupervised approach addresses the sparseness problem. In a self-training framework, we train a decision tree on a corpus that is automatically labeled using word associations. We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. .
3 0.87405759 85 acl-2011-Coreference Resolution with World Knowledge
Author: Altaf Rahman ; Vincent Ng
Abstract: While world knowledge has been shown to improve learning-based coreference resolvers, the improvements were typically obtained by incorporating world knowledge into a fairly weak baseline resolver. Hence, it is not clear whether these benefits can carry over to a stronger baseline. Moreover, since there has been no attempt to apply different sources of world knowledge in combination to coreference resolution, it is not clear whether they offer complementary benefits to a resolver. We systematically compare commonly-used and under-investigated sources of world knowledge for coreference resolution by applying them to two learning-based coreference models and evaluating them on documents annotated with two different annotation schemes.
4 0.84567809 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models
Author: Dingcheng Li ; Tim Miller ; William Schuler
Abstract: and Wellner, This paper presents a supervised pronoun anaphora resolution system based on factorial hidden Markov models (FHMMs). The basic idea is that the hidden states of FHMMs are an explicit short-term memory with an antecedent buffer containing recently described referents. Thus an observed pronoun can find its antecedent from the hidden buffer, or in terms of a generative model, the entries in the hidden buffer generate the corresponding pronouns. A system implementing this model is evaluated on the ACE corpus with promising performance.
5 0.69398832 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum
Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
6 0.47055903 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
7 0.46921247 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
8 0.38722262 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
9 0.33776906 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
10 0.32181418 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
11 0.31998876 8 acl-2011-A Corpus of Scope-disambiguated English Text
12 0.31064996 297 acl-2011-That's What She Said: Double Entendre Identification
13 0.3062869 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
14 0.30341473 334 acl-2011-Which Noun Phrases Denote Which Concepts?
15 0.30097002 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
16 0.30054724 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components
17 0.29812002 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
18 0.29242298 199 acl-2011-Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning
19 0.28947023 293 acl-2011-Template-Based Information Extraction without the Templates
20 0.28568855 194 acl-2011-Language Use: What can it tell us?
topicId topicWeight
[(5, 0.035), (13, 0.264), (17, 0.055), (26, 0.021), (37, 0.13), (39, 0.04), (41, 0.059), (55, 0.021), (59, 0.04), (72, 0.045), (91, 0.036), (96, 0.135)]
simIndex simValue paperId paperTitle
1 0.86916918 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
Author: Balaji Soundrarajan ; Thomas Ginter ; Scott DuVall
Abstract: This demonstration presents the Annotation Librarian, an application programming interface that supports rapid development of natural language processing (NLP) projects built in Apache Unstructured Information Management Architecture (UIMA). The flexibility of UIMA to support all types of unstructured data – images, audio, and text – increases the complexity of some of the most common NLP development tasks. The Annotation Librarian interface handles these common functions and allows the creation and management of annotations by mirroring Java methods used to manipulate Strings. The familiar syntax and NLP-centric design allows developers to adopt and rapidly develop NLP algorithms in UIMA. The general functionality of the interface is described in relation to the use cases that necessitated its creation. 1
same-paper 2 0.75725377 63 acl-2011-Bootstrapping coreference resolution using word associations
Author: Hamidreza Kobdani ; Hinrich Schuetze ; Michael Schiehlen ; Hans Kamp
Abstract: In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe – e.g., the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. Since unlabeled text is readily available, our unsupervised approach addresses the sparseness problem. In a self-training framework, we train a decision tree on a corpus that is automatically labeled using word associations. We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. .
3 0.68552929 11 acl-2011-A Fast and Accurate Method for Approximate String Search
Author: Ziqi Wang ; Gu Xu ; Hang Li ; Ming Zhang
Abstract: This paper proposes a new method for approximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for finding the top k candidates. The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word. The learning method employs the criterion in candidate generation as loss function. The retrieval algorithm is efficient and is guaranteed to find the optimal k candidates. Experimental results on large scale data show that the proposed approach improves upon existing methods in terms of accuracy in different settings.
4 0.65639889 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
Author: Houda Bouamor ; Aurelien Max ; Anne Vilnat
Abstract: In this paper, we present a novel way of tackling the monolingual alignment problem on pairs of sentential paraphrases by means of edit rate computation. In order to inform the edit rate, information in the form of subsentential paraphrases is provided by a range of techniques built for different purposes. We show that the tunable TER-PLUS metric from Machine Translation evaluation can achieve good performance on this task and that it can effectively exploit information coming from complementary sources.
5 0.63234103 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
Author: Joseph Reisinger ; Marius Pasca
Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
6 0.62843513 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
7 0.62674141 292 acl-2011-Target-dependent Twitter Sentiment Classification
8 0.6251294 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
9 0.62485176 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
10 0.62291366 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
11 0.62231112 311 acl-2011-Translationese and Its Dialects
12 0.62211251 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
13 0.62163782 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts
14 0.6211617 85 acl-2011-Coreference Resolution with World Knowledge
15 0.62062836 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
16 0.61913431 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
17 0.61912477 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
18 0.61892653 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers
19 0.61815643 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
20 0.61790645 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation