acl acl2011 acl2011-334 knowledge-graph by maker-knowledge-mining

334 acl-2011-Which Noun Phrases Denote Which Concepts?

Source: pdf

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: Resolving polysemy and synonymy is required for high-quality information extraction. We present ConceptResolver, a component for the Never-Ending Language Learner (NELL) (Carlson et al., 2010) that handles both phenomena by identifying the latent concepts that noun phrases refer to. ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data. Domain knowledge (the ontology) guides concept creation by defining a set of possible semantic types for concepts. Word sense induction is performed by inferring a set of semantic types for each noun phrase. Synonym detection exploits redundant informa- tion to train several domain-specific synonym classifiers in a semi-supervised fashion. When ConceptResolver is run on NELL’s knowledge base, 87% of the word senses it creates correspond to real-world concepts, and 85% of noun phrases that it suggests refer to the same concept are indeed synonyms.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 , 2010) that handles both phenomena by identifying the latent concepts that noun phrases refer to. [sent-6, score-0.476]

2 ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data. [sent-7, score-0.713]

3 Word sense induction is performed by inferring a set of semantic types for each noun phrase. [sent-9, score-0.341]

4 When ConceptResolver is run on NELL’s knowledge base, 87% of the word senses it creates correspond to real-world concepts, and 85% of noun phrases that it suggests refer to the same concept are indeed synonyms. [sent-11, score-0.707]

5 A major limitation of many of these systems is that they fail to distinguish between noun phrases and the underlying concepts they refer to. [sent-17, score-0.476]

6 As a result, a polysemous phrase like “apple” will refer sometimes to the conceptApple Computer (the company), and other times to the concept apple (the fruit). [sent-18, score-0.341]

7 Furthermore, two synonymous noun phrases like “apple” and “Apple 570 Tom M. [sent-19, score-0.316]

8 Arrows indicate which noun phrases can refer to which concepts. [sent-23, score-0.289]

9 ce l lul ] ar [ j c penney j c penny ] [ nie l en media re search s nie l sen company ] [ univers al studios univers al mus i group c univers al ] [ amr corporat ion amr ] [ inte l corp inte l corp . [sent-27, score-0.338]

10 The result of ignoring this many-to-many mapping between noun phrases and underlying concepts (see Figure 1) is confusion about the meaning of extracted information. [sent-33, score-0.455]

11 To minimize such confusion, a system must separately represent noun phrases, the underlying concepts to which they can refer, and the many-to-many “can refer to” relation between them. [sent-34, score-0.448]

12 Ac s2s0o1ci1a Atiosnso fcoirat Cioonm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 570–580, the system extracts the relation ceoOf(x1 , x2) between the noun phrases x1 and x2. [sent-38, score-0.289]

13 The correct interpretation of this extracted relation is that there exist concepts c1 and c2 such that x1 can refer to c1, x2 can refer to c2, and ceoOf(c1 , c2). [sent-39, score-0.362]

14 We define concept discovery as the problem of (1) identifying concepts like c1 and c2 from extracted predicates like ceoOf(x1 , x2) and (2) mapping noun phrases like x1, x2 to the concepts they can refer to. [sent-42, score-0.894]

15 The main input to ConceptResolver is a set of extracted category and relation instances over noun phrases, like person(x1) and ceoOf(x1 , x2), produced by running NELL. [sent-43, score-0.365]

16 , cn}, and a mapping afro smet oeafch co noun phrase in the input tdo ath me spepti nogf concepts it can refer to. [sent-48, score-0.435]

17 Like many other systems (Miller, 1995; Yates and Etzioni, 2007; Lin and Pantel, 2002), ConceptResolver represents each output concept ci as a set of synonymous noun phrases, i. [sent-49, score-0.342]

18 For example, Figure 2 shows s=ev {erxal concepts output by ConceptResolver; each concept clearly reveals which noun phrases can refer to it. [sent-55, score-0.579]

19 Each concept also has a semantic type that corresponds to a category in ConceptResolver’s ontology; for instance, the first 10 concepts in Figure 2 belong to the category company. [sent-56, score-0.513]

20 Previous approaches to concept discovery use little prior knowledge, clustering noun phrases based on co-occurrence statistics (Pantel and Lin, 2002). [sent-57, score-0.444]

21 The ontology contains a schema for the relation and category predicates found in the input instances, including properties of predicates like type restrictions on its domain and range. [sent-60, score-0.441]

22 The category predicates are used to assign semantic types to each concept, and the properties of relation predicates are used to create evidence for synonym resolution. [sent-61, score-0.464]

23 It first performs word sense induction, using the extracted category instances to create one or more unambiguous word senses for each noun phrase in the knowledge base. [sent-65, score-0.77]

24 Each word sense is a copy of the original noun phrase paired with a semantic type (a category) that restricts the concepts it can refer to. [sent-66, score-0.558]

25 ConceptResolver then performs synonym resolution on these word senses. [sent-67, score-0.289]

26 This step treats the senses of each semantic type independently, first training a synonym classifier then clustering the senses based on the classifier’s decisions. [sent-68, score-0.823]

27 The evaluation shows that, on average, 87% of the word senses created by ConceptResolver correspond to real-world concepts. [sent-72, score-0.291]

28 We additionally find that, on average, 85% of the noun phrases in each concept refer to the same real-world entity. [sent-73, score-0.392]

29 2 Prior Work Previous work on concept discovery has focused on the subproblems of word sense induction and synonym resolution. [sent-74, score-0.491]

30 , 2010), all of the submissions in 2007 created senses by clustering the contexts each word occurs in, and the 2010 event explicitly disallowed the use of external resources like ontologies. [sent-77, score-0.36]

31 Other systems cluster words to find both word senses and concepts (Pantel and Lin, 2002; Lin and Pantel, 2002). [sent-78, score-0.485]

32 Synonym resolution on relations extracted from web text has been previously studied by Resolver (Yates and Etzioni, 2007), which finds synonyms in relation triples extracted by TextRunner (Banko et al. [sent-85, score-0.326]

33 However, our evaluation shows that ConceptResolver has higher synonym resolution precision than Resolver, which we attribute to our semi-supervised approach and the known relation schema. [sent-89, score-0.366]

34 Other synonym resolution work is fully supervised (Singla and Domingos, 2006; McCallum and Wellner, 2004; Snow et al. [sent-93, score-0.289]

35 The difference between the two problems is that coreference resolution finds noun phrases that refer to the same concept within a specific document. [sent-106, score-0.539]

36 We think the concepts produced by a system like ConceptResolver could be used to improve coreference – resolution by providing prior knowledge about noun phrases that can refer to the same concept. [sent-107, score-0.65]

37 This knowledge could be especially helpful for crossdocument coreference resolution systems (Haghighi and Klein, 2010), which actually represent concepts and track mentions of them across documents. [sent-108, score-0.361]

38 As in other information extraction systems, the category and relation instances extracted by NELL contain polysemous and synonymous noun phrases. [sent-137, score-0.463]

39 It uses a two-step procedure, first creating one or more senses for each noun phrase, then clustering synonymous senses to create concepts. [sent-140, score-0.836]

40 1 Word Sense Induction ConceptResolver induces word senses using a sim- ple assumption about noun phrases and concepts. [sent-142, score-0.504]

41 If 573 a noun phrase has multiple senses, the senses should be distinguishable from context. [sent-143, score-0.463]

42 As the category predicates in NELL’s ontology define a set of possible semantic types, this assumption is equivalent to the one-sense-per-category assumption: a noun phrase refers to at most one concept in each category of NELL’s ontology. [sent-147, score-0.706]

43 For example, this means that a noun phrase can refer to a company and a fruit, but not multiple companies. [sent-148, score-0.333]

44 Each word sense is represented as a tuple containing a noun phrase and a category. [sent-150, score-0.301]

45 In synonym resolution, the category acts like a type constraint, and only senses with the same category type can be synonymous. [sent-151, score-0.686]

46 To create senses, the system interprets each extracted category predicate c(x) as evidence that category c contains a concept denoted by noun phrase x. [sent-152, score-0.532]

47 Sense induction creates two senses for “apple”: (“apple”, company) and (“apple”, fruit). [sent-155, score-0.364]

48 The second step of sense induction produces evidence for synonym resolution by creating relations between word senses. [sent-156, score-0.529]

49 This process is effective because the relations in the ontology have restrictive domains and ranges, so only a small fraction of sense pairs satisfy the argument type restrictions. [sent-161, score-0.343]

50 It is also not vital that this mapping be perfect, as the sense relations are only used as evidence for synonym resolution. [sent-162, score-0.342]

51 The final output of sense induction is a sense-disambiguated knowledge base, where each noun phrase has been converted into one or more word senses, and relations hold between pairs of senses. [sent-163, score-0.466]

52 2 Synonym Resolution After mapping each noun phrase to one or more senses (each with a distinct category type), ConceptResolver performs semi-supervised clustering to find synonymous senses. [sent-165, score-0.709]

53 As only senses with the same category type can be synonymous, our synonym resolution algorithm treats senses of each type independently. [sent-166, score-0.96]

54 Our key insight is that semantic relations and string attributes provide independent views of the data: we can predict that two noun phrases are synonymous either based on the similarity of their text strings, or based on similarity in the relations NELL has extracted about them. [sent-168, score-0.558]

55 1 Co-Training the Synonym Classifier For each category, ConceptResolver co-trains a pair of synonym classifiers using a handful of labeled synonymous senses and a large number of automatically created unlabeled sense pairs. [sent-173, score-0.669]

56 Ideally, U would contain all pairs of senses in the category, but this set grows quadratically in category size. [sent-185, score-0.365]

57 2 Agglomerative Clustering The second step of our algorithm runs agglomerative clustering to enforce transitivity constraints on the predictions of the co-trained synonym classifier. [sent-224, score-0.31]

58 The algorithm is essentially bottom-up agglomerative clustering of word senses using a similarity score derived from P(Y |X1, X2). [sent-232, score-0.409]

59 The similarity score f doerr tiwveod senses Pis (dYe f|Xined as: logPP((YY = = 01))PP((YY == 0 1||XX11,,XX22)) The similarity score for two clusters is the sum of the similarity scores for all pairs of senses. [sent-233, score-0.363]

60 The clusters of word senses produced by this process are the concepts for each category. [sent-235, score-0.451]

61 The second experiment evaluates synonym resolution by comparing ConceptResolver’s sense clusters to a gold standard clustering. [sent-238, score-0.442]

62 We preprocessed this knowledge base by removing all noun phrases with zero extracted relations. [sent-240, score-0.331]

63 As ConceptResolver treats the instances of each category predicate independently, we chose 7 categories from NELL’s ontology to use in the evaluation. [sent-241, score-0.308]

64 The number of noun phrases in each category is shown in Table 2. [sent-243, score-0.341]

65 We manually labeled 10 pairs of synonymous senses for each of these categories. [sent-244, score-0.362]

66 We estimate two quantities: (1) sense precision, the fraction of senses created by our system that correspond to real-world entities, and (2) sense recall, the fraction of real-world entities that Con- ceptResolver creates senses for. [sent-248, score-0.783]

67 Sense recall is only measured over entities which are represented by a noun phrase in ConceptResolver’s input assertions it is a measure of ConceptResolver’s ability to create senses for the noun phrases it is given. [sent-249, score-0.762]

68 Sense precision is directly determined by how frequently NELL’s extractors propose correct senses for noun phrases, while sense recall is related to the correctness of the one-sense-per-category assumption. [sent-250, score-0.602]

69 Precision and recall were evaluated by comparing the senses created by ConceptResolver to concepts in Freebase (Bollacker et al. [sent-251, score-0.503]

70 We sampled 100 noun phrases from each category and matched each noun phrase to a set of Freebase concepts. [sent-253, score-0.54]

71 We interpret each matching Freebase concept as a sense of the noun phrase. [sent-254, score-0.368]

72 To align ConceptResolver’s senses with Freebase, we first matched each of our categories with a set of similar Freebase categories2. [sent-256, score-0.313]

73 We then used a combination of Freebase’s search API and Mechanical Turk to align noun phrases with Freebase concepts: we searched for the noun phrase in Freebase, then had Mechanical Turk workers label which of the – 2In Freebase, concepts are called Topics and categories are called Types. [sent-257, score-0.675]

74 576 mance Figure5:EmpircaldistributionofthenumberofFre - base concepts per noun phrase in each category top 10 resulting Freebase concepts the noun phrase could refer to. [sent-259, score-0.958]

75 After obtaining the list of matching Freebase concepts for each noun phrase, we computed sense precision as the number of noun phrases matching ≥ 1Freebase concept divided by 100, the tmotaatlc nhuinmgb ≥er 1 1o fF noun phrases. [sent-260, score-0.986]

76 pSte dnivsei d reedca blly i s1 0t0he, reciprocal of the average number of Freebase concepts per noun phrase. [sent-261, score-0.35]

77 Other cate- gories have almost 4 senses per noun phrase. [sent-266, score-0.427]

78 Figure 5 shows the distribution of the number of concepts per noun phrase in each category. [sent-268, score-0.386]

79 The distribution shows that most noun phrases are unambiguous, but a small number of noun phrases have a large number of senses. [sent-269, score-0.48]

80 An important footnote to this evaluation is that the categories in NELL’s ontology are somewhat arbitrary, and that creating subcategories would improve sense recall. [sent-274, score-0.314]

81 2 Synonym Resolution Evaluation Our second experiment evaluates synonym resolution by comparing the concepts created by ConceptResolver to a gold standard set of concepts. [sent-280, score-0.554]

82 Specifically, the gold standard clustering contains noun phrases that refer to multiple concepts within the same category. [sent-282, score-0.57]

83 The first measure is the precision and recall of pairwise synonym decisions, typically known as cluster precision and recall. [sent-286, score-0.293]

84 The Resolver metric aligns each proposed cluster containing ≥ 2 senses with a gold sptoasneddar cdl ucsltuesrte cro (i. [sent-289, score-0.323]

85 , a gre ≥al-w 2o srledn concept) by selecting the cluster that a plurality of the senses in the proposed cluster refer to. [sent-291, score-0.381]

86 Precision is then the fraction of senses in the proposed cluster which are also 577 in the gold standard cluster; recall is computed analogously by swapping the roles of the proposed and gold standard clusters. [sent-292, score-0.373]

87 Incorrect senses were removed from the data set before evaluating precision; however, these senses may still affect performance by influencing the clustering process. [sent-294, score-0.597]

88 Precision was evaluated by sampling 100 random concepts proposed by ConceptResolver, then manually scoring each concept using both of the metrics above. [sent-295, score-0.29]

89 To create this set, we randomly sampled noun phrases from each category and manually matched each noun phrase to one or more real-world entities. [sent-298, score-0.54]

90 We then found other noun phrases which referred to each entity and created a concept for each entity with at least one unambiguous reference. [sent-299, score-0.395]

91 This process can create multiple senses for a noun phrase, depending on the real-world entities represented in the input assertions. [sent-300, score-0.427]

92 We only included concepts containing at least 2 senses in the test set, as singleton concepts do not contribute to either recall metric. [sent-301, score-0.663]

93 We note that the synonym resolution portion of ConceptResolver is tuned for precision, and that perfect recall is not necessarily attainable. [sent-314, score-0.314]

94 6 Discussion In order for information extraction systems to accurately represent knowledge, they must represent noun phrases, concepts, and the many-to-many mapping from noun phrases to concepts they denote. [sent-317, score-0.59]

95 We present ConceptResolver, a system which takes extracted relations between noun phrases and identifies latent concepts that the noun phrases refer to. [sent-318, score-0.806]

96 Two lessons from ConceptResolver are that (1) ontologies aid word sense induction, as the senses of polysemous words tend to have distinct semantic types, and (2) redundant information, in the form of string similarity and extracted relations, helps train accurate synonym classifiers. [sent-319, score-0.627]

97 Defining finer-grained categories will improve performance at word sense induction, as more precise categories will contain fewer ambiguous noun phrases. [sent-321, score-0.363]

98 Both extracting more relation instances and adding new relations to the ontology will improve synonym res578 olution. [sent-322, score-0.447]

99 However, the ontology contains compatible categories like male and politician, where a single concept can belong to both categories. [sent-326, score-0.306]

100 We currently address this problem with a heuristic post-processing step: we merge all pairs of concepts that belong to compatible categories and share at least one referring noun phrase. [sent-328, score-0.399]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('conceptresolver', 0.698), ('nell', 0.303), ('senses', 0.264), ('concepts', 0.187), ('synonym', 0.178), ('noun', 0.163), ('freebase', 0.146), ('ontology', 0.134), ('apple', 0.131), ('resolution', 0.111), ('concept', 0.103), ('sense', 0.102), ('category', 0.101), ('company', 0.085), ('phrases', 0.077), ('induction', 0.076), ('synonymous', 0.076), ('clustering', 0.069), ('predicates', 0.068), ('relations', 0.062), ('fruit', 0.059), ('ceoof', 0.055), ('resolver', 0.052), ('categories', 0.049), ('refer', 0.049), ('relation', 0.049), ('synonyms', 0.048), ('bhattacharya', 0.044), ('ceoofcompany', 0.044), ('agglomerative', 0.043), ('coreference', 0.036), ('snow', 0.036), ('phrase', 0.036), ('base', 0.036), ('assertions', 0.034), ('cluster', 0.034), ('athlete', 0.033), ('inte', 0.033), ('kaspersky', 0.033), ('ravikumar', 0.033), ('univers', 0.033), ('carlson', 0.033), ('similarity', 0.033), ('discovery', 0.032), ('ceo', 0.032), ('jobs', 0.031), ('subcategories', 0.029), ('getoor', 0.029), ('precision', 0.028), ('extracted', 0.028), ('knowledge', 0.027), ('classifier', 0.027), ('created', 0.027), ('poon', 0.026), ('evaluates', 0.026), ('recall', 0.025), ('cohen', 0.025), ('yates', 0.025), ('unambiguous', 0.025), ('gold', 0.025), ('pantel', 0.024), ('argument', 0.024), ('synonymy', 0.024), ('steve', 0.024), ('creates', 0.024), ('instances', 0.024), ('rr', 0.024), ('views', 0.024), ('record', 0.023), ('cmu', 0.022), ('andruw', 0.022), ('basu', 0.022), ('careerbui', 0.022), ('chri', 0.022), ('clemente', 0.022), ('communi', 0.022), ('corp', 0.022), ('corporat', 0.022), ('fellegi', 0.022), ('franci', 0.022), ('indrajit', 0.022), ('lder', 0.022), ('losman', 0.022), ('lul', 0.022), ('monge', 0.022), ('pradeep', 0.022), ('rtw', 0.022), ('singla', 0.022), ('tsheen', 0.022), ('lise', 0.022), ('polysemous', 0.022), ('labeled', 0.022), ('abbreviation', 0.021), ('type', 0.021), ('blum', 0.021), ('predictions', 0.02), ('male', 0.02), ('extractors', 0.02), ('mining', 0.02), ('domingos', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 334 acl-2011-Which Noun Phrases Denote Which Concepts?

Author: Jayant Krishnamurthy ; Tom Mitchell

2 0.20423689 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation

Author: Tim Van de Cruys ; Marianna Apidianaki

Abstract: In this paper, we present a unified model for the automatic induction of word senses from text, and the subsequent disambiguation of particular word instances using the automatically extracted sense inventory. The induction step and the disambiguation step are based on the same principle: words and contexts are mapped to a limited number of topical dimensions in a latent semantic word space. The intuition is that a particular sense is associated with a particular topic, so that different senses can be discriminated through their association with particular topical dimensions; in a similar vein, a particular instance of a word can be disambiguated by determining its most important topical dimensions. The model is evaluated on the SEMEVAL-20 10 word sense induction and disambiguation task, on which it reaches stateof-the-art results.

3 0.19005263 158 acl-2011-Identification of Domain-Specific Senses in a Machine-Readable Dictionary

Author: Fumiyo Fukumoto ; Yoshimi Suzuki

Abstract: This paper focuses on domain-specific senses and presents a method for assigning category/domain label to each sense of words in a dictionary. The method first identifies each sense of a word in the dictionary to its corresponding category. We used a text classification technique to select appropriate senses for each domain. Then, senses were scored by computing the rank scores. We used Markov Random Walk (MRW) model. The method was tested on English and Japanese resources, WordNet 3.0 and EDR Japanese dictionary. For evaluation of the method, we compared English results with the Subject Field Codes (SFC) resources. We also compared each English and Japanese results to the first sense heuristics in the WSD task. These results suggest that identification of domain-specific senses (IDSS) may actually be of benefit.

4 0.15692212 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics

Author: Christian Rohrdantz ; Annette Hautli ; Thomas Mayer ; Miriam Butt ; Daniel A. Keim ; Frans Plank

Abstract: This paper presents a new approach to detecting and tracking changes in word meaning by visually modeling and representing diachronic development in word contexts. Previous studies have shown that computational models are capable of clustering and disambiguating senses, a more recent trend investigates whether changes in word meaning can be tracked by automatic methods. The aim of our study is to offer a new instrument for investigating the diachronic development of word senses in a way that allows for a better understanding of the nature of semantic change in general. For this purpose we combine techniques from the field of Visual Analytics with unsupervised methods from Natural Language Processing, allowing for an interactive visual exploration of semantic change.

5 0.10905162 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld

Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.

6 0.10126331 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation

7 0.090717211 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

8 0.086483747 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories

9 0.085584357 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation

10 0.077511117 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

11 0.077389397 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

12 0.075107276 167 acl-2011-Improving Dependency Parsing with Semantic Classes

13 0.074036382 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

14 0.071463503 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components

15 0.071219735 76 acl-2011-Comparative News Summarization Using Linear Programming

16 0.071134806 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

17 0.068503559 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment

18 0.067073956 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

19 0.066464193 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

20 0.066196106 150 acl-2011-Hierarchical Text Classification with Latent Concepts

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.148), (1, 0.042), (2, -0.108), (3, -0.001), (4, 0.052), (5, -0.012), (6, 0.088), (7, 0.03), (8, -0.112), (9, -0.043), (10, 0.054), (11, -0.158), (12, 0.124), (13, 0.018), (14, -0.036), (15, -0.157), (16, 0.078), (17, 0.091), (18, -0.037), (19, 0.173), (20, -0.006), (21, -0.018), (22, 0.006), (23, 0.005), (24, -0.035), (25, 0.036), (26, 0.087), (27, -0.053), (28, -0.016), (29, 0.083), (30, -0.061), (31, 0.049), (32, 0.002), (33, 0.028), (34, -0.124), (35, -0.039), (36, -0.01), (37, -0.007), (38, 0.013), (39, 0.032), (40, 0.015), (41, 0.028), (42, 0.052), (43, -0.008), (44, 0.043), (45, -0.026), (46, -0.097), (47, 0.034), (48, 0.06), (49, -0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9491809 334 acl-2011-Which Noun Phrases Denote Which Concepts?

Author: Jayant Krishnamurthy ; Tom Mitchell

2 0.82864642 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics

Author: Christian Rohrdantz ; Annette Hautli ; Thomas Mayer ; Miriam Butt ; Daniel A. Keim ; Frans Plank

3 0.8174125 158 acl-2011-Identification of Domain-Specific Senses in a Machine-Readable Dictionary

Author: Fumiyo Fukumoto ; Yoshimi Suzuki

4 0.79595041 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation

Author: Tim Van de Cruys ; Marianna Apidianaki

5 0.63623762 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation

Author: Dirk Hovy ; Ashish Vaswani ; Stephen Tratz ; David Chiang ; Eduard Hovy

Abstract: We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the first attempt at unsupervised preposition sense disambiguation. Our best accuracy reaches 56%, a significant improvement (at p <.001) of 16% over the most-frequent-sense baseline.

6 0.56628966 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation

7 0.4820196 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

8 0.46545759 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations

9 0.46333945 167 acl-2011-Improving Dependency Parsing with Semantic Classes

10 0.44075236 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis

11 0.43235278 322 acl-2011-Unsupervised Learning of Semantic Relation Composition

12 0.42929676 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

13 0.42567456 319 acl-2011-Unsupervised Decomposition of a Document into Authorial Components

14 0.40154365 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

15 0.40110242 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

16 0.38986343 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

17 0.38825768 150 acl-2011-Hierarchical Text Classification with Latent Concepts

18 0.36769605 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge

19 0.36762702 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment

20 0.3666763 85 acl-2011-Coreference Resolution with World Knowledge

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.016), (17, 0.048), (26, 0.02), (37, 0.471), (39, 0.027), (41, 0.038), (55, 0.035), (59, 0.052), (72, 0.028), (91, 0.036), (96, 0.097), (97, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97396272 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

Author: Kevin Duh ; Akinori Fujino ; Masaaki Nagata

Abstract: Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece, we take a step back and make some general statements about crosslingual adaptation problems. First, we claim that domain mismatch is not caused by MT errors, and accuracy degradation will occur even in the case of perfect MT. Second, we argue that the cross-lingual adaptation problem is qualitatively different from other (monolingual) adaptation problems in NLP; thus new adaptation algorithms ought to be considered. This paper will describe a series of carefullydesigned experiments that led us to these conclusions. 1 Summary Question 1: If MT gave perfect translations (semantically), do we still have a domain adaptation challenge in cross-lingual sentiment classification? Answer: Yes. The reason is that while many lations of a word may be valid, the MT system have a systematic bias. For example, the word some” might be prevalent in English reviews, transmight “awebut in 429 translated reviews, the word “excellent” is generated instead. From the perspective of MT, this translation is correct and preserves sentiment polarity. But from the perspective of a classifier, there is a domain mismatch due to differences in word distributions. Question 2: Can we apply standard adaptation algorithms developed for other (monolingual) adaptation problems to cross-lingual adaptation? Answer: No. It appears that the interaction between target unlabeled data and source data can be rather unexpected in the case of cross-lingual adaptation. We do not know the reason, but our experiments show that the accuracy of adaptation algorithms in cross-lingual scenarios have much higher variance than monolingual scenarios. The goal of this opinion piece is to argue the need to better understand the characteristics of domain adaptation in cross-lingual problems. We invite the reader to disagree with our conclusion (that the true barrier to good performance is not insufficient MT quality, but inappropriate domain adaptation methods). Here we present a series of experiments that led us to this conclusion. First we describe the experiment design (§2) and baselines (§3), before answering Question §12 (§4) dan bda Question 32) (§5). 2 Experiment Design The cross-lingual setup is this: we have labeled data from source domain S and wish to build a sentiment classifier for target domain T. Domain mismatch can arise from language differences (e.g. English vs. translated text) or market differences (e.g. DVD vs. Book reviews). Our experiments will involve fixing Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 429–433, T to a common testset and varying S. This allows us to experiment with different settings for adaptation. We use the Amazon review dataset of Prettenhofer (2010)1 , due to its wide range of languages (English [EN], Japanese [JP], French [FR], German [DE]) and markets (music, DVD, books). Unlike Prettenhofer (2010), we reverse the direction of cross-lingual adaptation and consider English as target. English is not a low-resource language, but this setting allows for more comparisons. Each source dataset has 2000 reviews, equally balanced between positive and negative. The target has 2000 test samples, large unlabeled data (25k, 30k, 50k samples respectively for Music, DVD, and Books), and an additional 2000 labeled data reserved for oracle experiments. Texts in JP, FR, and DE are translated word-by-word into English with Google Translate.2 We perform three sets of experiments, shown in Table 1. Table 2 lists all the results; we will interpret them in the following sections. Target (T) Source (S) 312BDMToVuasbDkil-ecE1N:ExpDMB eorVuimsDkice-JEnPtN,s eBD,MtuoVBDpuoVsk:-iFDck-iERxFN,T DB,vVoMaDruky-sSiDc.E-, 3 How much performance degradation occurs in cross-lingual adaptation? First, we need to quantify the accuracy degradation under different source data, without consideration of domain adaptation methods. So we train a SVM classifier on labeled source data3, and directly apply it on test data. The oracle setting, which has no domain-mismatch (e.g. train on Music-EN, test on Music-EN), achieves an average test accuracy of (81.6 + 80.9 + 80.0)/3 = 80.8%4. Aver1http://www.webis.de/research/corpora/webis-cls-10 2This is done by querying foreign words to build a bilingual dictionary. The words are converted to tfidf unigram features. 3For all methods we try here, 5% of the 2000 labeled source samples are held-out for parameter tuning. 4See column EN of Table 2, Supervised SVM results. 430 age cross-lingual accuracies are: 69.4% (JP), 75.6% (FR), 77.0% (DE), so degradations compared to oracle are: -11% (JP), -5% (FR), -4% (DE).5 Crossmarket degradations are around -6%6. Observation 1: Degradations due to market and language mismatch are comparable in several cases (e.g. MUSIC-DE and DVD-EN perform similarly for target MUSIC-EN). Observation 2: The ranking of source language by decreasing accuracy is DE > FR > JP. Does this mean JP-EN is a more difficult language pair for MT? The next section will show that this is not necessarily the case. Certainly, the domain mismatch for JP is larger than DE, but this could be due to phenomenon other than MT errors. 4 Where exactly is the domain mismatch? 4.1 Theory of Domain Adaptation We analyze domain adaptation by the concepts of labeling and instance mismatch (Jiang and Zhai, 2007). Let pt(x, y) = pt (y|x)pt (x) be the target distribution of samples x (e.g. unigram feature vec- tor) and labels y (positive / negative). Let ps (x, y) = ps (y|x)ps (x) be the corresponding source distributio(ny. Wx)pe assume that one (or both) of the following distributions differ between source and target: • Instance mismatch: ps (x) pt (x). • Labeling mismatch: ps (y|x) pt(y|x). Instance mismatch implies that the input feature vectors have different distribution (e.g. one dataset uses the word “excellent” often, while the other uses the word “awesome”). This degrades performance because classifiers trained on “excellent” might not know how to classify texts with the word “awesome.” The solution is to tie together these features (Blitzer et al., 2006) or re-weight the input distribution (Sugiyama et al., 2008). Under some assumptions (i.e. covariate shift), oracle accuracy can be achieved theoretically (Shimodaira, 2000). Labeling mismatch implies the same input has different labels in different domains. For example, the JP word meaning “excellent” may be mistranslated as “bad” in English. Then, positive JP = = 5See “Adapt by Language” columns of Table 2. Note JP+FR+DE condition has 6000 labeled samples, so is not directly comparable to other adaptation scenarios (2000 samples). Nevertheless, mixing languages seem to give good results. 6See “Adapt by Market” columns of Table 2. TargetClassifierOEraNcleJPAFdaRpt bDyE LanJgPu+agFeR+DEMUASdIaCpt D byV MDar BkeOtOK MUSIC-ENSAudpaeprtvedise TdS SVVMM8719..666783..50 7745..62 7 776..937880..36--7768..847745..16 DVD-ENSAudpaeprtveidse TdS SVVMM8801..907701..14 7765..54 7 767..347789..477754..28--7746..57 BOOK-ENSAudpaeprtveidse TdS SVVMM8801..026793..68 7775..64 7 767..747799..957735..417767..24-Table 2: Test accuracies (%) for English Music/DVD/Book reviews. Each column is an adaptation scenario using different source data. The source data may vary by language or by market. For example, the first row shows that for the target of Music-EN, the accuracy of a SVM trained on translated JP reviews (in the same market) is 68.5, while the accuracy of a SVM trained on DVD reviews (in the same language) is 76.8. “Oracle” indicates training on the same market and same language domain as the target. “JP+FR+DE” indicates the concatenation of JP, FR, DE as source data. Boldface shows the winner of Supervised vs. Adapted. reviews ps (y will be associated = +1|x = bad) co(nydit =io +na1l − |x = 1 will be high, whereas the true xdis =tr bibaudti)o wn bad) instead. labeling mismatch, with the word “bad”: lslh boeu hldi hha,v we high pt(y = There are several cases for depending on sheovwe tahle c polarity changes (Table 3). The solution is to filter out these noisy samples (Jiang and Zhai, 2007) or optimize loosely-linked objectives through shared parameters or Bayesian priors (Finkel and Manning, 2009). Which mismatch is responsible for accuracy degradations in cross-lingual adaptation? • Instance mismatch: Systematic Iantessta nwcoerd m diissmtraibtcuhti:on Ssy MT bias gener- sdtiefmferaetinct MfroTm b naturally- occurring English. (Translation may be valid.) Label mismatch: MT error mis-translates a word iLnatob something w: MithT Td eifrfreorren mti polarity. Conclusion from §4.2 and §4.3: Instance mismaCtcohn occurs often; M §4T. error appears Imnisntainmcael. • Mis-translated polarity Effect Taeb0+±.lge→ .3(:±“ 0−tgLhoae b”nd →l m− i“sg→m otbah+dce”h):mIfpoLAinse ca-ptsoriuaesncvieatl /ndioeansgbvcaewrptlimovaeshipntdvaei(+), negative (−), or neutral (0) words have different effects. Wnege athtiivnek ( −th)e, foirrs nt tuwtroa cases hoardves graceful degradation, but the third case may be catastrophic. 431 4.2 Analysis of Instance Mismatch To measure instance mismatch, we compute statistics between ps (x) and pt(x), or approximations thereof: First, we calculate a (normalized) average feature from all samples of source S, which represents the unigram distribution of MT output. Simi- larly, the average feature vector for target T approximates the unigram distribution of English reviews pt(x). Then we measure: • KL Divergence between Avg(S) and Avg(T), wKhLer De Avg() nisc eth bee average Avvegct(oSr.) • Set Coverage of Avg(T) on Avg(S): how many Sweotrd C (type) ien o Tf appears oatn le Aavsgt once ionw wS .m Both measures correlate strongly with final accuracy, as seen in Figure 1. The correlation coefficients are r = −0.78 for KL Divergence and r = 0.71 for Coverage, 0 b.7o8th statistically significant (p < 0.05). This implies that instance mismatch is an important reason for the degradations seen in Section 3.7 4.3 Analysis of Labeling Mismatch We measure labeling mismatch by looking at differences in the weight vectors of oracle SVM and adapted SVM. Intuitively, if a feature has positive weight in the oracle SVM, but negative weight in the adapted SVM, then it is likely a MT mis-translation 7The observant reader may notice that cross-market points exhibit higher coverage but equal accuracy (74-78%) to some cross-lingual points. This suggests that MT output may be more constrained in vocabulary than naturally-occurring English. 0.35 0.3 gnvLrDeiceKe0 0 0. 120.25 510 erts TeCovega0 0 0. .98657 68 70 72 7A4ccuracy76 78 80 82 0.4 68 70 72 7A4ccuracy76 78 80 82 Figure 1: KL Divergence and Coverage vs. accuracy. (o) are cross-lingual and (x) are cross-market data points. is causing the polarity flip. Algorithm 1 (with K=2000) shows how we compute polarity flip rate.8 We found that the polarity flip rate does not correlate well with accuracy at all (r = 0.04). Conclusion: Labeling mismatch is not a factor in performance degradation. Nevertheless, we note there is a surprising large number of flips (24% on average). A manual check of the flipped words in BOOK-JP revealed few MT mistakes. Only 3.7% of 450 random EN-JP word pairs checked can be judged as blatantly incorrect (without sentence context). The majority of flipped words do not have a clear sentiment orientation (e.g. “amazon”, “human”, “moreover”). 5 Are standard adaptation algorithms applicable to cross-lingual problems? One of the breakthroughs in cross-lingual text classification is the realization that it can be cast as domain adaptation. This makes available a host of preexisting adaptation algorithms for improving over supervised results. However, we argue that it may be 8The feature normalization in Step 1 is important that the weight magnitudes are comparable. to ensure 432 Algorithm 1 Measuring labeling mismatch Input: Weight vectors for source wsand target wt Input: Target data average sample vector avg(T) Output: Polarity flip rate f 1: Normalize: ws = avg(T) * ws ; wt = avg(T) * wt 2: Set S+ = { K most positive features in ws} 3: Set S− == {{ KK mmoosstt negative ffeeaattuurreess inn wws}} 4: Set T+ == {{ KK m moosstt npoesgiatitivvee f efeaatuturreess i inn w wt}} 5: Set T− == {{ KK mmoosstt negative ffeeaattuurreess inn wwt}} 6: for each= f{e a Ktur me io ∈t T+ adtiov 7: rif e ia c∈h S fe−a ttuhreen i if ∈ = T f + 1 8: enidf fio ∈r 9: for each feature j ∈ T− do 10: rif e j ∈h Sfe+a uthreen j f ∈ = T f + 1 11: enidf fjo r∈ 12: f = 2Kf better to “adapt” the standard adaptation algorithm to the cross-lingual setting. We arrived at this conclusion by trying the adapted counterpart of SVMs off-the-shelf. Recently, (Bergamo and Torresani, 2010) showed that Transductive SVMs (TSVM), originally developed for semi-supervised learning, are also strong adaptation methods. The idea is to train on source data like a SVM, but encourage the classification boundary to divide through low density regions in the unlabeled target data. Table 2 shows that TSVM outperforms SVM in all but one case for cross-market adaptation, but gives mixed results for cross-lingual adaptation. This is a puzzling result considering that both use the same unlabeled data. Why does TSVM exhibit such a large variance on cross-lingual problems, but not on cross-market problems? Is unlabeled target data interacting with source data in some unexpected way? Certainly there are several successful studies (Wan, 2009; Wei and Pal, 2010; Banea et al., 2008), but we think it is important to consider the possibility that cross-lingual adaptation has some fundamental differences. We conjecture that adapting from artificially-generated text (e.g. MT output) is a different story than adapting from naturallyoccurring text (e.g. cross-market). In short, MT is ripe for cross-lingual adaptation; what is not ripe is probably our understanding of the special characteristics of the adaptation problem. References Carmen Banea, Rada Mihalcea, Janyce Wiebe, and Samer Hassan. 2008. Multilingual subjectivity analysis using machine translation. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Alessandro Bergamo and Lorenzo Torresani. 2010. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS). John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Jenny Rose Finkel and Chris Manning. 2009. Hierarchical Bayesian domain adaptation. In Proc. of NAACL Human Language Technologies (HLT). Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proc. of the Association for Computational Linguistics (ACL). Peter Prettenhofer and Benno Stein. 2010. Crosslanguage text classification using structural correspondence learning. In Proc. of the Association for Computational Linguistics (ACL). Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inferenc, 90. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B ¨unau, and Motoaki Kawanabe. 2008. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4). Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proc. of the Association for Computational Linguistics (ACL). Bin Wei and Chris Pal. 2010. Cross lingual adaptation: an experiment on sentiment classification. In Proceedings of the ACL 2010 Conference Short Papers. 433

2 0.94436091 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

Author: Roy Schwartz ; Omri Abend ; Roi Reichart ; Ari Rappoport

Abstract: Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a), a small set of parameters can be found whose modification yields a significant improvement in standard evaluation measures. These parameters correspond to local cases where no linguistic consensus exists as to the proper gold annotation. Therefore, the standard evaluation does not provide a true indication of algorithm quality. We present a new measure, Neutral Edge Direction (NED), and show that it greatly reduces this undesired phenomenon.

3 0.94221407 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai

Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.

4 0.9411996 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.

5 0.93086588 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

Author: Bing Xiang ; Abraham Ittycheriah

Abstract: In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination ofmultiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways, such as based on feature types, word alignments, or domains, for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task.

same-paper 6 0.92722613 334 acl-2011-Which Noun Phrases Denote Which Concepts?

7 0.92367506 122 acl-2011-Event Extraction as Dependency Parsing

8 0.92250609 204 acl-2011-Learning Word Vectors for Sentiment Analysis

9 0.87401819 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

10 0.8173424 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

11 0.81665587 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

12 0.81477505 256 acl-2011-Query Weighting for Ranking Model Adaptation

13 0.81173319 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

14 0.80707908 85 acl-2011-Coreference Resolution with World Knowledge

15 0.80420703 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

16 0.78486478 292 acl-2011-Target-dependent Twitter Sentiment Classification

17 0.78334463 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

18 0.78187084 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

19 0.78161252 199 acl-2011-Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning

20 0.7738775 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing