acl acl2010 acl2010-27 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Vickrey ; Oscar Kipersztok ; Daphne Koller
Abstract: We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. In particular, our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.). We evaluate on a hand-labeled evaluation set; our system improves over a strong baseline by 36%.
Reference: text
sentIndex sentText sentNum sentScore
1 The user begins by specifying one or more seed words. [sent-8, score-0.56]
2 The system then iteratively suggests a series of candidate words, which the user can either accept or reject. [sent-9, score-0.256]
3 Current techniques for this task typically bootstrap a classifier based on a fixed seed set. [sent-10, score-0.459]
4 In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. [sent-11, score-0.372]
5 Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc. [sent-13, score-0.709]
6 1 Introduction Set expansion is a well-studied NLP problem where a machine-learning algorithm is given a fixed set of seed words and asked to find additional members of the implied set. [sent-16, score-0.669]
7 For example, given the seed set {“elephant,” “horse,” “bat”}, the algorithm di ss expected htaon n rte,”tur “nh oortshee,”r m “bamat”m},al tsh. [sent-17, score-0.424]
8 “US Presidents,” particularly when given a large seed set. [sent-24, score-0.424]
9 Set expansions is more difficult with fewer seed words and for other kinds of sets. [sent-25, score-0.567]
10 The seed words may have multiple senses and the user may have in mind a variety of attributes that the answer must match. [sent-26, score-0.629]
11 We propose a system which addresses several shortcomings of many set expansion systems. [sent-30, score-0.228]
12 (2009), non-expert users produce seed sets that lead to poor quality expansions, for a variety of reasons including ambiguity and lack of coverage. [sent-33, score-0.424]
13 Even for expert users, constructing seed sets can be a laborious and timeconsuming process. [sent-34, score-0.424]
14 Second, most set expansion systems do not use negative examples, which can be very useful for weeding out other bad answers. [sent-35, score-0.263]
15 Third, many set expansion systems concentrate on noun classes such as “US Presidents” and are not effective or do not apply to other kinds of sets. [sent-36, score-0.209]
16 The user initially thinks of at least one seed word belonging to the desired set. [sent-38, score-0.629]
17 One at a time, the system presents candidate words to the user and asks whether the can- didate fits the concept. [sent-39, score-0.325]
18 Our system uses both positive and negative examples to guide the search, allowing it to recover from initially poor seed words. [sent-42, score-0.656]
19 By using multiple sources of similarity data, our system captures a variety of kinds of similarity. [sent-43, score-0.436]
20 Our system replaces the potentially difficult problem of thinking of many seed words with the easier task of answering yes/no questions. [sent-44, score-0.541]
21 The downside is a possibly increased amount of user interaction (although standard set expansion requires a non-trivial amount of user interaction to build the seed set). [sent-45, score-0.871]
22 ” Another interesting direction not pursued in this paper is using our system as part of a more-traditional set expansion system to build seed sets more quickly. [sent-56, score-0.705]
23 2 Set Expansion As input, we are provided with a small set of seed words s. [sent-57, score-0.459]
24 A particular seed set s can belong to many possible goal sets G, so additional information may be required to do well. [sent-59, score-0.424]
25 (2009) discusses the issue of seed set size in detail, concluding that 5-20 seed words are often required for good performance. [sent-63, score-0.883]
26 There are several problems with the fixed seed set approach. [sent-64, score-0.459]
27 It is not always easy to think of even a single additional seed word (e. [sent-65, score-0.499]
28 Even if the user can think of additional seed words, time and effort might be saved by using active learning to find good suggestions. [sent-68, score-0.784]
29 (2009) show, nonexpert users often produce poor-quality seed sets. [sent-70, score-0.424]
30 3 Active Learning System Any system for this task relies on information about similarity between words. [sent-71, score-0.331]
31 Each row corresponds to a unique dimension of similarity; the jth entry in row imij is a number between 0 and 1indicating the degree to which wj belongs to the ith similarity group. [sent-74, score-0.905]
32 Possible similarity dimensions include “How similar is word wj to the verb jump? [sent-75, score-0.483]
33 ” and “Are the words which appear in the context of wj similar to those that appear in the context of boat? [sent-77, score-0.206]
34 This may follow intuitively from the similarity axis (e. [sent-79, score-0.278]
35 Thus, θi should be large and positive if row ihas large entries for positive but not negative examples; and it should be large and negative if row ihas large entries for negative but not positive examples. [sent-85, score-1.025]
36 A natural way to generate a score zj for column j is toP take the dot product of θ with column j, zj = Pi θimij. [sent-90, score-0.273]
37 This rewards word wj for having high mPembership in rows with positive θ, and low membPership in rows with negative θ. [sent-91, score-0.42]
38 Our system uses a “batch” approach to active learning. [sent-92, score-0.236]
39 At iteration i, it chooses a new θ based on all data labeled so far (for the 1st iteration, this data consists of the seed set s). [sent-93, score-0.542]
40 It then chooses the column (word) with the highest score (among words not yet labeled) as the candidate word wi. [sent-94, score-0.207]
41 The user answers “Yes” or “No,” indicating whether or not wi belongs to G. [sent-95, score-0.224]
42 wi is added to the positive set p or the negative set n based on the user’s answer. [sent-96, score-0.194]
43 Thus, we have a labeled data set that grows from iteration to iteration as the user labels each candidate word. [sent-97, score-0.385]
44 Recall that each row iis associated with a label li. [sent-100, score-0.218]
45 We refe=r to − t1his if m let∈hod n as n“dU θntrained”, although it is still adaptive it takes into account the labeled examples the user has provided so far. [sent-102, score-0.226]
46 However, zj is passed through the logistic function to produce a score between 0 and — 1, zj0 = 1+e1−zj. [sent-105, score-0.255]
47 We can interpret this score as the probability that wj is a positive example, Pθ (Y |wj). [sent-106, score-0.261]
48 4 Data Sources We consider three similarity data sources: the Moby thesaurus1 , WordNet (Fellbaum, 1998), and distributional similarity based on a large corpus of text (Lin, 1998). [sent-122, score-0.607]
49 These sources capture different kinds of similarity information, which increases the representational power of our system. [sent-124, score-0.383]
50 For all sources, the similarity of a word with itself is set to 1. [sent-125, score-0.312]
51 For example, if we have a list of luxury items, and another list of cars, our system can learn weights so that it prefers items in the intersection, luxury cars. [sent-128, score-0.283]
52 Moby thesaurus consists of a list of wordbased thesaurus entries. [sent-129, score-0.331]
53 In the raw format, the similarity relation is not symmetric; for example, there are many words that occur only in similarity lists but do not have their own entries. [sent-135, score-0.621]
54 We augmented the thesaurus to make it symmetric: if “dog” is in the similarity entry for “cat,” we add “cat” to the similarity entry for “dog” (creating an entry for “dog” if it does not exist yet). [sent-136, score-0.951]
55 We then have a row ifor every similarity entry in the augmented thesaurus; mij is 1 if wj appears in the similarity list of wi, and 0 otherwise. [sent-137, score-1.058]
56 com), the entries are not broken down by word sense or part of speech. [sent-140, score-0.202]
57 We focused on measuring similarity in WordNet using the hypernym hierarchy. [sent-145, score-0.278]
58 The number of types of similarity in WordNet tends to be less than that captured by Moby, because synsets in WordNet are (usually) only allowed to have a single parent. [sent-152, score-0.329]
59 We handle this by adding one row for every word sense with the right part of speech (rather than for every word); each row measures the similarity of every word to a particular word sense. [sent-155, score-0.736]
60 The label of each row is the (undisambiguated) word; multiple rows can have the same label. [sent-156, score-0.22]
61 For example, to determine how similar (the only sense of) “factory” is to the word “plant,” we compute the similarity of “factory” to the “industrial plant” sense of “plant” and to the “living thing” sense of “plant” and take the higher of the two (in this case, the former). [sent-158, score-0.456]
62 1, yielding a final similarity of This greatly sparsified the similarity matrix M. [sent-164, score-0.585]
63 Like Moby, similarity entries are not divided by word sense; usually, only the dominant sense of each word is represented. [sent-173, score-0.483]
64 This type of similarity is considerably different from the other two types, tending to focus less on minor details and more on broad patterns. [sent-174, score-0.278]
65 Each similarity entry corresponds to a single 373 word wi and is a list of scored similar words simji. [sent-175, score-0.548]
66 The scores vary between 0 and 1, but usually the highest-scored word in a similarity list gets a score of no more than 0. [sent-176, score-0.415]
67 Since each row is normalized individually, the similarity matrix M is not symmetric. [sent-179, score-0.461]
68 Also, there are separate similarity lists for each of nouns, verbs, and modifiers; we only used the lists matching the seed word’s part of speech. [sent-180, score-0.762]
69 5 Experimental Setup Given a seed set s and a complete target set G, it is easy to evaluate our system; we say “Yes” to anything in G, “No” to everything else, and see how many of the candidate words are in G. [sent-181, score-0.526]
70 To evaluate a particular active learning algorithm, we can just run the algorithm manually, and see how many candidate words we say “Yes” to (note that this will not give us an accurate estimate of the recall of our algorithm). [sent-183, score-0.285]
71 At each step, we pick a random algorithm and either present its current candidate to the user or, if that candidate has already been labeled, we supply that algorithm with the given answer. [sent-190, score-0.27]
72 To evaluate the relative contribution of active learning, we consider a version of our system where active learning is disabled. [sent-193, score-0.419]
73 Instead of retraining the system every iteration, we train it once on the seed set s and keep the weight vector θ fixed from iteration to iteration. [sent-194, score-0.576]
74 Thus, logistic regression using Moby and no active learning is L(M,-). [sent-199, score-0.385]
75 The entries in this thesaurus are similar to Moby, except that each word may have multiple sense-disambiguated entries. [sent-203, score-0.269]
76 For each seed word w, we downloaded the page for w and extracted a set of synonyms entries for that word. [sent-204, score-0.613]
77 Table 2 shows each category, with examples of specific similarity queries. [sent-209, score-0.314]
78 For each query, the first author built the seed set by writing down the first three words that came to mind. [sent-211, score-0.459]
79 However, for the similarity type Hard Synonyms, coming up with more than one seed word was considerably more difficult. [sent-213, score-0.736]
80 To build seed sets for these queries, we ran our evaluation system using a single seed word and took the first two positive candidates; this ensured that we were not biasing our seed set in favor of a particular algorithm or data set. [sent-214, score-1.443]
81 For each query, we ran our evaluation system until each algorithm had suggested 25 candidate words, for a total of 625 labeled words per algorithm. [sent-215, score-0.209]
82 Com performs poorly overall; our best system, L(MWD,+), outscores it by 164%. [sent-221, score-0.188]
83 The next group of al374 gorithms, U(*,-), add together the similarity entries of the seed words for a particular similarity source. [sent-222, score-1.104]
84 The best of these uses distributional similarity; L(MWD,+) outscores it by 53%. [sent-223, score-0.209]
85 Combining all similarity types, U(MWD,-) improves by 10% over U(D,-). [sent-224, score-0.319]
86 Using logistic regression instead of the un- trained weights significantly improves performance. [sent-226, score-0.243]
87 Using active learning also significantly improves performance: L(MWD,+) outscores L(MWD,-) by 13%. [sent-228, score-0.382]
88 This shows that active learning is useful even when a reasonable amount of initial information is available (three seed words for each test case). [sent-229, score-0.642]
89 The gains from logistic regression and active learning are cumulative; L(MWD,+) outscores U(MWD,-) by 38%. [sent-230, score-0.543]
90 Finally, our best system, L(MWD,+) improves over L(D,-), the best system using a single data source and no active learning, by 36%. [sent-231, score-0.277]
91 We consider L(D,-) to be a strong baseline; this comparison demonstrates the usefulness of the main contributions of this paper, the use of multiple data sources and active learning. [sent-232, score-0.254]
92 L(D,-) is still fairly sophisticated, since it combines information from the similarity entries for different words. [sent-233, score-0.4]
93 For this chart, we chose the best setting for each similarity type. [sent-235, score-0.278]
94 7 Meronyms were difficult Discussion and Related Work The biggest difference between our system and previous work is the use of active learning, especially in allowing the use of negative examples. [sent-238, score-0.353]
95 Most previous set expansion systems use bootstrapping from a small set of positive examples. [sent-239, score-0.23]
96 Recently, the use of negative examples for set expansion was proposed by Vyas and Pantel (2009), although in a different way. [sent-240, score-0.299]
97 First, set expansion is run as normal using a fixed seed set. [sent-241, score-0.634]
98 Also, we use a logistic regression model to robustly incorporate negative information, rather than deterministically ruling out words and features. [sent-244, score-0.325]
99 Semantic similarity based on corpus statistics and lexical taxonomy. [sent-273, score-0.278]
100 Helping editors choose better seed sets for entity expansion. [sent-319, score-0.424]
wordName wordTfidf (topN-words)
[('seed', 0.424), ('mwd', 0.284), ('similarity', 0.278), ('moby', 0.253), ('vyas', 0.189), ('active', 0.183), ('expansion', 0.175), ('wj', 0.171), ('outscores', 0.158), ('row', 0.154), ('thesaurus', 0.146), ('logistic', 0.137), ('user', 0.136), ('wordnet', 0.104), ('cat', 0.102), ('plant', 0.101), ('entries', 0.089), ('negative', 0.088), ('zj', 0.083), ('entry', 0.083), ('pantel', 0.082), ('dog', 0.079), ('yes', 0.076), ('luxury', 0.076), ('thesauri', 0.076), ('sources', 0.071), ('candidate', 0.067), ('synonyms', 0.066), ('regression', 0.065), ('cars', 0.064), ('iteration', 0.064), ('jump', 0.063), ('wyj', 0.063), ('boeing', 0.055), ('anford', 0.055), ('factory', 0.055), ('hughes', 0.055), ('ihas', 0.055), ('mij', 0.055), ('presidents', 0.055), ('ramage', 0.055), ('positive', 0.055), ('labeled', 0.054), ('system', 0.053), ('synsets', 0.051), ('distributional', 0.051), ('wi', 0.051), ('heller', 0.051), ('safety', 0.051), ('sense', 0.048), ('suppose', 0.048), ('boat', 0.047), ('conrath', 0.047), ('query', 0.047), ('expansions', 0.045), ('crestan', 0.043), ('actors', 0.043), ('durme', 0.043), ('nocedal', 0.043), ('cohen', 0.041), ('think', 0.041), ('improves', 0.041), ('list', 0.039), ('pasca', 0.038), ('roark', 0.037), ('belongs', 0.037), ('column', 0.036), ('examples', 0.036), ('snow', 0.036), ('rows', 0.036), ('desired', 0.035), ('words', 0.035), ('score', 0.035), ('regularization', 0.035), ('fixed', 0.035), ('senses', 0.034), ('word', 0.034), ('asks', 0.034), ('iis', 0.034), ('kinds', 0.034), ('ghahramani', 0.033), ('fairly', 0.033), ('choosing', 0.033), ('car', 0.032), ('broken', 0.031), ('poorly', 0.03), ('label', 0.03), ('lists', 0.03), ('matrix', 0.029), ('symmetric', 0.029), ('groups', 0.029), ('difficult', 0.029), ('favor', 0.029), ('scores', 0.029), ('jiang', 0.028), ('corresponds', 0.028), ('languageindependent', 0.028), ('nonexperts', 0.028), ('aurus', 0.028), ('clu', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 27 acl-2010-An Active Learning Approach to Finding Related Terms
Author: David Vickrey ; Oscar Kipersztok ; Daphne Koller
Abstract: We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. In particular, our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.). We evaluate on a hand-labeled evaluation set; our system improves over a strong baseline by 36%.
2 0.23119146 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
Author: Yabin Zheng ; Zhiyuan Liu ; Lixing Xie
Abstract: Motivated by Google Sets, we study the problem of growing related words from a single seed word by leveraging user behaviors hiding in user records of Chinese input method. Our proposed method is motivated by the observation that the more frequently two words cooccur in user records, the more related they are. First, we utilize user behaviors to generate candidate words. Then, we utilize search engine to enrich candidate words with adequate semantic features. Finally, we reorder candidate words according to their semantic relatedness to the seed word. Experimental results on a Chinese input method dataset show that our method gains better performance. 1
3 0.21691072 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
Author: Xiao-Li Li ; Lei Zhang ; Bing Liu ; See-Kiong Ng
Abstract: Distributional similarity is a classic technique for entity set expansion, where the system is given a set of seed entities of a particular class, and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning (PU learning) can model the set expansion problem better. Based on the test results of 10 corpora, we show that a PU learning technique outperformed distributional similarity significantly. 1
4 0.14745595 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
Author: Boxing Chen ; George Foster ; Roland Kuhn
Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1
5 0.14098807 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
Author: Ruihong Huang ; Ellen Riloff
Abstract: This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words. The learning process begins by inducing a classifier that only has access to contextual features, forcing it to generalize beyond the seeds. The contextual classifier then labels new instances, to expand and diversify the training set. Next, a cross-category bootstrapping process simultaneously trains a suite of classifiers for multiple semantic classes. The positive instances for one class are used as negative instances for the others in an iterative bootstrapping cycle. We also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynam- ically create semantic features. We evaluate our approach by inducing six semantic taggers from a collection of veterinary medicine message board posts.
6 0.11945456 141 acl-2010-Identifying Text Polarity Using Random Walks
7 0.11268185 210 acl-2010-Sentiment Translation through Lexicon Induction
8 0.10164469 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
9 0.10015154 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
10 0.094904363 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
11 0.093546204 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
12 0.087821908 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -
13 0.086407423 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
14 0.083689034 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
15 0.08278852 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
16 0.080463044 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
17 0.078225829 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
18 0.07781969 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
19 0.077069722 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
20 0.075802445 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
topicId topicWeight
[(0, -0.217), (1, 0.1), (2, -0.078), (3, 0.008), (4, 0.139), (5, -0.056), (6, 0.022), (7, 0.083), (8, -0.053), (9, 0.06), (10, -0.124), (11, 0.062), (12, -0.036), (13, -0.135), (14, 0.051), (15, 0.034), (16, 0.105), (17, -0.149), (18, -0.186), (19, 0.001), (20, 0.036), (21, -0.04), (22, 0.006), (23, 0.176), (24, 0.004), (25, -0.08), (26, -0.105), (27, 0.025), (28, 0.21), (29, 0.144), (30, 0.011), (31, 0.027), (32, -0.103), (33, -0.135), (34, -0.023), (35, -0.091), (36, 0.043), (37, -0.024), (38, 0.011), (39, -0.014), (40, 0.001), (41, 0.029), (42, -0.013), (43, 0.074), (44, -0.146), (45, -0.023), (46, -0.027), (47, 0.048), (48, 0.037), (49, -0.046)]
simIndex simValue paperId paperTitle
same-paper 1 0.96885735 27 acl-2010-An Active Learning Approach to Finding Related Terms
Author: David Vickrey ; Oscar Kipersztok ; Daphne Koller
Abstract: We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. In particular, our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.). We evaluate on a hand-labeled evaluation set; our system improves over a strong baseline by 36%.
2 0.82842326 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
Author: Yabin Zheng ; Zhiyuan Liu ; Lixing Xie
Abstract: Motivated by Google Sets, we study the problem of growing related words from a single seed word by leveraging user behaviors hiding in user records of Chinese input method. Our proposed method is motivated by the observation that the more frequently two words cooccur in user records, the more related they are. First, we utilize user behaviors to generate candidate words. Then, we utilize search engine to enrich candidate words with adequate semantic features. Finally, we reorder candidate words according to their semantic relatedness to the seed word. Experimental results on a Chinese input method dataset show that our method gains better performance. 1
3 0.76256448 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
Author: Xiao-Li Li ; Lei Zhang ; Bing Liu ; See-Kiong Ng
Abstract: Distributional similarity is a classic technique for entity set expansion, where the system is given a set of seed entities of a particular class, and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning (PU learning) can model the set expansion problem better. Based on the test results of 10 corpora, we show that a PU learning technique outperformed distributional similarity significantly. 1
4 0.65607131 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities
Author: Jun'ichi Kazama ; Stijn De Saeger ; Kow Kuroda ; Masaki Murata ; Kentaro Torisawa
Abstract: Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution. When the context profiles are multinomial distributions, the priors are Dirichlet, and the base measure is . the Bhattacharyya coefficient, we can derive an analytical form that allows efficient calculation. For the task of word similarity estimation using a large amount of Web data in Japanese, we show that the proposed measure gives better accuracies than other well-known similarity measures.
5 0.59996849 141 acl-2010-Identifying Text Polarity Using Random Walks
Author: Ahmed Hassan ; Dragomir Radev
Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.
6 0.58789301 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures
7 0.53900862 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
8 0.48327696 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
9 0.47231841 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.
10 0.47066855 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
11 0.45422435 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
12 0.43666616 210 acl-2010-Sentiment Translation through Lexicon Induction
13 0.42919961 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
14 0.39130861 63 acl-2010-Comparable Entity Mining from Comparative Questions
15 0.3887291 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
16 0.38685685 257 acl-2010-WSD as a Distributed Constraint Optimization Problem
17 0.38407075 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
18 0.37969065 204 acl-2010-Recommendation in Internet Forums and Blogs
19 0.3738966 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
20 0.36925206 127 acl-2010-Global Learning of Focused Entailment Graphs
topicId topicWeight
[(25, 0.054), (42, 0.025), (59, 0.074), (72, 0.011), (73, 0.035), (78, 0.02), (83, 0.065), (84, 0.027), (98, 0.609)]
simIndex simValue paperId paperTitle
1 0.9967615 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
Author: Yabin Zheng ; Zhiyuan Liu ; Lixing Xie
Abstract: Motivated by Google Sets, we study the problem of growing related words from a single seed word by leveraging user behaviors hiding in user records of Chinese input method. Our proposed method is motivated by the observation that the more frequently two words cooccur in user records, the more related they are. First, we utilize user behaviors to generate candidate words. Then, we utilize search engine to enrich candidate words with adequate semantic features. Finally, we reorder candidate words according to their semantic relatedness to the seed word. Experimental results on a Chinese input method dataset show that our method gains better performance. 1
Author: Reyyan Yeniterzi ; Kemal Oflazer
Abstract: We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. We incrementally explore capturing various syntactic substructures as complex tags on the English side, and evaluate how our translations improve in BLEU scores. Our maximal set of source and target side transformations, coupled with some additional techniques, provide an 39% relative improvement from a baseline 17.08 to 23.78 BLEU, all averaged over 10 training and test sets. Now that the syntactic analysis on the English side is available, we also experiment with more long distance constituent reordering to bring the English constituent order close to Turkish, but find that these transformations do not provide any additional consistent tangible gains when averaged over the 10 sets.
3 0.99521399 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -
Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii
Abstract: Nivre’s method was improved by enhancing deterministic dependency parsing through application of a tree-based model. The model considers all words necessary for selection of parsing actions by including words in the form of trees. It chooses the most probable head candidate from among the trees and uses this candidate to select a parsing action. In an evaluation experiment using the Penn Treebank (WSJ section), the proposed model achieved higher accuracy than did previous deterministic models. Although the proposed model’s worst-case time complexity is O(n2), the experimental results demonstrated an average pars- ing time not much slower than O(n).
same-paper 4 0.99406475 27 acl-2010-An Active Learning Approach to Finding Related Terms
Author: David Vickrey ; Oscar Kipersztok ; Daphne Koller
Abstract: We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. In particular, our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.). We evaluate on a hand-labeled evaluation set; our system improves over a strong baseline by 36%.
5 0.98776793 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
Author: Xiangyu Duan ; Min Zhang ; Haizhou Li
Abstract: The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. 1
6 0.98712349 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization
7 0.97832334 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
8 0.95858628 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing
9 0.95667291 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
10 0.94436318 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
11 0.93351746 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
12 0.90144408 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
13 0.89900553 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
14 0.89353395 79 acl-2010-Cross-Lingual Latent Topic Extraction
15 0.8924911 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
16 0.87765408 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization
17 0.87728941 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
18 0.8722865 262 acl-2010-Word Alignment with Synonym Regularization
19 0.87063867 133 acl-2010-Hierarchical Search for Word Alignment
20 0.86569852 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features