acl acl2012 acl2012-216 knowledge-graph by maker-knowledge-mining

216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time


Source: pdf

Author: Rada Mihalcea ; Vivi Nastase

Abstract: In this paper we introduce the novel task of “word epoch disambiguation,” defined as the problem of identifying changes in word usage over time. Through experiments run using word usage examples collected from three major periods of time (1800, 1900, 2000), we show that the task is feasible, and significant differences can be observed between occurrences of words in different periods of time.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper we introduce the novel task of “word epoch disambiguation,” defined as the problem of identifying changes in word usage over time. [sent-3, score-0.902]

2 Through experiments run using word usage examples collected from three major periods of time (1800, 1900, 2000), we show that the task is feasible, and significant differences can be observed between occurrences of words in different periods of time. [sent-4, score-0.603]

3 Language is continually changing: we discard or coin new senses for old words; metaphoric and metonymic usages become so engrained that at some point they are considered literal; and we constantly add new words to our vocabulary. [sent-7, score-0.197]

4 The purpose of the current work is to look at language as an evolutionary phenomenon, which we can investigate and analyze and use when working with text collections that span a wide time frame. [sent-8, score-0.153]

5 1 This has changed thanks to the Google books and Google Ngrams historical projects. [sent-10, score-0.181]

6 They make available in electronic format a large amount of textual data starting from the 17th century, as well as statistics on word usage. [sent-11, score-0.058]

7 We will exploit this data to find differences in word usage across wide periods of time. [sent-12, score-0.275]

8 1While the Brown corpus does include documents from different years, it is far from the scale and time range of Google books. [sent-13, score-0.061]

9 Vivi Nastase Institute for Computational Linguistics University of Heidelberg nastase @ cl . [sent-14, score-0.052]

10 de The phenomena involved in language change are numerous, and for now we focus on word usage in different time epochs. [sent-16, score-0.362]

11 As an example, the word gay, currently most frequently used to refer to a sexual orientation, was in the previous century used to express an emotion. [sent-17, score-0.235]

12 The word run, in the past used intransitively, has acquired a transitive sense, common in computational circles where we run processes, programs and such. [sent-18, score-0.058]

13 The purpose of the current research is to quantify changes in word usage, which can be the effect of various factors: changes in meaning (addition/removal of senses), changes in distribution, change in topics that co-occur more frequently with a given word, changes in word spelling, etc. [sent-19, score-1.079]

14 For now we test whether we can identify the epoch to which a word occurrence belongs. [sent-20, score-0.616]

15 We use two sets of words one with monosemous words, the other with polysemous ones to try and separate the effect of topic change over time from the effect of sense change. [sent-21, score-0.842]

16 We use examples from Google books, split into three epochs: 1800+/-25 years, 1900+/-25, 2000+/25. [sent-22, score-0.054]

17 We select open-class words that occur frequently in all these epochs, and words that occur frequently only in one of them. [sent-23, score-0.286]

18 We then treat each epoch as – – a “class,” and verify whether we can correctly predict this class for test instances from each epoch for the words in our lists. [sent-24, score-1.324]

19 To test whether word usage frequency or sense variation have an impact on this disambiguation task, we use lists of words that have different frequencies in different epochs as well as different polysemies. [sent-25, score-1.09]

20 As mentioned before, we also compare the performance ofmonosemous and thus (sensewise) unchanged through time and polysemous words, to verify whether we can in fact predict sense change as opposed to contextual variation. [sent-26, score-0.546]

21 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 5s9–263, 2 Related Work The purpose of this paper is to look at words and how they change in time. [sent-29, score-0.265]

22 Previous work that looks at diachronic language change works at a higher language level, and is not specifically concerned with how words themselves change. [sent-30, score-0.268]

23 The historical data provided by Google has quickly attracted researchers in various fields, and started the new field of culturomics (Michel et al. [sent-31, score-0.058]

24 The purpose of such research is to analyse changes in human culture, as evidenced by the rise and fall in usage of various terms. [sent-33, score-0.382]

25 Reali and Griffiths (2010) analyse the similarities between language and genetic evolution, with the transmission of frequency distributions over linguistic forms functioning as the mechanism behind the phenomenon of language change. [sent-34, score-0.228]

26 Blei and Lafferty (2006) and Blei and Lafferty (2007) track changes in scientific topics through a discrete dynamic topic model (dDTM) both as types of scientific topics at different time points, and as changing word probability distributions within these topics. [sent-35, score-0.549]

27 The “Photography” topic for example has changed dramatically since the beginning of the 20th century, with words related to digital photography appearing recently, and dominating the most current version of the topic. [sent-36, score-0.258]

28 (2008) develop time-specific topic models, where topics, as patterns of word use, are tracked across a time changing text collection, and address the task of (fine-grained) time stamp prediction. [sent-38, score-0.306]

29 Wijaya and Yeniterzi (201 1) investigate through topic models the change in context of a specific entity over time, based on the Google Ngram corpus. [sent-39, score-0.211]

30 They determine that changes in this context reflect events occurring in the same period of time. [sent-40, score-0.2]

31 – 3 Word Epoch Disambiguation We formulate the task as a disambiguation problem, where we automatically classify the period of time when a word was used, based on its surrounding context. [sent-41, score-0.394]

32 We use a data-driven formulation, and draw examples from word occurrences over three different epochs. [sent-42, score-0.143]

33 For the purpose of this work, we consider an epoch to be a period of 50 years surrounding the beginning of a new century (1800+/25 years, 1900+/-25, 2000+/-25). [sent-43, score-0.834]

34 The word usage examples are gathered from books, where the publi260 cation year of a book is judged to be representative for the time when that word was used. [sent-44, score-0.357]

35 We select words with different characteristics to allow us to investigate whether there is an effect caused by sense change, or the disambiguation performance comes from the change of topics and vocabulary over time. [sent-45, score-0.646]

36 The choice of target words for our experiments is driven by the phenomena we aim to analyze. [sent-47, score-0.132]

37 According to these criteria, for each open class (nouns, verbs, adjectives, adverbs) we select 50 words, 25 of which have multiple senses, 25 with one sense only. [sent-49, score-0.191]

38 Each of these two sets has a 105-5-5 distribution: 10 words that are frequent in all three epochs, and 5 per each epoch such that these words are only frequent in one epoch. [sent-50, score-0.878]

39 To avoid partof-speech ambiguity we also choose words that are unambiguous from this point of view. [sent-51, score-0.124]

40 This selection process was done based on Google 1gram historical data, used for computing the probability distribution of open-class words for each epoch. [sent-52, score-0.149]

41 2 The set of target words consists thus of 200 open class words, uniformly distributed over the 4 parts of speech, uniformly distributed over multiple- sense/unique sense words, and with the frequency based sample as described above. [sent-53, score-0.473]

42 From this initial set of words, we could not identify enough examples in the three epochs considered for which left us with a final set of 165 words. [sent-54, score-0.505]

43 For each target word in our dataset, we collect the top 100 snippets returned by a search on Google Books for each of the three epochs we consider. [sent-56, score-0.552]

44 35,3 2For each open class word we create ranked lists of words, where the ranking score is an adjusted tfidf score – the epochs correspond to documents. [sent-57, score-0.569]

45 To choose words frequent only in one epoch, we choose the top words in the list, for words frequent in all epochs we choose the bottom words in this list. [sent-58, score-1.017]

46 3A minimum of 30 total examples was required for a word to be considered in the dataset. [sent-59, score-0.147]

47 All the extracted snippets are then processed: the text is tokenized and part-of-speech tagged using the Stanford tagger (Toutanova et al. [sent-60, score-0.037]

48 , 2003), and contexts that do not include the target word with the specified part-of-speech are removed. [sent-61, score-0.099]

49 The position of the target word is also identified and recorded as an offset along with the example. [sent-62, score-0.099]

50 For illustration, we show below an example drawn from each epoch for two different words, dinner: 1800: On reaching Mr. [sent-63, score-0.558]

51 Crane’s house, dinner was set before us ; but as is usual here in many places on the Sabbath, it was both dinner and tea combined into a single meal. [sent-64, score-0.26]

52 1900: The average dinner of today consists of relishes; of soup, either a consomme (clear soup) or a thick soup. [sent-65, score-0.13]

53 2000: Preparing dinner in a slow cooker is easy and convenient because the meal you’re making requires little to no attention while it cooks. [sent-66, score-0.186]

54 and surgeon: 1800: The apothecaries must instantly dispense what medicines the surgeons require for the use of the regiments. [sent-67, score-0.028]

55 1900: The surgeon operates, collects a fee, and sends to the physician one-third or onehalf of the fee, this last transaction being unknown to the patient. [sent-68, score-0.154]

56 2000: From a New York plastic surgeon comes all anyone ever wanted to know–and never imagined–about what goes on behind the scenes at the office of one of the world’s most prestigious plastic surgeons. [sent-69, score-0.312]

57 The classification algorithm we use is inspired by previous work on datadriven word sense disambiguation. [sent-71, score-0.182]

58 Specifically, we use a system that integrates both local and topical features. [sent-72, score-0.044]

59 The local features include: the current word and its part-of-speech; a local context of three words to the left and right of the ambiguous word; the parts-of-speech of the surrounding words; the first noun before and after the target word; the first verb before and after the target word. [sent-73, score-0.339]

60 The topical features are determined from the global context and are implemented through class-specific keywords, which are determined as a list of at most five words occurring at least three times in the contexts defining a certain word class (or epoch). [sent-74, score-0.26]

61 To evaluate word epoch disambiguation, we calculate the average accuracy obtained through ten-fold cross-validations applied on the data collected for each word. [sent-92, score-0.616]

62 To place results in perspective, we also calculate a simple baseline, which assigns the most frequent class by default. [sent-93, score-0.136]

63 Overall, the task appears to be feasible, as absolute improvements of 18. [sent-95, score-0.072]

64 While improvements are obtained for all parts-ofspeech, the nouns lead to the highest disambiguation results, with the largest improvement over the baseline, which interestingly aligns with previous observations from work on word sense disambiguation (Mihalcea and Edmonds, 2004; Agirre et al. [sent-97, score-0.606]

65 There are also words that experience very small improvements, such as “again” (3%) or “captivate” (7%), which are words that are frequently used in all three epochs. [sent-100, score-0.264]

66 There are even a few words (seven) for which the disambiguation accuracy is below the baseline, such as “oblige” (-1%) or “cruel” (-15%). [sent-101, score-0.284]

67 To understand to what extent the change in frequency over time has an impact on word epoch disambiguation, in Table 2 we report results for words that have high frequency in all three epochs considered, or in only one epoch at a time. [sent-102, score-2.023]

68 As expected, the words that are used more often in an epoch are also easier to disambiguate. [sent-103, score-0.649]

69 4 For instance, the 4The difference in results does not come from difference in verb “reassert” has higher frequency in 2000, and it has a disambiguation accuracy of 67. [sent-104, score-0.341]

70 Instead, the verb “conceal,” which appears with high frequency in all three epochs, has a disambiguation accuracy of 44. [sent-107, score-0.341]

71 70%, which is a relatively small improvement over the baseline of 38. [sent-108, score-0.054]

72 POS words examples Baseline WED High frequency in all epochs Noun1818042. [sent-113, score-0.643]

73 86% Table 2: Results for words that have high frequency in all epochs, or in one epoch at a time The second analysis that we perform is concerned with the accuracy observed for polysemous words as compared to monosemous words. [sent-133, score-1.305]

74 Monosemous words do not have sense changes over time, so being able to classify them in different epochs relies exclusively on variations in their context over time. [sent-135, score-0.791]

75 Polysemous words’s context change because of both changes in topics/vocabulary over time, and changes in word senses. [sent-136, score-0.495]

76 The fact that we see a difference in accuracy between disambiguation results for monosemous and polysemous words is an indication that word sense change is reflected and can be captured in the context. [sent-137, score-0.973]

77 To better visualize the improvements obtained with word epoch disambiguation with respect to the baseline, Figure 1plots the results. [sent-138, score-0.847]

78 6 Conclusions In this paper, we introduced the novel task of word epoch disambiguation, which aims to quantify the changes in word usage over time. [sent-139, score-0.999]

79 Using examples collected from three major periods of time, for 165 words, we showed that the word epoch disambiguation algorithm can lead to an overall absolute imsize in the data, as the number of examples extracted for words of high or low frequency is approximately the same. [sent-140, score-1.215]

80 262 By epoch frequency By number of senses Figure 1: Word epoch disambiguation compared to the baseline, for words that are frequent/not frequent (in a given epoch), and monosemous/polysemous. [sent-141, score-1.622]

81 POS words examples Baseline WED Polysemous words Noun2419141. [sent-145, score-0.236]

82 96% Table 3: Results for words that are polysemous monosemous. [sent-165, score-0.285]

83 5%, as compared to a baseline that picks the most frequent class by default. [sent-167, score-0.19]

84 These results indicate that there are significant differences between occurrences of words in different periods of time. [sent-168, score-0.213]

85 Moreover, additional analyses suggest that changes in usage frequency and word senses contribute to these differences. [sent-169, score-0.497]

86 In future work, we plan to do an in-depth analysis of the features that best characterize the changes in word usage over time, and develop representations that allow us to track sense changes. [sent-170, score-0.468]

87 Acknowledgments This material is based in part upon work supported by the National Science Foundation CAREER award #0747340. [sent-171, score-0.028]

88 Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. [sent-172, score-0.028]

89 An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. [sent-197, score-0.182]

90 Quantitative analysis of culture using millions of digitized books. [sent-221, score-0.046]

91 Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach. [sent-234, score-0.058]

92 Words as alleles: connecting language evolution with bayesian learners to models of genetic drift. [sent-240, score-0.085]

93 Topics over time: A non-Markov continuous-time model of topical trends. [sent-254, score-0.044]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('epoch', 0.558), ('epochs', 0.416), ('monosemous', 0.196), ('polysemous', 0.194), ('disambiguation', 0.193), ('changes', 0.16), ('dinner', 0.13), ('usage', 0.126), ('sense', 0.124), ('change', 0.117), ('adverb', 0.1), ('surgeon', 0.098), ('wed', 0.098), ('century', 0.097), ('words', 0.091), ('periods', 0.091), ('frequency', 0.082), ('books', 0.08), ('google', 0.075), ('adjective', 0.071), ('senses', 0.071), ('frequent', 0.069), ('blei', 0.067), ('class', 0.067), ('verb', 0.066), ('photography', 0.065), ('plastic', 0.065), ('reali', 0.065), ('time', 0.061), ('topic', 0.059), ('historical', 0.058), ('word', 0.058), ('topics', 0.058), ('purpose', 0.057), ('edmonds', 0.057), ('soup', 0.057), ('baseline', 0.054), ('examples', 0.054), ('frequently', 0.052), ('wijaya', 0.052), ('genetic', 0.052), ('fee', 0.052), ('nastase', 0.052), ('mihalcea', 0.051), ('verify', 0.05), ('culture', 0.046), ('topical', 0.044), ('changed', 0.043), ('surrounding', 0.042), ('agirre', 0.042), ('target', 0.041), ('period', 0.04), ('years', 0.04), ('changing', 0.039), ('quantify', 0.039), ('analyse', 0.039), ('improvements', 0.038), ('snippets', 0.037), ('investigate', 0.035), ('considered', 0.035), ('absolute', 0.034), ('rada', 0.034), ('uniformly', 0.034), ('evolution', 0.033), ('choose', 0.033), ('feasible', 0.033), ('concerned', 0.032), ('occurrences', 0.031), ('toutanova', 0.03), ('experience', 0.03), ('install', 0.028), ('tfidf', 0.028), ('diachronic', 0.028), ('anyone', 0.028), ('meal', 0.028), ('sexual', 0.028), ('tracked', 0.028), ('lberg', 0.028), ('physician', 0.028), ('medicines', 0.028), ('transaction', 0.028), ('transmission', 0.028), ('cooker', 0.028), ('aiden', 0.028), ('clancy', 0.028), ('hoiberg', 0.028), ('nowak', 0.028), ('orwant', 0.028), ('pickett', 0.028), ('pinker', 0.028), ('veres', 0.028), ('prestigious', 0.028), ('imagined', 0.028), ('material', 0.028), ('scientific', 0.028), ('michel', 0.028), ('wang', 0.028), ('comes', 0.028), ('lafferty', 0.027), ('phenomenon', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time

Author: Rada Mihalcea ; Vivi Nastase

Abstract: In this paper we introduce the novel task of “word epoch disambiguation,” defined as the problem of identifying changes in word usage over time. Through experiments run using word usage examples collected from three major periods of time (1800, 1900, 2000), we show that the task is feasible, and significant differences can be observed between occurrences of words in different periods of time.

2 0.14745566 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer

Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.

3 0.12550618 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

Author: Yuri Lin ; Jean-Baptiste Michel ; Erez Aiden Lieberman ; Jon Orwant ; Will Brockman ; Slav Petrov

Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

4 0.10405669 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

Author: Zhi Zhong ; Hwee Tou Ng

Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.

5 0.094897978 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

6 0.08656209 153 acl-2012-Named Entity Disambiguation in Streaming Data

7 0.08126004 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

8 0.073319107 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

9 0.068384796 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

10 0.067099296 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

11 0.065963343 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

12 0.0651711 171 acl-2012-SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations

13 0.05900868 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

14 0.057769593 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation

15 0.055118997 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

16 0.05376254 98 acl-2012-Finding Bursty Topics from Microblogs

17 0.053263821 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

18 0.051858764 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks

19 0.049908321 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

20 0.049750227 79 acl-2012-Efficient Tree-Based Topic Modeling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.154), (1, 0.067), (2, 0.037), (3, 0.031), (4, -0.058), (5, 0.098), (6, -0.004), (7, -0.037), (8, -0.009), (9, -0.03), (10, 0.076), (11, -0.014), (12, 0.128), (13, 0.077), (14, -0.029), (15, -0.048), (16, 0.048), (17, 0.026), (18, -0.054), (19, -0.043), (20, -0.032), (21, -0.061), (22, -0.093), (23, -0.019), (24, 0.053), (25, 0.025), (26, -0.037), (27, 0.03), (28, 0.064), (29, -0.018), (30, 0.01), (31, 0.033), (32, 0.065), (33, 0.123), (34, 0.172), (35, 0.028), (36, 0.157), (37, -0.008), (38, -0.078), (39, 0.171), (40, 0.083), (41, 0.115), (42, -0.126), (43, 0.191), (44, -0.167), (45, 0.105), (46, -0.04), (47, -0.163), (48, 0.152), (49, 0.157)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94519323 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time

Author: Rada Mihalcea ; Vivi Nastase

Abstract: In this paper we introduce the novel task of “word epoch disambiguation,” defined as the problem of identifying changes in word usage over time. Through experiments run using word usage examples collected from three major periods of time (1800, 1900, 2000), we show that the task is feasible, and significant differences can be observed between occurrences of words in different periods of time.

2 0.68394172 39 acl-2012-Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum

Author: Matt Garley ; Julia Hockenmaier

Abstract: We investigate how novel English-derived words (anglicisms) are used in a Germanlanguage Internet hip hop forum, and what factors contribute to their uptake.

3 0.53039432 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

Author: Yuri Lin ; Jean-Baptiste Michel ; Erez Aiden Lieberman ; Jon Orwant ; Will Brockman ; Slav Petrov

Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

4 0.46905258 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

Author: Eric Huang ; Richard Socher ; Christopher Manning ; Andrew Ng

Abstract: Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. 1

5 0.44810125 153 acl-2012-Named Entity Disambiguation in Streaming Data

Author: Alexandre Davis ; Adriano Veloso ; Altigran Soares ; Alberto Laender ; Wagner Meira Jr.

Abstract: The named entity disambiguation task is to resolve the many-to-many correspondence between ambiguous names and the unique realworld entity. This task can be modeled as a classification problem, provided that positive and negative examples are available for learning binary classifiers. High-quality senseannotated data, however, are hard to be obtained in streaming environments, since the training corpus would have to be constantly updated in order to accomodate the fresh data coming on the stream. On the other hand, few positive examples plus large amounts of unlabeled data may be easily acquired. Producing binary classifiers directly from this data, however, leads to poor disambiguation performance. Thus, we propose to enhance the quality of the classifiers using finer-grained variations of the well-known ExpectationMaximization (EM) algorithm. We conducted a systematic evaluation using Twitter streaming data and the results show that our classifiers are extremely effective, providing improvements ranging from 1% to 20%, when compared to the current state-of-the-art biased SVMs, being more than 120 times faster.

6 0.44622254 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

7 0.4024989 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

8 0.38027242 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

9 0.36988211 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

10 0.36029741 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

11 0.36016473 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

12 0.35026371 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

13 0.34531829 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

14 0.34226543 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions

15 0.33914256 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model

16 0.33370879 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy

17 0.31014487 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

18 0.29865831 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords

19 0.28954884 195 acl-2012-The Creation of a Corpus of English Metalanguage

20 0.28851131 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.037), (25, 0.017), (26, 0.043), (28, 0.033), (30, 0.022), (37, 0.026), (39, 0.054), (74, 0.039), (80, 0.197), (82, 0.01), (84, 0.035), (85, 0.024), (90, 0.237), (92, 0.075), (94, 0.011), (99, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86531126 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time

Author: Rada Mihalcea ; Vivi Nastase

Abstract: In this paper we introduce the novel task of “word epoch disambiguation,” defined as the problem of identifying changes in word usage over time. Through experiments run using word usage examples collected from three major periods of time (1800, 1900, 2000), we show that the task is feasible, and significant differences can be observed between occurrences of words in different periods of time.

2 0.79300314 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

Author: Yuri Lin ; Jean-Baptiste Michel ; Erez Aiden Lieberman ; Jon Orwant ; Will Brockman ; Slav Petrov

Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

3 0.79207116 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Author: Weiwei Sun ; Hans Uszkoreit

Abstract: From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by constituent parsing and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated approaches yield a relative error reduction of 18% in total over a stateof-the-art baseline.

4 0.79054254 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

Author: Hyun-Je Song ; Jeong-Woo Son ; Tae-Gil Noh ; Seong-Bae Park ; Sang-Jo Lee

Abstract: All types of part-of-speech (POS) tagging errors have been equally treated by existing taggers. However, the errors are not equally important, since some errors affect the performance of subsequent natural language processing (NLP) tasks seriously while others do not. This paper aims to minimize these serious errors while retaining the overall performance of POS tagging. Two gradient loss functions are proposed to reflect the different types of errors. They are designed to assign a larger cost to serious errors and a smaller one to minor errors. Through a set of POS tagging experiments, it is shown that the classifier trained with the proposed loss functions reduces serious errors compared to state-of-the-art POS taggers. In addition, the experimental result on text chunking shows that fewer serious errors help to improve the performance of sub- sequent NLP tasks.

5 0.78907716 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

Author: Sungchul Kim ; Kristina Toutanova ; Hwanjo Yu

Abstract: In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence. The model outperforms both standard annotation projection methods and methods based solely on Wikipedia metadata.

6 0.78906578 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

7 0.78860259 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

8 0.78776008 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

9 0.7873953 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

10 0.78721184 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

11 0.78341788 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling

12 0.78325778 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

13 0.78316295 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

14 0.78203267 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions

15 0.78188813 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

16 0.78157127 73 acl-2012-Discriminative Learning for Joint Template Filling

17 0.7810486 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

18 0.78093731 137 acl-2012-Lemmatisation as a Tagging Task

19 0.78080302 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

20 0.78001422 131 acl-2012-Learning Translation Consensus with Structured Label Propagation