emnlp emnlp2013 emnlp2013-100 knowledge-graph by maker-knowledge-mining

100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models

Source: pdf

Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao

Abstract: One of the language phenomena that n-gram language model fails to capture is the topic information of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all ngrams into exclusive topics, and two, developing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. [sent-9, score-0.293]

2 In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document. [sent-10, score-0.281]

3 , 1991) or topic information (Gildea and Hofmann, 1999; Wallach, 2006), to grammaticality aware models (Pauls and Klein, 2012). [sent-15, score-0.281]

4 , unsupervised language model adaptation: we want a language model that can adapt to the domain or topic of the current situation (e. [sent-18, score-0.281]

5 , a document in SMT or a conversation in ASR) automatically and select the appropriate words using both topic and syntactic context. [sent-20, score-0.319]

6 Wallach (2006) is one such model, which generate each word based on local context and global topic information to capture 1180 the difference of lexical usage among different topics. [sent-21, score-0.338]

7 When the number of topics is K and vocabulary size is V , n-gram topic model has O(KVn) parameters, which grow exponentially to n, making the local minima problem even more severe. [sent-27, score-0.415]

8 Our sampler resolves this problem by moving many customers in the hierarchical Chinese restau- rant process at a time. [sent-28, score-0.555]

9 2 Basic Models All models presented in this paper are based on the Bayesian n-gram language model, the hierarchical Pitman-Yor process language model (HPYLM). [sent-33, score-0.158]

10 In the following, we first introduce the HPYLM, and then discuss the topic model extension of Wallach (2006) with HPYLM. [sent-34, score-0.318]

11 The generative story starts with the unigram word distribution Gφ, which is a V -dimensional multinomial where Gφ(w) represents the probability of word w. [sent-39, score-0.153]

12 The model first generates this distribution from the PYP as Gφ ∼ PYP(a, b, G0), where G0 is a V -dimensional un∼iform distribution (G0(u) = V1; ∀u ∈ W) and acts as a prior for Gφ and a, b are hyperparameters called discount and concentration, respectively. [sent-40, score-0.182]

13 For example, two contexts “he is” and “she is”, which share the suffix “is”, are generated from the same (bigram) distribution Gis, so they would have similar word distributions. [sent-43, score-0.159]

14 2 Wallach (2006) with HPYLM Wallach (2006) is a generative model for a document collection that combines the topic model with a Bayesian n-gram language model. [sent-52, score-0.364]

15 , 2003) is the most basic topic model, which generates each word in a document based on a unigram word distribution defined by a topic allocated to that word. [sent-54, score-0.796]

16 The bigram topic model of Wallach (2006) simply replaces this unigram word distribution (a multinomial) for each topic with a bigram word distribution 1. [sent-55, score-0.836]

17 In other words, ordinary LDA generates word conditioning only on the latent topic, whereas the bigram topic model generates conditioning on both the latent topic and the previous word, as in the bigram language model. [sent-56, score-0.844]

18 Extending this model with a higher order n-gram is trivial; all we have to do is to replace the bigram language model for each topic with an ngram language model. [sent-57, score-0.332]

19 The formal description of the generative story of this n-gram topic model is as follows. [sent-58, score-0.326]

20 First, for each topic k ∈ 1, · · · , K, where K is the num- beaerc hof t otpoipcic ks, ∈the 1 m,·o·d·e ,l Kge,n werhaetrese aKn n-gram luamn-guage model Ghk. [sent-59, score-0.281]

21 2We sometimes denote Ghk to represent a language model of topic k, not a specific multinomial for some context h, depending on the context. [sent-64, score-0.281]

22 dimensional topic distribution θj by a Dirichlet distribution Dir(α) where α = (α1 , α2, · · · , αK) is a prior. [sent-65, score-0.409]

23 e FNinja il sy t,h feo nru emacbher w oorfd dw poordsist oinn d io ∈cu 1m,e··n·t , j,N ith word’s topic assignment zji is chosen according to θj, then a word type wji is generated from where hji is the last n −1 words preceding wji. [sent-67, score-0.836]

24 Generate corpora: For each document j ∈ 1, · · · D: θj a∼c hD diro(cαum) For∼ ∼ea Dchir w(αo)rd position i∈ 1, · · · , Nj : wzj i ∼∼ θ Gjzhj i 3 Extended Models One serious drawback of the n-gram topic model presented in the previous section is sparseness. [sent-70, score-0.319]

25 Roughly speaking, when the number of topics is K and the number of all n-grams in the training corpus is N, a language model of topic k, Ghk is learned using only about O(N/K) instances of the n-grams assigned the topic k, making each Ghk much sparser and unreliable distribution. [sent-72, score-0.657]

26 In one model, the HIERARCHICAL model, Gh0 is used as a prior for all other n-gram models, where Gh0 exploits global statistics across all topics {Gkh}. [sent-76, score-0.152]

27 sta Intis tthices are rs mhaoredde across Gh0 and {Gkh}, but some words are directly generated from Gh0 regardless oef wtheo topic edi dstirreibcutltyion g. [sent-78, score-0.281]

28 {u, v} are word types, k is a topic and each Ghk ims a eml. [sent-82, score-0.281]

29 Ftoopri example, hG Gu2v represents a word distribution following the context uv in topic 2. [sent-84, score-0.345]

30 mA a nnadtu nr-agl samolu ctioonnto this problem PisY tPh(ea doubly hierarchical Pitman- aGrckhh G 1-khg0 Yor process (DHPYP) proposed in Wood and Teh (2009). [sent-89, score-0.197]

31 This shows us that λ determines the back-off behavior: which probability we should take into account: the shorter context of the same topic Ghk0 or the full context of the global model Gh0. [sent-100, score-0.338]

32 We encode this assumption by placing hierarchical Beta distributions on the suffix tree across all topics: λh ∼ Beta(γλh0, γ(1 − λh0)) = DP(γ, λh0), (5) where DP is the hierarchical Dirichlet process (Teh et al. [sent-108, score-0.327]

33 Having generated the topic component of the model, the corpus generating process is the same as the previous model because we only change the generating process of Ghk for k = 1, · · · , K. [sent-111, score-0.349]

34 In this model, relationship of Gh0 to the other {Gkh} is flat, not hierarchical: Gh0 is a special topic etrh a{Gt can generate a word. [sent-116, score-0.281]

35 1 ,W2,h·e·n· g ,eKner inadtienpge an dwenotrldy, it first ∼det PerYmPi(naes, bw,Ghether to use global model Gh0 or topic model {Gkh}kK=1. [sent-118, score-0.338]

36 Generate corpora: For each document j ∈ 1, · · · D: θj a∼c hD diro(cαum) For∼ ∼ea Dchir w(αo)rd position i∈ 1, · lji ∼ch Bern(λhji i)t If lji = B e0r:n zji = 0 If lji = 1: zji ∼ θj wji ∼ · · , Nj : Ghzjji The difference between the two models is their usage of the global model Gh0. [sent-123, score-1.218]

37 In our models, all the latent variables are {Gkh, λh, θj , z, Θ}, where z is the set of topic assign{mGents and ,Θz = {a, b, γ, α} are hyperparameters, mwhenicths are dtr Θeate =d la {ate,r. [sent-126, score-0.281]

38 Given the training corpus w, the target posterior distribution is p(z, S |w, Θ), wwh,e trhee S ta gise tth peo ssteetr oofr seating arrangements wo,fΘ Θa)ll, restaurants. [sent-132, score-0.358]

39 To distinguish the two types of restau- rant, in the following, we refer the restaurant to indi- Figure 2: Graphical model representations of our two models in the case of a 3-gram model. [sent-133, score-0.174]

40 Ghk cate the collapsed state of (PYP), while we refer the restaurant of λh to indicates the collapsed state of λh (DP). [sent-135, score-0.256]

41 We present two different types of sampler: a token-based sampler and a table-based sampler. [sent-136, score-0.215]

42 1 Token-based Sampler The token-based sampler is almost identical to the collapsed sampler of the LDA (Griffiths and Steyvers, 2004). [sent-140, score-0.471]

43 Given the sampled topic zji, we update the language model of topic zji, by adding 1184 customer wji to the restaurant specified by zji and context hji. [sent-142, score-1.252]

44 HIERARCHICAL Adding customer operation is slightly changed: When a new table is added to a restaurant, we must track the label l ∈ {0, 1} indicating tahnet parent ur estst taruarcaknt t hoef t lahabet table, a{n0,d1 a}d idn tdhiecustomer corresponding to lto the restaurant of λh. [sent-144, score-0.235]

45 We need not assign a label to a new table, but rather we always add a customer to the restaurant of λh according to whether the sampled topic is 0 or not. [sent-147, score-0.553]

46 2 Table-based Sampler One problem with the token-based sampler is that the seating arrangement of the internal restaurant would never be changed unless a new table is created (or an old table is removed) in its child restaurant. [sent-149, score-0.594]

47 This probability is very low, particularly in the restaurants of shallow depth (e. [sent-150, score-0.176]

48 , unigram or Construct a block : ctuabstleomervMovetheblockto hesampledtopic Figure 3: Transition of the state of restaurants in the table-based sampler when the number of topics is 2. [sent-152, score-0.55]

49 In this case, we can change the topic of the three 3-grams (vvw, vvw, uvw) in some documents from 1 to 2 at the same time. [sent-155, score-0.33]

50 bigram restaurants) because these restaurants have a larger number of customers and tables than those Algorithm 1 Table-based sampler for all table in all restaurants do Remove a customer from the parent restaurant. [sent-156, score-0.796]

51 Construct a block of seating arrangement S by descending the tree recursively. [sent-157, score-0.277]

52 Move S to sampled topic, and∼ ad p(dz a |cSu,sStomer to the parent restaurant of the first selected table. [sent-159, score-0.211]

53 We continue this process recursively until reaching the leaf nodes, obtaining a block of seating arrangement S. [sent-161, score-0.311]

54 After calculating the conditional distribution, we sample new topic assignment for this block. [sent-162, score-0.281]

55 Finally, we move this block to the sampled topic, which potentially changes the topic of many words across different documents, which are connected to customers in a block at leaf nodes (this connection is also arbitrary). [sent-163, score-0.605]

56 Conditional distribution Let zS be the block of topic assignments connected to S and zS be a variable indicating the topic assignment. [sent-164, score-0.698]

57 Thanks to the exchangeability of all customers and tables in one of deep depth, leading to get stack in undesirable local minima. [sent-165, score-0.221]

58 For example, imagine a table in the restaurant of context “hidden” (depth is 2) and some topic, served “unit”. [sent-166, score-0.213]

59 This table is connected to tables in its child restaurants corresponding to some 3-grams (e. [sent-167, score-0.202]

60 , “of hidden unit” or “train hidden unit”), whereas similar n-grams, such as those of “of hidden units” or “train hidden units” might be gathered in another topic, but collecting these ngrams into the same topic might be difficult under the token-based sampler. [sent-169, score-0.44]

61 The table-based sampler moves those different n-grams having common suffixes jointly into another topic. [sent-170, score-0.215]

62 Figure 3 shows a transition of state by the tablebased sampler and Algorithm 4. [sent-171, score-0.215]

63 Because this connection cannot be preserved in common data structures for a restaurant described in Teh (2006a) or Blunsom et al. [sent-175, score-0.174]

64 This is correct because customers in CRP are exchangerestaurant (Teh, 2006a), we can imagine that customers and tables in S have been added to the restaurants last. [sent-177, score-0.488]

65 Each s ∈ S is a part of seating arrangements in a restaurant, Sth isere a being ts tables, ria-nthg eomf wnthsic inh aw riethst csi customers, with hs as the corresponding context. [sent-180, score-0.188]

66 A restaurant of context h and topic k has thkw tables served dish w, i-th of which with chkwi customers. [sent-181, score-0.572]

67 fIinrs (t1 s0e)l epc(twe|dk table, and the other p(s |k0) is the seating arrangement of custtoheme otrsh. [sent-184, score-0.205]

68 e T ph(es lkikelihood for changing topic assignments across documents must also be considered, which is p(zS = k0 |z−S) and decomposed as: p(zS= k0|z−S) =Yj(N(jn−j−SkS+0+Pαkk0α)nkj)n(jS()S), (13) where nj (S) is the number of word tokens connected with S in document j. [sent-185, score-0.435]

69 HIERARCHICAL We skip tables on restaurants of k = 0, because these tables are all from other topics and we cannot construct a block. [sent-186, score-0.375]

70 This problem is the same one addressed by Blunsom and Cohn (201 1), and we follow the same approximation in which, when we calculate the probability, we fractionally add tables and customers recursively. [sent-189, score-0.221]

71 3 Inference of Hyperparameters We also place a prior on each hyperparameter and sample value from the posterior distribution for every iteration. [sent-191, score-0.166]

72 We make the topic prior α asymmetric: α = βα0; β ∼ Gamma(1 , 1) , α0 ∼ Dir(1). [sent-194, score-0.281]

73 , 2005) is a composite model of HMM and LDA that assumes the words in a document are generated by HMM, where only one state has a document-specific topic distribution. [sent-196, score-0.319]

74 Other n-gram topic models have focused mainly on information retrieval. [sent-199, score-0.281]

75 (2007) is a topic model on automatically segmented chunks. [sent-205, score-0.281]

76 They also used switching variables, but for a different purpose: to determine the segmenting points. [sent-208, score-0.161]

77 Conventionally, this adaptation has relied on a heuristic combination of two separately trained models: an n-gram model p(w|h) aanradt a topic mdo mdeold p(w|d). [sent-211, score-0.345]

78 , settings where we have to update the topic distribution as new inputs come in. [sent-216, score-0.345]

79 (d)–(f): Test perplexity of various 3-gram models as a function of number of topics on each corpus. [sent-231, score-0.158]

80 For BNC, we first randomly selected 400 documents from a written corpus and then split each document into smaller documents every 100 sentences, leading to 6,262 documents, from which we randomly selected 100 documents for testing, and other are used for training. [sent-238, score-0.185]

82 2 Effects of Table-based Sampler We first evaluate the effects of our blocked sampler at training. [sent-252, score-0.293]

83 On all corpora, the model with the table-based sampler reached the higher probability space with much faster speed on both 3-gram and 4-gram models. [sent-255, score-0.215]

84 3 Perplexity Results Training For burn-in, we ran the sampler as follows: For HPYLM, we ran 100 Gibbs iterations. [sent-257, score-0.215]

85 For all other models, we ran 500 iterations of the Gibbs; HPYTMtoken is trained only on the token-based sampler, while for other models, the table-based sampler is performed after the token-based sampler. [sent-259, score-0.215]

86 Evaluation We have to adapt to the topic distribution of unseen documents incrementally. [sent-260, score-0.394]

87 , 2009), which is a kind of particle filter updating the posterior topic distribution of a test document. [sent-262, score-0.412]

88 One rea- son for this result might be the mismatch of prediction of the topic distribution in the HIERARCHICAL. [sent-286, score-0.345]

89 The HIERARCHICAL must allocate some (not global) topics to every word in a document, so even the words to which the SWITCHING might allocate the global topic (mainly function words; see below) must be allocated to some other topics, causing a mismatch of allocations of topic. [sent-287, score-0.535]

90 4 Qualitative Results To observe the behavior in which the SWITCHING allocates some words to the global topic, in Figure 5, we show the posterior of allocating the topic 0 or not at each word in a part of the NIPS training corpus. [sent-289, score-0.405]

91 We can see that the model elegantly identified content and function words, learning the topic distribution appropriately using only semantic contexts. [sent-290, score-0.345]

92 Contexts that might be likely to precede nouns have a higher value of λh, measuring image statistics learning probability distributions images we the mapping images statistics many-to-one phase space factor Figure 5: The posterior for assigning topic 0 or not in NIPS by the ∞-gram SWITCHING. [sent-292, score-0.348]

93 Darker words indicNaItPe a higher probability of not being assigned topic 0. [sent-293, score-0.281]

94 The ∞gram e pxretefinxseiosn o gives us htahev posterior voafl n-gram orderp(n| h), which can be used to calculate the probability nof| a )w, worhdi ordering composing a phrase einp topic k as p(w, n|k, h) ∝ p(n| h)p(w |k, n, h). [sent-299, score-0.348]

95 7 Conclusion We have presented modeling and algorithmic contributions to the existing Bayesian n-gram topic model. [sent-301, score-0.281]

96 A hierarchical pitman-yor process hmm for unsupervised part of speech induction. [sent-315, score-0.158]

97 A note on the implementation of hierarchical dirichlet processes. [sent-322, score-0.174]

98 Unsupervised language model adaptation based on topic and role information in multiparty meetings. [sent-344, score-0.345]

99 A hierarchical bayesian language model based on pitman-yor processes. [sent-401, score-0.213]

100 Topical n-grams: Phrase and topic discovery, with an application to information retrieval. [sent-417, score-0.281]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ghk', 0.294), ('zji', 0.294), ('topic', 0.281), ('sampler', 0.215), ('pyp', 0.204), ('gkh', 0.176), ('hpylm', 0.176), ('restaurant', 0.174), ('wallach', 0.167), ('switching', 0.161), ('customers', 0.143), ('hji', 0.137), ('lji', 0.137), ('seating', 0.137), ('hierarchical', 0.124), ('wji', 0.124), ('restaurants', 0.124), ('hpytm', 0.117), ('wood', 0.116), ('zs', 0.112), ('hpytmtoken', 0.098), ('topics', 0.095), ('teh', 0.094), ('bayesian', 0.089), ('ji', 0.085), ('bnc', 0.082), ('topicality', 0.078), ('blocked', 0.078), ('tam', 0.078), ('tables', 0.078), ('block', 0.072), ('arrangement', 0.068), ('rescaling', 0.068), ('nj', 0.067), ('posterior', 0.067), ('distribution', 0.064), ('adaptation', 0.064), ('perplexity', 0.063), ('customer', 0.061), ('gildea', 0.06), ('gibbs', 0.058), ('global', 0.057), ('generates', 0.054), ('depth', 0.052), ('whye', 0.052), ('arrangements', 0.051), ('exclusive', 0.051), ('bigram', 0.051), ('dp', 0.051), ('contexts', 0.05), ('dirichlet', 0.05), ('documents', 0.049), ('crp', 0.047), ('schultz', 0.047), ('lda', 0.046), ('generative', 0.045), ('suffix', 0.045), ('hofmann', 0.044), ('unigram', 0.044), ('brown', 0.044), ('mochihashi', 0.043), ('sparseness', 0.043), ('gu', 0.041), ('yee', 0.041), ('collapsed', 0.041), ('nips', 0.04), ('actkhk', 0.039), ('bthkw', 0.039), ('chw', 0.039), ('dchir', 0.039), ('dhpyp', 0.039), ('diro', 0.039), ('doubly', 0.039), ('ekha', 0.039), ('ghzjji', 0.039), ('lindsey', 0.039), ('minima', 0.039), ('rant', 0.039), ('sks', 0.039), ('tiyn', 0.039), ('trhee', 0.039), ('vvw', 0.039), ('served', 0.039), ('blunsom', 0.038), ('griffiths', 0.038), ('document', 0.038), ('extension', 0.037), ('gram', 0.037), ('sampled', 0.037), ('conditioning', 0.036), ('place', 0.035), ('ch', 0.034), ('allocated', 0.034), ('allocate', 0.034), ('tanja', 0.034), ('process', 0.034), ('incremental', 0.033), ('hidden', 0.032), ('ngrams', 0.031), ('samplers', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000017 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models

Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao

2 0.15288669 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

3 0.11107975 202 emnlp-2013-Where Not to Eat? Improving Public Policy by Predicting Hygiene Inspections Using Online Reviews

Author: Jun Seok Kang ; Polina Kuznetsova ; Michael Luca ; Yejin Choi

Abstract: This paper offers an approach for governments to harness the information contained in social media in order to make public inspections and disclosure more efficient. As a case study, we turn to restaurant hygiene inspections which are done for restaurants throughout the United States and in most of the world and are a frequently cited example of public inspections and disclosure. We present the first empirical study that shows the viability of statistical models that learn the mapping between textual signals in restaurant reviews and the hygiene inspection records from the Department of Public Health. The learned model achieves over 82% accuracy in discriminating severe – offenders from places with no violation, and provides insights into salient cues in reviews that are indicative of the restaurant’s sanitary conditions. Our study suggests that public disclosure policy can be improved by mining public opinions from social media to target inspections and to provide alternative forms of disclosure to customers.

4 0.10861824 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

Author: John Philip McCrae ; Philipp Cimiano ; Roman Klinger

Abstract: Cross-lingual topic modelling has applications in machine translation, word sense disambiguation and terminology alignment. Multilingual extensions of approaches based on latent (LSI), generative (LDA, PLSI) as well as explicit (ESA) topic modelling can induce an interlingual topic space allowing documents in different languages to be mapped into the same space and thus to be compared across languages. In this paper, we present a novel approach that combines latent and explicit topic modelling approaches in the sense that it builds on a set of explicitly defined topics, but then computes latent relations between these. Thus, the method combines the benefits of both explicit and latent topic modelling approaches. We show that on a crosslingual mate retrieval task, our model significantly outperforms LDA, LSI, and ESA, as well as a baseline that translates every word in a document into the target language.

5 0.10438029 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang

Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.

6 0.100346 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

7 0.095144905 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

8 0.093806639 121 emnlp-2013-Learning Topics and Positions from Debatepedia

9 0.084753826 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

10 0.080108374 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication

11 0.075625129 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

12 0.074738368 138 emnlp-2013-Naive Bayes Word Sense Induction

13 0.071533546 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

14 0.068563968 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

15 0.064339146 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation

16 0.063351616 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

17 0.062326577 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

18 0.061980616 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

19 0.06183356 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities

20 0.060751561 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.185), (1, 0.014), (2, -0.079), (3, 0.014), (4, -0.028), (5, -0.049), (6, 0.065), (7, 0.044), (8, -0.039), (9, -0.089), (10, -0.079), (11, -0.221), (12, -0.14), (13, 0.081), (14, 0.051), (15, 0.115), (16, 0.211), (17, -0.039), (18, -0.054), (19, -0.096), (20, -0.033), (21, 0.017), (22, -0.036), (23, 0.147), (24, -0.056), (25, 0.058), (26, 0.006), (27, 0.014), (28, 0.003), (29, 0.064), (30, 0.044), (31, -0.087), (32, -0.022), (33, 0.036), (34, -0.092), (35, -0.009), (36, -0.01), (37, 0.03), (38, -0.07), (39, 0.087), (40, -0.09), (41, -0.027), (42, -0.106), (43, 0.095), (44, 0.024), (45, 0.071), (46, -0.074), (47, -0.032), (48, 0.139), (49, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95420343 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models

Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao

2 0.70103908 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

3 0.62469494 121 emnlp-2013-Learning Topics and Positions from Debatepedia

Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith

Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.

4 0.57432276 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

Author: James Foulds ; Padhraic Smyth

Abstract: When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach.

5 0.56819546 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang

6 0.53366745 199 emnlp-2013-Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students

7 0.50672054 138 emnlp-2013-Naive Bayes Word Sense Induction

8 0.50668478 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

9 0.50316244 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication

10 0.46030274 94 emnlp-2013-Identifying Manipulated Offerings on Review Portals

11 0.45648423 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches

12 0.45189661 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities

13 0.40104201 176 emnlp-2013-Structured Penalties for Log-Linear Language Models

14 0.39497393 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

15 0.38844791 202 emnlp-2013-Where Not to Eat? Improving Public Policy by Predicting Hygiene Inspections Using Online Reviews

16 0.38630089 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

17 0.38430712 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation

18 0.38159588 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

19 0.37665337 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

20 0.36179629 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.023), (9, 0.404), (18, 0.034), (22, 0.042), (30, 0.076), (45, 0.015), (50, 0.017), (51, 0.149), (66, 0.047), (71, 0.023), (75, 0.029), (77, 0.024), (96, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85741115 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues

Author: Matt Gardner ; Partha Pratim Talukdar ; Bryan Kisiel ; Tom Mitchell

Abstract: Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRAbased approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.

2 0.83552212 182 emnlp-2013-The Topology of Semantic Knowledge

Author: Jimmy Dubuisson ; Jean-Pierre Eckmann ; Christian Scheible ; Hinrich Schutze

Abstract: Studies of the graph of dictionary definitions (DD) (Picard et al., 2009; Levary et al., 2012) have revealed strong semantic coherence of local topological structures. The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). Based on our earlier work (Levary et al., 2012), we study a different class of word definitions, namely those of the Free Association (FA) dataset (Nelson et al., 2004). These are responses by subjects to a cue word, which are then summarized by a directed, free association graph. We find that the structure of this network is quite different from both the Wordnet and the dictionary networks. This difference can be explained by the very nature of free association as compared to the more “logical” construction of dictionaries. It thus sheds some (quantitative) light on the psychology of free association. In NLP, semantic groups or clusters are interesting for various applications such as word sense disambiguation. The FA graph is tighter than the DD graph, because of the large number of triangles. This also makes drift of meaning quite measurable so that FA graphs provide a quantitative measure of the semantic coherence of small groups of words.

same-paper 3 0.77689737 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models

Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao

4 0.48653218 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier

Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.

5 0.47965476 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

Author: He He ; Hal Daume III ; Jason Eisner

Abstract: Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.

6 0.45183375 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

7 0.45055693 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

8 0.45016387 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

9 0.44957241 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression

10 0.44861326 138 emnlp-2013-Naive Bayes Word Sense Induction

11 0.44770586 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

12 0.44676501 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

13 0.44668713 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

14 0.44502285 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

15 0.44131365 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches

16 0.4406971 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

17 0.43110177 149 emnlp-2013-Overcoming the Lack of Parallel Data in Sentence Compression

18 0.42966911 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation

19 0.42831659 121 emnlp-2013-Learning Topics and Positions from Debatepedia

20 0.42757854 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections