emnlp emnlp2012 emnlp2012-89 knowledge-graph by maker-knowledge-mining

89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

Source: pdf

Author: Michael J. Paul

Abstract: Recent work has explored the use of hidden Markov models for unsupervised discourse and conversation modeling, where each segment or block of text such as a message in a conversation is associated with a hidden state in a sequence. We extend this approach to allow each block of text to be a mixture of multiple classes. Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We extend this approach to allow each block of text to be a mixture of multiple classes. [sent-3, score-0.572]

2 Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. [sent-4, score-0.839]

3 We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. [sent-5, score-0.458]

4 Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models. [sent-6, score-0.289]

5 Compared to the naive approach of treating conversations as flat documents, models which include conversation structure have been shown to improve tasks such as forum search (Elsas and Carbonell, 2009; Seo et al. [sent-9, score-0.317]

6 , 2010), where each state corresponds to a conversational class or “act. [sent-18, score-0.285]

7 ” Unlike more traditional uses of HMMs in which a single token is emitted per time step, HMM emissions in conversations correspond to entire blocks of text, such that an entire message is generated at each step. [sent-19, score-0.522]

8 Because each time step is associated with a block of variables, we refer to this type of HMM as a block HMM (Fig. [sent-20, score-1.084]

9 While block HMMs offer a concise model of inter-message structure, they have the limitation that each text block (message) belongs to exactly one class. [sent-22, score-1.084]

10 An email might contain a request, a question, and an answer to a previous question three distinct dialog acts within a single message. [sent-28, score-0.273]

11 This figure depicts the Bayesian variant of the block HMM (Ritter et al. [sent-34, score-0.542]

12 We generalize the block HMM approach so that there is no longer a one-to-one correspondence between states in the Markov chain and latent dis- course classes. [sent-37, score-0.697]

13 Instead, we allow a state in the HMM to correspond to a mixture of many classes: we refer to this family of models as mixed membership Markov models (M4). [sent-38, score-0.291]

14 Instead of defining explicit transition probabilities from one class to another as in a traditional HMM, we define the distribution over classes as a function of the entire histogram of class assignments of the previous text segment. [sent-39, score-0.845]

15 While we introduce a general model, we will focus on the task of unsupervised conversation modeling. [sent-41, score-0.268]

16 Specifically, we build off the Bayesian block HMMs used by Ritter et al. [sent-42, score-0.542]

17 95 2 M4: Mixed Membership Markov Models In this section, we extend the block HMM by introducing mixed membership Markov models (M4). [sent-48, score-0.732]

18 Under the block HMM, as utilized by Ritter et al. [sent-49, score-0.542]

19 (2010), messages in a conversation flow according to a Markov process, where the words of messages are generated according to language models associated with a state in a hidden Markov model. [sent-50, score-0.571]

20 The intuition is that HMM states should correspond to some notion of a conversation “act” such as QUESTION or ANSWER. [sent-51, score-0.3]

21 The intuition is the same under M4, but now each token in a message is given its own class assignment, according to a class distribution for that particular message. [sent-52, score-0.635]

22 A message’s class distribution depends on the class assignments of the previous message, yielding a model that retains sequential dependencies between messages, while allowing for finer grained class allocation than the block HMM. [sent-53, score-1.345]

23 Modeling messages (or more generally, text blocks) as a mixture of multiple classes rather than a single class gives rise to the “mixed membership” property. [sent-54, score-0.456]

24 Thus, a text block is a set of tokens, while a document consists of the discourse graph and all blocks associated with it. [sent-59, score-0.815]

25 In the context of modeling conversation threads, which will be the focus of our experiments later, we will assume a block corresponds to a single message in a thread. [sent-60, score-0.988]

26 The parent of a message m is the message to which it is a response; if a message is not in response to anything in particular, then it has no parent. [sent-61, score-0.628]

27 The graph as a whole may be a forest: for example, someone could write a new message in a conversation that is not directly in reply to any previous message, so this message would not have any parents, and would form the root of a new tree in the forest. [sent-69, score-0.652]

28 2 Generative Story Extending the block HMM, latent classes in M4 are now associated with each individual token, rather than one class for an entire block. [sent-71, score-0.95]

29 The key difference between the generative process behind M4 and the block HMM is that the transition distributions are defined with a log-linear model, which uses class assignments in a block as features to define the distribution over classes for the children of that block. [sent-72, score-1.741]

30 Put another way, a state in M4 corresponds to a class histogram, and transitions between states are functions of the log-linear parameters. [sent-73, score-0.346]

31 Given a block b, we will use the notation b to denote the block’s feature vector, which consists of the histogram of latent class assignments for the tokens of b. [sent-74, score-1.027]

32 Additionally, we assume each feature vector has an extra cell containing an indicator denoting whether the block has no parent this allows us to learn transitions from a “start” state. [sent-76, score-0.702]

33 The features are weighted by transition parameters, denoted The random variable z denotes a latent class, and φz is a discrete distribution over word types that is, each class is associated with a unigram language model. [sent-84, score-0.489]

34 The transition distribution over classes is denoted π, which is given in terms of and the feature vector of the parent block. [sent-85, score-0.417]

35 For each (j, k) in the transition matrix ΛK×K+2: (a) Draw transition weight λjk ∼ N(0, σ2). [sent-88, score-0.266]

36 For each block b of each document d in D: (a) Set class probability πbj = Pej0xepx(λpjT(λajT0)a) for all classes j,where a is theP Pfeature vector for block a, the parent of b. [sent-92, score-1.499]

37 each message in a conversation), the distribution over classes π is computed as a function of the feature vector of the block’s parent and the transition parameters (feature weights) Λ. [sent-99, score-0.601]

38 Each λjk has an intuitive interpretation: a positive value means that the occurrence of class k in a parent block increases the probability that j will appear in the next block, while a negative value reduces this probability. [sent-100, score-0.785]

39 The observed words of each block are generated by repeatedly sampling classes from the block’s distribution π, and for each sampled class z, a single word is sampled from the class-specific distribution over words φz. [sent-101, score-1.047]

40 In contrast, under the block HMM, a class z is sampled once from the transition distribution, and words are repeatedly sampled from φz. [sent-102, score-0.842]

41 The graphical diagram is shown in Figure 1 along with the block HMM and LDA. [sent-104, score-0.542]

42 This figure shows how M4 combines the sequential dependencies of the block HMM with the token-specific class assignments of LDA. [sent-105, score-0.905]

43 3 Discussion Like the block HMM, M4 is a type ofHMM. [sent-107, score-0.542]

44 A latent sequence under M4 forms a Markov chain in which a state corresponds to a histogram of classes. [sent-108, score-0.286]

45 ) If we assume a priori that the length of a block is unbounded, then this state space is NK where 0 ∈ N. [sent-110, score-0.585]

46 The probability of transitioning from a sthaetree b 0 t o∈ another state b˜ ∈ NK is: P(b → b˜) ∝ ζNMultinomial(˜b|π(b), N) (1) where N = Pk ˜bk, ζN is the probability that a block has N toPkens,2 and π(b) is the transition distribution given a vector b. [sent-111, score-0.796]

47 This follows from the generative story defined above, with an additional step of generating the number of tokens N from the distribution We currently define a block b’s distribution πb in terms of the discrete feature vector a given by its parent a. [sent-112, score-0.811]

48 However, as a generative story we believe it makes more sense for a block’s distribution to depend on the actual class values which are emitted by the parent. [sent-114, score-0.276]

49 Under a block HMM with one class per block, there are K states corresponding to the K classes, ζ. [sent-116, score-0.753]

50 We derive a stochastic EM algorithm in which we alternate between sampling class assignments for the word tokens and optimizing the transition parameters, outlined in the following two subsections. [sent-126, score-0.487]

51 1 Latent Class Sampling To explore the posterior distribution over latent classes, we use a collapsed Gibbs sampler such that we marginalize out each word multinomial φ and only need to sample the token assignments z conditioned on each other. [sent-128, score-0.359]

52 a is the parent block of b, and C is the set of b’s children. [sent-132, score-0.618]

53 t ihse t hcela fesas histogram plus the bias feature), where the histogram includes the incremented count of the candidate class k. [sent-135, score-0.311]

54 The rightmost term is a result of the dependency of the child blocks (C) on the class assignments of b. [sent-137, score-0.416]

55 speech acts or discourse states) will be reasonably small. [sent-141, score-0.312]

56 If it is desired to have a large number of latent topics as is common in LDA, this model could be combined with a standard topic model without sequential dependencies, as explored by Ritter et al. [sent-142, score-0.362]

57 −λσz2k (3) where a is the parent of block b, a is the feature vector associated with a, nbz is the number of times class z occurs in block b and nb is the total number of tokens in block b. [sent-147, score-1.942]

58 (2000) used HMMs as a general model of discourse with an application to speech acts (or dialog acts) in conversations. [sent-152, score-0.366]

59 The authors also experimented with incorporating a topic model on top of the HMM to distinguish speech acts from topical clusters, with mixed results. [sent-159, score-0.455]

60 paragraph) in a document is associated with a latent class or topic, and the ordering of topics within a document is modeled as a deviation from some canonical ordering. [sent-169, score-0.397]

61 Extensions to the block HMM have incorporated mixed membership properties within blocks, notably the Markov Clustering Topic Model (Hospedales et al. [sent-170, score-0.732]

62 , 2009), which allows each HMM state to be associated with its own distribution over topics in a topic model. [sent-171, score-0.303]

63 Like the block HMM, this still assumes a relatively small number of HMM states, but with an extra layer of latent variables before the observations are emitted. [sent-172, score-0.679]

64 Decoupling HMM states from latent classes was considered by Beal et al. [sent-174, score-0.285]

65 The Factorial HMM is most often used to model independent Markov chains, whereas M4 has a dense graphical model topology: the probability of each of the latent classes depends on the counts of all of the classes in the previous block. [sent-176, score-0.371]

66 In topic models, log-linear formulations of latent class distributions4 are utilized in correlated topic models (Blei and Lafferty, 2007) as a means of incorporating covariance structure among topic probabilities. [sent-178, score-0.719]

67 In M4, such features would correspond to the class histograms of previous blocks, introducing additional dependencies between documents. [sent-180, score-0.266]

68 , 2010), which models a document as a sequence of segments (such as paragraphs) governed by a Pitman-Yor process, in which the latent topic distribution of one segment serves as the base distribution for the next segment. [sent-182, score-0.542]

69 of our work, where the latent classes in a segment depend on the class distribution of the previous segment. [sent-184, score-0.546]

70 This corpus includes 321 threads and a total of 1309 messages, with an average message length of 78 tokens after preprocessing. [sent-195, score-0.304]

71 We consider 36K conversation threads for a total of 100K messages with average length 13. [sent-198, score-0.44]

72 (2010) the model we refer to as the block HMM (BHMM) and we consider this our primary baseline. [sent-205, score-0.542]

73 ) We also compare against LDA, which makes latent assignments at the token-level, but blocks of text are independent of – – 5Three messages in this corpus have multiple parents. [sent-207, score-0.489]

74 In other words, BHMM models sequential dependencies but allows only single-class membership, whereas LDA uses no sequence information but has a mixed membership property. [sent-210, score-0.314]

75 We use standard Gibbs samplers for both baseline models, and we optimize the Dirichlet hyperparameters (for the transition and topic distributions) using Minka’s fixed-point iterations (2003). [sent-212, score-0.28]

76 speech acts) of the conversation data we consider here. [sent-216, score-0.289]

77 We instead handle this by extending our model to include a “background” distribution over words which is independent of the latent classes in a document; this was also done by Wang et al. [sent-217, score-0.319]

78 The idea is to introduce a binary switching variable x into the model which determines whether a word is generated from the general background distribution or from the distribution specific to a latent class z. [sent-219, score-0.504]

79 When computing the transition distributions for M4, we normalize the class histograms so that the counts to sum to 1. [sent-239, score-0.393]

80 We run the sampler for 500 iterations using the word distributions and transition parameters learned during training; we compute the average perplexity from the final ten sampling iterations. [sent-246, score-0.315]

81 Each block’s topic distribution is stochastically generated with LDA, whereas in the two sequence models, the distribution over classes is simply a deterministic function of the previous block. [sent-250, score-0.459]

82 6Implementations of both M4 and the block HMM will be available at http : / / c s . [sent-254, score-0.542]

83 In the conversation domain, this corresponds to the task of thread reconstruction (Yeh and Harnly, 2006; Wang et al. [sent-263, score-0.449]

84 We do this by treating the parent of each block as a hidden variable to be inferred. [sent-270, score-0.66]

85 The parent of block b is the random variable rb, and we alternate between sampling values of the latent classes z and the parents r. [sent-271, score-0.911]

86 nbj/τ (5) For each conversation thread, any message is a candidate for the parent of block b (except b itself) including the dummy “start” block. [sent-275, score-1.03]

87 In the case of conversation data, our hope is that some of the latent classes represent speech acts or dialog acts (Searle, 1975). [sent-297, score-0.926]

88 We again see patterns that we might expect to find in social media conversations, and some classes appear to correspond to speech acts such a declarations, personal questions, and replies. [sent-329, score-0.449]

89 Conversely, the class containing URLs (which corresponds to the act of sharing news or media) is likely to begin a thread, but is not likely to follow other classes except itself. [sent-331, score-0.418]

90 How well unsupervised models can truly capture speech acts is an open question. [sent-332, score-0.272]

91 , 2009), the conversation classes inferred by unsupervised sequence models are similarly unlikely to be a perfect fit to human-assigned classes. [sent-334, score-0.424]

92 6 Conclusion We have presented mixed membership Markov models (M4), which extend the simple HMM approach to discourse modeling by positing class assignments at the level of individual tokens. [sent-337, score-0.535]

93 This allows blocks of text to belong to potentially multiple classes, a property that relates M4 to topic models. [sent-338, score-0.298]

94 Standard topic model extensions such as n-gram models (Wallach, 2006) can straightforwardly be applied here, and indeed we already applied such an extension by incorporating background distributions in §5. [sent-342, score-0.254]

95 into sentences) and constraint each segment to belong to one class or speech act; modi- fications along these lines have been applied to topic models as well (Gruber et al. [sent-346, score-0.435]

96 While we have focused on conversation modeling, M4 is a general probabilistic model that could be applied to other discourse applications, for example modeling sentences or paragraphs in articles rather than messages in conversations; it could also be applied to data beyond text. [sent-348, score-0.437]

97 Compared to a Bayesian block HMM, M4 performs much better at a variety of tasks. [sent-349, score-0.542]

98 Another potential avenue of future work is to model transitions such that a Dirichlet prior for the class distribution of a block, rather than the class distribution itself, depends on the previous class assignments. [sent-352, score-0.715]

99 Sequential latent dirichlet allocation: Discover underlying topic structures within a document. [sent-433, score-0.336]

100 Classifying sentences as speech acts in message board posts. [sent-486, score-0.416]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('block', 0.542), ('hmm', 0.234), ('conversation', 0.228), ('message', 0.184), ('acts', 0.171), ('lda', 0.168), ('class', 0.167), ('blocks', 0.151), ('topic', 0.147), ('ritter', 0.144), ('thread', 0.141), ('transition', 0.133), ('classes', 0.13), ('messages', 0.129), ('membership', 0.114), ('latent', 0.111), ('bhmm', 0.107), ('cnet', 0.107), ('assignments', 0.098), ('conversations', 0.089), ('act', 0.087), ('threads', 0.083), ('discourse', 0.08), ('distribution', 0.078), ('dirichlet', 0.078), ('hmms', 0.077), ('mixed', 0.076), ('parent', 0.076), ('histogram', 0.072), ('sequential', 0.069), ('twitter', 0.069), ('markov', 0.068), ('speech', 0.061), ('segment', 0.06), ('blei', 0.06), ('transitions', 0.058), ('reply', 0.056), ('dialog', 0.054), ('sampling', 0.052), ('distributions', 0.051), ('email', 0.048), ('factorial', 0.046), ('perplexity', 0.046), ('reconstruction', 0.046), ('states', 0.044), ('predictive', 0.043), ('state', 0.043), ('document', 0.042), ('histograms', 0.042), ('switching', 0.042), ('hidden', 0.042), ('conversational', 0.041), ('wang', 0.04), ('unsupervised', 0.04), ('token', 0.039), ('mimno', 0.039), ('tokens', 0.037), ('elsas', 0.036), ('hospedales', 0.036), ('mpaul', 0.036), ('nbz', 0.036), ('pjt', 0.036), ('qadir', 0.036), ('searle', 0.036), ('surendran', 0.036), ('topics', 0.035), ('corresponds', 0.034), ('social', 0.033), ('sampler', 0.033), ('tom', 0.031), ('bayesian', 0.031), ('variation', 0.031), ('timothy', 0.031), ('joty', 0.031), ('emitted', 0.031), ('carol', 0.031), ('asynchronous', 0.031), ('jk', 0.031), ('chemudugunta', 0.031), ('diehl', 0.031), ('hongning', 0.031), ('ros', 0.031), ('seo', 0.031), ('sigir', 0.03), ('mixture', 0.03), ('griffiths', 0.03), ('dependencies', 0.029), ('goldwater', 0.029), ('quantitatively', 0.029), ('correspond', 0.028), ('gruber', 0.028), ('straightforwardly', 0.028), ('bangalore', 0.028), ('beal', 0.028), ('background', 0.028), ('allocation', 0.028), ('rb', 0.028), ('sequence', 0.026), ('extra', 0.026), ('media', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

Author: Michael J. Paul

2 0.2198232 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model

Author: Lan Du ; Wray Buntine ; Huidong Jin

Abstract: Topic models are increasingly being used for text analysis tasks, often times replacing earlier semantic techniques such as latent semantic analysis. In this paper, we develop a novel adaptive topic model with the ability to adapt topics from both the previous segment and the parent document. For this proposed model, a Gibbs sampler is developed for doing posterior inference. Experimental results show that with topic adaptation, our model significantly improves over existing approaches in terms of perplexity, and is able to uncover clear sequential structure on, for example, Herman Melville’s book “Moby Dick”.

3 0.14670415 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

Author: Robert Lindsey ; William Headden ; Michael Stipicevic

Abstract: Topic models traditionally rely on the bagof-words assumption. In data mining applications, this often results in end-users being presented with inscrutable lists of topical unigrams, single words inferred as representative of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bagof-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for approximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases than do competing models.

4 0.14288691 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

Author: Fei Huang ; Alexander Yates

Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.

5 0.12749316 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

Author: Jennifer Gillenwater ; Alex Kulesza ; Ben Taskar

Abstract: We propose a novel probabilistic technique for modeling and extracting salient structure from large document collections. As in clustering and topic modeling, our goal is to provide an organizing perspective into otherwise overwhelming amounts of information. We are particularly interested in revealing and exploiting relationships between documents. To this end, we focus on extracting diverse sets of threads—singlylinked, coherent chains of important documents. To illustrate, we extract research threads from citation graphs and construct timelines from news articles. Our method is highly scalable, running on a corpus of over 30 million words in about four minutes, more than 75 times faster than a dynamic topic model. Finally, the results from our model more closely resemble human news summaries according to several metrics and are also preferred by human judges.

6 0.12740651 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model

7 0.12596534 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics

8 0.12111301 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories

9 0.10731131 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management

10 0.10515279 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media

11 0.088731959 19 emnlp-2012-An Entity-Topic Model for Entity Linking

12 0.082502656 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models

13 0.079878397 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

14 0.076445535 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

15 0.068983614 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

16 0.065017857 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

17 0.063574895 106 emnlp-2012-Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features

18 0.063506708 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

19 0.0635041 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions

20 0.062979966 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.239), (1, 0.053), (2, 0.127), (3, 0.162), (4, -0.283), (5, 0.188), (6, -0.051), (7, -0.13), (8, -0.02), (9, -0.029), (10, -0.01), (11, -0.012), (12, 0.047), (13, -0.04), (14, 0.005), (15, -0.01), (16, -0.159), (17, -0.024), (18, -0.01), (19, 0.035), (20, -0.015), (21, -0.145), (22, -0.024), (23, -0.066), (24, 0.213), (25, -0.038), (26, 0.005), (27, -0.09), (28, 0.06), (29, -0.023), (30, 0.154), (31, 0.118), (32, -0.041), (33, -0.098), (34, -0.085), (35, -0.036), (36, 0.064), (37, -0.005), (38, -0.069), (39, 0.066), (40, -0.007), (41, 0.006), (42, 0.06), (43, 0.002), (44, -0.034), (45, -0.006), (46, -0.004), (47, -0.061), (48, 0.012), (49, -0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96848232 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

Author: Michael J. Paul

2 0.71721601 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model

Author: Lan Du ; Wray Buntine ; Huidong Jin

3 0.61429054 115 emnlp-2012-SSHLDA: A Semi-Supervised Hierarchical Topic Model

Author: Xian-Ling Mao ; Zhao-Yan Ming ; Tat-Seng Chua ; Si Li ; Hongfei Yan ; Xiaoming Li

Abstract: Supervised hierarchical topic modeling and unsupervised hierarchical topic modeling are usually used to obtain hierarchical topics, such as hLLDA and hLDA. Supervised hierarchical topic modeling makes heavy use of the information from observed hierarchical labels, but cannot explore new topics; while unsupervised hierarchical topic modeling is able to detect automatically new topics in the data space, but does not make use of any information from hierarchical labels. In this paper, we propose a semi-supervised hierarchical topic model which aims to explore new topics automatically in the data space while incorporating the information from observed hierarchical labels into the modeling process, called SemiSupervised Hierarchical Latent Dirichlet Allocation (SSHLDA). We also prove that hLDA and hLLDA are special cases of SSHLDA. We . conduct experiments on Yahoo! Answers and ODP datasets, and assess the performance in terms of perplexity and clustering. The experimental results show that predictive ability of SSHLDA is better than that of baselines, and SSHLDA can also achieve significant improvement over baselines for clustering on the FScore measure.

4 0.60339814 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

Author: Robert Lindsey ; William Headden ; Michael Stipicevic

5 0.55978537 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

Author: Jennifer Gillenwater ; Alex Kulesza ; Ben Taskar

6 0.54931563 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics

7 0.45993575 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

8 0.45915848 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management

9 0.4505831 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media

10 0.44893521 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

11 0.43696854 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories

12 0.38364911 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models

13 0.34098017 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context

14 0.33139312 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation

15 0.31611562 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation

16 0.30443165 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

17 0.29753411 9 emnlp-2012-A Sequence Labelling Approach to Quote Attribution

18 0.29630125 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

19 0.29579505 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

20 0.2869812 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.027), (16, 0.042), (25, 0.02), (34, 0.111), (39, 0.011), (45, 0.018), (60, 0.073), (63, 0.095), (64, 0.017), (65, 0.024), (70, 0.024), (73, 0.025), (74, 0.052), (76, 0.049), (80, 0.018), (86, 0.027), (94, 0.247), (95, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82560205 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

Author: Ndapandula Nakashole ; Gerhard Weikum ; Fabian Suchanek

Abstract: This paper presents PATTY: a large resource for textual patterns that denote binary relations between entities. The patterns are semantically typed and organized into a subsumption taxonomy. The PATTY system is based on efficient algorithms for frequent itemset mining and can process Web-scale corpora. It harnesses the rich type system and entity population of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. Random-sampling-based evaluation shows a pattern accuracy of 84.7%. PATTY has 8,162 subsumptions, with a random-sampling-based precision of 75%. The PATTY resource is freely available for interactive access and download.

same-paper 2 0.81692284 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

Author: Michael J. Paul

3 0.57598603 97 emnlp-2012-Natural Language Questions for the Web of Data

Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum

Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the . in question translation and the resulting query answering.

4 0.56982005 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black

Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.

5 0.56904852 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua

Abstract: This paper proposes to generate appropriate answers for opinion questions about products by exploiting the hierarchical organization of consumer reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. We develop a new framework for opinion Questions Answering, which enables accurate question analysis and effective answer generation by making use the hierarchy. In particular, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. We then retrieve the corresponding review fragments relevant to the aspects from the hierarchy. In order to gener- ate appropriate answers from the review fragments, we develop a multi-criteria optimization approach for answer generation by simultaneously taking into account review salience, coherence, diversity, and parent-child relations among the aspects. We conduct evaluations on 11 popular products in four domains. The evaluated corpus contains 70,359 consumer reviews and 220 questions on these products. Experimental results demonstrate the effectiveness of our approach.

6 0.56440216 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

7 0.56399637 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

8 0.55979925 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

9 0.55894566 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

10 0.55659539 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

11 0.55391508 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

12 0.55139005 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

13 0.54967415 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques

14 0.54855657 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

15 0.54797179 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

16 0.54564106 95 emnlp-2012-N-gram-based Tense Models for Statistical Machine Translation

17 0.54542822 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

18 0.5446111 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

19 0.54346758 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

20 0.5434255 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules