nips nips2009 nips2009-68 knowledge-graph by maker-knowledge-mining

68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora


Source: pdf

Author: Shuang-hong Yang, Hongyuan Zha, Bao-gang Hu

Abstract: We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e. [sent-7, score-0.155]

2 , paragraphs in the document) and belongs to multiple classes. [sent-11, score-0.096]

3 By casting predefined classes as latent Dirichlet variables (i. [sent-12, score-0.073]

4 , instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. [sent-14, score-0.42]

5 DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively. [sent-15, score-0.651]

6 1 Introduction We consider multi-class, multi-label and multi-instance classification (M3 C), a task of learning decision rules from corpora in which each pattern consists of multiple instances1 and is associated with multiple classes. [sent-16, score-0.133]

7 M3 C finds its application in many fields: For example, in web page classification, a web page (pattern) typically comprises of different entities (instances) (e. [sent-17, score-0.157]

8 , texts, pictures and videos) and is usually associated with several different topics (e. [sent-19, score-0.128]

9 In such tasks, a pattern usually consists of a set of instances, and the possible instances may be too diverse in nature (e. [sent-22, score-0.155]

10 What makes the problem more complicated and challenging is that the pattern is usually ambiguous, i. [sent-25, score-0.102]

11 Even for corpora consisting of relatively homogenous data, treating the tasks as M3 C might still be advantageous since it enables us to explore the inner structures and the ambiguity of the data simultaneously. [sent-29, score-0.077]

12 For example, in text classification, a document usually comprises several separate semantic parts (e. [sent-30, score-0.151]

13 , paragraphs), and several different topics are evolving along these parts. [sent-32, score-0.091]

14 Since the class-labels are often only locally tied to the document (e. [sent-33, score-0.071]

15 , paragraphs are often far more topicfocused than the whole document), base the classification on the whole document would incur too much noise and in turn harm the performance. [sent-35, score-0.145]

16 In addition, treating the task as M3 C also offers a natural way to track the topic evolution along paragraphs, a task that is otherwise difficult to handle. [sent-36, score-0.071]

17 Ideal annotation requires a skilled expert to specify both the exact location and class label of each object in the image, which, though not completely impossible, involves too much human efforts especially for large image repositories. [sent-43, score-0.059]

18 By modeling a document as a mixture over topics, LDA allows each document to be associated with multiple topics with different proportions, and thus provides a promising way to capture the heterogeneity/ambiguity in the data. [sent-52, score-0.233]

19 However, the topics discovered by LDA are implicit (i. [sent-53, score-0.146]

20 , each topic is expressed as a distribution over words, comprehensible interpretation of which requires human expertise), and cannot be easily aligned to the topics of human interests. [sent-55, score-0.162]

21 , each multi-labeled pattern is a bag of single-labeled instances. [sent-61, score-0.115]

22 Through likelihood maximization, DBA automatically aligns the topics discovered from the data to the predefined classes of our interests. [sent-64, score-0.188]

23 DBA can be naturally tailored to M3 C tasks for both pattern classification and instance disambiguation. [sent-65, score-0.149]

24 Section 2 briefly reviews some related topics and Section 3 presents the formal description of the corpora used in M3 C and the basic assumptions of our model. [sent-71, score-0.14]

25 And in Section 6, we apply the DBA model to text classification and query disambiguation tasks. [sent-74, score-0.255]

26 However, the real-world is more like a web of (sub-)patterns connected with a web of classes that they belong to. [sent-81, score-0.169]

27 MIC assumes that each pattern consists of multiple instances but belongs to a single class, whereas MLC studies single-instance pattern associated with multiple classes. [sent-86, score-0.243]

28 Recently, Cour et al proposed a discriminative framework [6] based on convex surrogate loss minimization for classifying ambiguously labeled images; and Xu et al established a hybrid generative/discriminative approach (i. [sent-93, score-0.049]

29 , a heuristically regularized LDA classifier) [12] to mining named entity from web search click-through data. [sent-95, score-0.271]

30 Our proposed DBA model can be viewed as a supervised version of topic models. [sent-97, score-0.071]

31 A widely used topic model for categorical data is the LDA model [4]. [sent-98, score-0.071]

32 By modeling a pattern as a random mixture over latent topics and a topic as a Multinomial distribution over features in a dictionary, LDA is effective in discovering implicit topics from a corpus. [sent-99, score-0.393]

33 The supervised LDA (sLDA) model [2], by linking the empirical topics to the label of each pattern, is able to learn classifiers using Generalized Linear Models. [sent-100, score-0.125]

34 2 pattern X θ z a class c c c instance x x x f . [sent-102, score-0.174]

35 (b):A graphic representation of the DBA model with multinomial bag-of-feature instance model. [sent-106, score-0.093]

36 3 Problem Formalization Intuitively, we can think of a pattern as a document, an instance as a paragraph, and a feature as a word. [sent-107, score-0.149]

37 In M3 C, we are interested in inferring class labels for both the document and its paragraphs. [sent-108, score-0.136]

38 Formally, let X ⊂ RD denote the instance space (e. [sent-109, score-0.065]

39 A multi-class, multilabel multi-instance corpus D consists of a set of input patterns {Xn }n=1,2,. [sent-118, score-0.046]

40 ,Mn contains a set of instances xmn ∈ X , and Yn ⊂ Y consists of a set of class labels. [sent-127, score-0.109]

41 Assumption 1 [Exchangeability]: A corpus is a bag of patterns, and each pattern is a bag of instances. [sent-130, score-0.192]

42 Assumption 2 [Distinguishablity]: Each pattern can belong to several classes, but each instance belongs to a single class. [sent-131, score-0.192]

43 These assumptions are equivalent to assuming a tree structure for the corpus (Figure 1(a)). [sent-132, score-0.046]

44 4 Dirichlet-Bernoulli Alignment In this section, we present Dirichlet-Bernoulli Alignment (DBA), a probabilistic generative model for the multi-class, multi-label and multi-instance corpus described in Section 3. [sent-133, score-0.068]

45 In DBA, each pattern X in a corpus D is assumed to be generated by the following process: 1. [sent-134, score-0.13]

46 For each of the M instances in X: ⊲ Choose a class z ∼Mult(θ); ⊲ Generate an instance x ∼ p(x|z, B); 3. [sent-137, score-0.143]

47 , a binary C-vector with the 1-of-C code: zc = 1 if the c-th class is chosen, and ∀i = c, zi = 0. [sent-149, score-0.222]

48 , yC ]⊤ is also a binary C-vector with yc = 1 if the pattern X belongs to the c-th class and yc = 0 otherwise. [sent-153, score-0.409]

49 In this paper, we assume the label of a pattern is generated by a cost-sensitive voting process according to the labels of the instances in it, which is intuitively reasonable. [sent-154, score-0.211]

50 In this paper, we use a logistic model: ¯ ¯ p(yc = 1|¯, λ) = z 3 exp(λc zc ) ¯ . [sent-171, score-0.197]

51 1 + exp(λc zc ) ¯ (1) In practice, the set of possible instances can be quite diverse, such as pictures, texts, music and videos on a web page. [sent-172, score-0.303]

52 Without loss of generality, we follow the convention of topic models to assume that each instance x is a bag of discrete features {f1 , f2 , . [sent-173, score-0.167]

53 , bD ] is a C × D-matrix with the (c, d)-th entry bcd = p(fd = 1|zc = 1) and xd is the frequency of fd in x. [sent-187, score-0.077]

54 The joint probability is then given by: M L p(X, y, Z, θ|a, B, λ) = p(θ|a) p(zm |θ) m=1 p(fml |B, zm ) p(y|¯, λ). [sent-188, score-0.09]

55 , the classification of each pattern as well as the instances within it. [sent-193, score-0.137]

56 1 Variational Approximations We use the following fully-factorized variational distribution to approximate the posterior distribution of the latent variables: M q(Z, θ|γ, Φ) = q(θ|γ) q(zm |φm ) = m=1 Γ( C c=1 γc ) C C c=1 Γ(γc ) c=1 M γ θc c −1 zmc φmc , (3) m=1 where γ and Φ=[φ1 ,. [sent-197, score-0.197]

57 ,φM ] are variational parameters for a pattern X. [sent-200, score-0.173]

58 z m=1 2 This is only a simple special case instance model for DBA. [sent-202, score-0.065]

59 It is quite straightforward to substitute other instance models such as Gaussian, Poisson and other more complicated models like Gaussian mixtures. [sent-203, score-0.065]

60 4 The first two terms and the fifth term (the entropy of the variational distribution) in the right-hand side of Eq. [sent-204, score-0.089]

61 , the variational expectation of the log likelihood for instance observations is: M M C D φmc xmd log bcd . [sent-208, score-0.293]

62 Eq [log p(xm |B, zm )] = m=1 (6) m=1 c=1 d=1 The forth term in the righthand side of Eq. [sent-209, score-0.09]

63 (5) corresponds to the expected log likelihood of observing the labels given the topic assignments: Eq [log p(y|¯, λ)] = z 1 M M C C 1 λc zc ¯ −λc zc ¯ (yc − )λc φmc − Eq [log(exp + exp )]. [sent-210, score-0.56]

64 2 2 2 m=1 c=1 c=1 (7) We bound the second term above by using the lower bound for logistic function [9]: − log(exp λc zc ¯ −λc zc ¯ + exp ) 2 2 ξc 2 2 + ςc (λ2 zc − ξc ) c¯ 2 ξc 2 ≈ − log(1 + exp(−ξc )) − + 2ςc (λc zc ξc − ξc ), ¯ 2 − log(1 + exp(−ξc )) − (8) 1 c where ξ=[ξ1 , . [sent-211, score-0.812]

65 , ξC ]⊤ are variational parameters, ςc = 4ξc tanh( ξ2 ), and the second order residue term is omitted since the lower bound is exact when ξc = −λc zc . [sent-214, score-0.286]

66 ¯ Obtaining an approximate posterior distribution for the latent variables is then reduced to optimizing the objective max L(q) or min KL(q||p) with respect to the variational parameters. [sent-215, score-0.12]

67 Note that instead of only one feature contributing to φmc as in LDA, all the features appearing in an instance are now responsible for contributing. [sent-217, score-0.065]

68 Also, DBA makes use of the supervision information with a term C λc zc (2yc − 1) in the variational likelihood bound L. [sent-219, score-0.286]

69 As a result, it tends to align the Dirichlet topics discovered from the data to the class labels (Bernoulli observations) y. [sent-222, score-0.186]

70 2 Parameter Estimation The maximum likelihood parameter estimation of DBA relies on the variational approximation procedure. [sent-225, score-0.089]

71 (10) involves two groups of parameters corresponding to the DBA model and its variational approximation, respectively. [sent-238, score-0.089]

72 Optimizing alternatively between these two groups leads to a Variational Expectation Maximization (VEM) algorithm similar to the one used in LDA, where the E-step corresponds to the variational approximation for each pattern in the corpus. [sent-239, score-0.173]

73 The first task is to infer the latent variables for a given pattern, which is straightforward after the variational approximation. [sent-248, score-0.12]

74 The second task, pattern classification, addresses prediction of labels for a new pattern X: p(yc = 1|X; a, B, λ) ≈ M λc 1 c ¯ ¯ ¯ exp(λc φc )/(1 + exp(λc φc )), where φc = M m=1 φmc and the term 2M [2yc − 1 + tanh( ξ2 )] is removed when updating φ in Eq. [sent-249, score-0.208]

75 The third task, instance disambiguation, finds labels for each instances within a pattern: p(zm |X, y) = θ p(zm , θ|X, y)dθ ≈ q(zm |φm ), that is, p(zmc = 1|X, y) = φmc . [sent-251, score-0.158]

76 6 Experiments In this section, we conduct extensive experiments to test the DBA model as it is applied to pattern classification and instance disambiguation respectively. [sent-252, score-0.322]

77 Then the instance disambiguation performance of DBA is tested on a novel real-world task, i. [sent-254, score-0.238]

78 1 Text Classification This experiment is conducted on the ModApte split of the Reuters-21578 text collection, which contains 10788 documents belonging to the most popular 10 classes. [sent-259, score-0.089]

79 We use the top 500 words with the highest document frequency as features, and represent each document as a pattern with each of its paragraphs being an instance in order to exploit the semantic structure of documents explicitly. [sent-260, score-0.411]

80 After eliminating the documents that have empty label set or less than 20 features, we obtain a subset of 1879 documents, among which 721 documents (about 38. [sent-261, score-0.126]

81 6 and the average number of instances (paragraphs) per pattern (document) is 8. [sent-265, score-0.137]

82 The data set is further randomly partitioned into a subset of 1200 documents for training and the rest for testing. [sent-268, score-0.046]

83 For comparison, we also test two state-of-the-art M3 C algorithms, the MIMLSVM and MIMLBoost [13], and use the Multinomial Na¨ve Bayes (MNB) classifier trained on the vector space model of the ı whole documents as the baseline. [sent-269, score-0.046]

84 We can see that: (1) for most classes, the three 6 Table 2: Accuracy@N (N = 1, 2, 3) and micro-averaged and macro-averaged F-measures of DBA, MNB and SVM based disambiguation methods. [sent-273, score-0.173]

85 2 0 0 20 40 60 80 0 100 class 0 20 40 60 class Figure 3: Precision and Recall scores for each of 101 classes by using DBA, MNB and SVM based methods. [sent-319, score-0.112]

86 A possible reason might be: if the documents are very short, splitting them might introduce severe data sparseness and in turn harms the performance. [sent-322, score-0.046]

87 2 Named Entity Disambiguation Query ambiguity is a fundamental obstacle for search engine to capture users’ search intentions. [sent-326, score-0.074]

88 In this section, we employ DBA to disambiguate the named entities in web search queries. [sent-327, score-0.199]

89 This is a very challenging problem because queries are usually very short (2 to 3 words on average), noisy (e. [sent-328, score-0.052]

90 A single namedentity query Q can be viewed as a combination of a single named entity e and a set of context words w (the remaining text in Q). [sent-331, score-0.258]

91 By differentiating the possible meanings of the named entity in a query and identifying the most possible one, entity disambiguation can help search engines to capture the precise information need of the user and in turn improve search by responding with the truly most relevant documents. [sent-332, score-0.519]

92 ”, the system should be able to identify that the ambiguous named entity “Harry Potter” (i. [sent-334, score-0.195]

93 We treat the ambiguity of e as a hidden class z over e and make use of the query log as a data source for mining the relationship among e, w and z. [sent-337, score-0.142]

94 In particular, the query log can be viewed as a multi-class, multi-label and multi-instance corpus {(Xn , Yn )}n=1,2,. [sent-338, score-0.116]

95 ,N , in which each pattern X corresponds to a named-entity e and is characterized by a set of instances {xm }m=1,2,. [sent-341, score-0.137]

96 We manually collect 400 named entities and label them according to the labels of their co-occurring queries in Yahoo! [sent-351, score-0.231]

97 Table 2 demonstrates the Accuracy@N (N = 1, 2, 3) as well as micro-averaged and macro-average F-measure scores of each disambiguation approach3. [sent-356, score-0.193]

98 In particular, for Accuracy@1 scores, DBA can achieve a gain of about 3 Since SVM only outputs hard class assignments, there is no Accuracy@2,3 for SVM based methods. [sent-359, score-0.066]

99 The proposed DBA model is useful for both pattern classification and instance disambiguation, as has been tested respectively in text classification and named-entity disambiguation tasks. [sent-368, score-0.365]

100 An interesting observation in practice is that, although there might be a large number of classes/topics, usually a pattern is only associated with a very limited number of them. [sent-369, score-0.102]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dba', 0.8), ('zc', 0.197), ('disambiguation', 0.173), ('mnb', 0.169), ('yc', 0.139), ('mimlboost', 0.123), ('lda', 0.11), ('mimlsvm', 0.108), ('mc', 0.094), ('topics', 0.091), ('named', 0.091), ('zm', 0.09), ('variational', 0.089), ('entity', 0.085), ('pattern', 0.084), ('zmc', 0.077), ('paragraphs', 0.074), ('document', 0.071), ('topic', 0.071), ('instance', 0.065), ('eq', 0.063), ('mic', 0.062), ('mlc', 0.062), ('slda', 0.058), ('na', 0.056), ('instances', 0.053), ('web', 0.053), ('classi', 0.052), ('corpora', 0.049), ('corpus', 0.046), ('bcd', 0.046), ('documents', 0.046), ('text', 0.043), ('classes', 0.042), ('gain', 0.041), ('labels', 0.04), ('query', 0.039), ('alignment', 0.039), ('svm', 0.037), ('prede', 0.035), ('label', 0.034), ('queries', 0.034), ('ac', 0.033), ('tanh', 0.033), ('dirichlet', 0.032), ('entities', 0.032), ('fd', 0.031), ('zha', 0.031), ('log', 0.031), ('bag', 0.031), ('ambiguously', 0.031), ('hongyuan', 0.031), ('mnbtf', 0.031), ('mnbtfidf', 0.031), ('potter', 0.031), ('svmtf', 0.031), ('svmtfidf', 0.031), ('xmd', 0.031), ('xmn', 0.031), ('latent', 0.031), ('discovered', 0.03), ('dir', 0.028), ('ambiguity', 0.028), ('multinomial', 0.028), ('cour', 0.027), ('casts', 0.027), ('yn', 0.025), ('implicit', 0.025), ('fl', 0.025), ('aligns', 0.025), ('paragraph', 0.025), ('tfidf', 0.025), ('class', 0.025), ('cation', 0.025), ('exp', 0.024), ('bernoulli', 0.024), ('conditioned', 0.023), ('search', 0.023), ('kl', 0.023), ('xm', 0.022), ('inferential', 0.022), ('harry', 0.022), ('generative', 0.022), ('belongs', 0.022), ('belong', 0.021), ('blei', 0.021), ('georgia', 0.02), ('tech', 0.02), ('scores', 0.02), ('comprises', 0.019), ('texts', 0.019), ('tf', 0.019), ('mining', 0.019), ('xn', 0.019), ('pictures', 0.019), ('hu', 0.019), ('ambiguous', 0.019), ('usually', 0.018), ('discriminative', 0.018), ('concluding', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

Author: Shuang-hong Yang, Hongyuan Zha, Bao-gang Hu

Abstract: We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively.

2 0.102499 205 nips-2009-Rethinking LDA: Why Priors Matter

Author: Andrew McCallum, David M. Mimno, Hanna M. Wallach

Abstract: Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1

3 0.10122196 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1

4 0.081377201 204 nips-2009-Replicated Softmax: an Undirected Topic Model

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

5 0.078634895 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Author: Chong Wang, David M. Blei

Abstract: We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the “topics”). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models. 1

6 0.078024961 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

7 0.073172368 190 nips-2009-Polynomial Semantic Indexing

8 0.070765659 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

9 0.067308173 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

10 0.062781632 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

11 0.060536671 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

12 0.058955651 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

13 0.053253312 186 nips-2009-Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

14 0.051224887 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

15 0.050853826 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

16 0.050401062 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

17 0.049351081 72 nips-2009-Distribution Matching for Transduction

18 0.04756197 97 nips-2009-Free energy score space

19 0.046181817 71 nips-2009-Distribution-Calibrated Hierarchical Classification

20 0.044796713 87 nips-2009-Exponential Family Graph Matching and Ranking


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.132), (1, -0.054), (2, -0.087), (3, -0.092), (4, 0.056), (5, -0.082), (6, -0.078), (7, -0.004), (8, -0.028), (9, 0.141), (10, -0.026), (11, -0.003), (12, 0.006), (13, 0.06), (14, 0.039), (15, -0.011), (16, -0.055), (17, -0.007), (18, -0.009), (19, -0.018), (20, 0.07), (21, -0.037), (22, 0.004), (23, -0.022), (24, -0.005), (25, 0.013), (26, -0.016), (27, -0.027), (28, 0.065), (29, 0.005), (30, -0.0), (31, 0.017), (32, -0.078), (33, 0.05), (34, 0.007), (35, 0.038), (36, -0.081), (37, -0.044), (38, 0.047), (39, -0.062), (40, 0.012), (41, 0.035), (42, 0.021), (43, 0.009), (44, -0.013), (45, -0.043), (46, -0.051), (47, -0.001), (48, -0.093), (49, 0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90010178 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

Author: Shuang-hong Yang, Hongyuan Zha, Bao-gang Hu

Abstract: We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively.

2 0.75550091 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

Author: Tomoharu Iwata, Takeshi Yamada, Naonori Ueda

Abstract: We propose a probabilistic topic model for analyzing and extracting contentrelated annotations from noisy annotated discrete data such as web pages stored in social bookmarking services. In these services, since users can attach annotations freely, some annotations do not describe the semantics of the content, thus they are noisy, i.e. not content-related. The extraction of content-related annotations can be used as a preprocessing step in machine learning tasks such as text classification and image recognition, or can improve information retrieval performance. The proposed model is a generative model for content and annotations, in which the annotations are assumed to originate either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method by using synthetic data and real social annotation data for text and images.

3 0.71787834 204 nips-2009-Replicated Softmax: an Undirected Topic Model

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

4 0.70827353 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Author: Chong Wang, David M. Blei

Abstract: We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the “topics”). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models. 1

5 0.68808281 205 nips-2009-Rethinking LDA: Why Priors Matter

Author: Andrew McCallum, David M. Mimno, Hanna M. Wallach

Abstract: Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1

6 0.58035028 186 nips-2009-Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

7 0.57630426 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

8 0.56230968 96 nips-2009-Filtering Abstract Senses From Image Search Results

9 0.54416418 190 nips-2009-Polynomial Semantic Indexing

10 0.53933787 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

11 0.49761492 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

12 0.47577026 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

13 0.42913797 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

14 0.41850007 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models

15 0.40891814 72 nips-2009-Distribution Matching for Transduction

16 0.38194481 260 nips-2009-Zero-shot Learning with Semantic Output Codes

17 0.37957665 49 nips-2009-Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

18 0.37803444 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

19 0.37263244 97 nips-2009-Free energy score space

20 0.36553198 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.011), (24, 0.052), (25, 0.058), (31, 0.016), (35, 0.054), (36, 0.101), (39, 0.049), (58, 0.038), (61, 0.014), (71, 0.093), (73, 0.276), (81, 0.028), (86, 0.088), (91, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76940471 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

Author: Shuang-hong Yang, Hongyuan Zha, Bao-gang Hu

Abstract: We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively.

2 0.69321996 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

Author: Alan S. Willsky, Erik B. Sudderth, Michael I. Jordan, Emily B. Fox

Abstract: We propose a Bayesian nonparametric approach to the problem of modeling related time series. Using a beta process prior, our approach is based on the discovery of a set of latent dynamical behaviors that are shared among multiple time series. The size of the set and the sharing pattern are both inferred from data. We develop an efficient Markov chain Monte Carlo inference method that is based on the Indian buffet process representation of the predictive distribution of the beta process. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.

3 0.56908596 204 nips-2009-Replicated Softmax: an Undirected Topic Model

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produce an accurate estimate of the log-probability the model assigns to test data. This allows us to demonstrate that the proposed model is able to generalize much better compared to Latent Dirichlet Allocation in terms of both the log-probability of held-out documents and the retrieval accuracy.

4 0.56799448 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

5 0.56785029 260 nips-2009-Zero-shot Learning with Semantic Output Codes

Author: Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell

Abstract: We consider the problem of zero-shot learning, where the goal is to learn a classifier f : X → Y that must predict novel values of Y that were omitted from the training set. To achieve this, we define the notion of a semantic output code classifier (SOC) which utilizes a knowledge base of semantic properties of Y to extrapolate to novel classes. We provide a formalism for this type of classifier and study its theoretical properties in a PAC framework, showing conditions under which the classifier can accurately predict novel classes. As a case study, we build a SOC classifier for a neural decoding task and show that it can often predict words that people are thinking about from functional magnetic resonance images (fMRI) of their neural activity, even without training examples for those words. 1

6 0.56587976 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

7 0.55969125 129 nips-2009-Learning a Small Mixture of Trees

8 0.55915838 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

9 0.55875748 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

10 0.55715555 226 nips-2009-Spatial Normalized Gamma Processes

11 0.55621135 96 nips-2009-Filtering Abstract Senses From Image Search Results

12 0.5541743 56 nips-2009-Conditional Neural Fields

13 0.55308026 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

14 0.55213308 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

15 0.55166286 112 nips-2009-Human Rademacher Complexity

16 0.55151761 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

17 0.55085999 97 nips-2009-Free energy score space

18 0.55032265 90 nips-2009-Factor Modeling for Advertisement Targeting

19 0.5496949 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

20 0.54725438 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability