nips nips2012 nips2012-166 knowledge-graph by maker-knowledge-mining

166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Source: pdf

Author: Xianxing Zhang, Lawrence Carin

Abstract: A new methodology is developed for joint analysis of a matrix and accompanying documents, with the documents associated with the matrix rows/columns. The documents are modeled with a focused topic model, inferring interpretable latent binary features for each document. A new matrix decomposition is developed, with latent binary features associated with the rows/columns, and with imposition of a low-rank constraint. The matrix decomposition and topic model are coupled by sharing the latent binary feature vectors associated with each. The model is applied to roll-call data, with the associated documents deﬁned by the legislation. Advantages of the proposed model are demonstrated for prediction of votes on a new piece of legislation, based only on the observed text of legislation. The coupling of the text and legislation is also shown to yield insight into the properties of the matrix decomposition for roll-call data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A new methodology is developed for joint analysis of a matrix and accompanying documents, with the documents associated with the matrix rows/columns. [sent-4, score-0.225]

2 The documents are modeled with a focused topic model, inferring interpretable latent binary features for each document. [sent-5, score-0.604]

3 A new matrix decomposition is developed, with latent binary features associated with the rows/columns, and with imposition of a low-rank constraint. [sent-6, score-0.403]

4 The matrix decomposition and topic model are coupled by sharing the latent binary feature vectors associated with each. [sent-7, score-0.547]

5 Advantages of the proposed model are demonstrated for prediction of votes on a new piece of legislation, based only on the observed text of legislation. [sent-9, score-0.349]

6 The coupling of the text and legislation is also shown to yield insight into the properties of the matrix decomposition for roll-call data. [sent-10, score-0.505]

7 1 Introduction The analysis of legislative roll-call data provides an interesting setting for recent developments in the joint analysis of matrices and text [23, 8]. [sent-11, score-0.2]

8 The problem is made interesting because, in addition to the matrix of votes, we have access to the text of the legislation (e. [sent-13, score-0.466]

9 , characteristic of the columns of the matrix, with each column representing a piece of legislation and each row a legislator). [sent-15, score-0.481]

10 , the text may correspond to content on a website; each column of the matrix may represent a website, and each row an individual, with the matrix representing number of visits). [sent-18, score-0.228]

11 In most such research the binary data are typically analyzed with a probit or logistic link function, and the underlying real matrix is assumed to have rank one. [sent-20, score-0.198]

12 Each legislator and piece of legislation exists at a point along this one dimension, which is interpreted as characterizing a (one-dimensional) political philosophy (e. [sent-21, score-0.484]

13 As in much matrix-completion research [17, 18], one typically can only infer votes that are missing at random. [sent-25, score-0.242]

14 It is not possible to predict the votes of legislators on a new piece of legislation (for which, for example, an entire column of votes is missing). [sent-26, score-0.807]

15 In [23, 8] a latent Dirichlet allocation (LDA) [5] topic model was employed for the text. [sent-29, score-0.342]

16 It has been demonstrated that LDA yields inferior perplexity scores when compared to modern Bayesian topic models, such as the focused topic model (FTM) [24]. [sent-30, score-0.468]

17 Another signiﬁcant issue with [23, 8] concerns how the topic (text) and matrix models are coupled. [sent-31, score-0.269]

18 In [23, 8] the frequency with which a given topic is utilized in the text legislation is used to infer the associated matrix parameters (e. [sent-32, score-0.77]

19 , to infer the latent feature vector associated with the respective column of the matrix). [sent-34, score-0.215]

20 Motivated by these limitations, in this paper the FTM is employed to model the text of legislation, with each piece of legislation characterized by a latent binary vector that deﬁnes the sparse set of associated topics. [sent-37, score-0.743]

21 A new probabilistic low-rank matrix decomposition is developed for the votes, utilizing latent binary features; this leverages the merits of what were previously two distinct lines of matrix factorization methods [13, 17]. [sent-38, score-0.39]

22 For a piece of legislation, the latent binary feature vectors for the FTM and matrix decomposition are shared, yielding a new means of jointly modeling text and matrices. [sent-40, score-0.507]

23 This linkage between text and matrices is innovative as: (i) it’s based on whether a topic is relevant to a document/legislation, not based on the frequency with which the topic is used in the document (i. [sent-41, score-0.713]

24 , not based on the style of writing); (ii) it enables interpretation of the underlying latent binary features [13, 9] based upon available text data. [sent-43, score-0.37]

25 Section 2 ﬁrst reviews the focused topic model, then introduces a new lowrank matrix decomposition method, and the joint model of the two. [sent-45, score-0.403]

26 In Section 4 quantitative results are presented for prediction of columns of roll-call votes based on the associated text legislation, and the joint model is demonstrated qualitatively to infer meaning/insight for the characteristics of legislation and voting patterns, and Section 5 concludes. [sent-47, score-0.711]

27 It is desirable to share a set of topics across all documents, but with the additional constraint that a given document only utilize a small subset of the topics; this tends to yield more descriptive/focused topics, characteristic of detailed properties of the documents. [sent-51, score-0.241]

28 A FTM is manifested as a compound linkage of the Indian buffet process (IBP) [10] and the Dirichlet process (DP). [sent-52, score-0.168]

29 Each document draws latent binary features from an IBP to select a ﬁnite subset of atoms/topics from the DP. [sent-53, score-0.326]

30 In the model details, the DP is represented in terms of a normalized gamma process [7] with weighting by the binary feature vector, constituting a document-speciﬁc topic distribution in which only a subset of topics are manifested with non-zero probability. [sent-54, score-0.53]

31 2 bj: and λ, thereby selecting a subset of topics for document j (those for which the corresponding components of bj: are non-zero). [sent-58, score-0.206]

32 The rest of the FTM is constructed similar to LDA [5], where for each token n in document j, a topic indicator is drawn as zjn |θj ∼ Mult(zjn |1, θj ). [sent-59, score-0.35]

33 Conditional on zjn and the topics {βk }Kr , a word is drawn as wjn |zjn , {βk }Kr ∼ Mult(wjn |1, βzjn ), where k=1 k=1 βk |η ∼ Dirichlet(βk |η). [sent-60, score-0.221]

34 Although in (1) bj: is mainly designed to map the global prevalence of topics across the corpus, λ, to a within-document proportion of topic usage, θj , latent features bj: are informative in their own right, as they indicate which subset of topics is relevant to a given document. [sent-61, score-0.657]

35 We therefore make the linkage between documents and an associated matrix via the bj: , not based on θj (where [23, 8] base the document-matrix linkage via θj or it’s empirical estimate). [sent-63, score-0.259]

36 2 Matrix factorization with binary latent factors and a low-rank assumption Binary matrix factorization (BMF) [13, 14] is a general framework in which real latent matrix X ∈ RP ×N is decomposed as X = LHRT , where L ∈ {0, 1}P ×Kl , R ∈ {0, 1}N ×Kr are binary, and H ∈ RKl ×Kr is real. [sent-65, score-0.515]

37 We focus on binary observed matrices, Y ∈ {0, 1}P ×N , and utilize f (·) as a probit model [2]: yij = with xij = xij + ˆ ij , where ij if xij ≥ 0 ˆ if xij < 0 ˆ 1 0 (2) ∼ N (0, 1). [sent-69, score-0.335]

38 Thus the deﬁnition of Ψ and Φ via the binary matrices L and R and the linkage matrix H merges previously two distinct lines of matrix factorization methods. [sent-77, score-0.313]

39 In the context of the application considered here, the decomposition X = LHRT will prove convenient, as we may share the binary matrices L or R among the topic usage of available documents. [sent-78, score-0.436]

40 The binary features in L and R are therefore characteristic of the presence/absence of underlying topics, or related latent processes, and the matrix H provides the mapping of how these binary features map to observed data. [sent-79, score-0.47]

41 We model the “signiﬁcance” of each rank-1 term in the expansion explicitly, using a stochastic Kc T process {sk }Kc , therefore H can be decomposed as H = k=1 sk u:k v:k , Kc can be inﬁnity in k=1 principle. [sent-82, score-0.16]

42 As a result, the hierarchical representation in modeling the latent matrix X in probit model can be summarized as: xij | li: , rj: , {u:k , v:k , sk }Kc ∼ N xij | ˆ ˆ k=1 Kc T k=1 sk (li: u:k )(rj: v:k ) , 1 (4) Note that sk in (4) is similar to the singular value of SVD in spirit. [sent-83, score-0.809]

43 Theorem 1 below formally states that if sk is modeled by MGP as in (5), the rank-1 expansion in (4) will converge when Kc → ∞. [sent-87, score-0.202]

44 When αc > 1, the sequence Kc T k=1 sk (li: u:k )(rj: v:k ) converges in 2, as Kc → ∞. [sent-89, score-0.16]

45 ∞ T k=Kc +1 sk (li: u:k )(rj: v:k ) , then ∀ maxk E(li: u:k )2 , b = maxk E(rj: v:k )2 . [sent-92, score-0.16]

46 3 Joint learning of FTM and BMF Via the FTM and BMF framework of the previous subsections, each piece of legislation j is represented as two latent binary feature vectors bj: and rj: . [sent-98, score-0.621]

47 To jointly model the matrix of votes with associated text of legislation, a natural choice is to impose bj: = rj: . [sent-99, score-0.343]

48 As a result, the full joint model can be speciﬁed by equations (1) - (5), with bjt in (1) replaced by rjt . [sent-100, score-0.171]

49 In the context of the model for Y = f (X), with X = LHRT , if one were to learn L and H based upon available training data, then a new legislation y:N +1 could be predicted if we had access to r:N +1 . [sent-103, score-0.323]

50 Via the construction above, not only do we gain a predictive advantage, because the new legislation’s latent binary features r:N +1 can be obtained from modeling its document as in (1), but also the model provides powerful interpretative insights. [sent-104, score-0.353]

51 Speciﬁcally the topics inferred from the documents may be used to interpret the latent binary features associated with the matrix factorization. [sent-105, score-0.56]

52 4 Related work The ideal point topic model (IPTM) was developed in [8], where the supervised Latent Dirichlet Allocation (sLDA) [4] model was used to link empirical topic-usage frequencies to the latent factors via regression. [sent-108, score-0.342]

53 In [23] the authors proposed to jointly analyze the voting matrix and the associated text through a mixture model, where each legislation’s latent feature factor is clustered to a mixture component in coupled with that legislation’s document topic distribution θ. [sent-112, score-0.631]

54 Note that in their case each piece of legislation can only belong to one cluster, while in our case the latent binary features for each document can be effectively treated as being grouped to multiple clusters [13] (a mixedmembership model, manifested in terms of the binary feature vectors). [sent-113, score-0.878]

55 Similar research in linking collaborative ﬁltering and topic models can also be found in web content recommendation [1], movie recommendation[19], and scientiﬁc paper recommendation [22]. [sent-114, score-0.3]

56 None of these methods makes use of the binary indicators as the characterization of associated documents, but perform linking via the topic distribution θ and the latent (real) features in different ways. [sent-115, score-0.54]

57 Sampling {v:k , u:k }k=1:Kc Based on (3) and (4) the conditional posterior of v:k can be writN Kc ˆ ten as p(v:k |−) ∝ j=1 N (x:j | k=1 sk (Lu:k )(rj: v:k ), 1)N (v:k |0, IKr ). [sent-119, score-0.188]

58 It can be shown that N T ˜ −k j=1 (Lu:k rj: ) x:j and covariN T −1 T ˜ :j ˆ , where x−k = x:j − LUVT rj: + j=1 (Lu:k rj: ) (Lu:k rj: )] p(v:k |−) = N (v:k |µv:k , Σv:k ), with mean µv:k = sk Σv:k ance matrix Σv:k = [IKr + s2 k Lu:k rj: v:k . [sent-120, score-0.213]

59 Sampling {sk }k=1:Kc Based on (4) and (5) the conditional posterior of sk can be written N Kc −1 ˆ as p(sk |−) ∝ It can be shown that k=1 sk (Lu:k )(rj: v:k ), 1)N (sk |0, τk ). [sent-122, score-0.348]

60 Sampling {rjt }j=1:N,t=1:Kr Similar to the derivation in [24], p(rjt = 1|−) = 1 if Njt > 0, where Njt denotes the number of times document j used topic t. [sent-126, score-0.288]

61 When Njt = 0, based on (1) and (4) the conditional posterior of rjt can be written as p(rjt = 1|−) ∝ πt ˜ :j exp{− 1 [(LhT )T (LhT ) − 2(LhT )T x−k ]}, where ht: represents the tth row of H = t: t: t: 2 πt +2λt (1−πt ) Kc T k=1 sk u:k v:k ; and p(rjt = 0|−) ∝ 2λt (1−πt ) . [sent-127, score-0.303]

62 1 Experiment setting We have performed joint matrix and text analysis, considering the House of Representatives (House), sessions 106 - 111 2 ; we model each session’s roll-call votes separately as binary matrix Y. [sent-135, score-0.53]

63 Entry yij = 1 denotes that the ith legislator’s response to legislation j is either “Yea” or “Yes” , and yij = 0 denotes that the corresponding response is either “Nay” or “No”. [sent-136, score-0.393]

64 We recommend to set the IBP hyperparameters αl = αr = 1, MGP hyperparameters αc = 3, FTM hyperparameters γ = 5 and topic model hyperparameter η = 0. [sent-138, score-0.216]

65 2 Predicting random missing votes In this section we study the classical problem of estimating the values of matrix data that are missing uniformly at random (in-matrix missing votes), without the use of associated documents. [sent-147, score-0.4]

66 This is done by decomposing the latent matrix X = ΨΦT , where each row of Ψ and ΦT are drawn from a Gaussian distribution with mean and covariance matrix modeled by a Gaussian-Wishart distribution. [sent-149, score-0.274]

67 In Figure 1 each panel corresponds to a certain percentage of missingness; the horizontal axis is the number of columns (rank), which varies as a free parameter of PMF, while the vertical axis is the prediction accuracy. [sent-156, score-0.266]

68 3 Predicting new bills based on text We study the predictive power of the proposed model when the legislative roll-call votes and the associated bill documents are modeled jointly, as described in Section 2. [sent-165, score-0.699]

69 We also compare our model with that in [23], where the authors proposed to combine the factor analysis model and topic model via a compounded mixture model, with all sessions of roll-call data are modeled jointly via a Markov process. [sent-169, score-0.312]

70 Since our main goal is to predict new bills but not modeling the matrices dynamically, in the following experiments we remove the Markov process but model each session of House data separately; we call this model FATM. [sent-170, score-0.344]

71 In [23] the authors proposed to use a beta-Bernoulli distributed binary variable bk to model if the kth rank-1 matrix is used in matrix decomposition. [sent-171, score-0.224]

72 When performing posterior inference we ﬁnd that bk tends to be easily trapped in local maxima, while MGP, which models the signiﬁcance of usage (but not the binary usage) of each kth rank-1 matrix via sk , smoother estimates and better mixing were observed. [sent-172, score-0.43]

73 For each session the bills are partitioned into 6-folds, and we iteratively remove a fold, and train the model with the remaining folds; predictions are then performed on the bills in the removed fold. [sent-173, score-0.549]

74 This may lead to the undesirable consequence that the latent features learned from text are not discriminative in predicting a new piece of legislation. [sent-176, score-0.404]

75 To reduce such risk, in practice we could either set αr such that it strongly favor fewer latent binary features, or we can truncate the stick breaking construction at a pre-deﬁned level Kr . [sent-177, score-0.277]

76 91 50 1 5 10 BMF‐Original Proposed 20 50 PMF PMF+MGP 1 5 10 20 50 Figure 1: Comparison of prediction accuracy for votes missing uniformly at random, for the 110th House data. [sent-201, score-0.217]

77 Different panels corresponds to different percentage of missingness, for each panel the vertical axis represents accuracy and horizontal axis represents the rank set for PMF. [sent-202, score-0.303]

78 81 30 50 100 150 200 300 FATM IPTM 30 50 100 IPTM(Kc = 1) Figure 2: Prediction accuracy for held-out legislation across 106th - 111th House data; prediction of an entire column of missing votes based on text. [sent-251, score-0.572]

79 In each panel the vertical axis represents accuracy and the horizontal axis represents the number of topics used for each model. [sent-252, score-0.356]

80 tage of our proposed model when the truncation on the number of topics Kr (horizontal axis) is small (e. [sent-254, score-0.186]

81 4 Latent binary feature interpretation In this study we partition all the bills into two groups: (i) bills for which there is near-unanimous agreement, with “Yea” or “Yes” more than 90%; (ii) contentious bills with percentage of votes received as “Yea” or “Yes” less than 60%. [sent-260, score-1.118]

82 By linking the inferred binary latent features to the topics for those two groups, we can get insight into the characteristics of legislation and voting patterns, e. [sent-261, score-0.822]

83 Figure 3 compares the latent feature usage pattern of those two groups; the horizontal axis represents the latent features, where we set Kr = 100 for illustration purpose, and the vertical axis is the aggregated frequency that a feature/topic is used by all the bills in each of those two groups. [sent-264, score-0.814]

84 For example, in the left panel the features highlighted in blue are widely used by bills in the left group, but rarely used by bills in the right group. [sent-267, score-0.615]

85 As observed 7 Binary feature usage pattern for unanimous agreed bills Binary feature usage pattern for highly debated bills 0. [sent-268, score-0.734]

86 01 0 0 10 20 30 40 50 60 70 80 90 0 100 0 10 20 30 40 50 60 70 80 90 100 Figure 3: Comparison of the frequencies of binary features usage between two groups of bills, left: nearunanimous afﬁrmative bills (e. [sent-281, score-0.46]

87 , bills with percentage of votes received as “Yes” or “Yea” is more than 90%). [sent-283, score-0.473]

88 , bills with percentage of votes received as “Yes” or “Yea” is less than 60%). [sent-286, score-0.473]

89 The six most discriminative features/topics (labeled in the ﬁgure) are shown in Table 1 Table 1: Six discriminative topics of unanimous agreed/highly debated bills learned from the 110th house of representatives, with top-ten most probable words shown. [sent-289, score-0.617]

90 We also study the interpretation of those latent features by linking them to the topics inferred from the texts. [sent-292, score-0.376]

91 As an example, those six highlighted features are linked to their corresponding topics and depicted in Table 1, with the top-ten most probable words within each topic shown. [sent-293, score-0.42]

92 For the unanimous agreed bills, we can read from Table 1 that they are highly probable to be related to topics about the education of youth (Topic 22), or the prevention of terrorist (Topic 73). [sent-294, score-0.224]

93 While the bills from the contentious group tend to more related to making amendments to an existing piece of legislation (Topic 83) or discussing taxation (Topic 38). [sent-295, score-0.717]

94 Note that compared to conventional topic modeling, these inferred topics are not only informative in semantic meaning of the bills, but also discriminative in predicting the outcome of the bills. [sent-296, score-0.431]

95 5 Conclusion A new methodology has been developed for the joint analysis of a matrix with associated text, based on sharing latent binary features modeled via the Indian buffet process. [sent-297, score-0.464]

96 Imposition of a lowrank representation for the latent real matrix has proven important, with this done in a new manner via the multiplicative gamma process. [sent-299, score-0.249]

97 The sharing of latent binary features provides a general joint learning framework for Indian buffet process based models [9], where focused topic model and binary matrix factorization are two examples, exploring other possibilities in different scenarios could be an interesting direction. [sent-301, score-0.761]

98 Inﬁnite latent feature models and the Indian buffet process. [sent-363, score-0.178]

99 The IBP compound Dirichlet process and its application to focused topic modeling. [sent-462, score-0.252]

100 Hierarchical topic modeling for analysis of time-evolving personal choices. [sent-468, score-0.243]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kc', 0.582), ('legislation', 0.323), ('bills', 0.261), ('topic', 0.216), ('mgp', 0.186), ('votes', 0.168), ('sk', 0.16), ('kr', 0.139), ('topics', 0.134), ('latent', 0.126), ('ftm', 0.124), ('rj', 0.123), ('bmf', 0.115), ('iptm', 0.099), ('house', 0.098), ('piece', 0.091), ('text', 0.09), ('bj', 0.086), ('opic', 0.084), ('rjt', 0.084), ('binary', 0.081), ('pmf', 0.08), ('lu', 0.078), ('document', 0.072), ('usage', 0.071), ('yea', 0.07), ('ibp', 0.066), ('axis', 0.065), ('zjn', 0.062), ('linkage', 0.059), ('manifested', 0.057), ('bjt', 0.056), ('missingness', 0.056), ('documents', 0.056), ('yes', 0.055), ('sessions', 0.054), ('matrix', 0.053), ('truncation', 0.052), ('buffet', 0.052), ('legislative', 0.05), ('missing', 0.049), ('xij', 0.048), ('features', 0.047), ('indian', 0.047), ('dirichlet', 0.045), ('percentage', 0.044), ('voting', 0.042), ('gamma', 0.042), ('contentious', 0.042), ('ikr', 0.042), ('lhrt', 0.042), ('lht', 0.042), ('njt', 0.042), ('unanimous', 0.042), ('modeled', 0.042), ('decomposition', 0.039), ('vertical', 0.038), ('linking', 0.038), ('stick', 0.038), ('factorization', 0.038), ('legislator', 0.037), ('rank', 0.037), ('kth', 0.037), ('focused', 0.036), ('yij', 0.035), ('characteristic', 0.035), ('political', 0.033), ('column', 0.032), ('associated', 0.032), ('breaking', 0.032), ('frequency', 0.031), ('sampler', 0.031), ('joint', 0.031), ('tth', 0.031), ('horizontal', 0.031), ('inferred', 0.031), ('matrices', 0.029), ('hdp', 0.028), ('debated', 0.028), ('ibps', 0.028), ('lowrank', 0.028), ('posterior', 0.028), ('session', 0.027), ('probit', 0.027), ('modeling', 0.027), ('discriminative', 0.027), ('style', 0.026), ('infer', 0.025), ('dp', 0.025), ('legislators', 0.025), ('imposition', 0.025), ('wjn', 0.025), ('youth', 0.025), ('dunson', 0.024), ('vote', 0.024), ('predicting', 0.023), ('highlighted', 0.023), ('panel', 0.023), ('terrorist', 0.023), ('recommendation', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Author: Xianxing Zhang, Lawrence Carin

2 0.21180925 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

Author: Sean Gerrish, David M. Blei

Abstract: We develop a probabilistic model of legislative data that uses the text of the bills to uncover lawmakers’ positions on speciﬁc political issues. Our model can be used to explore how a lawmaker’s voting patterns deviate from what is expected and how that deviation depends on what is being voted on. We derive approximate posterior inference algorithms based on variational methods. Across 12 years of legislative data, we demonstrate both improvement in heldout predictive performance and the model’s utility in interpreting an inherently multi-dimensional space. 1

3 0.15456791 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

Author: Michael Paul, Mark Dredze

Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is inﬂuenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientiﬁc discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1

4 0.14235859 233 nips-2012-Multiresolution Gaussian Processes

Author: David B. Dunson, Emily B. Fox

Abstract: We propose a multiresolution Gaussian process to capture long-range, nonMarkovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each deﬁned over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points deﬁne the abrupt changes. Due to the inherent conjugacy of the GPs, one can analytically marginalize the GPs and compute the marginal likelihood of the observations given the partition tree. This property allows for efﬁcient inference of the partition itself, for which we employ graph-theoretic techniques. We apply the multiresolution GP to the analysis of magnetoencephalography (MEG) recordings of brain activity.

5 0.13248257 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

Author: Michael Bryant, Erik B. Sudderth

Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.

6 0.12129839 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

7 0.10362002 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

8 0.10010615 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

9 0.097220913 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

10 0.094814375 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

11 0.093566887 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

12 0.092027672 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

13 0.082363285 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

14 0.081606217 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

15 0.080327667 12 nips-2012-A Neural Autoregressive Topic Model

16 0.077918015 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

17 0.077842377 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

18 0.074804813 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

19 0.072407685 22 nips-2012-A latent factor model for highly multi-relational data

20 0.07054887 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.15), (1, 0.072), (2, -0.025), (3, -0.017), (4, -0.187), (5, -0.039), (6, -0.011), (7, 0.024), (8, 0.127), (9, -0.045), (10, 0.121), (11, 0.162), (12, 0.0), (13, -0.046), (14, 0.033), (15, 0.065), (16, 0.075), (17, 0.103), (18, -0.0), (19, 0.053), (20, -0.014), (21, 0.04), (22, -0.02), (23, -0.088), (24, 0.007), (25, -0.016), (26, 0.04), (27, 0.096), (28, -0.034), (29, -0.013), (30, 0.04), (31, 0.085), (32, 0.012), (33, 0.054), (34, -0.009), (35, -0.013), (36, -0.099), (37, 0.021), (38, 0.02), (39, -0.053), (40, -0.069), (41, 0.029), (42, 0.07), (43, 0.055), (44, 0.018), (45, 0.19), (46, 0.0), (47, 0.014), (48, 0.108), (49, 0.122)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94583738 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Author: Xianxing Zhang, Lawrence Carin

2 0.77244318 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

Author: Sean Gerrish, David M. Blei

3 0.73496461 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

Author: Michael Paul, Mark Dredze

4 0.68291223 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

Author: Kosuke Fukumasu, Koji Eguchi, Eric P. Xing

Abstract: Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be speciﬁed in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more eﬀective than some other existing multilingual topic models. 1

5 0.67469335 12 nips-2012-A Neural Autoregressive Topic Model

Author: Hugo Larochelle, Stanislas Lauly

Abstract: We describe a new model for learning meaningful representations of text documents from an unlabeled collection of documents. This model is inspired by the recently proposed Replicated Softmax, an undirected graphical model of word counts that was shown to learn a better generative model and more meaningful document representations. Speciﬁcally, we take inspiration from the conditional mean-ﬁeld recursive equations of the Replicated Softmax in order to deﬁne a neural network architecture that estimates the probability of observing a new word in a given document given the previously observed words. This paradigm also allows us to replace the expensive softmax distribution over words with a hierarchical distribution over paths in a binary tree of words. The end result is a model whose training complexity scales logarithmically with the vocabulary size instead of linearly as in the Replicated Softmax. Our experiments show that our model is competitive both as a generative model of documents and as a document representation learning algorithm. 1

6 0.65988076 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

7 0.63520163 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

8 0.63029456 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

9 0.59369326 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

10 0.58631456 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

11 0.57155603 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

12 0.52912885 192 nips-2012-Learning the Dependency Structure of Latent Factors

13 0.51564682 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

14 0.51460057 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

15 0.47475958 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

16 0.47280905 22 nips-2012-A latent factor model for highly multi-relational data

17 0.4504123 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

18 0.41741025 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

19 0.38312677 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

20 0.38203961 59 nips-2012-Bayesian nonparametric models for bipartite graphs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.101), (9, 0.013), (21, 0.018), (38, 0.118), (39, 0.349), (42, 0.016), (53, 0.011), (54, 0.023), (55, 0.028), (74, 0.03), (76, 0.089), (80, 0.067), (92, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77940071 351 nips-2012-Transelliptical Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have inﬁnite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments conﬁrm that TCA can achieve model ﬂexibility, estimation accuracy and robustness at almost no cost. 1

same-paper 2 0.7735182 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Author: Xianxing Zhang, Lawrence Carin

3 0.76776719 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

Author: Gal Elidan, Cobi Cario

Abstract: The empirical success of the belief propagation approximate inference algorithm has inspired numerous theoretical and algorithmic advances. Yet, for continuous non-Gaussian domains performing belief propagation remains a challenging task: recent innovations such as nonparametric or kernel belief propagation, while useful, come with a substantial computational cost and offer little theoretical guarantees, even for tree structured models. In this work we present Nonparanormal BP for performing efﬁcient inference on distributions parameterized by a Gaussian copulas network and any univariate marginals. For tree structured networks, our approach is guaranteed to be exact for this powerful class of non-Gaussian models. Importantly, the method is as efﬁcient as standard Gaussian BP, and its convergence properties do not depend on the complexity of the univariate marginals, even when a nonparametric representation is used. 1

4 0.76025021 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Author: Xiaolong Wang, Liang Lin

Abstract: This paper studies a novel discriminative part-based model to represent and recognize object shapes with an “And-Or graph”. We deﬁne this model consisting of three layers: the leaf-nodes with collaborative edges for localizing local parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encoding the global veriﬁcation. A discriminative learning algorithm, extended from the CCCP [23], is proposed to train the model in a dynamical manner: the model structure (e.g., the conﬁguration of the leaf-nodes associated with the or-nodes) is automatically determined with optimizing the multi-layer parameters during the iteration. The advantages of our method are two-fold. (i) The And-Or graph model enables us to handle well large intra-class variance and background clutters for object shape detection from images. (ii) The proposed learning algorithm is able to obtain the And-Or graph representation without requiring elaborate supervision and initialization. We validate the proposed method on several challenging databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-People), and it outperforms the state-of-the-arts approaches. 1

5 0.67994142 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Author: Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou

Abstract: Both random Fourier features and the Nystr¨ m method have been successfully o applied to efﬁcient kernel learning. In this work, we investigate the fundamental difference between these two approaches, and how the difference could affect their generalization performances. Unlike approaches based on random Fourier features where the basis functions (i.e., cosine and sine functions) are sampled from a distribution independent from the training data, basis functions used by the Nystr¨ m method are randomly sampled from the training examples and are o therefore data dependent. By exploring this difference, we show that when there is a large gap in the eigen-spectrum of the kernel matrix, approaches based on the Nystr¨ m method can yield impressively better generalization error bound than o random Fourier features based approach. We empirically verify our theoretical ﬁndings on a wide range of large data sets. 1

6 0.64808083 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

7 0.64735281 352 nips-2012-Transelliptical Graphical Models

8 0.5489912 310 nips-2012-Semiparametric Principal Component Analysis

9 0.53904313 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

10 0.53203279 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

11 0.51031357 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

12 0.50585246 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

13 0.49507833 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

14 0.49116984 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

15 0.49013868 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

16 0.48952579 192 nips-2012-Learning the Dependency Structure of Latent Factors

17 0.4856185 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

18 0.48550048 163 nips-2012-Isotropic Hashing

19 0.4838706 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

20 0.48372221 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination