nips nips2011 nips2011-140 knowledge-graph by maker-knowledge-mining

140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Source: pdf

Author: Le Song, Eric P. Xing, Ankur P. Parikh

Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efﬁcient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. [sent-9, score-0.409]

2 However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. [sent-10, score-0.749]

3 We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. [sent-11, score-0.816]

4 Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efﬁcient inference. [sent-12, score-0.625]

5 One way to compactly model these statistical structures is to use probabilistic graphical models that relate the observed features to a set of latent or hidden variables. [sent-15, score-0.548]

6 By deﬁning a joint probabilistic model over observed and latent variables, the marginal distribution of the observed variables is obtained by integrating out the latent ones. [sent-16, score-0.744]

7 Probabilistic models with latent variables have been deployed successfully to a diverse range of problems such as in document analysis [3], social network modeling [10], speech recognition [18] and bioinformatics [5]. [sent-22, score-0.354]

8 In this paper, we will focus on latent variable models where the latent structures are trees (we call it a “latent tree” for short). [sent-23, score-0.737]

9 , distinct ancestral species, objects in an image, latent semantics). [sent-28, score-0.277]

10 In particular, we will study the problems of estimating the latent tree structures, learning the model parameters and performing inference on these models for continuous and non-Gaussian variables where it is not easy to specify a parametric family. [sent-34, score-0.679]

11 In previous works, the challenging problem of estimating the structure of latent trees has largely been tackled by heuristics since the search space of structures is intractable. [sent-35, score-0.518]

12 [28] proposed a search heuristic for hierarchical latent class models by deﬁning a series of local search operations and using EM to compute the likelihood of candidate structures. [sent-37, score-0.307]

13 Harmeling and Williams [8] proposed a greedy algorithm to learn binary trees by joining two nodes with a high mutual information and iteratively performing EM to compute the mutual information among newly added hidden nodes. [sent-38, score-0.281]

14 Given the structures of the latent trees, learning the model parameters has predominantly relied on likelihood maximization and local search heuristics such as expectation maximization (EM) [6]. [sent-42, score-0.387]

15 In this paper, we propose a method for latent tree models with continuous and non-Gaussian observation based on the concept of kernel embedding of distributions [23]. [sent-46, score-0.877]

16 The problems we try to address are: how to estimate the structures of latent trees with provable guarantees, and how to perform local-minimum-free parameter learning and efﬁcient inference given the tree structures, all in nonparametric fashion. [sent-47, score-0.774]

17 The main ﬂavor of our method is to exploit the spectral properties of the joint embedding (or covariance operators) in both the structure recovery and learning/inference stage. [sent-48, score-0.345]

18 For the former, we deﬁne a distance measure between variables based on the singular value decomposition of covariance operators. [sent-49, score-0.247]

19 This allows us to generalize some of the distance based latent tree learning procedures such as neighbor joining [20] and the recursive grouping methods [4] to the nonparametric setting. [sent-50, score-0.715]

20 After the structure is recovered, we further use the covariance operator and its principal singular vectors to design surrogates for parameters of the latent variables (called a “spectral algorithm”). [sent-52, score-0.594]

21 2 Latent Tree Graphical Models We will focus on latent variable models where the observed variables are continuous and nonGaussian and the conditional independence structures are speciﬁed by trees. [sent-55, score-0.613]

22 A latent tree model deﬁnes a joint distribution over a set, O = {X1 , . [sent-61, score-0.581]

23 For simplicity, we will assume that all observed variables have the same domain XO , and all hidden variables take values from XH and have ﬁnite dimension d. [sent-69, score-0.291]

24 The joint distribution of X in a latent tree model is fully characterized by a set of conditional distributions (CD). [sent-70, score-0.639]

25 More speciﬁcally, we can select an arbitrary latent node in the tree as the root, and reorient all edges away from the root. [sent-71, score-0.604]

26 Then the set of CDs between nodes and their parents P(Xi |Xπi ) are sufﬁcient to characterize the joint distribution (for the root node Xr , we set P(Xr |Xπr ) = O+H P(Xr ); and we use P to refer to density in continuous case), P(X ) = i=1 P(Xi |Xπi ). [sent-72, score-0.29]

27 Compared to tree models which are deﬁned solely on observed variables, latent tree models encompass a much larger classes of models, allowing more ﬂexibility in modeling observed variables. [sent-73, score-0.905]

28 This is evident if we sum out the latent variables in the joint distribution, O+H P(O) = P(Xi |Xπi ). [sent-74, score-0.385]

29 (1) H i=1 This expression leads to complicated conditional independence structures between observed variables depending on the tree topology. [sent-75, score-0.524]

30 In other words, latent tree models allow complex distributions over observed variables (e. [sent-76, score-0.668]

31 For simplicity of explanation, we will focus on latent tree structures where each internal node has exactly 3 neighbors. [sent-80, score-0.731]

32 We can reroot the tree and redirect all the edges away from the root. [sent-81, score-0.273]

33 ∗ All the observed variables are leaves in the tree, and we will use ι∗ , ρ∗ , πs to denote an observed s s variable which is found by tracing in the direction from node s to its left child ιs , right child ρs , and its parent πs respectively. [sent-83, score-0.38]

34 2 3 Kernel Density Estimator and Hilbert Space Embedding Kernel density estimation (KDE) is a nonparametric way of ﬁtting the density of continuous random variables with non-Gaussian statistical features such as multi-modality and skewness [22]. [sent-85, score-0.355]

35 However, traditional KDE cannot model the latent tree structure. [sent-86, score-0.55]

36 In this paper, we will show that the kernel density estimator can be augmented to deal with latent tree structures using a recent concept called Hilbert space embedding of distributions [23]. [sent-87, score-0.958]

37 j=1 j=1 Here ⊗O denotes the tensor product of O feature vectors which results in a rank-1 tensor of j=1 order O. [sent-108, score-0.252]

38 CO := EO ⊗O φ(Xj ) is called the Hilbert space embedding of disj=1 tribution P(O) with tensor features ⊗O φ(Xj ). [sent-111, score-0.281]

39 The essence of Hilbert space embedding is to represent distributions as elements in Hilbert spaces, and then subsequent manipulation of the distributions can be carried out via Hilbert space operations such as inner product and distance. [sent-113, score-0.233]

40 , xO ) = EO O j=1 k(xj , Xj ) = EO ⊗O φ(Xj ) , ⊗O φ(xj ) j=1 j=1 FO , (3) we see that this expected value is the inner product between the embedding CO and tensor features ⊗O φ(xj ). [sent-118, score-0.313]

41 While traditional KDE can not make use of the fact that the embedding CO originates from a distribution with latent tree structure, the embedding view actually allows us to exploit this special structure and further decompose CO to simpler tensors of much lower orders. [sent-124, score-0.994]

42 4 Kernel Embedding of Latent Tree Graphical Models In this section, we assume that the structures of the latent tree graphical models are given, and we will deal with structure learning in the next section. [sent-125, score-0.713]

43 The challenge is that message passing algorithm becomes nontrivial to represent and implement in continuous and nonparametric settings. [sent-128, score-0.371]

44 In contrast, the distribution embedding view allows us to represent and implement message passing algorithm efﬁciently without resorting to approximations. [sent-130, score-0.416]

45 Furthermore, it also allows us to develop a local-minimum-free algorithm for learning the parameters of latent tree graphical models. [sent-131, score-0.609]

46 1 Covariance Operator and Conditional Embedding Operator We will ﬁrst explain the concept of conditional embedding operators which are the nonparametric counterparts for conditional probability tables in the discrete case. [sent-133, score-0.469]

47 Conditional embedding operators will be the key building blocks to a nonparametric message passing algorithm as much as conditional probability tables are to the ordinary message passing algorithm. [sent-134, score-0.862]

48 In other words, the covariance operator is also the embedding of the joint distribution P(Xs , Xt ). [sent-142, score-0.332]

49 Then the conditional embedding operator can be deﬁned via covariance operators according to Song et al. [sent-143, score-0.428]

50 A conditional embedding operator allows us to compute conditional expectations −1 EXt |xs [f (Xt )] as linear operations in the RKHS. [sent-145, score-0.388]

51 2 Representation for Message Passing Algorithm For simplicity, we will focus on latent trees where all latent variables have degree 3 (but our method can be generalized to higher degrees). [sent-152, score-0.706]

52 ** An internal latent variable aggregates incoming messages from its two children and then sends an outgoing message to its own parent ms (Xπs ) = EXs |Xπs [mιs (Xs )mρs (Xs )]. [sent-154, score-0.696]

53 *** Finally, at the root node, all incoming messages are multiplied together and the root variable O is integrated out br := EO [ j=1 k(xj , Xj )] = EXr [mιs (Xr )mρs (Xr )mωr (Xr )]. [sent-155, score-0.258]

54 The challenge is that message passing becomes nontrivial to represent and implement in continuous and nonparametric settings. [sent-156, score-0.371]

55 [24] show that the above 3 message update operations can be expressed using Hilbert space embeddings [26], and no further approximation is needed in the message computation. [sent-159, score-0.407]

56 Basically, the embedding approach assume that messages are functions in the reproducing kernel Hilbert space, and message update is an operator that takes several functions as inputs and output another function in the reproducing kernel Hilbert space. [sent-160, score-0.754]

57 By making use of the conditional independence structure of latent tree models, we only need to estimate tensors of much smaller orders. [sent-164, score-0.71]

58 Particularly, we only need to estimate tensors involving two variables (for each parent-child pair), and then the density can be estimated via message passing algorithms using these tensors of much smaller order. [sent-165, score-0.526]

59 The drawback of the representations in (7) and (8) is that they require exact knowledge of conditional embedding operators associated with latent variables, but none of these are available in training. [sent-166, score-0.575]

60 Next we will show that we can still make use of the tensor decomposition representation without the need for recovering the latent variables explicitly. [sent-167, score-0.464]

61 3 Spectral Algorithm for Learning Latent Tree Parameters Our observation from (7) and (8) is that if we can recover the conditional embedding operators associated with latent variables up to some invertible transformations, we will still be able to compute latent tree density correctly. [sent-169, score-1.29]

62 These transformations provide us an additional degree of freedom for algorithm design: we can choose the invertible transforms cleverly, such that the transformed representation can be recovered from observed quantities without the need for accessing the latent variables. [sent-172, score-0.469]

63 More speciﬁcally, these transformations T can be constructed from cross covariance operators of −1 certain pairs of observed variables and their singular vectors U . [sent-175, score-0.367]

64 The general pattern is that we can relate the transformed latent quantity to observed quantities in two different ways such that we can solve for the transformed −1 latent quantity. [sent-180, score-0.729]

65 r r The above results give us an efﬁcient algorithm for computing the expected kernel density br which can take into account the latent tree structures while at the same time avoiding the local minimum problems associated with explicitly recovering latent parameters. [sent-186, score-1.111]

66 After we obtain the transformed quantities, we can then use them in the message passing algorithm to obtain the ﬁnal belief br . [sent-188, score-0.345]

67 from a P(O), the spectral algorithm for latent 1 O trees proceeds by replacing all population quantities by their empirical counterpart. [sent-195, score-0.435]

68 5 Structure Learning of Latent Tree Graphical Models The last section focused on density estimation where the structure of the latent tree is known. [sent-204, score-0.637]

69 In this section, we focus on learning the structure of the latent tree. [sent-205, score-0.306]

70 Structure learning of latent trees is a challenging problem that has largely been tackled by heuristics since the search space of structures is intractable. [sent-206, score-0.489]

71 Structure learning algorithm We develop a distance based method for constructing latent trees of continuous, non-Gaussian variables. [sent-208, score-0.382]

72 The idea is that if we have a tree metric (distance) between distributions on observed nodes, we can use the property of the tree metric to reconstruct the latent tree structure using algorithms such as neighbor joining [20] and the recursive grouping algorithm [4]. [sent-209, score-1.357]

73 These methods take a distance matrix among all pairs of observed variables as input and output a tree by iteratively adding hidden nodes. [sent-210, score-0.517]

74 While these methods are iterative, they have strong theoretical guarantees on structure recovery when the true distance matrix forms an additive tree metric. [sent-211, score-0.332]

75 However, most previously known tree metrics are deﬁned for discrete and Gaussian variables. [sent-212, score-0.312]

76 We propose a tree metric below which works for continuous non-Gaussian cases. [sent-214, score-0.39]

77 Tree metric and pseudo-determinant We will ﬁrst explain some basic concepts of a tree metric. [sent-215, score-0.338]

78 If the joint probability distribution P(X ) has a latent tree structure, then a distance measure dst between an arbitrary variables pairs Xs and Xt are called tree metric if it satisﬁes the following path additive condition: dst = (u,v)∈P ath(s,t) duv . [sent-216, score-1.204]

79 However, this deﬁnition of tree metric is restricted in the sense that it requires all discrete variables to have the same number of states and all Gaussian variables have the same dimension. [sent-218, score-0.563]

80 For our more general scenario, where the observed variables are continuous non-Gaussian but the hidden variables have dimension d, we will deﬁne a tree metric based on pseudo-determinant which works for our operators. [sent-220, score-0.681]

81 Nonparametric tree metric The pseudo-determinant is deﬁned as the product of non-zero singular d values of an operator |C| = i=1 σi (C). [sent-221, score-0.522]

82 In our case, since we assume that the dimension of the hidden variables is d, the pseudo-determinant is simply the product of top d singular values. [sent-222, score-0.286]

83 Then we deﬁne the distance metric between two continuous non-Gaussian variables Xs and Xt as 1 (12) dst = − 2 log Cst Cst + 1 log |Css Css | + 1 log |Ctt Ctt | . [sent-223, score-0.313]

84 4 4 One can prove that (12) deﬁnes a tree metric by inducting on the path length. [sent-224, score-0.338]

85 5 1 2 5 10 20 50 100 Training Sample Size (x103) (a) balanced binary tree 0. [sent-233, score-0.306]

86 5 1 2 5 10 20 50 100 Training Sample Size (x103) (b) skewed HMM-like tree (c) random trees Figure 1: Comparison of our kernel structure learning method to the Gaussian and Nonparanormal methods on different tree structures. [sent-237, score-0.818]

87 multivariate Gaussians and use the tree metric deﬁned in [4] (which is essentially a function of the correlation coefﬁcient). [sent-239, score-0.338]

88 If the data comes from a Nonparanormal distribution, then the transformed data are assumed to be multivariate Gaussians and the same tree metric as the Gaussian case can be used on the transformed data. [sent-247, score-0.444]

89 To perform learning and inference in our approach, we use the spectral algorithm and message passing algorithm described earlier in the paper. [sent-249, score-0.3]

90 We experiment with 3 different tree types (each with 64 leaves or observed variables): a balanced binary tree, a completely binary skewed tree (like an HMM), and randomly generated binary trees. [sent-254, score-0.726]

91 For all trees, (n) we use the following generative process to generate the n-th sample from a node s (denoted xs ): If 1 s is the root, sample from a mixture of 2 Gaussians. [sent-255, score-0.468]

92 Once we have computed the empirical tree distance matrix for each algorithm, we use the neighbor joining algorithm [20] to learn the trees. [sent-258, score-0.364]

93 For evaluation we compare the number of hops between each pair of leaves in the true tree to the estimated tree. [sent-259, score-0.45]

94 For a pair of leaves i, j the error is deﬁned as: error(i, j) = ∗ |hops (i,j)−hops(i,j)| , hops(i,j) |hops∗ (i,j)−hops(i,j)| hops∗ (i,j) + where hops∗ is the true number of hops and hops is the estimated number of hops. [sent-260, score-0.312]

95 Furthermore, we choose the bandwidth σ for the Gaussian RBF kernel needed for the covariance operators using median distance between pairs of training points. [sent-263, score-0.262]

96 We also note that balanced binary trees are the easiest to learn while the skewed trees are the hardest (Figure 1). [sent-266, score-0.247]

97 For this experiment we use a balanced binary tree with 16 leaves (total of 31 nodes) and 100000 samples. [sent-269, score-0.348]

98 15 Divorce/crime/poverty (a) 5 Kernel 10 20 30 query size 40 (b) Figure 3: (a) visualization of kernel latent tree learned from crime data (b) Comparison of our method to Gaussian and NPN in predictive task. [sent-273, score-0.732]

99 We pick the ﬁrst 50 of these attributes, plus the violent crime variable and construct a latent tree using our tree metric and the neighbor joining algorithm [20]. [sent-289, score-1.122]

100 We depict the tree in Figure 3 and highlight a few coherent groupings. [sent-290, score-0.273]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xs', 0.414), ('latent', 0.277), ('tree', 0.273), ('cuu', 0.267), ('cst', 0.172), ('embedding', 0.171), ('message', 0.163), ('cs', 0.153), ('nonparanormal', 0.142), ('hops', 0.135), ('xr', 0.127), ('exs', 0.125), ('css', 0.125), ('hilbert', 0.113), ('tensor', 0.11), ('xo', 0.108), ('xt', 0.107), ('ctt', 0.107), ('eo', 0.107), ('npn', 0.107), ('kernel', 0.104), ('kde', 0.098), ('hidden', 0.096), ('cxs', 0.089), ('dst', 0.089), ('co', 0.083), ('passing', 0.082), ('singular', 0.081), ('song', 0.081), ('crime', 0.078), ('xj', 0.078), ('variables', 0.077), ('trees', 0.075), ('ct', 0.075), ('structures', 0.075), ('nonparametric', 0.074), ('tensors', 0.073), ('csu', 0.071), ('ctu', 0.071), ('operator', 0.071), ('operators', 0.069), ('metric', 0.065), ('skewed', 0.064), ('diff', 0.062), ('violent', 0.062), ('joining', 0.061), ('graphical', 0.059), ('covariance', 0.059), ('density', 0.058), ('conditional', 0.058), ('messages', 0.057), ('ms', 0.055), ('spectral', 0.055), ('node', 0.054), ('gaussian', 0.054), ('ext', 0.054), ('crimes', 0.053), ('transformed', 0.053), ('internal', 0.052), ('continuous', 0.052), ('embeddings', 0.051), ('nodes', 0.049), ('blowup', 0.047), ('br', 0.047), ('leaf', 0.046), ('root', 0.046), ('reproducing', 0.042), ('leaves', 0.042), ('observed', 0.041), ('transformations', 0.04), ('discrete', 0.039), ('abbreviation', 0.036), ('divorce', 0.036), ('elderly', 0.036), ('exr', 0.036), ('skewness', 0.036), ('heuristics', 0.035), ('em', 0.034), ('balanced', 0.033), ('variable', 0.033), ('product', 0.032), ('states', 0.032), ('harmeling', 0.031), ('poverty', 0.031), ('attributes', 0.031), ('child', 0.031), ('rbf', 0.031), ('joint', 0.031), ('operations', 0.03), ('invertible', 0.03), ('determinant', 0.03), ('parent', 0.03), ('distance', 0.03), ('incoming', 0.029), ('structure', 0.029), ('yz', 0.029), ('cts', 0.029), ('quantities', 0.028), ('largely', 0.027), ('gaussians', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

2 0.24215803 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

Author: Jiarong Jiang, Piyush Rai, Hal Daume

Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efﬁcacy of our proposed algorithm. 1

3 0.20975918 265 nips-2011-Sparse recovery by thresholded non-negative least squares

Author: Martin Slawski, Matthias Hein

Abstract: Non-negative data are commonly encountered in numerous ﬁelds, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike ﬁrst intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overﬁtting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our ﬁndings suggest that NNLS may be the method of choice. 1

4 0.16914086 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li

Abstract: We present a novel approach to efﬁciently learn a label tree for large scale classiﬁcation with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classiﬁers for each node in the tree. This approach also allows ﬁne grained control over the efﬁciency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classiﬁcation with 10184 classes and 9 million images. We demonstrate signiﬁcant improvements in test accuracy and efﬁciency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1

5 0.15109302 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

Author: Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efﬁcient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our ﬁnite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics. 1

6 0.1440551 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

7 0.14203475 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

8 0.12624249 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

9 0.11890607 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

10 0.11659336 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

11 0.11626693 226 nips-2011-Projection onto A Nonnegative Max-Heap

12 0.11531059 301 nips-2011-Variational Gaussian Process Dynamical Systems

13 0.10860802 102 nips-2011-Generalised Coupled Tensor Factorisation

14 0.10828675 270 nips-2011-Statistical Performance of Convex Tensor Decomposition

15 0.10556523 157 nips-2011-Learning to Search Efficiently in High Dimensions

16 0.1031254 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators

17 0.10093529 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

18 0.095990732 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

19 0.093179666 186 nips-2011-Noise Thresholds for Spectral Clustering

20 0.092219204 274 nips-2011-Structure Learning for Optimization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.258), (1, 0.034), (2, -0.047), (3, -0.102), (4, -0.06), (5, -0.184), (6, 0.009), (7, -0.182), (8, 0.221), (9, -0.236), (10, -0.017), (11, -0.025), (12, -0.132), (13, 0.165), (14, -0.062), (15, -0.066), (16, -0.077), (17, 0.007), (18, -0.049), (19, -0.104), (20, -0.121), (21, -0.151), (22, -0.085), (23, 0.001), (24, -0.021), (25, -0.018), (26, 0.046), (27, 0.071), (28, -0.005), (29, 0.108), (30, 0.006), (31, 0.02), (32, 0.085), (33, -0.126), (34, 0.009), (35, 0.061), (36, 0.014), (37, -0.077), (38, 0.056), (39, -0.081), (40, 0.016), (41, 0.04), (42, -0.061), (43, 0.078), (44, -0.011), (45, -0.01), (46, 0.051), (47, 0.105), (48, 0.027), (49, 0.082)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95708615 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

2 0.74826437 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

Author: Jiarong Jiang, Piyush Rai, Hal Daume

3 0.65077835 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

Author: Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

4 0.62560403 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg

Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the ﬁrst algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the ﬁrst rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1

5 0.57607692 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

Author: Ali Tofigh, Erik Sj̦lund, Mattias H̦glund, Jens Lagergren

Abstract: Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein’s path model of colon cancer was a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising learning algorithms, and applying these to cross-sectional data. Beerenwinkel et al. provided, what they coined, EM-like algorithms for Oncogenetic Trees (OTs) and mixtures of such. Given the small size of current and future data sets, it is important to minimize the number of parameters of a model. For this reason, we too focus on tree-based models and introduce Hidden-variable Oncogenetic Trees (HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they ﬁnd a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it. The algorithm for single HOTs performs very well on reasonable-sized data sets, while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrow’s more cost-efﬁcient technologies. 1

6 0.53984684 226 nips-2011-Projection onto A Nonnegative Max-Heap

7 0.52875048 265 nips-2011-Sparse recovery by thresholded non-negative least squares

8 0.52654928 274 nips-2011-Structure Learning for Optimization

9 0.52264607 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

10 0.50077665 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

11 0.4834742 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

12 0.48144877 301 nips-2011-Variational Gaussian Process Dynamical Systems

13 0.46074826 288 nips-2011-Thinning Measurement Models and Questionnaire Design

14 0.45991278 60 nips-2011-Confidence Sets for Network Structure

15 0.45962262 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

16 0.42900372 102 nips-2011-Generalised Coupled Tensor Factorisation

17 0.42753884 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

18 0.42450327 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

19 0.41220534 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

20 0.41104475 75 nips-2011-Dynamical segmentation of single trials from population neural data

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.032), (4, 0.056), (20, 0.03), (26, 0.02), (27, 0.01), (31, 0.115), (33, 0.032), (34, 0.165), (43, 0.087), (45, 0.09), (57, 0.125), (66, 0.017), (74, 0.034), (83, 0.026), (99, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88938183 11 nips-2011-A Reinforcement Learning Theory for Homeostatic Regulation

Author: Mehdi Keramati, Boris S. Gutkin

Abstract: Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identical. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a uniﬁed explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion.

same-paper 2 0.85469526 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

3 0.84652191 158 nips-2011-Learning unbelievable probabilities

Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller

Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-deﬁnite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1

4 0.80747157 157 nips-2011-Learning to Search Efficiently in High Dimensions

Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang

Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efﬁciency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efﬁciency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efﬁciency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efﬁcient large scale search. Our approach takes both search quality and computational cost into consideration. Speciﬁcally, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efﬁciently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efﬁciency. Experimental results show that our approach signiﬁcantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).

5 0.78667694 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements

Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk

Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, afﬁne rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efﬁcient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efﬁcacy of the algorithm. 1

6 0.77081025 100 nips-2011-Gaussian Process Training with Input Noise

7 0.76063609 75 nips-2011-Dynamical segmentation of single trials from population neural data

8 0.75056219 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

9 0.7493729 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

10 0.74689072 219 nips-2011-Predicting response time and error rates in visual search

11 0.74614531 111 nips-2011-Hashing Algorithms for Large-Scale Learning

12 0.74573559 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

13 0.73597533 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations

14 0.73539656 66 nips-2011-Crowdclustering

15 0.73525512 301 nips-2011-Variational Gaussian Process Dynamical Systems

16 0.73520178 171 nips-2011-Metric Learning with Multiple Kernels

17 0.73435187 24 nips-2011-Active learning of neural response functions with Gaussian processes

18 0.73392582 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

19 0.73123717 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding

20 0.7302655 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure