nips nips2013 nips2013-92 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yacine Jernite, Yonatan Halpern, David Sontag
Abstract: We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable. For each latent variable, the existence of a singly-coupled quartet allows us to uniquely identify and learn all parameters involving that latent variable. We give a proof of the polynomial sample complexity of our learning algorithm, and experimentally compare it to variational EM. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. [sent-3, score-0.286]
2 For each latent variable, the existence of a singly-coupled quartet allows us to uniquely identify and learn all parameters involving that latent variable. [sent-6, score-1.243]
3 1 Introduction We study the problem of discovering the presence of latent variables in data and learning models involving them. [sent-8, score-0.356]
4 We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartetlearnable, meaning that every latent variable has a singly-coupled quartet (i. [sent-14, score-1.016]
5 four children of a latent variable for which there is no other latent variable that is shared by at least two of the children). [sent-16, score-0.782]
6 We show that the existence of such a quartet allows us to uniquely identify each latent variable and to learn all parameters involving that latent variable. [sent-17, score-1.293]
7 Furthermore, using a technique introduced by Halpern & Sontag (2013), we show how to subtract already learned latent variables to create new singly-coupled quartets, substantially expanding the class of structures that we can learn. [sent-18, score-0.408]
8 Importantly, even if we cannot discover every latent variable, our algorithm guarantees the correctness of any latent variable that was discovered. [sent-19, score-0.54]
9 4 that our algorithm can learn nearly all of the structure of the QMR-DT network for medical diagnosis (i. [sent-21, score-0.258]
10 First, we introduce a quartet test to determine whether a set of binary variables is singly-coupled. [sent-25, score-0.735]
11 When singly-coupled variables are found, we use previous results in mixture model learning to identify the coupling latent variable. [sent-26, score-0.406]
12 Second, we develop a conditional point-wise mutual information test to learn parameters of other children of identified latent variables. [sent-27, score-0.539]
13 For this network, the order (X, Y, Z) satisfies the def- inition: {a, b, c, d} is singly coupled by X, {c, e, f, g} is singly coupled by Y given X and {d, g, h, i} is singly coupled by Z given X, Y . [sent-30, score-1.275]
14 Martin & VanLehn (1995) study structure learning for noisy-or Bayesian networks, observing that any two observed variables that share a hidden parent must be correlated. [sent-52, score-0.334]
15 Their algorithm incrementally constructs the network, in each step adding a new observed variable, introducing edges from the existing latent variables to the observed variable, and then seeing if new latent variables should be created. [sent-55, score-0.754]
16 This approach requires strong assumptions, such as identical priors for the hidden variables and all incoming edges for an observed variable having the same failure probabilities. [sent-56, score-0.321]
17 (2006) study structure learning in linear models with continuous latent variables, giving an algorithm for discovering disjoint subsets of observed variables that have a single hidden variable as its parent. [sent-58, score-0.542]
18 Recent work has used tensor methods and sparse recovery to learn linear latent variable models with graph expansion (Anandkumar et al. [sent-59, score-0.369]
19 In Halpern & Sontag (2013) the parameters of singly-coupled variables in bipartite networks of known structure are learned using mixture model learning. [sent-74, score-0.375]
20 Quartet tests have been previously used for learning latent tree models (Anandkumar et al. [sent-75, score-0.286]
21 We consider bipartite noisy-or Bayesian networks (G, ⇥) with n binary latent variables U , which we denote with capital letters (e. [sent-80, score-0.463]
22 The edges in the model are directed from the latent variables to the observed variables, as shown in Fig. [sent-85, score-0.389]
23 These parameters consist of prior probabilities on the latent variables, pX for X 2 U, failure probabilities between latent and 2 ~ observed variables, fX (a vector of size m), and noise or leak probabilities ~ = {⌫1 , . [sent-89, score-0.804]
24 An ⌫ equivalent formulation includes the noise in the model by introducing a single ‘noise’ latent variable, ~ X0 , which is present with probability p0 = 1 and has failure probabilities f0 = 1 ~ . [sent-93, score-0.372]
25 The Bayesian ⌫ network only has an edge between latent variable X and observed variable a if fX,a < 1. [sent-94, score-0.462]
26 The generative process for the model is then: • The states of the latent variables are drawn independently: X ⇠ Bernoulli(pX ) for X 2 U. [sent-95, score-0.323]
27 We denote NG to be the set of negative moments of the observed variables under (G, ⇥). [sent-108, score-0.272]
28 We say that a set O of observed variables is singlycoupled by a parent X if X is a parent of every member of O and there is no other parent Y that is shared by at least two members of O. [sent-111, score-0.403]
29 A singly coupled set of observations is a binary mixture model, which gives rise to the next result based on a rank-2 tensor decomposition of the joint distribution. [sent-112, score-0.479]
30 Because of the factored form of Equation 1, we can remove the influence of a latent variable from the negative moments. [sent-117, score-0.295]
31 If we know NG,S , the prior of X, and the failure probabilities fX,S , we can obtain the negative moments of S under (G \ {X}, ⇥). [sent-120, score-0.279]
32 Q|S| pX + pX i=1 fX,oi ) (3) Our paper focuses on learning the structure of these bipartite networks, including the number of latent variables. [sent-122, score-0.366]
33 We give two variants of an algorithm based on quartet tests, and prove its correctness in Section 3. [sent-130, score-0.657]
34 Our approach is based on decomposing the structure learning problem into two tasks: (1) identifying the latent variables, and (2) determining to which observed variables they are connected. [sent-131, score-0.41]
35 1 Finding singly coupled quartets Since triplets are not sufficient to identify a latent variable (Figure 1), we propose a new approach based on identifying singly-coupled quartets. [sent-133, score-0.998]
36 The 3 Algorithm 1 STRUCTURE-LEARN Algorithm 2 EXTEND Input: Latent variable L with singly-coupled Input: Observations S, Thresholds children (a, b, c, d), currently known latent Output: Latent structure Latent structure Latent, threshold ⌧ 1: Latent = {} Output: children, all the children of L. [sent-135, score-0.794]
37 JOINT gives the joint distribution and ADJUST subtracts off the influence of the latent variables (Eq. [sent-140, score-0.354]
38 PRETEST filters the set of candidate quartets by determining whether every triplet in a quartet has a shared parent, using 0 Lemma 2. [sent-142, score-0.875]
39 4TEST refers to either of the quartet tests described in Section 2. [sent-143, score-0.698]
40 Right: Algorithm to identify all of the children of a latent variable. [sent-149, score-0.462]
41 first is based on a rank test on a matrix formed from the fourth order moments and the second uses variance of parameters learned from third order moments. [sent-153, score-0.265]
42 We then present a method that uses the point-wise mutual information of a triplet to identify all the other children of the new latent variable. [sent-154, score-0.539]
43 A noisy-or network is quartet-learnable if there exists an ordering of its latent variables such that each one has a quartet of children which are singly coupled once the previous latent variables are removed from the model. [sent-158, score-1.995]
44 A noisy-or network is strongly quartet-learnable if all of its latent variables have a singly coupled quartet of children. [sent-159, score-1.48]
45 A candidate quartet for the rank test is a quartet where all nodes have at least one common parent. [sent-162, score-1.343]
46 One way to find whether a candidate quartet is singly coupled is by looking directly at the rank of its fourth-order moments matrix. [sent-163, score-1.263]
47 We have three ways to unfold the 2 ⇥ 2 ⇥ 2 ⇥ 2 tensor defined by these moments into a 4 ⇥ 4 matrix: we can consider the joint probability matrix of the aggregated variables (a, b) and (c, d), of (a, c) and (b, d), or of (a, d) and (b, c). [sent-164, score-0.256]
48 This will allow us to determine whether a candidate quartet is singly coupled by looking at the third eigenvalues of the three unfoldings of its joint distribution tensor. [sent-175, score-1.113]
49 We can then determine whether a quartet is singly coupled by testing whether the third eigenvalues of all of the three unfoldings of the joint distributions are below a threshold, ⌧q . [sent-180, score-1.113]
50 Let {a, b, c, d} be a quartet of observed variables. [sent-184, score-0.699]
51 To determine whether it is singly coupled, we can also apply Eq. [sent-185, score-0.329]
52 2 to learn the parameters of triplets (a, b, c), (a, b, d), (a, c, d) and (b, c, d) as if they were singly coupled. [sent-186, score-0.481]
53 If the variance of parameter estimates exceeds a threshold we know that the quartet is not singly coupled. [sent-188, score-1.011]
54 Note that agreement between the parameters learned is necessary but not sufficient to determine that (a, b, c, d) are singly coupled. [sent-189, score-0.413]
55 For example, in the case of a fully connected graph with two parents, four children and identical failure probabilities, the third-order moments of any triplet are identical, hence the parameters learned will be the same. [sent-190, score-0.552]
56 Lemma 1, however, states that the moments generated from the estimated parameters can only be equal to the true moments if the quartet is actually singly coupled. [sent-191, score-1.313]
57 If the model is ✏-rank-testable and (a, b, c, d) are not singly coupled, then if MR represents the reconstructed moments and M the true moments, we have: ⇣ ✏ ⌘4 ||MR M ||1 > . [sent-193, score-0.481]
58 These two properties lead to the following algorithm: First try to learn the parameters as if the quartet were singly coupled. [sent-195, score-1.057]
59 Accept the quartet as singly-coupled if the reconstruction error is below a second threshold. [sent-198, score-0.657]
60 2 Extending Latent Variables Once we have found a singly coupled quartet (a, b, c, d), the second step is to find all other children of the coupling parent A. [sent-200, score-1.388]
61 Let (a, b) be two observed variables that we know only share one parent A, and let x be any another observed variable. [sent-205, score-0.282]
62 Since the only latent variable that has both a and b as children is A, this is equivalent to saying that x is a child of A. [sent-209, score-0.487]
63 Once we find a singly-coupled quartet (a, b, c, d) with common parent A, Lemma 2 allows us to determine whether a new variable x is also a child of A. [sent-222, score-0.791]
64 Algorithm 2 combines these two steps to find the parameters of all the children of A after a singly-coupled quartet has been found. [sent-226, score-0.872]
65 When the structure of the network is known, singlycoupled triplets are sufficient for identifiability without resorting to the quartet tests in Section 2. [sent-228, score-0.93]
66 That setting was previously studied in Halpern & Sontag (2013), which required every edge to be part of a singly coupled triplet or pair for its parameters to be learnable (possibly after subtracting off latent variables). [sent-230, score-0.821]
67 Our new CPMI technique improves this result by enabling us to learn all failure probabilities for a latent variable’s children even if the variable has only one singly coupled triplet. [sent-231, score-1.087]
68 All priors are in [pmin , 1/2], all failures probabilities are in [fmin , fmax ], and the marginal probabilities of an observed variable x being off is lower bounded by nmin P (¯). [sent-234, score-0.314]
69 If a network with m observed variables is strongly quartet-learnable and ⇣-ranktestable, then its structure can be learned in polynomial time with probability (1 ) and with a polynomial number of samples equal to: ⇣ ⇣1 ⌘ ⇣ 2m ⌘⌘ 1 O max 8 , 8 ln . [sent-237, score-0.369]
70 Next, we show how to extend the analysis to quartetlearnable networks as defined in Section 2 by subtracting off latent variables that we have previously learned. [sent-241, score-0.483]
71 If some of the removed latent variables were coupling for an otherwise singly coupled quartet, we then discover new latent variables, and repeat the operation. [sent-242, score-1.023]
72 If a network is quartetlearnable, we can find all of the latent variables in a finite number of subtracting off steps, which we call the depth of the network (thus, a strongly quartet-learnable network has depth 0). [sent-243, score-0.699]
73 If the additive error on the estimated negative moments of an observed quartet C and on the parameters for W latent variables X1 , . [sent-246, score-1.197]
74 We define the width of the network to be the maximum number of parents that need to be subtracted off to be able to learn the parameters for a new singly-coupled quartet (this is typically a small constant). [sent-250, score-0.918]
75 If a network with m observed variables is quartet-learnable at depth d, is ⇣-ranktestable, and has width W , then its structure can be learned with probability (1 ) with NS samples, where: ⇣⇣ ⌘2d ⇣1 ⌘ ⇣ 2m ⌘⌘ W 4W 1 NS = O ⇥ max 8 , 8 ln . [sent-252, score-0.352]
76 18 (1 2 6 n28 p13 8 fmin fmax ) min min ⇣ nmin pmin (1 fmax ) The left hand side of this expression has to do with the error introduced in the estimate of the parameters each time we do a subtracting off step, which by definition occurs at most d times, hence the exponent. [sent-253, score-0.336]
77 We notice that the bounds do not depend directly on the number of latent variables, indicating that we can learn networks with many latent variables, as long as the number of subtraction steps is small. [sent-254, score-0.665]
78 Here we assume that the quartet tests are perfect (i. [sent-261, score-0.698]
79 These two diseases are discussed in Halpern & Sontag (2013) and share all of their children except for one symptom each, resulting in a situation where no singly-coupled triplets can be found. [sent-266, score-0.374]
80 The additional two diseases that cannot be learned share all but two children with each other. [sent-267, score-0.354]
81 Thus, for these two latent variables, singly-coupled triplets exist but singly-coupled quartets do not. [sent-268, score-0.498]
82 The Bayesian network consists of 8 latent variables and 64 observed variables, arranged in an 8x8 grid of pixels. [sent-271, score-0.44]
83 Each of the latent variables connects to a subset of the observed pixels (see Figure 3). [sent-272, score-0.365]
84 We generate samples from the network and use them to test the ability of our algorithm to discover the latent variables and network structure from the samples. [sent-277, score-0.518]
85 The network is quartet learnable, but the first and last of the ground truth sources shown in Figure 3 can only be learned after a subtraction step. [sent-278, score-0.898]
86 , diseases) are discovered and parameters learned in the aQMR-DT network for medical diagnosis (Shwe et al. [sent-283, score-0.249]
87 For our algorithm, we use the rank-based quartet test, which has the advantage of requiring only one threshold, ⌧q , compared to the two needed by the coherence test. [sent-289, score-0.685]
88 In our algorithm, the thresholds determine the number of discovered latent variables (sources). [sent-290, score-0.323]
89 For each quartet in sorted order we check if it overlaps with a latent variable previously learned in this round. [sent-296, score-1.013]
90 If it does not, we create a new latent variable and use the EXTEND step to find all of its children. [sent-297, score-0.295]
91 The running time of the quartet algorithm is under 6 minutes for 10,000 samples using a parallel implementation with 16 cores. [sent-304, score-0.657]
92 The variational run-time scales linearly with sample size while the quartet algorithm is independent of sample size once the quartet marginals are computed. [sent-306, score-1.352]
93 Sample size Variational EM Quartet Structure Learning d=0 Ground truth sources d=1 100 500 1000 2000 10000 10000* ˇ Figure 3: A comparison between the variational algorithm of Singliar & Hauskrecht (2006) and the quartet algorithm as the number of samples increases. [sent-307, score-0.737]
94 The true network structure is shown on the right, with one image for each of the eight latent variables (sources). [sent-308, score-0.443]
95 For each edge from a latent variable to an observed variable, the corresponding pixel intensity specifies 1 fX,a (black means no edge). [sent-309, score-0.337]
96 The results of the quartet algorithm are divided by depth. [sent-310, score-0.657]
97 Column d=0 shows the sources learned without any subtraction and d=1 shows the sources learned after a single subtraction step. [sent-311, score-0.332]
98 The sample size of 10,000* refers to 10,000 samples using an optimized value for the threshold of the rank-based quartet test (⌧q = 0. [sent-313, score-0.682]
99 5 Conclusion We presented a novel algorithm for learning the structure and parameters of bipartite noisy-or Bayesian networks where the top layer consists completely of latent variables. [sent-315, score-0.453]
100 Structural Expectation Propagation: Bayesian structure learning for networks with latent variables. [sent-383, score-0.354]
wordName wordTfidf (topN-words)
[('quartet', 0.657), ('singly', 0.329), ('latent', 0.245), ('children', 0.192), ('quartets', 0.172), ('pu', 0.157), ('moments', 0.152), ('halpern', 0.124), ('cpmi', 0.11), ('coupled', 0.096), ('hauskrecht', 0.094), ('singliar', 0.094), ('pa', 0.088), ('parents', 0.086), ('parent', 0.084), ('anandkumar', 0.084), ('triplets', 0.081), ('variables', 0.078), ('failure', 0.078), ('sontag', 0.076), ('bipartite', 0.076), ('network', 0.075), ('px', 0.073), ('fmax', 0.069), ('diseases', 0.065), ('networks', 0.064), ('subtraction', 0.063), ('learned', 0.061), ('diagnosis', 0.057), ('elidan', 0.055), ('nmin', 0.055), ('depth', 0.051), ('variable', 0.05), ('probabilities', 0.049), ('hidden', 0.049), ('subtracting', 0.049), ('learn', 0.048), ('cooper', 0.048), ('fmin', 0.047), ('lazarsfeld', 0.047), ('pmi', 0.047), ('pretest', 0.047), ('quartetlearnable', 0.047), ('shwe', 0.047), ('tarsi', 0.047), ('triplet', 0.046), ('structure', 0.045), ('sources', 0.042), ('observed', 0.042), ('vanlehn', 0.041), ('tests', 0.041), ('unfolding', 0.039), ('variational', 0.038), ('sham', 0.038), ('bayesian', 0.036), ('share', 0.036), ('friedman', 0.035), ('pearl', 0.034), ('polynomial', 0.034), ('medical', 0.033), ('discovering', 0.033), ('learnable', 0.033), ('anima', 0.031), ('elsner', 0.031), ('jernite', 0.031), ('lazic', 0.031), ('roch', 0.031), ('saund', 0.031), ('singlycoupled', 0.031), ('subtracts', 0.031), ('unfoldings', 0.031), ('yoni', 0.031), ('ax', 0.031), ('mutual', 0.031), ('coupling', 0.03), ('roots', 0.029), ('hsu', 0.029), ('rank', 0.029), ('subtracted', 0.029), ('mixture', 0.028), ('kakade', 0.028), ('coherence', 0.028), ('gal', 0.028), ('ishteva', 0.028), ('mossel', 0.028), ('tensor', 0.026), ('eriksson', 0.025), ('animashree', 0.025), ('threshold', 0.025), ('identify', 0.025), ('unsupervised', 0.024), ('leak', 0.024), ('pmin', 0.024), ('subtract', 0.024), ('symptoms', 0.024), ('causal', 0.024), ('edges', 0.024), ('parameters', 0.023), ('em', 0.023), ('qs', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
Author: Yacine Jernite, Yonatan Halpern, David Sontag
Abstract: We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable. For each latent variable, the existence of a singly-coupled quartet allows us to uniquely identify and learn all parameters involving that latent variable. We give a proof of the polynomial sample complexity of our learning algorithm, and experimentally compare it to variational EM. 1
2 0.13161258 75 nips-2013-Convex Two-Layer Modeling
Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1
3 0.11844852 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
Author: Tzu-Kuo Huang, Jeff Schneider
Abstract: Learning dynamic models from observed data has been a central issue in many scientific studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientific modeling tasks, however, it turns out that reliable sequential data are rather difficult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover firstorder Markov models and hidden Markov models. To the best of our knowledge, this is the first formal guarantee on learning from non-sequence data. Preliminary simulation results confirm our theoretical findings. 1
Author: Anima Anandkumar, Daniel Hsu, Majid Janzamin, Sham M. Kakade
Abstract: Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of “higher order” expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allow for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition. Keywords: Overcomplete representation, admixture models, generic identifiability, tensor decomposition.
5 0.075308599 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
6 0.074158154 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
7 0.073947378 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
8 0.069840945 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
9 0.064859249 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
10 0.059769623 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
11 0.059551325 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
12 0.058117397 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
13 0.05463269 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
14 0.052884903 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
15 0.05142336 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors
16 0.051289935 36 nips-2013-Annealing between distributions by averaging moments
17 0.048354957 9 nips-2013-A Kernel Test for Three-Variable Interactions
18 0.048328638 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
19 0.048216432 251 nips-2013-Predicting Parameters in Deep Learning
20 0.046731401 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
topicId topicWeight
[(0, 0.138), (1, 0.069), (2, -0.032), (3, 0.039), (4, 0.023), (5, -0.037), (6, 0.084), (7, -0.026), (8, 0.076), (9, -0.054), (10, 0.036), (11, -0.034), (12, 0.009), (13, -0.008), (14, 0.001), (15, 0.016), (16, 0.037), (17, 0.005), (18, -0.013), (19, -0.069), (20, 0.032), (21, 0.03), (22, 0.077), (23, 0.018), (24, 0.072), (25, -0.08), (26, -0.029), (27, 0.052), (28, -0.021), (29, -0.035), (30, 0.011), (31, -0.061), (32, 0.085), (33, -0.023), (34, 0.018), (35, 0.022), (36, -0.061), (37, 0.014), (38, 0.061), (39, 0.032), (40, -0.044), (41, 0.042), (42, -0.055), (43, -0.132), (44, 0.024), (45, -0.022), (46, 0.021), (47, -0.017), (48, -0.1), (49, 0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.93328649 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
Author: Yacine Jernite, Yonatan Halpern, David Sontag
Abstract: We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable. For each latent variable, the existence of a singly-coupled quartet allows us to uniquely identify and learn all parameters involving that latent variable. We give a proof of the polynomial sample complexity of our learning algorithm, and experimentally compare it to variational EM. 1
2 0.63537675 85 nips-2013-Deep content-based music recommendation
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
3 0.61141378 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
4 0.58894438 75 nips-2013-Convex Two-Layer Modeling
Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1
5 0.56824106 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
Author: Ying Liu, Alan Willsky
Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1
6 0.5578748 294 nips-2013-Similarity Component Analysis
7 0.55215079 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
9 0.52110195 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
10 0.51576632 134 nips-2013-Graphical Models for Inference with Missing Data
11 0.51522064 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
12 0.4985427 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
13 0.49672467 70 nips-2013-Contrastive Learning Using Spectral Methods
14 0.48444816 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
15 0.47883177 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition
16 0.47733578 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
17 0.47450376 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
18 0.46431318 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
19 0.45305121 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
20 0.45065922 332 nips-2013-Tracking Time-varying Graphical Structure
topicId topicWeight
[(16, 0.031), (19, 0.015), (33, 0.113), (34, 0.117), (41, 0.043), (49, 0.031), (56, 0.155), (68, 0.275), (70, 0.052), (85, 0.027), (89, 0.023), (93, 0.017), (95, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.76136243 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
Author: Yacine Jernite, Yonatan Halpern, David Sontag
Abstract: We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable. For each latent variable, the existence of a singly-coupled quartet allows us to uniquely identify and learn all parameters involving that latent variable. We give a proof of the polynomial sample complexity of our learning algorithm, and experimentally compare it to variational EM. 1
2 0.74155331 314 nips-2013-Stochastic Optimization of PCA with Capped MSG
Author: Raman Arora, Andy Cotter, Nati Srebro
Abstract: We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as “Matrix Stochastic Gradient” (MSG), as well as a practical variant, Capped MSG. We study the method both theoretically and empirically. 1
3 0.64902031 249 nips-2013-Polar Operators for Structured Sparse Estimation
Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans
Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difficulties that entail sophisticated algorithms. Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efficient latent fused lasso. 1
4 0.64143103 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
5 0.6390872 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
Author: Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee
Abstract: POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the “curse of dimensionality” and the “curse of history”. This paper presents an online POMDP algorithm that alleviates these difficulties by focusing the search on a set of randomly sampled scenarios. A Determinized Sparse Partially Observable Tree (DESPOT) compactly captures the execution of all policies on these scenarios. Our Regularized DESPOT (R-DESPOT) algorithm searches the DESPOT for a policy, while optimally balancing the size of the policy and its estimated value obtained under the sampled scenarios. We give an output-sensitive performance bound for all policies derived from a DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime algorithm that approximates R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms. Source code along with experimental settings are available at http://bigbird.comp. nus.edu.sg/pmwiki/farm/appl/. 1
6 0.63673538 63 nips-2013-Cluster Trees on Manifolds
7 0.63637662 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
8 0.63563496 55 nips-2013-Bellman Error Based Feature Generation using Random Projections on Sparse Spaces
9 0.63509053 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion
10 0.6343444 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
11 0.63425356 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning
12 0.63405943 224 nips-2013-On the Sample Complexity of Subspace Learning
13 0.63299727 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
14 0.63286185 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
15 0.63271046 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
16 0.63261926 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
17 0.63224691 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes
18 0.63145852 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
19 0.63056487 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
20 0.6305632 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models