nips nips2013 nips2013-180 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ryosuke Matsushita, Toshiyuki Tanaka
Abstract: We study the problem of reconstructing low-rank matrices from their noisy observations. We formulate the problem in the Bayesian framework, which allows us to exploit structural properties of matrices in addition to low-rankedness, such as sparsity. We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. We have also successfully applied the proposed algorithm to a clustering problem, by reformulating it as a low-rank matrix reconstruction problem with an additional structural property. Numerical experiments show that the proposed algorithm outperforms Lloyd’s K-means algorithm. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Low-rank matrix reconstruction and clustering via approximate message passing Ryosuke Matsushita NTT DATA Mathematical Systems Inc. [sent-1, score-0.492]
2 We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. [sent-8, score-0.352]
3 We have also successfully applied the proposed algorithm to a clustering problem, by reformulating it as a low-rank matrix reconstruction problem with an additional structural property. [sent-9, score-0.382]
4 1 Introduction Low-rankedness of matrices has frequently been exploited when one reconstructs a matrix from its noisy observations. [sent-11, score-0.087]
5 In this paper, we consider the case where a matrix A0 ∈ Rm×N to be reconstructed is factored as A0 = U0 V0⊤ , U0 ∈ Rm×r , V0 ∈ RN ×r (r ≪ m, N ), and where one knows structural properties of the factors U0 and V0 a priori. [sent-13, score-0.103]
6 Since the properties of the factors to be exploited vary according to the problem, it is desirable that a reconstruction method has enough flexibility to incorporate a wide variety of properties. [sent-15, score-0.087]
7 The Bayesian approach achieves such flexibility by allowing us to select prior distributions of U0 and V0 reflecting a priori knowledge on the structural properties. [sent-16, score-0.081]
8 Monte Carlo sampling methods and variational Bayes methods have been proposed for low-rank matrix reconstruction to meet this requirement [3–5]. [sent-18, score-0.213]
9 We present in this paper an approximate message passing (AMP) based algorithm for Bayesian lowrank matrix reconstruction. [sent-19, score-0.303]
10 Developed in the context of compressed sensing, the AMP algorithm reconstructs sparse vectors from their linear measurements with low computational cost, and achieves a certain theoretical limit [6]. [sent-20, score-0.131]
11 The IterFac algorithm for the rank-one case [8] has been derived as an AMP algorithm. [sent-23, score-0.071]
12 An AMP algorithm for the general-rank case is proposed in [9], which, however, can only treat estimation of posterior means. [sent-24, score-0.105]
13 1 As the second contribution, we apply the derived AMP algorithm to K-means type clustering to obtain a novel efficient clustering algorithm. [sent-27, score-0.365]
14 It is based on the observation that our formulation of the low-rank matrix reconstruction problem includes the clustering problem as a special case. [sent-28, score-0.289]
15 We present results of numerical experiments, which show that the proposed algorithm outperforms Lloyd’s K-means algorithm [12] when data are high-dimensional. [sent-30, score-0.09]
16 Recently, AMP algorithms for dictionary learning and blind calibration [13] and for matrix reconstruction with a generalized observation model [14] were proposed. [sent-31, score-0.191]
17 1 Problem setting Low-rank matrix reconstruction We consider the following problem setting. [sent-36, score-0.142]
18 We restrict pU and pV to distributions of the form pU (U0 ) = i pu (u0,i ) ˆ ˆ ˆ ˆ ∏ ˆ and pV (V0 ) = j pv (v0,j ), respectively, which allows us to construct computationally efficient ˆ algorithms. [sent-57, score-0.596]
19 s) pu and pv can be improper, that is, they can integrate to ˆ ˆ infinity, as long as the posterior p. [sent-62, score-0.623]
20 The first problem, which we call the marginalization problem, is to calculate the marginal posterior distributions given A, ∫ ∏ ∏ pi,j (ui , vj |A) := p(U, V |A) ˆ ˆ duk dvl . [sent-69, score-0.461]
21 (3) k̸=i ⊤ l̸=j These are used to calculate the posterior mean E[U V |A] and the marginal MAP estimates ∫ ∫ MMAP uMMAP := arg maxu pi,j (u, v|A)dv and vj ˆ := arg maxv pi,j (u, v|A)du. [sent-70, score-0.519]
22 Because ˆ i 2 calculation of pi,j (ui , vj |A) typically involves high-dimensional integrations requiring high comˆ putational cost, approximation methods are needed. [sent-71, score-0.309]
23 The second problem, which we call the MAP problem, is to calculate the MAP estimate arg maxU,V p(U, V |A). [sent-72, score-0.091]
24 It is formulated as the following optimization problem: ˆ min C MAP (U, V ), U,V (4) where C MAP (U, V ) is the negative logarithm of (2): C MAP (U, V ) := m N ∑ ∑ 1 ∥A − U V ⊤ ∥2 − log pu (ui ) − ˆ log pv (vj ). [sent-73, score-0.563]
25 2 Clustering as low-rank matrix reconstruction A clustering problem can be formulated as a problem of low-rank matrix reconstruction [11]. [sent-76, score-0.431]
26 , N , where el ∈ {0, 1}r is the vector whose lth component is 1 and the others are 0. [sent-83, score-0.149]
27 When V0 and U0 are fixed, aj follows one of the r Gaussian distributions ˜ ˜ N (u0,l , mτ I), l = 1, . [sent-84, score-0.142]
28 We regard that each Gaussian ˜ distribution defines a cluster, u0,l being the center of cluster l and v0,j representing the cluster assignment of the datum aj . [sent-88, score-0.403]
29 One can then perform clustering on the dataset {a1 , . [sent-89, score-0.147]
30 ˆ ˆ Let us consider maximum likelihood estimation arg maxU,V p(A|U, V ), or equivalently, MAP esti∑r ˆ mation with the (improper) uniform prior distributions pu (u) = 1 and pv (v) = r−1 l=1 δ(v−el ). [sent-99, score-0.652]
31 ˆ ˆ ˆ The corresponding MAP problem is min r r U ∈Rm׈ ,V ∈{0,1}N ׈ ∥A − U V ⊤ ∥2 F subject to vj ∈ {e1 , . [sent-100, score-0.236]
32 ˆ (6) ∑N ∑r ˆ When V satisfies the constraints, the objective function ∥A − U V ⊤ ∥2 = F j=1 l=1 ∥aj − ˜ 2 ul ∥2 I(vj = el ) is the sum of squared distances, each of which is between a datum and the center of the cluster that the datum is assigned to. [sent-104, score-0.472]
33 The optimization problem (6), its objective function, and clustering based on it are called in this paper the K-means problem, the K-means loss function, and the K-means clustering, respectively. [sent-105, score-0.172]
34 If U0 and V0 follow pU and pV , reˆ ˆ spectively, the marginal MAP estimation is optimal in the sense that it maximizes the expectation of accuracy with respect to p(V0 |A). [sent-107, score-0.089]
35 Here, accuracy is defined as the fraction of correctly assigned data ˆ among all data. [sent-108, score-0.096]
36 We call the clustering using approximate marginal MAP estimation the maximum accuracy clustering, even when incorrect prior distributions are used. [sent-109, score-0.269]
37 A popular deterministic method is to use the variational Bayesian formalism. [sent-111, score-0.071]
38 The variational Bayes matrix factorization [4, 5] approximates the posterior distribution p(U, V |A) as the product of two functions pVB (U ) and pVB (V ), which are determined so that the Kullback-Leibler (KL) U V divergence from pVB (U )pVB (V ) to p(U, V |A) is minimized. [sent-112, score-0.256]
39 Applying the variational Bayes matrix factorization to the MAP problem, one obtains the iterated conditional modes (ICM) algorithm, which alternates minimization of C MAP (U, V ) over U for fixed V and minimization over V for fixed U . [sent-114, score-0.277]
40 The representative algorithm to solve the K-means problem approximately is Lloyd’s K-means algorithm [12]. [sent-115, score-0.09]
41 Lloyd’s K-means algorithm is regarded as the ICM algorithm: It alternates minimization of the K-means loss function over U for fixed V and minimization over V for fixed U iteratively. [sent-116, score-0.151]
42 nt = l N ∑ t I(vj = el ), ˜l ut = j=1 t+1 lj = arg min l∈{1,. [sent-118, score-0.292]
43 ,ˆ} r ˜l 2 ∥aj − ut ∥2 , N 1 ∑ t aj I(vj = el ), nt j=1 l (7a) t+1 vj = elt+1 . [sent-121, score-0.554]
44 (7b) j Throughout this paper, we represent an algorithm by a set of equations as in the above. [sent-122, score-0.072]
45 This representation means that the algorithm begins with a set of initial values and repeats the update of the variables using the equations presented until it satisfies some stopping criteria. [sent-123, score-0.127]
46 Lloyd’s K-means algorithm begins with a set of initial assignments V 0 ∈ {e1 , . [sent-124, score-0.133]
47 This algorithm easily gets ˆ stuck in local minima and its performance heavily depends on the initial values of the algorithm. [sent-128, score-0.118]
48 Maximum accuracy clustering can be solved approximately by using the variational Bayes matrix factorization, since it gives an approximation to the marginal posterior distribution of vj given A. [sent-130, score-0.658]
49 1 Proposed algorithm Approximate message passing algorithm for low-rank matrix reconstruction We first discuss the general idea of the AMP algorithm and advantages of the AMP algorithm compared with the variational Bayes matrix factorization. [sent-132, score-0.651]
50 The AMP algorithm is derived by approximating the belief propagation message passing algorithm in a way thought to be asymptotically exact for large-scale problems with appropriate randomness. [sent-133, score-0.387]
51 Fixed points of the belief propagation message passing algorithm correspond to local minima of the KL divergence between a kind of trial function and the posterior distribution [17]. [sent-134, score-0.4]
52 Therefore, the belief propagation message passing algorithm can be regarded as an iterative algorithm based on an approximation of the posterior distribution, which is called the Bethe approximation. [sent-135, score-0.421]
53 Therefore, one ˆ can intuitively expect that performance of the AMP algorithm is better than that of the variational Bayes matrix factorization, which treats U and V as if they were independent in p(U, V |A). [sent-137, score-0.171]
54 ˆ An important property of the AMP algorithm, aside from its efficiency and effectiveness, is that one can predict performance of the algorithm accurately for large-scale problems by using a set of equations, called the state evolution [6]. [sent-138, score-0.079]
55 Although we can present the state evolution for the algorithm proposed in this paper and give a proof of its validity like [8, 18], we do not discuss the state evolution here due to the limited space available. [sent-140, score-0.113]
56 An algorithm for the marginalization problem on p(U, V |A; β) is particuˆ ˆ larized to the algorithms for the marginalization problem and for the MAP problem for the original posterior distribution p(U, V |A) by letting β = 1 and β → ∞, respectively. [sent-145, score-0.241]
57 The AMP algorithm ˆ for the marginalization problem on p(U, V |A; β) is derived in a way similar to that described in [9], ˆ as detailed in the Supplementary Material. [sent-146, score-0.127]
58 The algorithm requires an initial value V 0 and ˆ ˆ r ˆ ˆ ˆ r ˆ r begins with Tj0 = O. [sent-171, score-0.1]
59 The marginal posterior distribution is then approximated as pi,j (ui , vj |A) ≈ q (ui ; b∞ , Λ∞ , pu )ˆ(vj ; b∞ , Λ∞ , pv ). [sent-184, score-0.9]
60 ˆ ˆ v ˆ v,j u ˆ q u,i (12) ∞ Since u∞ and vj are the means of q (u; b∞ , Λ∞ , pu ) and q (v; b∞ , Λ∞ , pv ), respectively, the ˆ ˆ u ˆ v ˆ i u,i v,j ∫ posterior mean E[U V ⊤ |A] = U V ⊤ p(U, V |A)dU dV is approximated as ˆ E[U V ⊤ |A] ≈ U ∞ (V ∞ )⊤ . [sent-185, score-0.859]
61 (13) MMAP are approximated as The marginal MAP estimates uMMAP and vj i MMAP vj ≈ arg max q (v; b∞ , Λ∞ , pv ). [sent-186, score-0.855]
62 ˆ v ˆ v,j uMMAP ≈ arg max q (u; b∞ , Λ∞ , pu ), ˆ u ˆ u,i i v u (14) Taking the limit β → ∞ in Algorithm 2 yields an algorithm for the MAP problem (4). [sent-187, score-0.401]
63 The computational cost per iteration is O(mN ), which is linear in the number of components of the matrix A. [sent-195, score-0.078]
64 Second, Algorithm 2 has a form similar to that of an algorithm based on the variational Bayesian matrix factorization. [sent-201, score-0.171]
65 In fact, if the last terms on the right-hand sides of the four equations in (9a) and (9c) are removed, the resulting algorithm is the same as an algorithm based on the variational Bayesian matrix factorization proposed in [4] and, in particular, the same as the ICM algorithm when β → ∞. [sent-202, score-0.358]
66 (Note, however, that [4] only treats the case where the priors pu and pv are multivariate ˆ ˆ Gaussian distributions. [sent-203, score-0.563]
67 It has two implications: (i) Execution of the ICM algorithm initialized with the converged values of the AMP algorithm does not improve C MAP (U t , V t ). [sent-221, score-0.09]
68 The second implication may help the AMP algorithm avoid getting stuck in bad local minima. [sent-223, score-0.07]
69 3 Clustering via AMP algorithm One can use the AMP algorithm for the MAP problem to perform the K-means clustering by letting ∑r ˆ ˆ pu (u) = 1 and pv (v) = r−1 l=1 δ(v − el ). [sent-225, score-0.941]
70 Noting that f∞ (b, Λ; pv ) is piecewise constant with ˆ ˆ ˆ respect to b and hence G∞ (b, Λ; pv ) is O almost everywhere, we obtain the following algorithm: ˆ Algorithm 3 (AMP algorithm for the K-means clustering). [sent-226, score-0.617]
71 Algorithm 3 is rewritten as follows: ˆ nt = l N ∑ j=1 t+1 lj = arg t I(vj = el ), ˜l ut = N 1 ∑ t aj I(vj = el ), nt j=1 l [ 1 2m m] t ˜l 2 ∥aj − ut ∥2 + t I(vj = el ) − t , nl nl l∈{1,. [sent-235, score-0.779]
72 j (18b) The parameter τ appearing in the algorithm does not exist in the∑ K-means clustering problem. [sent-239, score-0.192]
73 While the AMP algorithm for the KF means clustering updates the value of U in the same way as Lloyd’s K-means algorithm, it performs assignments of data to clusters in a different way. [sent-242, score-0.252]
74 In the AMP algorithm, in addition to distances from data to centers of clusters, the assignment at present is taken into consideration in two ways: (i) A datum is less likely to be assigned to the cluster that it is assigned to at present. [sent-243, score-0.334]
75 (ii) Data are more likely to be assigned to a cluster whose size at present is smaller. [sent-244, score-0.127]
76 The former can intuitively be t understood by observing that if vj = el , one should take account of the fact that the cluster center t t ˜ ul is biased toward aj . [sent-245, score-0.569]
77 The term 2m(nt )−1 I(vj = el ) in (18b) corrects this bias, which, as it l should be, is inversely proportional to the cluster size. [sent-246, score-0.196]
78 The AMP algorithm for maximum accuracy clustering is obtained by letting β = 1 and pv (v) be ˆ ∞ a discrete distribution on {e1 , . [sent-247, score-0.55]
79 After the algorithm converges, arg maxv q (v; vj , Λ∞ , pv ) ˆ ˆ v ˆ gives the final cluster assignment of the jth datum and U ∞ gives the estimate of the cluster centers. [sent-251, score-0.952]
80 For the 0 other algorithms, initial values vj , j = 1, . [sent-268, score-0.26]
81 We used the true prior distributions of U and V for maximum accuracy clustering. [sent-272, score-0.081]
82 We ran the AMP algorithm for the K-means clustering until either V t = V t−1 or V t = V t−2 is satisfied. [sent-274, score-0.192]
83 We then evaluated F F the following performance measures for the obtained solution (U ∗ , V ∗ ): ∑N ∑N 1 ¯ ¯ 2 • Normalized K-means loss ∥A−U ∗ (V ∗ )⊤ ∥2 /( j=1 ∥aj − a∥2 ), where a := N j=1 aj . [sent-277, score-0.134]
84 F ∑N ∗ • Accuracy maxP N −1 j=1 I(P vj = v0,j ), where the maximization is taken over all r-by-r permutation matrices. [sent-278, score-0.236]
85 The AMP algorithm for the K-means clustering achieves the smallest Kmeans loss among the five algorithms, while the Lloyd’s K-means algorithm and K-means++ show large K-means losses for r ≥ 5. [sent-284, score-0.262]
86 The AMP algorithm for maximum accuracy clustering achieves the highest accuracy among the five algorithms. [sent-286, score-0.288]
87 In particular, the convergence speed of the AMP algorithm for maximum accuracy clustering is comparable to that of the AMP algorithm for the K-means clustering when the two algorithms show similar accuracy (r < 9). [sent-288, score-0.48]
88 This is in contrast to the common observation that the variational Bayes method often shows slower convergence than the ICM algorithm. [sent-289, score-0.071]
89 We divided N = 400 images into r = 40 ˆ clusters with the K-means++ and the AMP algorithm for the K-means clustering. [sent-331, score-0.072]
90 We adopted the initialization method of the K-means++ also for the AMP algorithm, because random initialization often yielded empty clusters and almost all data were assigned to only one cluster. [sent-332, score-0.075]
91 We ran 50 trials with different initial values, and Figure 2 summarizes the results. [sent-335, score-0.079]
92 The AMP algorithm for the K-means clustering outperformed the standard K-means++ algorithm in 48 out of the 50 trials in terms of the K-means loss and in 47 trials in terms of the accuracy. [sent-336, score-0.372]
93 The AMP algorithm yielded just one cluster with all data assigned to it in two trials. [sent-337, score-0.172]
94 Hoyer, “Non-negative matrix factorization with sparseness constraints,” The Journal of Machine Learning Research, vol. [sent-355, score-0.151]
95 Mnih, “Bayesian probabilistic matrix factorization using Markov chain Monte Carlo,” in Proceedings of the 25th International Conference on Machine Learning, New York, NY, Jul. [sent-361, score-0.125]
96 Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proceedings of 2011 IEEE International Symposium on Information Theory, St. [sent-397, score-0.203]
97 Tanaka, “Approximate message passing algorithm for low-rank matrix reconstruction,” in Proceedings of the 35th Symposium on Information Theory and its Applications, Oita, Japan, Dec. [sent-410, score-0.303]
98 Gong, “Document clustering based on non-negative matrix factorization,” in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Toronto, Canada, Jul. [sent-416, score-0.202]
99 Zdeborov´ , “Phase diagram and approximate message passing for blind e a calibration and dictionary learning,” preprint, Jan. [sent-438, score-0.252]
100 Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Transactions on Information Theory, vol. [sent-473, score-0.234]
wordName wordTfidf (topN-words)
[('amp', 0.649), ('pv', 0.286), ('pu', 0.277), ('vj', 0.236), ('lloyd', 0.159), ('clustering', 0.147), ('icm', 0.134), ('el', 0.117), ('map', 0.114), ('aj', 0.109), ('rr', 0.106), ('message', 0.102), ('passing', 0.101), ('datum', 0.1), ('pvb', 0.087), ('reconstruction', 0.087), ('cluster', 0.079), ('variational', 0.071), ('factorization', 0.07), ('bv', 0.067), ('mmap', 0.065), ('tjt', 0.065), ('ummap', 0.065), ('bu', 0.064), ('posterior', 0.06), ('bt', 0.06), ('marginalization', 0.056), ('arg', 0.056), ('trials', 0.055), ('matrix', 0.055), ('tj', 0.053), ('si', 0.052), ('bayes', 0.049), ('assigned', 0.048), ('accuracy', 0.048), ('structural', 0.048), ('bayesian', 0.047), ('ut', 0.047), ('semide', 0.047), ('nt', 0.045), ('algorithm', 0.045), ('elt', 0.044), ('integrations', 0.044), ('matsushita', 0.044), ('shinanomachi', 0.044), ('marginal', 0.041), ('rm', 0.041), ('japan', 0.04), ('ui', 0.04), ('er', 0.04), ('assignment', 0.036), ('propagation', 0.036), ('reconstructing', 0.036), ('rangan', 0.035), ('tanaka', 0.035), ('maxv', 0.035), ('calculate', 0.035), ('evolution', 0.034), ('assignments', 0.033), ('distributions', 0.033), ('belief', 0.032), ('av', 0.032), ('reconstructs', 0.032), ('hungarian', 0.032), ('lth', 0.032), ('begins', 0.031), ('compressed', 0.031), ('bethe', 0.03), ('improper', 0.03), ('calculation', 0.029), ('minimization', 0.029), ('ul', 0.028), ('sensing', 0.028), ('mn', 0.028), ('lj', 0.027), ('equations', 0.027), ('normalized', 0.027), ('clusters', 0.027), ('montanari', 0.026), ('sparseness', 0.026), ('nl', 0.026), ('calibration', 0.026), ('derived', 0.026), ('symposium', 0.026), ('stuck', 0.025), ('sm', 0.025), ('loss', 0.025), ('du', 0.025), ('letting', 0.024), ('dv', 0.024), ('minima', 0.024), ('initial', 0.024), ('iterations', 0.024), ('blind', 0.023), ('deals', 0.023), ('iteration', 0.023), ('limit', 0.023), ('tn', 0.023), ('alternates', 0.023), ('centers', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
Author: Ryosuke Matsushita, Toshiyuki Tanaka
Abstract: We study the problem of reconstructing low-rank matrices from their noisy observations. We formulate the problem in the Bayesian framework, which allows us to exploit structural properties of matrices in addition to low-rankedness, such as sparsity. We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. We have also successfully applied the proposed algorithm to a clustering problem, by reformulating it as a low-rank matrix reconstruction problem with an additional structural property. Numerical experiments show that the proposed algorithm outperforms Lloyd’s K-means algorithm. 1
2 0.26582372 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
3 0.25627074 196 nips-2013-Modeling Overlapping Communities with Node Popularities
Author: Prem Gopalan, Chong Wang, David Blei
Abstract: We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB). Our model integrates two basic properties of nodes in social networks: homophily and preferential connection to popular nodes. We develop a scalable algorithm for posterior inference, based on a novel nonconjugate variant of stochastic variational inference. We evaluate the link prediction accuracy of our algorithm on nine real-world networks with up to 60,000 nodes, and on simulated networks with degree distributions that follow a power law. We demonstrate that the AMP predicts significantly better than the MMSB. 1
4 0.19072981 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
Author: Christophe Schulke, Francesco Caltagirone, Florent Krzakala, Lenka Zdeborova
Abstract: Compressed sensing (CS) is a concept that allows to acquire compressible signals with a small number of measurements. As such it is very attractive for hardware implementations. Therefore, correct calibration of the hardware is a central issue. In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown. We extend the approximate message passing (AMP) algorithm used in CS to the case of blind calibration. In the calibration-AMP, both the gains on the sensors and the elements of the signals are treated as unknowns. Our algorithm is also applicable to settings in which the sensors distort the measurements in other ways than multiplication by a gain, unlike previously suggested blind calibration algorithms based on convex relaxations. We study numerically the phase diagram of the blind calibration problem, and show that even in cases where convex relaxation is possible, our algorithm requires a smaller number of measurements and/or signals in order to perform well. 1
5 0.089938872 158 nips-2013-Learning Multiple Models via Regularized Weighting
Author: Daniel Vainsencher, Shie Mannor, Huan Xu
Abstract: We consider the general problem of Multiple Model Learning (MML) from data, from the statistical and algorithmic perspectives; this problem includes clustering, multiple regression and subspace clustering as special cases. A common approach to solving new MML problems is to generalize Lloyd’s algorithm for clustering (or Expectation-Maximization for soft clustering). However this approach is unfortunately sensitive to outliers and large noise: a single exceptional point may take over one of the models. We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. This enhances robustness by making assumptions on class balance. We further provide generalization bounds and explain how the new iterations may be computed efficiently. We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i.e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers. 1
6 0.080699869 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
7 0.080457784 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
8 0.069454372 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
9 0.069446206 168 nips-2013-Learning to Pass Expectation Propagation Messages
10 0.062620915 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
11 0.060526647 186 nips-2013-Matrix factorization with binary components
12 0.059551325 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
13 0.05936278 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
14 0.058738872 75 nips-2013-Convex Two-Layer Modeling
15 0.057144422 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
16 0.054732077 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
17 0.053901199 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
18 0.053193431 133 nips-2013-Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering
19 0.052372027 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
20 0.051802542 148 nips-2013-Latent Maximum Margin Clustering
topicId topicWeight
[(0, 0.171), (1, 0.059), (2, 0.014), (3, 0.04), (4, 0.01), (5, 0.099), (6, 0.058), (7, -0.007), (8, -0.018), (9, -0.019), (10, 0.081), (11, 0.054), (12, 0.057), (13, -0.056), (14, -0.097), (15, -0.095), (16, -0.002), (17, -0.048), (18, -0.011), (19, -0.047), (20, 0.058), (21, -0.033), (22, -0.004), (23, -0.053), (24, 0.006), (25, 0.079), (26, -0.083), (27, 0.058), (28, 0.218), (29, -0.195), (30, -0.075), (31, 0.027), (32, 0.182), (33, 0.128), (34, 0.124), (35, -0.226), (36, 0.024), (37, 0.025), (38, 0.036), (39, 0.145), (40, 0.167), (41, 0.056), (42, -0.056), (43, -0.047), (44, 0.053), (45, 0.041), (46, 0.021), (47, 0.038), (48, -0.013), (49, 0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.89786625 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
Author: Ryosuke Matsushita, Toshiyuki Tanaka
Abstract: We study the problem of reconstructing low-rank matrices from their noisy observations. We formulate the problem in the Bayesian framework, which allows us to exploit structural properties of matrices in addition to low-rankedness, such as sparsity. We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. We have also successfully applied the proposed algorithm to a clustering problem, by reformulating it as a low-rank matrix reconstruction problem with an additional structural property. Numerical experiments show that the proposed algorithm outperforms Lloyd’s K-means algorithm. 1
2 0.78988326 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
Author: Christophe Schulke, Francesco Caltagirone, Florent Krzakala, Lenka Zdeborova
Abstract: Compressed sensing (CS) is a concept that allows to acquire compressible signals with a small number of measurements. As such it is very attractive for hardware implementations. Therefore, correct calibration of the hardware is a central issue. In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown. We extend the approximate message passing (AMP) algorithm used in CS to the case of blind calibration. In the calibration-AMP, both the gains on the sensors and the elements of the signals are treated as unknowns. Our algorithm is also applicable to settings in which the sensors distort the measurements in other ways than multiplication by a gain, unlike previously suggested blind calibration algorithms based on convex relaxations. We study numerically the phase diagram of the blind calibration problem, and show that even in cases where convex relaxation is possible, our algorithm requires a smaller number of measurements and/or signals in order to perform well. 1
3 0.58521831 196 nips-2013-Modeling Overlapping Communities with Node Popularities
Author: Prem Gopalan, Chong Wang, David Blei
Abstract: We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB). Our model integrates two basic properties of nodes in social networks: homophily and preferential connection to popular nodes. We develop a scalable algorithm for posterior inference, based on a novel nonconjugate variant of stochastic variational inference. We evaluate the link prediction accuracy of our algorithm on nine real-world networks with up to 60,000 nodes, and on simulated networks with degree distributions that follow a power law. We demonstrate that the AMP predicts significantly better than the MMSB. 1
4 0.50073504 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
5 0.43753436 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
Author: Junming Yin, Qirong Ho, Eric Xing
Abstract: We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. 1
6 0.42610791 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
7 0.41723043 168 nips-2013-Learning to Pass Expectation Propagation Messages
8 0.41129959 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
9 0.40973288 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
10 0.39125741 247 nips-2013-Phase Retrieval using Alternating Minimization
11 0.38939697 284 nips-2013-Robust Spatial Filtering with Beta Divergence
12 0.38803679 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
13 0.35146344 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
14 0.34791976 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
15 0.34374595 186 nips-2013-Matrix factorization with binary components
16 0.34042117 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
17 0.33986351 63 nips-2013-Cluster Trees on Manifolds
18 0.33919251 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
19 0.32948807 72 nips-2013-Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses
20 0.32740897 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
topicId topicWeight
[(2, 0.011), (16, 0.026), (33, 0.117), (34, 0.117), (41, 0.354), (49, 0.019), (56, 0.096), (70, 0.029), (85, 0.04), (89, 0.041), (93, 0.045), (95, 0.014)]
simIndex simValue paperId paperTitle
1 0.94414121 257 nips-2013-Projected Natural Actor-Critic
Author: Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan
Abstract: Natural actor-critics form a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability—their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural ActorCritics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent. 1
2 0.91132581 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
Author: Nils E. Napp, Ryan P. Adams
Abstract: Recent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale. Just as in macroscale robotics, it is critical that such devices can learn about their environment and reason under uncertainty. At this small scale, systems are typically modeled as chemical reaction networks. In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. In particular, we show that marginalization based on sum-product message passing can be implemented in terms of reactions between chemical species whose concentrations represent probabilities. We show algebraically that the steady state concentration of these species correspond to the marginal distributions of the random variables in the graph and validate the results in simulations. As with standard sum-product inference, this procedure yields exact results for tree-structured graphs, and approximate solutions for loopy graphs.
3 0.86005777 193 nips-2013-Mixed Optimization for Smooth Functions
Author: Mehrdad Mahdavi, Lijun Zhang, Rong Jin
Abstract: It is well known that the optimal convergence rate for stochastic optimization of √ smooth functions is O(1/ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T 2 ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to significantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ). 1
4 0.84214795 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
Author: Justin Domke, Xianghang Liu
Abstract: Inference in general Ising models is difficult, due to high treewidth making treebased algorithms intractable. Moreover, when interactions are strong, Gibbs sampling may take exponential time to converge to the stationary distribution. We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences. We find that Gibbs sampling using the projected parameters is more accurate than with the original parameters when interaction strengths are strong and when limited time is available for sampling.
5 0.82031333 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf
Abstract: Causal inference uses observational data to infer the causal structure of the data generating system. We study a class of restricted Structural Equation Models for time series that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. This work contains two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we provide general identifiability results. They cover lagged and instantaneous effects that can be nonlinear and unfaithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. We show empirically that when the data are causally insufficient or the model is misspecified, the method avoids incorrect answers. We extend the theoretical and the algorithmic part to situations in which the time series have been measured with different time delays. TiMINo is applied to artificial and real data and code is provided. 1
same-paper 6 0.78571981 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
7 0.7766 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
8 0.74808472 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
9 0.72824574 11 nips-2013-A New Convex Relaxation for Tensor Completion
10 0.63697809 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
11 0.63184363 20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
12 0.62872338 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
13 0.62058789 184 nips-2013-Marginals-to-Models Reducibility
14 0.61989564 54 nips-2013-Bayesian optimization explains human active search
15 0.59712875 318 nips-2013-Structured Learning via Logistic Regression
16 0.59407771 301 nips-2013-Sparse Additive Text Models with Low Rank Background
17 0.59333032 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
18 0.59295356 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
19 0.5925476 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
20 0.58890271 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning