nips nips2010 nips2010-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. [sent-11, score-0.109]
2 In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. [sent-12, score-0.295]
3 This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. [sent-13, score-0.137]
4 We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. [sent-14, score-0.165]
5 Singular value decomposition (SVD) is a classical method for MF, which gives the optimal lowrank approximation to the target matrix in terms of the squared error. [sent-17, score-0.043]
6 , singular values are regularized by the ℓ2 penalty) [17] or the trace-norm penalty (i. [sent-20, score-0.067]
7 , singular values are regularized by the ℓ1 -penalty) [23]. [sent-22, score-0.051]
8 In contrast, the trace-norm penalty tends to produce sparse solutions, so a low-rank solution can be obtained without explicit rank constraints. [sent-24, score-0.081]
9 This implies that the optimization problem of trace-norm MF is still convex, and thus the global optimal solution can be obtained. [sent-25, score-0.127]
10 A maximum a posteriori (MAP) estimation, which computes the mode of the posterior distributions, was shown [23] to correspond to the ℓ1 -MF when Gaussian priors are imposed on factorized matrices [22]. [sent-28, score-0.054]
11 The variational Bayesian (VB) method [3, 5], which approximates the posterior distributions by factorized distributions, has also been applied to MF [13, 18]. [sent-29, score-0.075]
12 1 M ⊤A H M H L= B U L Figure 1: Matrix factorization model. [sent-31, score-0.035]
13 In practice, the VBMF solution is computed by the iterated conditional modes (ICM) [4, 5], where the mean and the covariance of the posterior distributions are iteratively updated until convergence [13, 18]. [sent-41, score-0.119]
14 One may obtain a local optimal solution by the ICM algorithm, but many restarts would be necessary to find a good local optimum. [sent-42, score-0.122]
15 In this paper, we first show that, although the optimization problem is non-convex, the global optimal solution of VBMF can be computed analytically by solving a quartic equation. [sent-43, score-0.255]
16 This is highly advantageous over the standard ICM algorithm since the global optimum can be found without any iterations and restarts. [sent-44, score-0.098]
17 Again, the optimization problem of EVBMF is non-convex, but we still show that the global optimal solution of EVBMF can be computed analytically. [sent-46, score-0.127]
18 Recently, the global optimal solution of VBMF when the target matrix is square has been obtained in [15]. [sent-48, score-0.158]
19 On the other hand, for EVBMF, this is the first paper that gives the analytic global solution, to the best of our knowledge. [sent-50, score-0.118]
20 The global analytic solution for EVBMF is shown to be highly useful in experiments. [sent-51, score-0.18]
21 2 Bayesian Matrix Factorization In this section, we formulate the MF problem and review a variational Bayesian MF algorithm. [sent-52, score-0.04]
22 Without loss of generality, we a b assume that the product cah cbh is non-increasing with respect to h. [sent-73, score-0.28]
23 Let r(A, B|V n ) be a trial distribution for A and B, and let FVB be the variational Bayes (VB) free energy with respect to r(A, B|V n ): r(A, B|V n ) , FVB (r|V n ) = log p(V n , A, B) r(A,B|V n ) where 〈·〉p denotes the expectation over p. [sent-74, score-0.213]
24 The VB approach minimizes the VB free energy FVB (r|V n ) with respect to the trial distribution r(A, B|V n ), by restricting the search space of r(A, B|V n ) so that the minimization is computationally tractable. [sent-75, score-0.173]
25 The VB solution U VB is given by the VB posterior mean: U VB = 〈BA⊤ 〉r(A,B|V n ) . [sent-78, score-0.071]
26 By applying the variational method to the VB free energy, we see that the VB posterior can be expressed as follows: H NM (ah ; µah , Σah )NL (bh ; µbh , Σbh ), r(A, B|V n ) = h=1 where Nd (·; µ, Σ) denotes the d-dimensional Gaussian density with mean µ and covariance matrix Σ. [sent-79, score-0.17]
27 i=1 The iterated conditional modes (ICM) algorithm [4, 5] for VBMF (VB-ICM) iteratively updates µah , µbh , Σah , and Σbh by Eq. [sent-81, score-0.048]
28 3 Empirical Variational Bayesian Matrix Factorization In the VB framework, hyperparameters (c2 h and c2h in the current setup) can also be learned from a b data by minimizing the VB free energy, which is called the empirical VB (EVB) method [5]. [sent-89, score-0.1]
29 By setting the derivatives of the VB free energy with respect to c2 h and c2h to zero, the following a b optimality condition can be obtained (see also Eq. [sent-90, score-0.163]
30 Again, one may obtain a local optimal solution by this algorithm. [sent-95, score-0.078]
31 3 Analytic-form Expression of Global Optimal Solution of VBMF In this section, we derive an analytic-form expression of the VBMF global solution. [sent-96, score-0.063]
32 The VB free energy can be explicitly expressed as follows. [sent-97, score-0.164]
33 This is a non-convex ++ optimization problem, but still we show that the global optimal solution can be analytically obtained. [sent-111, score-0.165]
34 Let γh (≥ 0) be the h-th largest singular value of V , and let ω ah and ω bh be the associated right and left singular vectors:2 L γh ω bh ω ⊤h . [sent-112, score-1.432]
35 Let γh = (L + M )σ 2 σ4 + 2 2 2 2n 2n cah cbh (L + M )σ 2 σ4 + 2 2 2 + 2n 2n cah cbh 2 − LM σ 4 . [sent-114, score-0.54]
36 n2 (6) Then we can analytically express the VBMF solution U VB as in the following theorem. [sent-115, score-0.09]
37 2 In our analysis, we assume that V has no missing entry, and its singular value decomposition (SVD) is easily obtained. [sent-116, score-0.064]
38 Therefore, our results cannot be directly applied to missing entry prediction. [sent-117, score-0.042]
39 4 Theorem 1 The global VB solution can be expressed as H VB VB γh ω bh ω ⊤h , where γh = a U VB = h=1 γh 0 if γh > γh , otherwise. [sent-118, score-0.616]
40 Then, by analyzing the stationary condition (2), we obtain an equation with respect to γh as a necessary and sufficient condition to be a stationary point (note that its quadratic approximation gives bounds of the solution [15]). [sent-120, score-0.107]
41 Its rigorous evaluation results in the quartic equation (5). [sent-121, score-0.103]
42 Finally, we show that only the second largest solution of the quartic equation (5) lies within the bounds, which completes the proof. [sent-122, score-0.169]
43 The coefficients of the quartic equation (5) are analytic, so γh can also be obtained analytically3 , e. [sent-123, score-0.103]
44 Therefore, the global VB solution can be analytically computed. [sent-126, score-0.153]
45 This is a strong advantage over the standard ICM algorithm since many iterations and restarts would be necessary to find a good solution by ICM. [sent-127, score-0.095]
46 Based on the above result, the complete VB posterior can also be obtained analytically as follows. [sent-128, score-0.057]
47 When the noise variance σ 2 is unknown, one may use the minimizer of the VB free energy with respect to σ 2 as its estimate. [sent-130, score-0.203]
48 4 Analytic-form Expression of Global Optimal Solution of Empirical VBMF In this section, we solve the following problem to obtain the EVBMF global solution: Given σ 2 ∈ R++ , min FVB ({µah , µbh , Σah , Σbh , c2 h , c2h ; h = 1, . [sent-133, score-0.063]
49 We show that, al++ though this is again a non-convex optimization problem, the global optimal solution can be obtained analytically. [sent-142, score-0.127]
50 We can observe the invariance of the VB free energy (4) under the transform (µah , µbh , Σah , Σbh , c2 h , c2h ) → (sh µah , s−1 µbh , s2 Σah , s−2 Σbh , s2 c2 h , s−2 c2h ) a b h h a h h h b 3 R In practice, one may solve the quartic equation numerically, e. [sent-143, score-0.256]
51 25 2 0 Global solution 1 2 4 Global solution 3 0 1 2 3 0 1 ch ch (a) V = 1. [sent-150, score-0.674]
52 7 Figure 2: Profiles of the VB free energy (4) when L = M = H = 1, n = 1, and σ 2 = 1 for observations V = 1. [sent-153, score-0.153]
53 5 < 2 = γ h , the VB free energy is monotone increasing and thus the global solution is given by ch → 0. [sent-158, score-0.553]
54 1 > 2 = γ h , a local minimum exists at ch = ch ≈ 1. [sent-160, score-0.584]
55 Accordingly, we fix the ratios to cah /cbh = S > 0, and refer to ch := cah cbh also as a hyperparameter. [sent-171, score-0.69]
56 (L + M )σ 2 2 γh − n 2 4LM σ 4 , − n2 (7) Then, we have the following lemma: Lemma 3 If γh ≥ γ h , the VB free energy function (4) can have two local minima, namely, ch → 0 and ch = ch . [sent-173, score-1.022]
57 Otherwise, ch → 0 is the only local minimum of the VB free energy. [sent-174, score-0.379]
58 ˘ Sketch of proof: Analyzing the region where ch is so small that the VB solution given ch is γh = 0, we find a local minimum ch → 0. [sent-175, score-0.921]
59 Combining the stationary conditions (2) and (3), we derive a quadratic equation with respect to c2 whose larger solution is given by Eq. [sent-176, score-0.091]
60 Showing that the h smaller solution corresponds to saddle points completes the proof. [sent-178, score-0.066]
61 Figure 2 shows the profiles of the VB free energy (4) when L = M = H = 1, n = 1, and σ 2 = 1 for observations V = 1. [sent-179, score-0.153]
62 As illustrated, depending on the value of V , either ch → 0 or ch = ch is the global solution. [sent-183, score-0.918]
63 ˘ Let nγh VB nγh VB n γ + 1 + L log ˘ γ + 1 + 2 −2γh γh + LM c2 , (8) ˘ ˘ VB ˘h M σ2 h Lσ 2 h σ where γh is the VB solution for ch = ch . [sent-184, score-0.622]
64 We can show that the sign of ∆h corresponds to that of ˘ VB ˘ the difference of the VB free energy at ch = ch and ch → 0. [sent-185, score-1.008]
65 ∆h := M log Theorem 4 The hyperparameter ch that globally minimizes the VB free energy function (4) is given by ch = ch if γh > γ h and ∆h ≤ 0. [sent-187, score-1.054]
66 ˘ Corollary 5 The global EVB solution can be expressed as H EVB EVB γh ω bh ω ⊤h , where γh := a U EVB = h=1 γh ˘ VB 0 if γh > γ h and ∆h ≤ 0, otherwise. [sent-189, score-0.616]
67 Since the optimal hyperparameter value ch can be expressed in a closed-form, the global EVB solution can also be computed analytically using the result given in Section 3. [sent-190, score-0.507]
68 This is again a strong advantage over the standard ICM algorithm since ICM would require many iterations and restarts to find a good solution. [sent-191, score-0.043]
69 1 Artificial Dataset H∗ We randomly created a true matrix V ∗ = h=1 b∗ a∗⊤ with L = 30, M = 100, and H ∗ = 10, h h where every element of {ah , bh } was drawn independently from the standard Gaussian distribution. [sent-195, score-0.525]
70 We set n = 1, and an observation matrix V was created by adding independent Gaussian noise with variance σ 2 = 1 to each element. [sent-196, score-0.056]
71 We first investigate the learning curve of the VB free energy over EVB-ICM iterations. [sent-202, score-0.153]
72 10 learning curves of the VB free energy were plotted in Figures 3(a). [sent-205, score-0.153]
73 The value of the VB free energy of the global solution computed by our analytic-form solution was also plotted in the graph by the dashed line. [sent-206, score-0.32]
74 The graph shows that the EVB-ICM algorithm reduces the VB free energy reasonably well over iterations. [sent-207, score-0.153]
75 Figure 3(b) shows the computation time of EVB-ICM over iterations and our analytic form-solution. [sent-210, score-0.068]
76 On the other hand, the computation of our analytic-form solution took only 0. [sent-213, score-0.052]
77 Thus, our method provides the reduction of computation time in 4 orders of magnitude, with better accuracy as a minimizer of the VB free energy. [sent-215, score-0.099]
78 Next, we investigate the generalization error of the global analytic solutions of VB and EVB, measured by G = ∥U − V ∗ ∥2 /(LM ). [sent-216, score-0.118]
79 Figure 3(c) shows the mean and error bars (min and max) Fro over 10 runs for VB with various hyperparameter values and EVB. [sent-217, score-0.046]
80 , c1 = · · · = cH ) in VB, while each hyperparameter ch was separately optimized in EVB. [sent-220, score-0.331]
81 Thus, automatic hyperparameter selection of EVB works quite well. [sent-222, score-0.046]
82 Figure 3(d) shows the hyperparameter values chosen in EVB sorted in the decreasing order. [sent-223, score-0.046]
83 This shows that, for all 10 runs, ch is positive for h ≤ H ∗ (= 10) and zero for h > H ∗ . [sent-224, score-0.285]
84 Here, we solve these tasks by VBMF and evaluate the performance using the concrete slump test dataset [28] available from the UCI repository [2]. [sent-228, score-0.066]
85 Overall, the proposed global analytic solution is shown to be a useful alternative to the popular ICM algorithm. [sent-231, score-0.17]
86 In this paper, we focused on the MF problem with no missing entry, and showed that this weakness could be overcome by computing the global optimal solution analytically. [sent-233, score-0.15]
87 We further derived the global optimal solution analytically for the EVBMF 7 0. [sent-234, score-0.165]
88 18 0 0 100 (a) VB free energy 50 Iteration 0 (b) Computation time 0 0 1 10 √ ch 100 10 10 20 30 h (c) Generalization error (d) Hyperparameter value Figure 3: Experimental results for artificial dataset. [sent-256, score-0.438]
89 53 50 100 150 Iteration 200 (a) VB free energy 250 0 0 50 50 100 150 Iteration 200 250 (b) Computation time −3 10 √ ch 10 −1 1 10 (c) Generalization error 0 0 1 2 h 3 (d) Hyperparameter value Figure 4: Experimental results of CCA for the concrete slump test dataset. [sent-273, score-0.492]
90 Since no hand-tuning parameter remains in EVBMF, our analytic-form solution is practically useful and computationally highly efficient. [sent-275, score-0.062]
91 When cah cbh → ∞, the priors get (almost) flat and the quartic equation (5) is factorized as lim cah cb →∞ “ “ “ ” ”“ “ ” ”“ “ ” ”“ ” ” 2 2 2 2 fh (t) = t + M 1− σ L γh t + 1− σ M γh t − 1− σ M γh t − M 1− σ L γh = 0. [sent-277, score-0.57]
92 L L nγ 2 nγ 2 nγ 2 nγ 2 h h h h h Theorem 1 states that its second largest solution gives the VB estimator for γh > limcah cbh →∞ γh = M σ 2 /n. [sent-278, score-0.197]
93 2 cah cbh →∞ nγh This is the positive-part James-Stein (PJS) shrinkage estimator [10], operated on each singular component separately, and this coincides with the upper-bound derived in [15] for arbitrary cah cbh > 0. [sent-280, score-0.591]
94 The MIR effect is observed in its analytic solution when A is integrated out and B is estimated to be the maximizer of the marginal likelihood. [sent-290, score-0.107]
95 lim VB γh = max 0, 1 − Our results fully made use of the assumptions that the likelihood and priors are both spherical Gaussian, the VB posterior is column-wise independent, and there exists no missing entry. [sent-291, score-0.073]
96 They were necessary to solve the free energy minimization problem as a reweighted SVD. [sent-292, score-0.153]
97 An important future work is to obtain the analytic global solution under milder assumptions. [sent-293, score-0.17]
98 This will enable us to handle more challenging problems such as missing entry prediction [23, 20, 6, 13, 18, 22, 12, 25]. [sent-294, score-0.042]
99 Inferring parameters and structure of latent variable models by variational Bayes. [sent-312, score-0.04]
100 Modeling slump flow of concrete using second-order regressions and artificial neural networks. [sent-467, score-0.054]
wordName wordTfidf (topN-words)
[('vb', 0.578), ('bh', 0.49), ('ah', 0.37), ('ch', 0.285), ('vbmf', 0.169), ('mf', 0.145), ('cah', 0.135), ('cbh', 0.135), ('evb', 0.124), ('icm', 0.109), ('evbmf', 0.09), ('quartic', 0.09), ('free', 0.08), ('energy', 0.073), ('lm', 0.073), ('fvb', 0.068), ('global', 0.063), ('analytic', 0.055), ('solution', 0.052), ('fro', 0.049), ('hyperparameter', 0.046), ('singular', 0.041), ('variational', 0.04), ('analytically', 0.038), ('tokyo', 0.037), ('factorization', 0.035), ('mir', 0.034), ('slump', 0.034), ('tomioka', 0.03), ('restarts', 0.03), ('rl', 0.028), ('bayesian', 0.025), ('svd', 0.025), ('cca', 0.024), ('ba', 0.024), ('missing', 0.023), ('sec', 0.023), ('japan', 0.022), ('concrete', 0.02), ('hyperparameters', 0.02), ('matrix', 0.02), ('rrr', 0.02), ('nakajima', 0.02), ('iterated', 0.019), ('entry', 0.019), ('minimizer', 0.019), ('posterior', 0.019), ('priors', 0.019), ('modes', 0.018), ('nm', 0.017), ('volume', 0.017), ('cup', 0.016), ('rennie', 0.016), ('stationary', 0.016), ('penalty', 0.016), ('factorized', 0.016), ('sl', 0.015), ('sugiyama', 0.015), ('fh', 0.015), ('created', 0.015), ('usefulness', 0.015), ('completes', 0.014), ('rb', 0.014), ('im', 0.014), ('springer', 0.014), ('local', 0.014), ('canonical', 0.014), ('ra', 0.013), ('rank', 0.013), ('iterations', 0.013), ('equation', 0.013), ('actively', 0.013), ('sh', 0.013), ('sm', 0.013), ('srebro', 0.013), ('cial', 0.013), ('lim', 0.012), ('optimal', 0.012), ('repository', 0.012), ('advantageous', 0.012), ('corollary', 0.012), ('les', 0.012), ('benchmark', 0.012), ('arti', 0.011), ('matlab', 0.011), ('il', 0.011), ('expressed', 0.011), ('variance', 0.011), ('principal', 0.011), ('target', 0.011), ('iteratively', 0.011), ('collaborative', 0.01), ('highly', 0.01), ('regularized', 0.01), ('numerically', 0.01), ('estimator', 0.01), ('noise', 0.01), ('trial', 0.01), ('respect', 0.01), ('rm', 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.
2 0.19785902 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
3 0.10994416 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
4 0.065957621 188 nips-2010-On Herding and the Perceptron Cycling Theorem
Author: Andrew Gelfand, Yutian Chen, Laurens Maaten, Max Welling
Abstract: The paper develops a connection between traditional perceptron algorithms and recently introduced herding algorithms. It is shown that both algorithms can be viewed as an application of the perceptron cycling theorem. This connection strengthens some herding results and suggests new (supervised) herding algorithms that, like CRFs or discriminative RBMs, make predictions by conditioning on the input attributes. We develop and investigate variants of conditional herding, and show that conditional herding leads to practical algorithms that perform better than or on par with related classifiers such as the voted perceptron and the discriminative RBM. 1
5 0.050587732 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
Author: Dirk Husmeier, Frank Dondelinger, Sophie Lebre
Abstract: Conventional dynamic Bayesian networks (DBNs) are based on the homogeneous Markov assumption, which is too restrictive in many practical applications. Various approaches to relax the homogeneity assumption have recently been proposed, allowing the network structure to change with time. However, unless time series are very long, this flexibility leads to the risk of overfitting and inflated inference uncertainty. In the present paper we investigate three regularization schemes based on inter-segment information sharing, choosing different prior distributions and different coupling schemes between nodes. We apply our method to gene expression time series obtained during the Drosophila life cycle, and compare the predicted segmentation with other state-of-the-art techniques. We conclude our evaluation with an application to synthetic biology, where the objective is to predict a known in vivo regulatory network of five genes in yeast. 1
6 0.049535692 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
7 0.038773444 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
8 0.036136672 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
9 0.033676337 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
10 0.032814451 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates
11 0.027378077 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
12 0.027194357 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
13 0.026718743 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
14 0.026278829 162 nips-2010-Link Discovery using Graph Feature Tracking
15 0.025896616 94 nips-2010-Feature Set Embedding for Incomplete Data
16 0.024653696 283 nips-2010-Variational Inference over Combinatorial Spaces
17 0.024354 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
18 0.023919938 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
19 0.023587452 101 nips-2010-Gaussian sampling by local perturbations
20 0.023223976 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
topicId topicWeight
[(0, 0.075), (1, 0.013), (2, 0.013), (3, 0.026), (4, -0.087), (5, 0.028), (6, 0.082), (7, 0.038), (8, -0.068), (9, -0.003), (10, 0.057), (11, 0.005), (12, 0.068), (13, 0.011), (14, -0.051), (15, 0.011), (16, -0.007), (17, 0.039), (18, 0.027), (19, 0.044), (20, 0.069), (21, -0.022), (22, 0.0), (23, -0.011), (24, -0.002), (25, -0.029), (26, -0.013), (27, -0.024), (28, -0.068), (29, -0.057), (30, 0.041), (31, -0.077), (32, 0.049), (33, -0.025), (34, 0.021), (35, 0.044), (36, -0.017), (37, 0.057), (38, 0.118), (39, 0.0), (40, 0.033), (41, -0.03), (42, 0.133), (43, 0.064), (44, 0.012), (45, -0.081), (46, -0.026), (47, -0.037), (48, -0.095), (49, -0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.90502244 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.
2 0.70630664 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
3 0.63463831 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
4 0.48776248 283 nips-2010-Variational Inference over Combinatorial Spaces
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
5 0.40833294 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions
Author: Bo Thiesson, Chong Wang
Abstract: Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-specific data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is significantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data. 1
6 0.36336565 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
7 0.36294833 287 nips-2010-Worst-Case Linear Discriminant Analysis
8 0.35311598 286 nips-2010-Word Features for Latent Dirichlet Allocation
9 0.34835061 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
10 0.33843547 188 nips-2010-On Herding and the Perceptron Cycling Theorem
11 0.33683103 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates
12 0.33453232 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
13 0.32324445 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
14 0.32265341 284 nips-2010-Variational bounds for mixed-data factor analysis
15 0.31748053 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods
16 0.31149331 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
17 0.30917385 221 nips-2010-Random Projections for $k$-means Clustering
18 0.30548859 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
19 0.30250272 226 nips-2010-Repeated Games against Budgeted Adversaries
20 0.30179766 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
topicId topicWeight
[(13, 0.037), (17, 0.014), (27, 0.05), (30, 0.046), (35, 0.018), (45, 0.124), (50, 0.065), (52, 0.035), (60, 0.016), (74, 0.395), (77, 0.039), (90, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.71642381 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.
2 0.58969742 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu
Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1
3 0.52890068 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
4 0.39000037 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
5 0.38929328 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
Author: Dahua Lin, Eric Grimson, John W. Fisher
Abstract: We present a novel method for constructing dependent Dirichlet processes. The approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models. 1
6 0.38705647 117 nips-2010-Identifying graph-structured activation patterns in networks
7 0.38274565 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
8 0.37888929 148 nips-2010-Learning Networks of Stochastic Differential Equations
9 0.37761673 96 nips-2010-Fractionally Predictive Spiking Neurons
10 0.37667114 155 nips-2010-Learning the context of a category
11 0.37666538 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
12 0.37594166 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
13 0.37576002 265 nips-2010-The LASSO risk: asymptotic results and real world examples
14 0.37554455 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices
15 0.37535876 158 nips-2010-Learning via Gaussian Herding
16 0.37488177 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes
17 0.374749 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
18 0.37473005 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
19 0.37469348 63 nips-2010-Distributed Dual Averaging In Networks
20 0.37412494 7 nips-2010-A Family of Penalty Functions for Structured Sparsity