nips nips2013 nips2013-214 knowledge-graph by maker-knowledge-mining

214 nips-2013-On Algorithms for Sparse Multi-factor NMF

Source: pdf

Author: Siwei Lyu, Xin Wang

Abstract: Nonnegative matrix factorization (NMF) is a popular data analysis method, the objective of which is to approximate a matrix with all nonnegative components into the product of two nonnegative matrices. In this work, we describe a new simple and efﬁcient algorithm for multi-factor nonnegative matrix factorization (mfNMF) problem that generalizes the original NMF problem to more than two factors. Furthermore, we extend the mfNMF algorithm to incorporate a regularizer based on the Dirichlet distribution to encourage the sparsity of the components of the obtained factors. Our sparse mfNMF algorithm affords a closed form and an intuitive interpretation, and is more efﬁcient in comparison with previous works that use ﬁx point iterations. We demonstrate the effectiveness and efﬁciency of our algorithms on both synthetic and real data sets. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Nonnegative matrix factorization (NMF) is a popular data analysis method, the objective of which is to approximate a matrix with all nonnegative components into the product of two nonnegative matrices. [sent-2, score-0.664]

2 In this work, we describe a new simple and efﬁcient algorithm for multi-factor nonnegative matrix factorization (mfNMF) problem that generalizes the original NMF problem to more than two factors. [sent-3, score-0.339]

3 Furthermore, we extend the mfNMF algorithm to incorporate a regularizer based on the Dirichlet distribution to encourage the sparsity of the components of the obtained factors. [sent-4, score-0.209]

4 Our sparse mfNMF algorithm affords a closed form and an intuitive interpretation, and is more efﬁcient in comparison with previous works that use ﬁx point iterations. [sent-5, score-0.142]

5 1 Introduction The goal of nonnegative matrix factorization (NMF) is to approximate a nonnegative matrix V with the product of two nonnegative matrices, as V ≈ W1 W2 . [sent-7, score-0.845]

6 Since the seminal work of [1] that introduces simple and efﬁcient multiplicative update algorithms for solving the NMF problem, it has become a popular data analysis tool for applications where nonnegativity constraints are natural [2]. [sent-8, score-0.108]

7 In this work, we address the multi-factor NMF (mfNMF) problem, where a nonnegative matrix V is K approximated with the product of K ≥ 2 nonnegative matrices, V ≈ k=1 Wk . [sent-9, score-0.521]

8 It has been argued that using more factors in NMF can improve the algorithm’s stability, especially for ill-conditioned and badly scaled data [2]. [sent-10, score-0.079]

9 Introducing multiple factors into the NMF formulation can also ﬁnd practical applications, for instance, extracting hierarchies of topics representing different levels of abstract concepts in document analysis or image representations [2]. [sent-11, score-0.096]

10 Many practical applications also require the obtained nonnegative factors to be sparse, i. [sent-12, score-0.298]

11 Most early works focuses on the matrix 1 norm [6], but it has been pointed out that 1 norm becomes completely toothless for factors that have constant 1 norms, as in the case of stochastic matrices [7, 8]. [sent-15, score-0.326]

12 Our solution to mfNMF seeks optimal factors that minimize the total difK ference between V and k=1 Wk , and it is based on the solution of a special matrix optimization problem that we term as the stochastic matrix sandwich (SMS) problem. [sent-19, score-0.372]

13 We show that the SMS problem affords a simple and efﬁcient algorithm consisting of only multiplicative update and normalization (Lemma 2). [sent-20, score-0.139]

14 The second contribution of this work is a new algorithm to incorporate the Dirichlet sparsity regularizer in mfNMF. [sent-21, score-0.191]

15 Our formulation of the sparse mfNMF problem leads to a new closed-form solution, and the resulting algorithm naturally embeds sparsity control into the mfNMF algorithm without iteration (Lemma 3). [sent-22, score-0.149]

16 We further show that the update steps of our sparse 1 mfNMF algorithm afford a simple and intuitive interpretation. [sent-23, score-0.148]

17 We demonstrate the effectiveness and efﬁciency of our sparse mfNMF algorithm on both synthetic and real data. [sent-24, score-0.088]

18 2 Related Works Most exiting works generalizing the original NMF problem to more than two factors are based on the multi-layer NMF formulation, in which we solve a sequence of two factor NMF problems, as V ≈ W1 H1 , H1 ≈ W2 H2 , · · · , and HK−1 ≈ WK−1 WK [3–5]. [sent-25, score-0.12]

19 Though simple and efﬁcient, such greedy algorithms are not associated with a consistent objective function involving all factors simultaneously. [sent-26, score-0.132]

20 Because of this, the obtained factors may be suboptimal measured by the difference between the target matrix V and their product. [sent-27, score-0.157]

21 On the other hand, there exist much fewer works directly solving the mfNMF problem, one example is a multiplicative algorithm based on the general Bregmann divergence [9]. [sent-28, score-0.173]

22 In this work, we focus on the generalized Kulback-Leibler divergence (a special case of the Bregmann divergence), and use its decomposing property to simplify the mfNMF objective and remove the undesirable multitude of equivalent solutions in the general formulation. [sent-29, score-0.142]

23 As a common means to encouraging sparsity in machine learning, the 1 norm has been incorporated into the NMF objective function [6] as a sparsity regularizer. [sent-31, score-0.178]

24 However, the 1 norm may be less effective for nonnegative matrices, for which it reduces to the sum of all elements and can be decreased trivially by scaling down all factors without affecting the number of zero components. [sent-32, score-0.36]

25 Furthermore, the 1 norm becomes completely toothless in cases when the nonnegative factors are constrained to have constant column or row sums (as in the case of stochastic matrices). [sent-33, score-0.454]

26 An alternative solution is to use the Shannon entropy of each column of the matrix factor as sparsity regularizer [7], since a vector with unit 1 norm and low entropy implies that only a few of its components are signiﬁcant. [sent-34, score-0.348]

27 However, the entropic prior based regularizer does not afford a closed form solution, and an iterative ﬁxed point algorithm is described based on the Lamber’s W-function in [7]. [sent-35, score-0.199]

28 Another regularizer is based on the symmetric Dirichlet distribution with concentration parameter α < 1, as such a model allocates more probability weights to sparse vectors on a probability simplex [8, 10, 11]. [sent-36, score-0.14]

29 However, using the Dirichlet regularizer has a practical problem, as it can become unbounded when there is a zero element in the factor. [sent-37, score-0.129]

30 The other uses the psuedo-Dirichlet regularizer [8], which is a bounded perturbation of the original Dirichlet model. [sent-41, score-0.097]

31 Furthermore, the effects of the updating steps on the sparsity of the resulting factors are obscured by the iterative steps. [sent-43, score-0.203]

32 In contrast, our sparse mfNMF algorithm uses the original Dirichlet model and does not require extra ﬁx point iteration. [sent-44, score-0.073]

33 More interestingly, the update steps of our sparse mfNMF algorithm afford a simple and intuitive interpretation. [sent-45, score-0.148]

34 3 Basic Deﬁnitions We denote 1m as the all-one column vector of dimension m and Im as the m-dimensional identity matrix, and use V ≥ 0 or v ≥ 0 to indicate that all elements of a matrix V or a vector v are nonnegative. [sent-46, score-0.135]

35 An m × n nonnegative matrix V is stochastic if V T 1m = 1n , i. [sent-48, score-0.328]

36 Also, for stochastic matrices W1 and W2 , their product W1 W2 is also stochastic. [sent-51, score-0.119]

37 Furthermore, an m × n nonnegative matrix V can be uniquely represented as V = SD, with an n × n nonnegative diagonal matrix D = diag(V T 1m ) and an m × n stochastic matrix S = V D−1 . [sent-52, score-0.691]

38 For nonnegative matrices V and W , their generalized Kulback-Leibler (KL) divergence [1] is deﬁned as m n d(V, W ) = Vij log i=1 j=1 2 Vij − Vij + Wij . [sent-53, score-0.397]

39 Speciﬁcally, given an m × n nonnegative matrix V , we seek K ≥ 2 matrices Wk of dimensions K lk−1 × lk for k = 1, · · · , K, with l0 = m and lK = n that minimize d(V, k=1 Wk ), s. [sent-59, score-0.347]

40 Using the property of nonnegative matrices, we represent the last nonnegative factor WK as the product of a stochastic matrix XK K and a nonnegative diagonal matrix D(W ) . [sent-65, score-0.899]

41 As such, we represent the nonnegative matrix k=1 Wk K as the product of a stochastic matrix S (W ) = k=1 Xk and a diagonal matrix D(W ) . [sent-66, score-0.496]

42 Similarly, we also decompose the target nonnegative matrix V as the product of a stochastic matrix S (V ) and a nonnegative diagonal matrix D(V ) . [sent-67, score-0.734]

43 It is not difﬁcult to see that any solution from this more constrained formulation leads to a solution to the original problem and vice versa. [sent-68, score-0.087]

44 The ﬁrst sub-problem solves for the diagonal matrix D(W ) , as: min d D(V ) , D(W ) , s. [sent-74, score-0.085]

45 The second sub-problem optimizes the K stochastic factors, X1 , · · · , XK , which, after dropping irrelevant constants and rearranging terms, becomes m n max X1 ,··· ,XK K Vij log i=1 j=1 Xk k=1 ij T , s. [sent-78, score-0.193]

46 (3) is essentially the maximization of the similarity between the stochastic part of V , S V with the stochastic matrix formed as the product of the K stochastic matrices X1 , · · · , XK , weighted by DV . [sent-82, score-0.278]

47 1 Stochastic Matrix Sandwich Problem Before describing the algorithm solving (3), we ﬁrst derive the solution to another related problem that we term as the stochastic matrix sandwich (SMS) problem, from which we can construct a solution to (3). [sent-84, score-0.23]

48 Speciﬁcally, in an SMS problem one minimizes the following objective function with regards to an m × n stochastic matrix X, as 1 In computing the generalized KL divergence, we deﬁne 0 log 0 = 0 and 3 0 0 = 0. [sent-85, score-0.206]

49 X T 1m = 1n , X ≥ 0, (4) where A and B are two known stochastic matrices of dimensions m × m and n × n, respectively, and C is an m × n nonnegative matrix. [sent-88, score-0.314]

50 However, we present here a new simple solution based on multiplicative updates and normalization, which completely obviates control parameters such as the step sizes. [sent-90, score-0.102]

51 Let us deﬁne m n ˜ Xkl Aik Xkl Blj ˜ ˜ log AXB , Fij (X; X) = ˜ ij ˜ Xkl AXB k=1 l=1 ij ˜ then we have Fij (X; X) ≤ log (AXB)ij and Fij (X; X) = log (AXB)ij . [sent-93, score-0.203]

52 The global optimal solution to the following optimization problem, m n max X is given by k=1 l=1 Mkl log Xkl , s. [sent-101, score-0.083]

53 Xkl = Next, we can construct a coordinate-wise optimization solution to the mfNMF problem (3) that iteratively optimizes each Xk while ﬁxing the others, based on the solution to the SMS problem given in Lemma 2. [sent-105, score-0.089]

54 In practice, we do not need to run each SMS optimization to converge, and the algorithm can be implemented with a few ﬁxed steps updating each factor in order. [sent-107, score-0.076]

55 4 5 Sparse Multi-Factor NMF Next, we describe incorporating sparsity regularization in the mfNMF formulation. [sent-109, score-0.076]

56 We assume that the sparsity requirement is applied to each individual factor in the mfNMF objective function (3), as K max Vij log X1 ,··· ,XK i,j K Xk k=1 + ij k=1 T (Xk ), s. [sent-110, score-0.208]

57 Xk 1lk−1 = 1lk , Xk ≥ 0, (9) where (X) is the sparsity regularizer that is larger for stochastic matrix X with more zero entries. [sent-112, score-0.281]

58 As the overall mfNMF can be solved by optimizing each individual factor in an SMS problem, we focus here on the case where the sparsity regularizer of each factor is introduced in (4), to solve max X 5. [sent-113, score-0.204]

59 (10) Dirichlet Sparsity Regularizer As we have mentioned, the typical choice of (·) as the matrix 1 norm is problematic in (10), as X 1 = ij Xij = n is a constant. [sent-117, score-0.155]

60 On the other hand, if we treat each column of X as a point on a probability simplex, as their elements are nonnegative and sum to one, then we can induce a sparse regularizer from the Dirichlet distribution. [sent-118, score-0.435]

61 The parameter α ∈ [0, 1] is the parameter that controls the sparsity of samples – smaller α corresponds to higher likelihood of sparse v in Dir(v; α). [sent-120, score-0.102]

62 Incorporating a Dirichlet regularizer with parameter αl to each column of X and dropping irrelevant constant terms, (10) reduces to3 m m n X n Cij log (AXB)ij + max i=1 j=1 k=1 l=1 (αl − 1) log Xkl , s. [sent-121, score-0.24]

63 (12) Solution to SMS with Dirichlet Sparse Regularizer However, a direct optimization of (12) is problematic when αl < 1: if there exists Mkl < 1 − αl , the objective function of (12) becomes non-convex and unbounded – the term (Mkl + αl − 1) log Xkl approaches ∞ as Xkl → 0. [sent-130, score-0.118]

64 This problem is addressed in [8] by modifying the deﬁnition of the Dirichlet regularizer in (11) to (αl − 1) log(Xkl + ), where > 0 is a predeﬁned parameter. [sent-131, score-0.097]

65 In addition, the ﬁx point algorithm is difﬁcult to interpret as its effect on the sparsity of the obtained factors is obscured by the iterative steps. [sent-133, score-0.2]

66 Therefore, we can simply modify the optimization of (12) the objective function to become inﬁnity, as: m n max X k=1 l=1 (Mkl + αl − 1) log Xkl , s. [sent-135, score-0.086]

67 (13) The following result shows that with a sufﬁciently small , the constrained optimization problem in (13) has a unique global optimal solution that affords a closed-form and intuitive interpretation. [sent-138, score-0.121]

68 If we choose a constant {|Mkl +αl −1|} ∈ 0, mminklkl {|Mkl +αl −1|} , and for each column l deﬁne Nl = {k|Mkl < 1 − αl } as the set of max elements with Mkl + αl − 1 < 0, then the following is the global optimal solution to (13): • case 1. [sent-144, score-0.111]

69 In the ﬁrst set of experiments, we study empirically the convergence of the multiplicative algorithm for the SMS problem (Lemma 2). [sent-177, score-0.082]

70 So we can technically ignore such elements for each column index l. [sent-181, score-0.076]

71 6 0 10 1 10 2 3 10 10 4 10 5 10 m=90,n=50 m’=20,n’=5 pgd mult 0 10 itr # 1 10 2 3 10 10 4 10 5 10 obj. [sent-186, score-0.197]

72 (m, n, m , n ), we randomly generate stochastic matrices A (m × m ) and B (n × n), and nonnegative matrix C (m × n). [sent-191, score-0.373]

73 We compare our algorithm with a projected gradient ascent optimization of the SMS problem, which updates X using the gradient of the SMS objective function and chooses a step size to satisfy the nonnegative and normalization constraints. [sent-193, score-0.291]

74 We do not consider methods that use the Hessian matrix of the objective function, as constructing a general Hessian matrix in this case have prohibitive memory requirement even for a medium sized problem. [sent-194, score-0.156]

75 In the second set of experiments, we evaluate the performance of the coordinate-update mfNMF algorithm based on the multiplicative updating algorithm of the SMS problem (Section 4. [sent-200, score-0.115]

76 Specifically, we consider the mfNMF problem that approximates a randomly generated target nonnegative matrix V of dimension m×n with the product of three stochastic factors, W1 (m×m ), W2 (m ×n ), and W3 (n × n). [sent-202, score-0.371]

77 The performance of the algorithm is evaluated by the logarithm of the generalized KL divergence for between V and W1 W2 W3 , of which lower numerical values suggest better performances. [sent-203, score-0.142]

78 As a comparison, we also implemented a multi-layer NMF algorithm [5], which solves ˜ ˜ two NMF problems in sequence, as: V ≈ W1 V and V ≈ W2 W3 , and the multiplicative update algorithm of mfNMF of [9], both of which are based on the generalized KL divergence. [sent-204, score-0.145]

79 The values correspond to the logarithm of generalized KL divergence, log d(V, W1 W2 W3 ). [sent-219, score-0.082]

80 The results of several runs of these algorithms for different problem sizes are summarized in Table 1, which show that in general, mfNMF algorithms lead to better solutions corresponding to lower generalized KL divergences between the target matrix and the product of the three estimated factors. [sent-221, score-0.149]

81 This is likely due to the fact that these algorithms optimize the generalized KL divergence directly, while multi-layer NMF is a greedy algorithm with sub-optimal solutions. [sent-222, score-0.119]

82 We think the improved performance and running efﬁciency are due to our formulation of the mfNMF problem based on stochastic matrices, which reduces the solution space and encourage convergence to a better local minimum of the objective function. [sent-224, score-0.158]

83 We apply the sparse mfNMF algorithm to data converted from grayscale images from the MNIST Handwritten Digits data set [17] that are vectorized to column vectors and normalized to have total sum of one. [sent-225, score-0.156]

84 All vectorized and normalized images are collected to form the target stochastic matrix V , which are decomposed into the product of three factors W1 W2 W3 . [sent-226, score-0.277]

85 We also incorporate the Dirichlet sparsity regularizers with different conﬁgurations. [sent-227, score-0.095]

86 3 are the decomposition results corresponding to 500 vectorized 20 × 20 images of handwritten digit 3, that are decomposed into three factors of size 400 × 196, 7 W1 W2 W 1 W2 W3 W1 W2 W3 Figure 3: Sparse mfNMF algorithm on the handwritten digit images. [sent-231, score-0.203]

87 The columns of the factors are reshaped to shown as images, where the brightness of each pixel in the ﬁgure is proportional to the nonnegative values in the corresponding factors. [sent-238, score-0.298]

88 All three factorization results can reconstruct the target matrix (last column), but they put different constraints on the obtained factors. [sent-240, score-0.124]

89 The factors are also visually meaningful: factor W1 contains low level components of the images that when combined with factor W2 forms more complex structures. [sent-241, score-0.146]

90 The two rows below correspond to the cases when the Dirichlet sparsity regularizer is applied to the third factor and to the second and third factor simultaneously. [sent-243, score-0.204]

91 Compare with the corresponding results in the non-sparse case, the obtained factors contain more zeros. [sent-244, score-0.079]

92 As a comparison, we also implement mfNMF algorithm using a pseudo-Dirichlet sparse regularizer [8]. [sent-245, score-0.155]

93 7 Conclusion We describe in this work a simple and efﬁcient algorithm for the sparse multi-factor nonnegative matrix factorization (mfNMF) problem, involving only multiplicative update and normalization. [sent-247, score-0.482]

94 Our solution to incorporate Dirichlet sparse regularizer leads to a closed form solution and the resulting algorithm is more efﬁcient than previous works based on ﬁx point iterations. [sent-248, score-0.262]

95 First, we are studying if similar multiplicative update algorithm also exists for mfNMF with more general similarity norms such as Csizar’s divergence [18], Itakura-Saito divergence, [19], α-β divergence [20] or the Bregmann divergence [9]. [sent-251, score-0.322]

96 , value ranges) over the factors into the mfNMF algorithm. [sent-254, score-0.079]

97 Last, we would like to further study applications of mfNMF in problems such as co-clustering or hierarchical document topic analysis, exploiting its ability to recover hierarchical decomposition of nonnegative matrices. [sent-255, score-0.256]

98 Sparse topic modeling with concave-convex procedure: EMish algorithm for latent dirichlet allocation. [sent-294, score-0.122]

99 Csiszar’s divergences for non-negative matrix factorization: Family of new algorithms. [sent-330, score-0.076]

100 Nonnegative matrix factorization with the itakurae e saito divergence: With application to music analysis. [sent-334, score-0.105]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mfnmf', 0.587), ('sms', 0.333), ('mkl', 0.322), ('xkl', 0.294), ('nmf', 0.246), ('nonnegative', 0.219), ('xk', 0.176), ('axb', 0.175), ('regularizer', 0.097), ('dirichlet', 0.091), ('cij', 0.085), ('vij', 0.084), ('pgd', 0.079), ('factors', 0.079), ('wk', 0.079), ('nl', 0.077), ('fij', 0.074), ('divergence', 0.074), ('multiplicative', 0.067), ('itr', 0.063), ('matrix', 0.059), ('sparsity', 0.059), ('ij', 0.058), ('fun', 0.056), ('mult', 0.055), ('kl', 0.052), ('column', 0.052), ('stochastic', 0.05), ('cichocki', 0.049), ('bregmann', 0.048), ('factorization', 0.046), ('matrices', 0.045), ('lemma', 0.045), ('afford', 0.044), ('sparse', 0.043), ('zdunek', 0.042), ('mk', 0.041), ('affords', 0.039), ('objective', 0.038), ('sandwich', 0.036), ('solution', 0.035), ('albany', 0.032), ('seungjin', 0.032), ('toothless', 0.032), ('generalized', 0.03), ('log', 0.029), ('intuitive', 0.028), ('obscured', 0.028), ('rafal', 0.028), ('cients', 0.027), ('vectorized', 0.027), ('andrzej', 0.026), ('diagonal', 0.026), ('elements', 0.024), ('entropic', 0.024), ('product', 0.024), ('factor', 0.024), ('lk', 0.024), ('xt', 0.023), ('nonnegativity', 0.023), ('rearranging', 0.023), ('logarithm', 0.023), ('argmax', 0.023), ('coef', 0.022), ('norm', 0.022), ('decomposition', 0.021), ('handwritten', 0.021), ('axt', 0.021), ('sij', 0.021), ('incorporate', 0.02), ('target', 0.019), ('images', 0.019), ('optimization', 0.019), ('iterative', 0.019), ('choi', 0.019), ('updating', 0.018), ('signs', 0.018), ('update', 0.018), ('encourage', 0.018), ('incorporating', 0.017), ('dir', 0.017), ('blind', 0.017), ('works', 0.017), ('irrelevant', 0.017), ('divergences', 0.017), ('formulation', 0.017), ('regularizers', 0.016), ('im', 0.016), ('problematic', 0.016), ('unbounded', 0.016), ('topic', 0.016), ('zero', 0.016), ('dropping', 0.016), ('involving', 0.015), ('ciency', 0.015), ('algorithm', 0.015), ('synthetic', 0.015), ('effectiveness', 0.015), ('extra', 0.015), ('hessian', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 214 nips-2013-On Algorithms for Sparse Multi-factor NMF

Author: Siwei Lyu, Xin Wang

2 0.1160751 318 nips-2013-Structured Learning via Logistic Regression

Author: Justin Domke

Abstract: A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for ﬁxed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an “oracle” exists to minimize a logistic loss.

3 0.075448789 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

Author: Marius Pachitariu, Adam M. Packer, Noah Pettit, Henry Dalgleish, Michael Hausser, Maneesh Sahani

Abstract: Biological tissue is often composed of cells with similar morphologies replicated throughout large volumes and many biological applications rely on the accurate identiﬁcation of these cells and their locations from image data. Here we develop a generative model that captures the regularities present in images composed of repeating elements of a few different types. Formally, the model can be described as convolutional sparse block coding. For inference we use a variant of convolutional matching pursuit adapted to block-based representations. We extend the KSVD learning algorithm to subspaces by retaining several principal vectors from the SVD decomposition instead of just one. Good models with little cross-talk between subspaces can be obtained by learning the blocks incrementally. We perform extensive experiments on simulated images and the inference algorithm consistently recovers a large proportion of the cells with a small number of false positives. We ﬁt the convolutional model to noisy GCaMP6 two-photon images of spiking neurons and to Nissl-stained slices of cortical tissue and show that it recovers cell body locations without supervision. The ﬂexibility of the block-based representation is reﬂected in the variability of the recovered cell shapes. 1

4 0.072489068 186 nips-2013-Matrix factorization with binary components

Author: Martin Slawski, Matthias Hein, Pavlo Lutsik

Abstract: Motivated by an application in computational biology, we consider low-rank matrix factorization with {0, 1}-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size 2m·r , where m is the dimension of the data points and r the rank of the factorization. Despite apparent intractability, we provide − in the line of recent work on non-negative matrix factorization by Arora et al. (2012)− an algorithm that provably recovers the underlying factorization in the exact case with O(mr2r + mnr + r2 n) operations for n datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.

5 0.069549605 193 nips-2013-Mixed Optimization for Smooth Functions

Author: Mehrdad Mahdavi, Lijun Zhang, Rong Jin

Abstract: It is well known that the optimal convergence rate for stochastic optimization of √ smooth functions is O(1/ T ), which is same as stochastic optimization of Lipschitz continuous convex functions. This is in contrast to optimizing smooth functions using full gradients, which yields a convergence rate of O(1/T 2 ). In this work, we consider a new setup for optimizing smooth functions, termed as Mixed Optimization, which allows to access both a stochastic oracle and a full gradient oracle. Our goal is to signiﬁcantly improve the convergence rate of stochastic optimization of smooth functions by having an additional small number of accesses to the full gradient oracle. We show that, with an O(ln T ) calls to the full gradient oracle and an O(T ) calls to the stochastic oracle, the proposed mixed optimization algorithm is able to achieve an optimization error of O(1/T ). 1

6 0.067979693 156 nips-2013-Learning Kernels Using Local Rademacher Complexity

7 0.065469846 202 nips-2013-Multiclass Total Variation Clustering

8 0.064036652 257 nips-2013-Projected Natural Actor-Critic

9 0.063787639 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

10 0.062483173 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing

11 0.061513808 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

12 0.051481623 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

13 0.047800116 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

14 0.047387674 157 nips-2013-Learning Multi-level Sparse Representations

15 0.04643704 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

16 0.044843782 146 nips-2013-Large Scale Distributed Sparse Precision Estimation

17 0.044783249 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

18 0.043427598 11 nips-2013-A New Convex Relaxation for Tensor Completion

19 0.042483345 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

20 0.040179234 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.125), (1, 0.038), (2, 0.023), (3, 0.042), (4, -0.015), (5, 0.017), (6, -0.039), (7, 0.013), (8, 0.048), (9, -0.01), (10, -0.004), (11, 0.056), (12, 0.009), (13, -0.035), (14, -0.025), (15, 0.015), (16, -0.024), (17, -0.033), (18, -0.015), (19, 0.025), (20, -0.036), (21, 0.019), (22, 0.011), (23, -0.011), (24, -0.029), (25, -0.061), (26, 0.019), (27, -0.031), (28, 0.022), (29, 0.001), (30, -0.08), (31, 0.106), (32, 0.018), (33, 0.097), (34, -0.026), (35, -0.019), (36, -0.005), (37, -0.027), (38, 0.059), (39, 0.015), (40, -0.015), (41, 0.032), (42, 0.014), (43, 0.021), (44, -0.008), (45, -0.031), (46, -0.03), (47, 0.013), (48, 0.06), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85836816 214 nips-2013-On Algorithms for Sparse Multi-factor NMF

Author: Siwei Lyu, Xin Wang

2 0.64426714 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators

Author: Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro

Abstract: In this paper, we propose a new computationally efﬁcient framework for learning sparse models. We formulate a uniﬁed approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a speciﬁc task, e.g., reconstruction or classiﬁcation. By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing a form of taskspeciﬁc convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications. 1

3 0.63564509 202 nips-2013-Multiclass Total Variation Clustering

Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht

Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. 1

4 0.60833299 157 nips-2013-Learning Multi-level Sparse Representations

Author: Ferran Diego Andilla, Fred A. Hamprecht

Abstract: Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reﬂected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel → neuron → assembly that should ﬁnd their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the ﬁrst formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difﬁcult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of real-world Calcium imaging data. 1

5 0.5869137 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty

Author: Haichao Zhang, David Wipf

Abstract: Typical blur from camera shake often deviates from the standard uniform convolutional assumption, in part because of problematic rotations which create greater blurring away from some unknown center point. Consequently, successful blind deconvolution for removing shake artifacts requires the estimation of a spatiallyvarying or non-uniform blur operator. Using ideas from Bayesian inference and convex analysis, this paper derives a simple non-uniform blind deblurring algorithm with a spatially-adaptive image penalty. Through an implicit normalization process, this penalty automatically adjust its shape based on the estimated degree of local blur and image structure such that regions with large blur or few prominent edges are discounted. Remaining regions with modest blur and revealing edges therefore dominate on average without explicitly incorporating structureselection heuristics. The algorithm can be implemented using an optimization strategy that is virtually tuning-parameter free and simpler than existing methods, and likely can be applied in other settings such as dictionary learning. Detailed theoretical analysis and empirical comparisons on real images serve as validation.

6 0.58024418 186 nips-2013-Matrix factorization with binary components

7 0.56991619 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

8 0.56990254 318 nips-2013-Structured Learning via Logistic Regression

9 0.56826514 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

10 0.56721753 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

11 0.56338245 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

12 0.55865425 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

13 0.55456758 350 nips-2013-Wavelets on Graphs via Deep Learning

14 0.53784096 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis

15 0.53183639 249 nips-2013-Polar Operators for Structured Sparse Estimation

16 0.53106672 73 nips-2013-Convex Relaxations for Permutation Problems

17 0.50756407 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

18 0.50405359 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

19 0.49323341 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

20 0.49162093 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.014), (16, 0.036), (33, 0.122), (34, 0.096), (41, 0.055), (49, 0.025), (56, 0.087), (65, 0.304), (70, 0.039), (85, 0.056), (89, 0.026), (93, 0.041), (95, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77745193 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations

Author: Botond Cseke, Manfred Opper, Guido Sanguinetti

Abstract: We propose an approximate inference algorithm for continuous time Gaussian Markov process models with both discrete and continuous time likelihoods. We show that the continuous time limit of the expectation propagation algorithm exists and results in a hybrid ﬁxed point iteration consisting of (1) expectation propagation updates for discrete time terms and (2) variational updates for the continuous time term. We introduce postinference corrections methods that improve on the marginals of the approximation. This approach extends the classical Kalman-Bucy smoothing procedure to non-Gaussian observations, enabling continuous-time inference in a variety of models, including spiking neuronal models (state-space models with point process observations) and box likelihood models. Experimental results on real and simulated data demonstrate high distributional accuracy and signiﬁcant computational savings compared to discrete-time approaches in a neural application. 1

2 0.71929115 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning

Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky

Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects deﬁned as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1

same-paper 3 0.6961118 214 nips-2013-On Algorithms for Sparse Multi-factor NMF

Author: Siwei Lyu, Xin Wang

4 0.60174102 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors

Author: Nitish Srivastava, Ruslan Salakhutdinov

Abstract: High capacity classiﬁers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classiﬁcation performance for such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classiﬁer’s parameters. We show that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require large amounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-speciﬁed tree, this algorithm modiﬁes the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classiﬁcation task. Our method achieves state-of-the-art classiﬁcation results on the CIFAR-100 image data set and the MIR Flickr image-text data set. 1

5 0.5448432 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

Author: Cristina Savin, Peter Dayan, Mate Lengyel

Abstract: It has long been recognised that statistical dependencies in neuronal activity need to be taken into account when decoding stimuli encoded in a neural population. Less studied, though equally pernicious, is the need to take account of dependencies between synaptic weights when decoding patterns previously encoded in an auto-associative memory. We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. We derive optimal network dynamics for recall in the face of synaptic correlations caused by a range of synaptic plasticity rules. These dynamics involve well-studied circuit motifs, such as forms of feedback inhibition and experimentally observed dendritic nonlinearities. We therefore show how addressing the problem of synaptic correlations leads to a novel functional account of key biophysical features of the neural substrate. 1

6 0.54223967 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

7 0.54157633 350 nips-2013-Wavelets on Graphs via Deep Learning

8 0.5412178 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

9 0.54069203 173 nips-2013-Least Informative Dimensions

10 0.54018289 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

11 0.54017264 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

12 0.54013449 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing

13 0.5394485 201 nips-2013-Multi-Task Bayesian Optimization

14 0.53908885 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

15 0.53882986 11 nips-2013-A New Convex Relaxation for Tensor Completion

16 0.53807729 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

17 0.53726506 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

18 0.53634566 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

19 0.53560072 5 nips-2013-A Deep Architecture for Matching Short Texts

20 0.53552145 318 nips-2013-Structured Learning via Logistic Regression