nips nips2012 nips2012-125 knowledge-graph by maker-knowledge-mining

125 nips-2012-Factoring nonnegative matrices with linear programs

Source: pdf

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identiﬁes a matrix C that satisﬁes X ≈ CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to ﬁnd a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efﬁcient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Factoring nonnegative matrices with linear programs Victor Bittorf bittorf@cs. [sent-1, score-0.224]

2 edu Abstract This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). [sent-10, score-0.355]

3 The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. [sent-11, score-0.275]

4 More precisely, given a data matrix X, the algorithm identiﬁes a matrix C that satisﬁes X ≈ CX and some linear constraints. [sent-12, score-0.134]

5 The constraints are chosen to ensure that the matrix C selects features; these features can then be used to ﬁnd a low-rank NMF of X. [sent-13, score-0.087]

6 1 Introduction Nonnegative matrix factorization (NMF) is a popular approach for selecting features in data [16–18, 23]. [sent-19, score-0.301]

7 As a consequence, we must instate additional assumptions on the data if we hope to compute nonnegative matrix factorizations in practice. [sent-24, score-0.355]

8 Our ﬁrst contribution is a new formulation of the nonnegative feature selection problem that only requires the solution of a single linear program. [sent-31, score-0.224]

9 Our formulation of NMF uses a data-driven modeling approach to simplify the factorization problem. [sent-43, score-0.214]

10 More precisely, we search for a small collection of rows from the data matrix that can be used to express the other rows. [sent-44, score-0.199]

11 This type of approach appears in a number of other factorization problems, including rank-revealing QR [15], interpolative decomposition [20], subspace clustering [10, 24], dictionary learning [11], and others. [sent-45, score-0.272]

12 For a matrix M and indices i and j, we write Mi· for the ith row of M and M·j for the jth column of M . [sent-48, score-0.091]

13 Let Y be a nonnegative f × n data matrix with columns indexing examples and rows indexing features. [sent-50, score-0.451]

14 Exact NMF seeks a factorization Y = F W where the feature matrix F is f × r, where the weight matrix W is r × n, and both factors are nonnegative. [sent-51, score-0.348]

15 Unless stated otherwise, we assume that each row of the data matrix Y is normalized so it sums to one. [sent-53, score-0.091]

16 Vavasis showed that it is NP-complete to decide whether a matrix admits a rank-r nonnegative factorization [27]. [sent-56, score-0.525]

17 AGKM exhibited an algorithm that can produce a nonnegative matrix factorization under a weaker sufﬁcient condition. [sent-61, score-0.505]

18 , vr } ⊂ Rd is simplicial if no vector vi lies in the convex hull of {vj : j = i}. [sent-67, score-0.227]

19 The set of vectors is α-robust simplicial if, for each i, the 1 distance from vi to the convex hull of {vj : j = i} is at least α. [sent-68, score-0.227]

20 Indeed, we can ﬁnd an NMF of Y efﬁciently if Y contains a set of r rows that is simplicial and whose convex hull contains the remaining rows. [sent-71, score-0.356]

21 2 An NMF Y = F W is called separable if the rows of W are simplicial and there is a permutation matrix Π such that Ir ΠF = . [sent-73, score-0.456]

22 f do Find the set Nk of rows that are at least 5 /α + 2 away from Xk· . [sent-79, score-0.11]

23 end for Cluster the rows in R as follows: j and k are in the same cluster if Djk ≤ 10 /α + 6 . [sent-82, score-0.11]

24 F = arg minZ∈Rf ×r X − ZW ∞,1 d1 2 2 d2 3 Figure 1: Numbered circles are hott topics. [sent-84, score-0.608]

25 Their convex hull (orange) contains the other topics (small circles), so the data admits a separable NMF. [sent-85, score-0.319]

26 The arrow d1 marks the 1 distance from hott topic (1) to the convex hull of the other two hott topics; deﬁnitions of d2 and d3 are similar. [sent-86, score-1.288]

27 The hott topics are α-robustly simplicial when each di ≥ α. [sent-87, score-0.866]

28 To compute a separable factorization of Y , we must ﬁrst identify a simplicial set of rows from Y . [sent-88, score-0.603]

29 Afterward, we compute weights that express the remaining rows as convex combinations of this distinguished set. [sent-89, score-0.175]

30 We call the simplicial rows hott and the corresponding features hott topics. [sent-90, score-1.501]

31 This model allows us to express all the features for a particular instance if we know the values of the instance at the simplicial rows. [sent-91, score-0.197]

32 While a nonnegative matrix one encounters in practice might not admit a separable factorization, it may be well-approximated by a nonnnegative matrix with separable factorization. [sent-96, score-0.606]

33 AGKM derived an algorithm for nonnegative matrix factorization of a matrix that is well-approximated by a separable factorization. [sent-97, score-0.696]

34 3 (AGKM [4]) Let and α be nonnegative constants satisfying ≤ 20+13α . [sent-100, score-0.224]

35 Assume X = Y + ∆ where Y is a nonnegative matrix whose rows have unit 1 norm, where Y = F W is a rank-r separable factorization in which the rows of W are α-robust simplicial, and where ∆ ∞,1 ≤ . [sent-102, score-0.849]

36 Then Algorithm 1 ﬁnds a rank-r nonnegative ˆ ˆ ˆ ˆ factorization F W that satisﬁes the error bound X − F W ≤ 10 /α + 7 . [sent-103, score-0.463]

37 ∞,1 In particular, the AGKM algorithm computes the factorization exactly when = 0. [sent-104, score-0.214]

38 It may be possible to calculate , but we can only estimate α if we know which rows are hott. [sent-107, score-0.11]

39 Second, the algorithm computes all 1 distances between rows at a cost of O(f 2 n). [sent-108, score-0.11]

40 Third, for every row in the matrix, we must determine its distance to the convex hull of the rows that lie at a sufﬁcient distance; this step requires us to solve a linear program for each row of the matrix at a cost of Ω(f n). [sent-109, score-0.297]

41 3 Main Theoretical Results: NMF by Linear Programming This paper shows that we can factor an approximately separable nonnegative matrix by solving a linear program. [sent-113, score-0.415]

42 3 d3 3 Algorithm 2 Separable Nonnegative Matrix Factorization by Linear Programming Require: An f × n nonnegative matrix Y with a rank-r separable NMF. [sent-115, score-0.415]

43 Ensure: An f × r matrix F and r × n matrix W with F ≥ 0, W ≥ 0, and Y = F W . [sent-116, score-0.134]

44 Here is the key observation: Suppose that Y is any f × n nonnegative matrix that admits a rank-r separable factorization Y = F W . [sent-119, score-0.649]

45 If we pad F with zeros to form an f × f matrix, we have Y = ΠT Ir M 0 0 ΠY =: CY We call the matrix C factorization localizing. [sent-120, score-0.281]

46 Note that any factorization localizing matrix C is an element of the polyhedral set Φ(Y ) := {C ≥ 0 : CY = Y , Tr(C) = r, Cjj ≤ 1 ∀j, Cij ≤ Cjj ∀i, j}. [sent-121, score-0.358]

47 Once we have such a C, we construct W by extracting the rows of X that correspond to the indices i where Cii = 1. [sent-124, score-0.13]

48 We construct the feature matrix F by extracting the nonzero columns of C. [sent-125, score-0.087]

49 1 Suppose Y is a nonnegative matrix with a rank-r separable factorization Y = F W . [sent-129, score-0.629]

50 Then Algorithm 2 constructs a rank-r nonnegative matrix factorization of Y . [sent-130, score-0.505]

51 As the theorem suggests, we can isolate the rows of Y that yield a simplicial factorization by solving a single linear program. [sent-131, score-0.479]

52 1 Robustness to Noise Suppose we observe a nonnegative matrix X whose rows sum to one. [sent-134, score-0.401]

53 Assume that X = Y + ∆ where Y is a nonnegative matrix whose rows sum to one, which has a rank-r separable factorization Y = F W such that the rows of W are α-robust simplicial, and where ∆ ∞,1 ≤ . [sent-135, score-0.849]

54 Deﬁne the polyhedral set Φτ (X) := C ≥ 0 : CX − X ∞,1 ≤ τ, Tr(C) = r, Cjj ≤ 1 ∀j, Cij ≤ Cjj ∀i, j The set Φ(X) consists of matrices C that approximately locate a factorization of X. [sent-136, score-0.237]

55 Furthermore, assume that for every row Yj,· that is not hott, we have the margin constraint Yj,· −Yi,· ≥ d0 ˆ ˆ for all hott rows i. [sent-140, score-0.766]

56 Then we can ﬁnd a nonnegative factorization satisfying X − F W ≤2 ∞,1 2 provided that < min{αd0 ,α } . [sent-141, score-0.438]

57 Furthermore, this factorization correctly identiﬁes the hott topics 9(r+1) appearing in the separable factorization of Y . [sent-142, score-1.263]

58 (2) Our robustness result requires a margin-type constraint assuming that the original conﬁguration consists either of duplicate hott topics, or topics that are reasonably far away from the hott topics. [sent-147, score-1.384]

59 4 Algorithm 3 Approximably Separable Nonnegative Matrix Factorization by Linear Programming Require: An f × n nonnegative matrix X that satisﬁes the hypotheses of Theorem 3. [sent-150, score-0.291]

60 Ensure: An f × r matrix F and r × n matrix W with F ≥ 0, W ≥ 0, and X − F W ∞,1 ≤ 2 . [sent-152, score-0.134]

61 The main idea is to show that we can only represent a hott topic efﬁciently using the hott topic itself. [sent-158, score-1.216]

62 Off-the-shelf LP solvers may sufﬁce for moderate-size problems, but for large-scale matrix factorization problems, their running time is prohibitive, as we show in Section 5. [sent-162, score-0.281]

63 2 Related Work Localizing factorizations via column or row subset selection is a popular alternative to direct factorization methods such as the SVD. [sent-165, score-0.302]

64 have proposed a factorization localization solution to nonnegative matrix factorization using group sparsity techniques [9, 11]. [sent-170, score-0.719]

65 Elhamifar shows exact representative recovery in the noiseless setting assuming no hott topics are duplicated. [sent-173, score-0.711]

66 4 Incremental Gradient Algorithms for NMF The rudiments of our fast implementation rely on two standard optimization techniques: dual decomposition and incremental gradient descent. [sent-175, score-0.111]

67 We minimize the Lagrangian using projected incremental gradient descent. [sent-185, score-0.08]

68 Algorithm 4 H OTTOPIXX: Approximate Separable NMF by Incremental Gradient Descent Require: An f × n nonnegative matrix X. [sent-187, score-0.291]

69 Ensure: An f × r matrix F and r × n matrix W with F ≥ 0, W ≥ 0, and X − F W 1: Pick a cost p with distinct entries. [sent-189, score-0.134]

70 The incremental gradient method chooses one of the n summands at random and follows its subgradient. [sent-202, score-0.08]

71 Just as before, once we have identiﬁed the hott rows, we can form W by selecting these rows of X. [sent-211, score-0.718]

72 Note that this minimization can also be computed by incremental subgradient descent. [sent-213, score-0.078]

73 We use standard techniques: in-memory clustering to increase prefetching opportunities, padded data structures for better cache alignment, and compiler directives to allow the Intel compiler to apply vectorization. [sent-220, score-0.114]

74 Note that the incremental gradient step (step 6 in Algorithm 4) only modiﬁes the entries of C where X·k is nonzero. [sent-221, score-0.08]

75 Thus, we can parallelize the algorithm with respect to updating either the rows or the columns of C. [sent-222, score-0.11]

76 In contrast, we choose a dense representation of our localizing matrix C; this choice trades space for runtime performance. [sent-224, score-0.121]

77 Each worker thread is assigned a number of rows of C so that all rows ﬁt in the shared L3 cache. [sent-225, score-0.268]

78 Then, each worker thread repeatedly scans X while marking updates to multiple rows of C. [sent-226, score-0.158]

79 We repeat this process until all rows of C are scanned, similar to the classical block-nested loop join in relational databases [22]. [sent-227, score-0.131]

80 We ran H OTTOPIXX for 50 epochs with primal stepsize 1e-1 and dual stepsize 1e-2. [sent-264, score-0.128]

81 Once the hott topics were identiﬁed, we ﬁt F using two cleaning epochs of incremental gradient descent for all three algorithms. [sent-265, score-0.824]

82 To generate our instances, we sampled r hott topics uniformly from the unit simplex in Rn . [sent-266, score-0.711]

83 We generated the remaining f − r(d + 1) rows to be random convex combinations of the hott topics, with the combinations selected uniformly at random. [sent-268, score-0.761]

84 For very high levels of noise, however, it achieves a lower reconstruction error than the AGKM algorithm, whose performance 7 data set jumbo clueweb RCV1 features 1600 44739 47153 documents 64000 351849 781265 nonzeros 1. [sent-292, score-0.227]

85 Time is to ﬁnd 100 hott topics on the 12 core machines. [sent-299, score-0.711]

86 (right) The test error on RCV1 CCAT class versus the number of hott topics. [sent-303, score-0.633]

87 We also provide performance proﬁles for the root-mean-square error of the nonnegative matrix factorizations (Figure 2 (d) and (e)). [sent-306, score-0.38]

88 In contrast to other parallel methods that exhibit memory contention [21], we see superlinear speed-ups for up to 20 threads due to hardware prefetching and cache effects. [sent-315, score-0.106]

89 Our algorithm is able to correctly identify the hott topics on the jumbo set. [sent-317, score-0.791]

90 For RCV1, we trained an SVM on the set of features extracted by H OTTOPIXX and plot the misclassiﬁcation error versus the number of topics in Figure 3 (right). [sent-320, score-0.148]

91 With 1500 hott topics, we achieve 7% misclassiﬁcation error as compared to 5. [sent-321, score-0.633]

92 6 Discussion This paper provides an algorithmic and theoretical framework for analyzing and deploying any factorization problem that can be posed as a linear (or convex) factorization localizing program. [sent-323, score-0.502]

93 Future work should investigate the applicability of H OTTOPIXX to other factorization localizing algorithms, such as subspace clustering, and should revisit earlier theoretical bounds on such prior art. [sent-324, score-0.294]

94 BR is generously supported by ONR award N00014-11-1-0723, NSF award CCF-1139953, and a Sloan Research Fellowship. [sent-326, score-0.113]

95 CR is generously supported by NSF CAREER award under IIS-1054009, ONR award N000141210041, and gifts or research awards from American Family Insurance, Google, Greenplum, and Oracle. [sent-327, score-0.113]

96 JAT is generously supported by ONR award N00014-11-1002, AFOSR award FA9550-09-1-0643, and a Sloan Research Fellowship. [sent-328, score-0.113]

97 When does non-negative matrix factorization give a correct decomposition into parts? [sent-378, score-0.281]

98 A convex model for non-negative matrix factorization o and dimensionality reduction on physical space. [sent-397, score-0.305]

99 NMF: A ﬂexible R package for nonnegative matrix factorization. [sent-406, score-0.291]

100 Robustness analysis of hotttopixx, a linear programming model for factoring nonnegative matrices. [sent-412, score-0.293]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hott', 0.608), ('agkm', 0.432), ('nmf', 0.247), ('ottopixx', 0.224), ('nonnegative', 0.224), ('factorization', 0.214), ('simplicial', 0.155), ('cjj', 0.128), ('separable', 0.124), ('rows', 0.11), ('topics', 0.103), ('clueweb', 0.08), ('jumbo', 0.08), ('matrix', 0.067), ('factorizations', 0.064), ('pro', 0.062), ('incremental', 0.058), ('localizing', 0.054), ('cij', 0.054), ('zw', 0.052), ('cii', 0.052), ('rmse', 0.052), ('elhamifar', 0.049), ('hull', 0.048), ('bittorf', 0.048), ('errormin', 0.048), ('cx', 0.047), ('arora', 0.045), ('qa', 0.045), ('les', 0.04), ('award', 0.039), ('minz', 0.039), ('esser', 0.039), ('lagrangian', 0.035), ('factoring', 0.035), ('generously', 0.035), ('speedup', 0.034), ('programming', 0.034), ('qr', 0.033), ('epochs', 0.033), ('pr', 0.033), ('diag', 0.032), ('approximably', 0.032), ('ccat', 0.032), ('interpolative', 0.032), ('rmsemin', 0.032), ('stodden', 0.032), ('vavasis', 0.032), ('dual', 0.031), ('cache', 0.03), ('recht', 0.029), ('donoho', 0.028), ('rf', 0.028), ('afterward', 0.028), ('compiler', 0.028), ('prefetching', 0.028), ('ge', 0.027), ('subspace', 0.026), ('cur', 0.026), ('superlinear', 0.026), ('thread', 0.026), ('tropp', 0.026), ('sp', 0.026), ('find', 0.026), ('indexing', 0.025), ('matlab', 0.025), ('error', 0.025), ('kannan', 0.024), ('moitra', 0.024), ('constraint', 0.024), ('convex', 0.024), ('row', 0.024), ('tr', 0.024), ('ran', 0.024), ('noise', 0.023), ('polyhedral', 0.023), ('synthetic', 0.023), ('express', 0.022), ('gradient', 0.022), ('worker', 0.022), ('cvx', 0.022), ('threads', 0.022), ('nonzeros', 0.022), ('onr', 0.022), ('pt', 0.022), ('lp', 0.021), ('join', 0.021), ('duplicate', 0.021), ('sapiro', 0.021), ('cy', 0.021), ('admits', 0.02), ('features', 0.02), ('robustness', 0.02), ('algorithmic', 0.02), ('stepsize', 0.02), ('sgd', 0.02), ('subgradient', 0.02), ('extracting', 0.02), ('serial', 0.02), ('remaining', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 125 nips-2012-Factoring nonnegative matrices with linear programs

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

2 0.21385989 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

3 0.078746051 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efﬁcient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

4 0.062927969 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is ﬁrst to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

5 0.060355097 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any ﬁxed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the afﬁrmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

6 0.059379902 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

7 0.057601705 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

8 0.054974984 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

9 0.052605726 208 nips-2012-Matrix reconstruction with the local max norm

10 0.049940806 192 nips-2012-Learning the Dependency Structure of Latent Factors

11 0.048999757 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

12 0.04758491 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

13 0.047114946 27 nips-2012-A quasi-Newton proximal splitting method

14 0.046640821 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

15 0.043299019 194 nips-2012-Learning to Discover Social Circles in Ego Networks

16 0.042896591 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

17 0.042742368 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

18 0.042565249 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

19 0.042514529 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

20 0.041905072 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.122), (1, 0.044), (2, 0.034), (3, -0.071), (4, 0.004), (5, 0.03), (6, -0.018), (7, -0.022), (8, 0.046), (9, 0.024), (10, 0.032), (11, -0.003), (12, -0.026), (13, 0.01), (14, 0.069), (15, 0.04), (16, -0.004), (17, 0.038), (18, 0.017), (19, -0.038), (20, -0.093), (21, -0.026), (22, -0.054), (23, -0.059), (24, -0.082), (25, -0.028), (26, 0.04), (27, 0.009), (28, 0.075), (29, 0.028), (30, 0.009), (31, 0.063), (32, 0.013), (33, 0.03), (34, -0.031), (35, -0.017), (36, 0.008), (37, 0.037), (38, -0.029), (39, 0.091), (40, -0.004), (41, -0.007), (42, 0.12), (43, -0.08), (44, 0.063), (45, 0.018), (46, -0.004), (47, -0.034), (48, 0.063), (49, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89372748 125 nips-2012-Factoring nonnegative matrices with linear programs

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

2 0.66134238 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

3 0.62966508 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

4 0.59678894 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

Author: Dijun Luo, Heng Huang, Feiping Nie, Chris H. Ding

Abstract: In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semisupervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to “forge” a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efﬁcient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

5 0.58360666 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

Author: S. D. Babacan, Shinichi Nakajima, Minh Do

Abstract: In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we ﬁrst develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian methods based on variational Bayesian (VB) approximation, which are capable of automatic dimensionality selection. While the ﬁrst method is based on an alternating optimization scheme for all unknowns, the second method makes use of recent results in VB matrix factorization leading to fast and effective estimation. Both methods are extended to handle sparse outliers for robustness and can handle missing values. Experimental results suggest that proposed methods are very effective in subspace clustering and identifying outliers. 1

6 0.56194836 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

7 0.55783188 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

8 0.52233148 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

9 0.52021432 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

10 0.51000565 128 nips-2012-Fast Resampling Weighted v-Statistics

11 0.50428277 208 nips-2012-Matrix reconstruction with the local max norm

12 0.50415814 86 nips-2012-Convex Multi-view Subspace Learning

13 0.48930532 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

14 0.48418114 69 nips-2012-Clustering Sparse Graphs

15 0.47232097 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

16 0.45496026 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

17 0.43649268 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

18 0.43299031 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

19 0.43225035 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

20 0.42659611 22 nips-2012-A latent factor model for highly multi-relational data

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.051), (13, 0.246), (14, 0.013), (17, 0.013), (21, 0.031), (38, 0.143), (39, 0.015), (42, 0.025), (54, 0.059), (55, 0.024), (68, 0.018), (74, 0.049), (76, 0.122), (80, 0.062), (92, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76716095 125 nips-2012-Factoring nonnegative matrices with linear programs

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

2 0.70798707 361 nips-2012-Volume Regularization for Binary Classification

Author: Koby Crammer, Tal Wagner

Abstract: We introduce a large-volume box classiﬁcation for binary prediction, which maintains a subset of weight vectors, and speciﬁcally axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains “simple” weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efﬁciently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of 30 NLP datasets and binarized USPS optical character recognition datasets. 1

3 0.65687054 200 nips-2012-Local Supervised Learning through Space Partitioning

Author: Joseph Wang, Venkatesh Saligrama

Abstract: We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-speciﬁc classiﬁers. We formulate an empirical risk minimization problem that incorporates both partitioning and classiﬁcation in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classiﬁers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-ﬁtting and generalization error. We train locally linear classiﬁers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classiﬁcation techniques on benchmark datasets. We also show improved robustness to label noise.

4 0.65112454 38 nips-2012-Algorithms for Learning Markov Field Policies

Author: Abdeslam Boularias, Jan R. Peters, Oliver B. Kroemer

Abstract: We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies. This distribution corresponds to a Markov Random Field. We also present forward and inverse reinforcement learning algorithms for learning such policy distributions. We illustrate the advantage of the proposed approach on two problems: cart-balancing with swing-up, and teaching a robot to grasp unknown objects. 1

5 0.65080047 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of inﬁnite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1

6 0.64755982 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

7 0.64700675 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

8 0.64618576 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

9 0.64578784 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

10 0.64522707 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

11 0.64402366 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

12 0.64375055 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

13 0.64338875 160 nips-2012-Imitation Learning by Coaching

14 0.6430524 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

15 0.64242542 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

16 0.64149594 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

17 0.64112025 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

18 0.64099532 69 nips-2012-Clustering Sparse Graphs

19 0.64096928 344 nips-2012-Timely Object Recognition

20 0.64085484 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress