nips nips2010 nips2010-136 knowledge-graph by maker-knowledge-mining

136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

Source: pdf

Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa

Abstract: Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semideﬁnite program (LRSDP) with the advantage that: 1) an efﬁcient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and 2) additional constraints such as orthonormality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm ﬁnds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the afﬁne SfM problem, non-rigid SfM and photometric stereo problems.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. [sent-6, score-0.473]

2 Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm ﬁnds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. [sent-8, score-0.299]

3 We further demonstrate the effectiveness of the proposed algorithm in solving the afﬁne SfM problem, non-rigid SfM and photometric stereo problems. [sent-9, score-0.188]

4 1 Introduction Many computer vision problems such as SfM [26], non-rigid SfM [3] and photometric stereo [11] can be formulated as a matrix factorization problem. [sent-10, score-0.489]

5 In all these problems, the measured data are observations of the elements of an m × n measurement matrix M of known rank r. [sent-11, score-0.292]

6 The objective is to factorize this measurement matrix M into factors A and B of dimensions m × r and n × r, respectively such that the error ||M − AB T || is minimized. [sent-12, score-0.162]

7 Matrix factorization with missing data is a difﬁcult non-convex problem with no known globally convergent algorithm. [sent-17, score-0.364]

8 The damped Newton algorithm [4], a variant of Newton’s method, is one of the most popular algorithms for solving this problem. [sent-18, score-0.284]

9 We formulate the matrix factorization with missing data problem as a LRSDP [6], which is essentially a rank constrained semideﬁnite programming problem (SDP) and was proposed to solve large SDP in an efﬁcient way. [sent-20, score-0.623]

10 The advantages of formulating the matrix factorization problem as a LRSDP problem are the following: 1) it inherits the efﬁciency of the LRSDP algorithm. [sent-21, score-0.335]

11 2) Many additional constraints, such as the ortho-normality constraints for the orthographic SfM, can be easily incorporated into the LRSDP-based factorization formulation; this is possible because of the ﬂexible framework of the LRSDP (see section 2). [sent-23, score-0.431]

12 Prior Work Algorithms for matrix factorization in the presence of missing data can be broadly divided into two main categories: initialization algorithms and iterative algorithms. [sent-24, score-0.506]

13 Alternation algorithms [23, 28, 12, 1, 2, 14], damped Newton algorithm [4] and our approach fall under this category. [sent-27, score-0.25]

14 To solve this problem, damped Newton and hybrid algorithms between damped Newton and alternation were proposed in [4]. [sent-30, score-0.759]

15 The matrix factorization with missing data problem is closely related to the matrix completion problem [9]. [sent-33, score-0.722]

16 The goal of matrix completion is to ﬁnd a low-rank matrix which agrees with the observed entries of the matrix M . [sent-34, score-0.539]

17 Some of them [16, 15, 20] are formulated as matrix factorization problems. [sent-36, score-0.335]

18 Matrix factorization also arises while solving the collaborative ﬁltering problem. [sent-38, score-0.288]

19 In [24], collaborative ﬁltering is formulated as a matrix completion problem and solved using a semideﬁnite program. [sent-40, score-0.277]

20 , k X 0 (2) where C and Ai are n × n real symmetric matrices, b is k-dimensional vector, and X is an n × n matrix variable, which is required to be symmetric and positive semideﬁnite, as indicated by the constraint X 0. [sent-48, score-0.137]

21 However, these are second-order methods, which 2 need to store and factorize a large (and often dense) matrix and hence are not suitable for solving large scale problems. [sent-51, score-0.15]

22 In LRSDP a change of variables is introduced as X = RR T , where R is a real, n × r matrix with r ≤ n. [sent-52, score-0.116]

23 3 Matrix factorization using LRSDP (MF-LRSDP) In this section, we formulate the matrix factorization with missing data as an LRSDP problem. [sent-65, score-0.732]

24 1, we look at the noiseless case, that is, where the measurement matrix M is not corrupted with noise, followed by the noisy measurement case in section 3. [sent-67, score-0.342]

25 3, we look at how additional constraints can be incorporated in the LRSDP formulation. [sent-69, score-0.129]

26 Then (m + n) × r dimensional matrix R = B RRT = AAT BAT AB T BB T (5) We observe that the cost function ||A||2 +||B||2 can be expressed as trace(RRT ) and the constraints F F as (RRT )i,j+m = Mi,j . [sent-73, score-0.173]

27 Thus, (4) is equivalent to: min trace(RRT ) subject to (RRT )i,j+m = Mi,j for (i, j) ∈ Ω R (6) This is already in the LRSDP form, since we can express the above equation as min C • RRT subject to Al • RRT = bl , R l = 1, . [sent-74, score-0.129]

28 Al are sparse matrices with the non-zero entries at indices (i, j + m) and (j + m, i) equal to 1/2 and bl = Mi,j . [sent-81, score-0.169]

29 This completes the formulation of the matrix factorization problem as an LRSDP problem for the noiseless case. [sent-82, score-0.429]

30 With this deﬁnition, RRT is a block-diagonal matrix given by   AAT AB T 0  RRT =  (11) BAT BB T 0 EE T We can now express (8) in the following LRSDP form: min C • RRT subject to Al • RRT = bl , R with C= λI(m+n)×(m+n) 0 l = 1, . [sent-95, score-0.213]

31 This is because the last constraint is used to set E|Ω|+1 = 1, which is done by choosing A|Ω|+1 to be a sparse matrix with the non-zero entry at index (|Ω| + l + m + n, |Ω| + 1 + m + n) equal to 1 and b|Ω|+1 = 1. [sent-99, score-0.137]

32 For the remaining values of l, the Al are sparse matrices with the non-zero entries at indices (i, j +m), (j +m, i), (|Ω|+1+m+n, l+m+n) and (l+m+n, |Ω|+1+m+n) equal to 1/2 and bl = Ml . [sent-100, score-0.169]

33 Suppose that m/2 cameras are looking at n 3-D points, then under the afﬁne camera model, the 2-D imaged points can be arranged as an m × n measurement matrix M with columns corresponding to the n 3-D points and rows corresponding to the m/2 cameras (2 consecutive rows per camera) [26]. [sent-108, score-0.277]

34 Under this arrangement, M can be factorized as M = AB T , where A is a m × 4 camera matrix and B is a n × 4 structure matrix with the last column of B, an all-one vector. [sent-109, score-0.329]

35 Thus, M is a rank 4 matrix with a special structure for the last column of B. [sent-110, score-0.254]

36 Further, under the orthographic camera model, A has more structure (constraints): pair of ’rows’ that corresponds to the same camera is ortho-normal. [sent-111, score-0.243]

37 To state this constraints precisely, we decompose the A matrix as A = [P t] where P is a m × 3 sub-matrix consisting of the ﬁrst three columns and t is the last column vector. [sent-112, score-0.201]

38 We can now express the camera ortho-normality constraint through the P P T matrix, whose diagonal elements should be 1 (normality constraint) and appropriate off-diagonal elements should be 0 (orthogonality constraint). [sent-113, score-0.156]

39 This completes our illustration on the incorporation of the ortho-normality constraints for the orthographic SfM case. [sent-123, score-0.191]

40 4 Matrix Completion, Uniqueness and Convergence of MF-LRSDP In this section, we state the main result of the matrix completion theory and discuss its implications for the matrix factorization problem. [sent-125, score-0.577]

41 3 of [9] which states that if 1) the matrix M , that we want to recover, has row and columns spaces incoherent with the standard basis and 2) we are given enough entries (≥ O(rd6/5 log d), where d = max(m, n)), then there exists a unique low-rank solution to (16). [sent-130, score-0.181]

42 For example, in our LRSDP formulation, we have imposed this rank constraint by ﬁxing the number of columns of the factors A and B to r. [sent-134, score-0.131]

43 However, though the matrix completion and factorization problems are deﬁned differently, they are closely related as revealed by their very similar Lagrangian formulations. [sent-135, score-0.495]

44 This fact has been used in solving the matrix completion problem via matrix factorization with an appropriate rank [16, 15, 20]. [sent-136, score-0.721]

45 We should also note that matrix completion theory helps us answer the question raised in [4]: when is missing data matrix factorization unique (up to a gauge)? [sent-137, score-0.722]

46 And from the discussion in the previous section, it should be clear that the conditions of the matrix completion theory are sufﬁcient for guaranteeing us the required uniqueness. [sent-138, score-0.242]

47 1 Evaluation with Synthetic Data The important parameters in the matrix factorization with missing data problem are: the size of the matrix M characterized by m and n, rank r, fraction of missing data and the variance σ 2 of the observation noise. [sent-142, score-0.879]

48 We evaluate the factorization algorithms by varying these parameters. [sent-143, score-0.245]

49 For synthetic data without noise, we generate n × n matrices M of rank r by M = AB T , where A and B are n × r random matrices with each entry being sampled independently from a standard Gaussian distribution N (0, 1). [sent-145, score-0.268]

50 Each entry is then revealed randomly according to the missing data fraction. [sent-146, score-0.179]

51 For synthetic data with noise, we add independent Gaussian noise N (0, σ 2 ) to the observed entries generated as above. [sent-147, score-0.127]

52 We study the reconstruction rate of different algorithms by varying the fraction of revealed entries per column (|Ω|/n) for noiseless 500 × 500 matrices of ˆ ˆ ˆˆ rank 5. [sent-149, score-0.522]

53 We declare a matrix to be reconstructed if ||M − M||F /||M ||F ≤ 10−4 , where M = AB is the reconstructed matrix and ||. [sent-150, score-0.38]

54 Reconstruction rate is deﬁned as the fraction of trials for which the matrix was successfully reconstructed. [sent-152, score-0.19]

55 Figure 1(a) shows the reconstruction rate by MF-LRSDP, alternation and OptSpace. [sent-154, score-0.413]

56 MF-LRSDP gives the best reconstruction results as it needs fewer observations for matrix reconstruction than the other algorithms. [sent-155, score-0.353]

57 For similar comparison to other matrix completion algorithms such as ADMiRA [16], SVT [8] and FPCA [17], the interested reader can look at [15], where OptSpace was shown to be consistently better than these algorithms. [sent-158, score-0.29]

58 Note that we have not included the damped Newton algorithm in this comparison because it is very slow for matrices of this size. [sent-160, score-0.289]

59 fraction of revealed entries per column |Ω|/n for 500 × 500 matrices of rank 5 by MF-LRSDP, alternation and OptSpace. [sent-166, score-0.615]

60 The proposed algorithm MF-LRSDP gives the best reconstruction results since it can reconstruct matrices with fewer observed entries. [sent-167, score-0.2]

61 fraction of revealed entries per column |Ω|/n for different sizes n of rank 5 square matrices by MF-LRSDP and OptSpace. [sent-172, score-0.33]

62 Figure 2(a) shows that MF-LRSDP reconstructs matrices from fewer observed entries than OptSpace. [sent-173, score-0.184]

63 |Ω|/n as we vary the rank r of 500 × 500 matrices. [sent-176, score-0.13]

64 fraction of revealed entries per column |Ω|/n for rank 5 square matrices of different sizes n by MF-LRSDP and OptSpace. [sent-191, score-0.33]

65 MF-LRSDP reconstructs matrices from fewer observed entries than OptSpace. [sent-192, score-0.184]

66 noise standard deviation for rank 5, 200 × 200 matrices by MF-LRSDP, OptSpace, alternation and damped Newton. [sent-197, score-0.718]

67 We vary the standard deviation σ of the additive noise for rank 5, 200 × 200 matrices and study the performance by MF-LRSDP, OptSpace, alternation and damped Newton. [sent-201, score-0.738]

68 3, for afﬁne SfM, the m × n measurement matrix M is a rank 4 matrix with the last column of matrix B an all-one vector. [sent-208, score-0.532]

69 M is generally an incomplete matrix because not all the points are visible in all the cameras. [sent-209, score-0.139]

70 We evaluate the performance of MF-LRSDP on the ‘Dinosaur’ sequence used in [4, 7], for which M is a 72 × 319 matrix with 72% missing entries. [sent-210, score-0.261]

71 We perform 25 trials and at each trial we provide the same random initializations to MFLRSDP, alternation and damped Newton (OptSpace has its only initialization technique). [sent-211, score-0.529]

72 MF-LRSDP gives the best performance followed by damped Newton, alternation and OptSpace. [sent-214, score-0.533]

73 In this case, the m × n measurement matrix M can be expressed as M = AB T , where A is an m × 3b matrix and B is an n × 3b matrix [3]. [sent-218, score-0.394]

74 We test the performance of the algorithms on the ’Giraffe’ sequence [4, 7] for which M is a 240 × 167 matrix with 30% missing entries. [sent-220, score-0.287]

75 Figure 3 shows the cumulative histogram of 25 trials from which we conclude that MF-LRSDP, alternation and damped Newton give good results. [sent-222, score-0.612]

76 Suppose we have n images of the object under different lighting conditions with each image consisting of m pixels (m surface normals) and we arrange them as an m × n measurement matrix M . [sent-225, score-0.162]

77 Then under Lambertian assumptions, we can express M as M = AB T , where A is an m × 3 matrix representing the surface normals and reﬂectance and B is an n × 3 matrix representing the light-source directions and intensities [11]. [sent-226, score-0.292]

78 We test the algorithms on the ‘Face’ sequence [4, 7] for which M is a 2944 × 20 matrix with 42% missing entries. [sent-230, score-0.287]

79 The cumulative histogram in ﬁgure 3 shows that MF-LRSDP and damped Newton gives the best results followed by alternation and OptSpace. [sent-231, score-0.616]

80 7 25 20 15 MF−LRSDP Alternation Damped Newton OptSpace 10 5 0 2 4 6 RMS pixel error 8 25 MF−LRSDP Alternation Damped Newton OptSpace Cumulative histogram 20 Cumulative histogram Cumulative histogram 25 15 10 5 0 0 (a) Dinosaur sequence 0. [sent-232, score-0.123]

81 8 RMS pixel error (b) Giraffe sequence 1 20 MF−LRSDP Alternation damped Newton OptSpace 15 10 5 0 0. [sent-236, score-0.224]

82 Orthographic SfM is a special case of afﬁne SfM, where the camera matrix A satisﬁes the additional constraint of ortho-normality, see section 3. [sent-244, score-0.226]

83 Figure 4 shows the input point tracks, reconstructed point tracks without the constraints and reconstructed point tracks with the constraints for the Dinosaur turntable sequence. [sent-247, score-0.578]

84 Without the constraints many tracks fail to be circular, whereas with the constraints all of them are circular (the dinosaur sequence is a turntable sequence and the tracks are supposed to be circular). [sent-248, score-0.623]

85 Without the constraints many tracks fail to be circular, whereas with the constraints all of them are circular (the dinosaur sequence is a turntable sequence and the tracks are supposed to be circular). [sent-251, score-0.623]

86 6 Conclusion and Discussion We have formulated the matrix factorization with missing data problem as a low-rank semideﬁnite programming problem MF-LRSDP. [sent-252, score-0.48]

87 MF-LRSDP is an efﬁcient algorithm that can be used for solving large-scale factorization problems. [sent-253, score-0.253]

88 It is also ﬂexible for handling many additional constraints such as the ortho-normality constraints of orthographic SfM. [sent-254, score-0.239]

89 Our empirical evaluations on synthetic data show that it needs fewer observations for matrix factorization as compared to other algorithms and it gives very good results on the real problems of SfM, non-rigid SfM and photometric stereo. [sent-255, score-0.555]

90 We note that though MF-LRSDP is a non-convex problem, it ﬁnds the global minimum under the conditions of matrix completion theory. [sent-256, score-0.242]

91 Damped newton algorithms for matrix factorization with missing data. [sent-284, score-0.646]

92 Optimization algorithms on subspaces: Revisiting missing data problem in low-rank matrix. [sent-299, score-0.171]

93 Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. [sent-340, score-0.364]

94 Fixed point and bregman iterative methods for matrix rank minimization. [sent-359, score-0.226]

95 3d reconstruction by ﬁtting low-rank matrices with missing data. [sent-364, score-0.312]

96 On the wiberg algorithm for matrix factorization in the presence of missing components. [sent-386, score-0.48]

97 Principal component analysis with missing data and its application to polyhedral object modeling. [sent-399, score-0.145]

98 Algorithms for batch matrix factorization with application to structure-from-motion. [sent-416, score-0.335]

99 Shape and motion from image streams under orthography: a factorization method. [sent-421, score-0.219]

100 Motion segmentation with missing data using powerfactorization and gpca. [sent-432, score-0.145]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lrsdp', 0.616), ('alternation', 0.285), ('sfm', 0.264), ('damped', 0.224), ('factorization', 0.219), ('optspace', 0.198), ('rrt', 0.171), ('missing', 0.145), ('newton', 0.14), ('ab', 0.138), ('tracks', 0.128), ('completion', 0.126), ('matrix', 0.116), ('rank', 0.11), ('photometric', 0.109), ('dinosaur', 0.105), ('orthographic', 0.105), ('reconstruction', 0.102), ('mf', 0.091), ('reconstructed', 0.074), ('camera', 0.069), ('af', 0.066), ('entries', 0.065), ('matrices', 0.065), ('circular', 0.065), ('turntable', 0.06), ('sdp', 0.059), ('constraints', 0.057), ('semide', 0.056), ('measurement', 0.046), ('giraffe', 0.045), ('stereo', 0.045), ('cumulative', 0.042), ('histogram', 0.041), ('orthonormality', 0.04), ('bl', 0.039), ('noiseless', 0.038), ('el', 0.038), ('collaborative', 0.035), ('rmse', 0.035), ('solving', 0.034), ('revealed', 0.034), ('normals', 0.034), ('noise', 0.034), ('fewer', 0.033), ('formulate', 0.033), ('subject', 0.032), ('ijcv', 0.031), ('incorporated', 0.03), ('bartoli', 0.03), ('bat', 0.03), ('guilbert', 0.03), ('kaushik', 0.03), ('mflrsdp', 0.03), ('mitra', 0.03), ('rama', 0.03), ('sheorey', 0.03), ('umiacs', 0.03), ('cvpr', 0.029), ('corrupted', 0.029), ('completes', 0.029), ('column', 0.028), ('synthetic', 0.028), ('fraction', 0.028), ('corr', 0.028), ('rms', 0.027), ('formulation', 0.027), ('admira', 0.026), ('lambertian', 0.026), ('rate', 0.026), ('algorithms', 0.026), ('express', 0.026), ('augmented', 0.026), ('al', 0.025), ('sameer', 0.024), ('aat', 0.024), ('hadamard', 0.024), ('followed', 0.024), ('evaluations', 0.024), ('incomplete', 0.023), ('cameras', 0.023), ('burer', 0.023), ('supposed', 0.023), ('memory', 0.023), ('singular', 0.022), ('look', 0.022), ('maryland', 0.022), ('rennie', 0.022), ('ltering', 0.021), ('lagrangian', 0.021), ('constraint', 0.021), ('noisy', 0.021), ('tpami', 0.021), ('bb', 0.021), ('ranks', 0.021), ('reconstructs', 0.021), ('additional', 0.02), ('trials', 0.02), ('elements', 0.02), ('vary', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa

2 0.11605684 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu

Abstract: We pose transductive classiﬁcation as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspeciﬁed, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modiﬁed ﬁxed-point continuation method that is guaranteed to ﬁnd the global optimum. 1

3 0.10927834 162 nips-2010-Link Discovery using Graph Feature Tracking

Author: Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, Nicolas Vayatis

Abstract: We consider the problem of discovering links of an evolving undirected graph given a series of past snapshots of that graph. The graph is observed through the time sequence of its adjacency matrix and only the presence of edges is observed. The absence of an edge on a certain snapshot cannot be distinguished from a missing entry in the adjacency matrix. Additional information can be provided by examining the dynamics of the graph through a set of topological features, such as the degrees of the vertices. We develop a novel methodology by building on both static matrix completion methods and the estimation of the future state of relevant graph features. Our procedure relies on the formulation of an optimization problem which can be approximately solved by a fast alternating linearized algorithm whose properties are examined. We show experiments with both simulated and real data which reveal the interest of our methodology. 1

4 0.10070478 94 nips-2010-Feature Set Embedding for Incomplete Data

Author: David Grangier, Iain Melvin

Abstract: We present a new learning strategy for classiﬁcation problems in which train and/or test data suffer from missing features. In previous work, instances are represented as vectors from some feature space and one is forced to impute missing values or to consider an instance-speciﬁc subspace. In contrast, our method considers instances as sets of (feature,value) pairs which naturally handle the missing value case. Building onto this framework, we propose a classiﬁcation strategy for sets. Our proposal maps (feature,value) pairs into an embedding space and then nonlinearly combines the set of embedded vectors. The embedding and the combination parameters are learned jointly on the ﬁnal classiﬁcation objective. This simple strategy allows great ﬂexibility in encoding prior knowledge about the features in the embedding step and yields advantageous results compared to alternative solutions over several datasets. 1

5 0.095075846 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon

Abstract: Minimizing the rank of a matrix subject to afﬁne constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under afﬁne constraints (ARMP) and show that SVP recovers the minimum rank solution for afﬁne constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the deﬁning afﬁne constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is signiﬁcantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1

6 0.089332059 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

7 0.081860892 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

8 0.075054809 231 nips-2010-Robust PCA via Outlier Pursuit

9 0.072185963 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

10 0.060118675 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

11 0.051240303 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

12 0.048836723 283 nips-2010-Variational Inference over Combinatorial Spaces

13 0.046988875 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

14 0.045728017 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

15 0.045041334 265 nips-2010-The LASSO risk: asymptotic results and real world examples

16 0.043269422 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

17 0.041455019 89 nips-2010-Factorized Latent Spaces with Structured Sparsity

18 0.040542051 284 nips-2010-Variational bounds for mixed-data factor analysis

19 0.039241347 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations

20 0.037590705 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.113), (1, 0.036), (2, 0.026), (3, 0.019), (4, 0.03), (5, -0.076), (6, 0.024), (7, 0.044), (8, -0.08), (9, -0.005), (10, 0.055), (11, 0.002), (12, 0.14), (13, 0.063), (14, 0.063), (15, 0.02), (16, 0.005), (17, -0.011), (18, 0.031), (19, 0.122), (20, 0.071), (21, 0.063), (22, -0.048), (23, -0.166), (24, 0.1), (25, -0.011), (26, -0.015), (27, -0.018), (28, -0.038), (29, -0.046), (30, -0.062), (31, -0.001), (32, -0.062), (33, 0.063), (34, -0.022), (35, -0.019), (36, -0.008), (37, 0.007), (38, 0.047), (39, -0.005), (40, 0.012), (41, -0.054), (42, -0.013), (43, 0.125), (44, -0.013), (45, -0.014), (46, -0.058), (47, -0.016), (48, 0.032), (49, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94198883 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa

2 0.82483584 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon

3 0.6963473 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu

4 0.68937737 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

Author: Nathan Srebro, Ruslan Salakhutdinov

Abstract: We show that matrix completion with trace-norm regularization can be signiﬁcantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling. We show that the weighted trace-norm regularization indeed yields signiﬁcant gains on the highly non-uniformly sampled Netﬂix dataset.

5 0.66448528 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

Author: Uri Shalit, Daphna Weinshall, Gal Chechik

Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efﬁciently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classiﬁcation task. 1

6 0.63227504 231 nips-2010-Robust PCA via Outlier Pursuit

7 0.60625535 162 nips-2010-Link Discovery using Graph Feature Tracking

8 0.55167145 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

9 0.50843036 45 nips-2010-CUR from a Sparse Optimization Viewpoint

10 0.4640246 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

11 0.45161575 94 nips-2010-Feature Set Embedding for Incomplete Data

12 0.44512668 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

13 0.42407382 221 nips-2010-Random Projections for $k$-means Clustering

14 0.4191705 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

15 0.38815421 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting

16 0.38309515 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

17 0.37127423 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

18 0.35716212 287 nips-2010-Worst-Case Linear Discriminant Analysis

19 0.35185143 284 nips-2010-Variational bounds for mixed-data factor analysis

20 0.34690627 265 nips-2010-The LASSO risk: asymptotic results and real world examples

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.097), (16, 0.335), (17, 0.011), (27, 0.057), (30, 0.052), (35, 0.015), (45, 0.125), (50, 0.062), (52, 0.043), (60, 0.018), (77, 0.044), (90, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70001346 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa

2 0.6127035 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan

Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semideﬁnite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm. 1

3 0.58862567 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

Author: Silvia Chiappa, Jan R. Peters

Abstract: Many time-series such as human movement data consist of a sequence of basic actions, e.g., forehands and backhands in tennis. Automatically extracting and characterizing such actions is an important problem for a variety of different applications. In this paper, we present a probabilistic segmentation approach in which an observed time-series is modeled as a concatenation of segments corresponding to different basic actions. Each segment is generated through a noisy transformation of one of a few hidden trajectories representing different types of movement, with possible time re-scaling. We analyze three different approximation methods for dealing with model intractability, and demonstrate how the proposed approach can successfully segment table tennis movements recorded using a robot arm as haptic input device. 1

4 0.53914958 1 nips-2010-(RF)^2 -- Random Forest Random Field

Author: Nadia Payet, Sinisa Todorovic

Abstract: We combine random forest (RF) and conditional random ﬁeld (CRF) into a new computational framework, called random forest random ﬁeld (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random ﬁeld of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.

5 0.52180791 212 nips-2010-Predictive State Temporal Difference Learning

Author: Byron Boots, Geoffrey J. Gordon

Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identiﬁcation. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identiﬁcation (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD ﬁnds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difﬁcult optimal stopping problem. 1

6 0.48102662 117 nips-2010-Identifying graph-structured activation patterns in networks

7 0.47724584 221 nips-2010-Random Projections for $k$-means Clustering

8 0.47530526 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

9 0.47395492 45 nips-2010-CUR from a Sparse Optimization Viewpoint

10 0.47296867 261 nips-2010-Supervised Clustering

11 0.47046089 284 nips-2010-Variational bounds for mixed-data factor analysis

12 0.46751571 262 nips-2010-Switched Latent Force Models for Movement Segmentation

13 0.46721187 146 nips-2010-Learning Multiple Tasks using Manifold Regularization

14 0.46641275 265 nips-2010-The LASSO risk: asymptotic results and real world examples

15 0.4657349 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

16 0.46469659 192 nips-2010-Online Classification with Specificity Constraints

17 0.46391732 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

18 0.46319911 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

19 0.46228427 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

20 0.46210596 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication