nips nips2013 nips2013-307 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Miao Xu, Rong Jin, Zhi-Hua Zhou
Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. [sent-6, score-0.87]
2 In many real tasks, side information in addition to the observed entries is often available. [sent-7, score-0.321]
3 In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. [sent-8, score-0.7]
4 We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). [sent-9, score-0.612]
5 We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. [sent-10, score-0.849]
6 1 Introduction Matrix completion concerns the problem of recovering a low-rank matrix from a limited number of observed entries. [sent-11, score-0.566]
7 Recent studies show that, with a high probability, we can efficiently recover a matrix M ∈ Rn×m of rank r from O(r(n+m) ln2 (n+ m)) observed entries when the observed entries are uniformly sampled from M [11, 12, 34]. [sent-13, score-0.565]
8 Moreover, current techniques for matrix completion require solving an optimization problem that can be computationally prohibitive when the size of the matrix is very large. [sent-17, score-0.657]
9 On the other hand, in several applications of matrix completion, besides the observed entries, side information is often available that can potentially benefit the process of matrix completion. [sent-20, score-0.546]
10 Below we list a few examples: • Collaborative filtering aims to predict ratings of individual users based on the ratings from other users [35]. [sent-21, score-0.186]
11 Besides the ratings provided by users, side information, such as the textual description of items and the demographical information of users, is often available and can be used to facilitate the prediction of missing ratings. [sent-22, score-0.238]
12 1 • Link prediction aiming to predict missing links between users in a social network based on the existing ones can be viewed as a matrix completion problem [20], where side information, such as attributes of users (e. [sent-23, score-0.793]
13 Although several studies exploit side information for matrix recovery [1, 2, 3, 16, 29, 32, 33], most of them focus on matrix factorization techniques, which usually result in non-convex optimization problems without guarantee of perfectly recovering the target matrix. [sent-26, score-0.779]
14 In contrast, matrix completion deals with convex optimization problems and perfect recovery is guaranteed under appropriate conditions. [sent-27, score-0.645]
15 In this work, we focus on exploiting side information to improve the sample complexity and scalability of matrix completion. [sent-28, score-0.321]
16 We assume that besides the observed entries in the matrix M , there exist two side information matrices A ∈ Rn×ra and B ∈ Rm×rb , where r ≤ ra ≤ n and r ≤ rb ≤ m. [sent-29, score-1.208]
17 We further assume the target matrix and the side information matrices share the same latent information; that is, the column and row vectors in M lie in the subspaces spanned by the column vectors in A and B, respectively. [sent-30, score-0.572]
18 We show that, with the assistance of side information matrices, with a high probability, we can perfectly recover M with O(r(ra + rb ) ln(ra + rb ) ln(n + m)) observed entries, a sample complexity that is sublinear in n and m. [sent-32, score-1.064]
19 We demonstrate the effectiveness of matrix completion with side information in transductive incomplete multi-label learning [17], which aims to assign multiple labels to individual instances in a transductive learning setting. [sent-33, score-1.31]
20 We formulate transductive incomplete multi-label learning as a matrix completion problem, i. [sent-34, score-0.826]
21 , completing the instance-label matrix based on the observed entries that correspond to the given label assignments. [sent-36, score-0.43]
22 Both the feature vectors of instances and the class correlation matrix can be used as side information. [sent-37, score-0.417]
23 Our empirical study shows that the proposed approach is particularly effective when the number of given label assignments is small, verifying our theoretical result, i. [sent-38, score-0.186]
24 , side information can be used to reduce the sample complexity. [sent-40, score-0.171]
25 2 Related work Matrix Completion The objective of matrix completion is to fill out the missing entries of a matrix based on the observed ones. [sent-45, score-0.812]
26 Early work on matrix completion, also referred to as maximum margin matrix factorization [37], was developed for collaborative filtering. [sent-46, score-0.39]
27 Theoretical studies show that, it is sufficient to perfectly recover a matrix M ∈ Rn×m of rank r when the number of observed entries is O(r(n + m) ln2 (n + m)) [11, 12, 34]. [sent-47, score-0.472]
28 A more general matrix recovery problem, referred to as matrix regression, was examined in [30, 36]. [sent-48, score-0.365]
29 Unlike these studies, our proposed approach reduces the sample complexity with the help of side information matrices. [sent-49, score-0.194]
30 Several computational algorithms [10, 22, 23, 25, 27, 28, 39] have been developed to efficiently solve the optimization problem of matrix completion. [sent-50, score-0.174]
31 The main problem with these algorithms lies in the fact that they have to explicitly update the full matrix of size n × m, which is expensive both computationally and storage wise for large matrices. [sent-51, score-0.224]
32 This issue has been addressed in several recent studies [5, 19], where the key idea is to store and update the low rank factorization of the target matrix. [sent-52, score-0.164]
33 A preliminary convergence analysis is given in [19], however, none of these approaches guarantees perfect recovery of the target matrix, even with significantly large number of observed entries. [sent-53, score-0.218]
34 In contrast, our proposed approach reduces the computational cost by explicitly exploring the side information matrices and still delivers the promise of perfect recovery. [sent-54, score-0.292]
35 Several recent studies involve matrix recovery with side information. [sent-55, score-0.432]
36 [2, 3, 29, 33] are based on graphical models by assuming special distribution of latent factors; these algorithms, as well as [16] and [32], consider side information in matrix factorization. [sent-56, score-0.321]
37 The main limitation lies in the fact that they have to solve non-convex optimization problems, and do not have theoretical guarantees on matrix recovery. [sent-57, score-0.196]
38 Matrix completion with infinite dimensional side information was exploited in [1], 2 yet lacking guarantee of perfect recovery. [sent-58, score-0.546]
39 In contrast, our work is based on matrix completion theory that deals with a general convex optimization problem and is guaranteed to make a perfect recovery of the target matrix. [sent-59, score-0.687]
40 In this work, we will evaluate our proposed approach by transductive incomplete multi-label learning [17]. [sent-63, score-0.366]
41 , xn )⊤ ∈ Rn×d be the feature matrix with xi ∈ Rd , where n is the number of instances and d is the dimension. [sent-67, score-0.217]
42 , Cm denote the m labels, and let T ∈ {−1, +1}n×m be the instance-label matrix, where Ti,j = +1 when xi is associated with the label Cj , and Ti,j = −1 when xi is not associated with the label Cj . [sent-71, score-0.21]
43 Let Ω denote the subset of the observed entries in T that corresponds to the given label assignments of instances. [sent-72, score-0.313]
44 The objective of transductive incomplete multi-label learning is to predict the missing entries in T based on the feature matrix X and the given label assignments in Ω. [sent-73, score-0.809]
45 The main challenge lies in the fact that only a partial label assignment is given for each training instance. [sent-74, score-0.159]
46 This is in contrast to many studies on common semi-supervised or transductive multi-label learning [18, 24, 26, 43] where each labeled instance receives a complete set of label assignments. [sent-75, score-0.371]
47 In [17], a matrix completion based approach was proposed for transductive incomplete multi-label learning. [sent-78, score-0.849]
48 To effectively exploit the information in the feature matrix X, the authors proposed to complete the matrix T ′ = [X, T ] that combines the input features with label assignments into a single matrix. [sent-79, score-0.506]
49 Unlike MC-1 and MC-b, our proposed approach does not need to deal with the big matrix T ′ , and is computationally more efficient. [sent-82, score-0.173]
50 Besides the computational advantage, we show that our proposed approach significantly improves the sample complexity of matrix completion by exploiting side information matrices. [sent-83, score-0.677]
51 3 Speedup Matrix Completion with Side Information We first describe the framework of matrix completion with side information, and then present its theoretical guarantee and application to multi-label learning 3. [sent-84, score-0.654]
52 1 Matrix Completion using Side Information Let M ∈ Rn×m be the target matrix of rank r to be recovered. [sent-85, score-0.231]
53 , m} be the subset of indices of observed entries sampled uniformly from all entries in M . [sent-108, score-0.254]
54 Given Ω, we define a linear operator RΩ (M ) : Rn×m → Rn×m as { Mi,j (i, j) ∈ Ω [RΩ (M )]i,j = 0 (i, j) ∈ Ω / Using RΩ (·), the standard matrix completion problem is: ˜ ˜ min ∥M ∥tr s. [sent-109, score-0.483]
55 , brb ) ∈ Rm×rb be the side information matrices, where r ≤ ra ≤ n and r ≤ rb ≤ m. [sent-118, score-0.823]
56 Without loss of generality, we assume that ra ≥ rb and that 3 both A and B are orthonormal matrices, i. [sent-119, score-0.652]
57 In case when the side information is not available, A and B will be set to identity matrix. [sent-122, score-0.171]
58 The objective is to complete a matrix M of rank r with the side information matrices A and B. [sent-123, score-0.416]
59 We make the following assumption in order to fully exploit the side information: Assumption A: the column vectors in M lie in the subspace spanned by the column vectors in A, and the row vectors in M lie in the subspace spanned by the column vectors in B. [sent-124, score-0.542]
60 Assumption A essentially implies that all the label assignments can be accurately predicted by a linear combination of feature vectors of instances. [sent-126, score-0.212]
61 Following the standard theory for matrix completion [11, 12, 34], we can cast the matrix completion task into the following optimization problem: min Z∈Rra ×rb ∥Z∥tr s. [sent-128, score-0.99]
62 (2) Unlike the standard algorithm for matrix completion that requires solving an optimization problem involved matrix of n × m, the optimization problem given in (2) only deals with a matrix Z of ra × rb , and therefore can be solved significantly more efficiently if ra ≪ n and rb ≪ m. [sent-131, score-2.166]
63 We also define the coherence measure for matrices A and B as ( ) n∥Ai,∗ ∥2 m∥Bj,∗ ∥2 µAB = max max , max , 1≤i≤n 1≤j≤m ra rb where Ai,∗ and Bi,∗ stand for the ith row of A and B, respectively. [sent-134, score-0.787]
64 Define q0 = 1 (1 + log2 ra − log2 r), Ω0 = 2 128β 8β 2 2 3 µ max(µ1 , µ)r(ra + rb ) ln n and Ω1 = 3 µ (ra rb + r ) ln n. [sent-137, score-1.125]
65 With a probability at least 1 − 4(q0 + 1)n−β+1 − 2q0 n−β+2 , Z0 is the unique optimizer to the problem in (2) provided |Ω| ≥ 64β µ max(µ1 , µ) (1 + log2 ra − log2 r) r(ra + rb ) ln n. [sent-139, score-0.707]
66 3 Compared to the standard matrix completion theory [34], the side information matrices reduce sample complexity from O(r(n + m) ln2 (n + m)) to O(r(ra + rb ) ln(ra + rb ) ln n). [sent-140, score-1.491]
67 When ra ≪ n and rb ≪ m, the side information allows us significantly reduce the number of observed entries required for perfectly recovering matrix M . [sent-141, score-1.217]
68 For transductive incomplete multi-label learning, we abuse our notation by defining n as the number of instances, m as the number of labels, and d as the dimensionality of input patterns. [sent-151, score-0.343]
69 Our goal is to complete the instance-label matrix M ∈ Rn×m by using (i) the feature matrix X ∈ Rn×d and (ii) the observed entries Ω in M (i. [sent-152, score-0.47]
70 We thus set the side information matrix A to include the top left singular vectors of X, and B = I to indicate that no side information is available for the dependence among labels. [sent-155, score-0.576]
71 We note that the low rank assumption of instance-label matrix M implies a linear dependence among the label prediction functions. [sent-156, score-0.294]
72 4 Experiments We evaluate the proposed algorithm for matrix completion with side information on both synthetic and real data sets. [sent-158, score-0.677]
73 1 Experiments on Synthetic Data To create the side information matrices A and B, we first generate a random matrix F ∈ Rn×m , with each entry Fi,j drawn independently from N (0, 1). [sent-163, score-0.4]
74 Side information matrix A includes the first ra left singular vectors of F , and B includes the first rb right singular vectors. [sent-164, score-0.941]
75 We create a diagonal matrix Σ ∈ Rr×r , whose diagonal entries are drawn independently from N (0, 104 ). [sent-167, score-0.277]
76 Finally, the target 2 1 matrix M is given by M = AZ0 B ⊤ . [sent-169, score-0.192]
77 Settings and Baselines Our goal is to show that the proposed algorithm is able to accurately recover the target matrix with significantly smaller number of entries and less computational time. [sent-170, score-0.349]
78 Both ra and rb of side information matrices are set to be 2r, and |Ω|, the number of observed entries, is set to be r(2n − r), which is significantly smaller than the number of observed entries used in previous studies [10, 25, 27]. [sent-174, score-1.121]
79 We compare the proposed Maxide algorithm with three state-of-the-art matrix completion algorithms: Singular Vector Thresholding (SVT) [10], Fixed Point Bregman Iterative Method (FPCA) [27] and Augmented Lagrangian Method (ALM) [25]. [sent-176, score-0.506]
80 In addition to these matrix completion methods, we also compare with a trace norm minimizing method (TraceMin) [6]. [sent-177, score-0.539]
81 Results We measure the performance of matrix completion by the relative error ∥AZB ⊤ − M ∥F /∥M ∥F and report the results of both relative error and running time in Table 1. [sent-179, score-0.527]
82 n is the size of a squared matrix and r is its rank. [sent-185, score-0.15]
83 Rate is the number of observed entries divided by the size of the matrix, that is, |Ω|/(nm). [sent-186, score-0.15]
84 53 × 10−1 by the baseline methods is Ω(1), implying that none of them is able to make accurate recovery of the target matrix given the small number of observed entries. [sent-264, score-0.354]
85 In contrast, our proposed algorithm is able to recover the target matrix with small relative error. [sent-265, score-0.267]
86 2 Application to Transductive Incomplete Multi-Label Learning We evaluate the proposed algorithm for transductive incomplete multi-label learning on thirteen benchmark data sets, including eleven data sets for web page classification from “yahoo. [sent-269, score-0.454]
87 To create partial label assignments for training data, for each label Cj , we expose the label assignment of Cj for ω% randomly sampled positive and negative training instances and keep the label assignment of Cj unknown for the rest of the training instances. [sent-278, score-0.612]
88 We compare the proposed Maxide method with MC-1 and MC-b, the state-of-the-art methods for transductive incomplete multi-label learning developed in [17]. [sent-287, score-0.366]
89 Results Table 2 summarizes the results on transductive incomplete multi-label learning. [sent-293, score-0.343]
90 For the NUS-WIDE data set, none of MC-1 and MC-b, the two existing matrix completion based algorithms for transductive incomplete multi-label learning, is able to finish within one week. [sent-298, score-0.849]
91 5 Conclusion In this paper, we develop the theory of matrix completion with side information. [sent-301, score-0.654]
92 We present the Maxide algorithm that can efficiently solve the optimization problem for matrix completion with side information. [sent-303, score-0.678]
93 Empirical studies with synthesized data sets and transductive incomplete multi-label learning show the promising performance of the proposed algorithm. [sent-304, score-0.461]
94 Incorporating side information in probabilistic matrix factorization with gaussian processes. [sent-318, score-0.358]
95 High-rank matrix completion and subspace clustering with missing data. [sent-404, score-0.555]
96 7 Table 2: Results on transductive incomplete multi-label learning. [sent-406, score-0.343]
97 ω% represents the percentage of training instances with observed label assignment for each label. [sent-411, score-0.23]
98 Matrix co-factorization for recommendation with rich side information and implicit feedback. [sent-771, score-0.171]
99 Fixed point and bregman iterative methods for matrix rank minimization. [sent-842, score-0.209]
100 Bayesian matrix factorization with side information and dirichlet process mixtures. [sent-880, score-0.358]
wordName wordTfidf (topN-words)
[('maxide', 0.57), ('rb', 0.363), ('completion', 0.333), ('ra', 0.289), ('transductive', 0.22), ('alm', 0.179), ('side', 0.171), ('matrix', 0.15), ('incomplete', 0.123), ('label', 0.105), ('entries', 0.104), ('fpca', 0.095), ('svt', 0.093), ('flickr', 0.077), ('eleven', 0.067), ('recovery', 0.065), ('assignments', 0.058), ('rra', 0.057), ('perfectly', 0.057), ('matrices', 0.056), ('rn', 0.056), ('users', 0.055), ('singular', 0.055), ('ln', 0.055), ('collaborative', 0.053), ('azb', 0.05), ('ap', 0.05), ('cj', 0.049), ('instances', 0.047), ('studies', 0.046), ('observed', 0.046), ('labels', 0.046), ('subspace', 0.043), ('perfect', 0.042), ('target', 0.042), ('rank', 0.039), ('ratings', 0.038), ('aza', 0.038), ('bzb', 0.038), ('nanjing', 0.038), ('runnable', 0.038), ('tracemin', 0.038), ('recovering', 0.037), ('factorization', 0.037), ('spanned', 0.034), ('trace', 0.034), ('recreation', 0.034), ('assistance', 0.034), ('ltering', 0.033), ('kdd', 0.033), ('assignment', 0.032), ('deals', 0.031), ('jin', 0.03), ('recover', 0.03), ('missing', 0.029), ('besides', 0.029), ('tit', 0.029), ('tkde', 0.029), ('vectors', 0.029), ('synthesized', 0.028), ('wise', 0.028), ('baseline', 0.028), ('aaai', 0.027), ('zb', 0.026), ('cand', 0.026), ('completing', 0.025), ('rennie', 0.025), ('luo', 0.025), ('pu', 0.025), ('za', 0.025), ('classi', 0.025), ('jmlr', 0.024), ('optimization', 0.024), ('pv', 0.024), ('storage', 0.024), ('none', 0.023), ('evgeniou', 0.023), ('proposed', 0.023), ('create', 0.023), ('tr', 0.023), ('libsvm', 0.022), ('coherence', 0.022), ('norm', 0.022), ('lies', 0.022), ('relative', 0.022), ('tang', 0.021), ('sets', 0.021), ('business', 0.021), ('thresholding', 0.021), ('column', 0.021), ('image', 0.021), ('nish', 0.02), ('bregman', 0.02), ('feature', 0.02), ('jain', 0.019), ('fixed', 0.019), ('ab', 0.019), ('lie', 0.019), ('max', 0.019), ('ensemble', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
Author: Miao Xu, Rong Jin, Zhi-Hua Zhou
Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1
2 0.24114418 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
Author: Akshay Krishnamurthy, Aarti Singh
Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1
3 0.16728061 185 nips-2013-Matrix Completion From any Given Set of Observations
Author: Troy Lee, Adi Shraibman
Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1
4 0.14111066 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
Author: Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu
Abstract: Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from O(nr log2 (n)) observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results. 1
5 0.12780783 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
Author: Franz Kiraly, Louis Theran
Abstract: We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries. We describe: effective algorithms for deciding if and entry can be reconstructed and, if so, for reconstructing and denoising it; and a priori bounds on the error of each entry, individually. In the noiseless case our algorithm is exact. For rank-one matrices, the new algorithm is fast, admits a highly-parallel implementation, and produces an error minimizing estimate that is qualitatively close to our theoretical and the state-of-the-art Nuclear Norm and OptSpace methods. 1
6 0.099066079 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
7 0.098859861 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
8 0.098655738 11 nips-2013-A New Convex Relaxation for Tensor Completion
9 0.093377315 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
10 0.091731444 75 nips-2013-Convex Two-Layer Modeling
11 0.077951662 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
12 0.074466348 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
13 0.072479382 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors
14 0.066347711 91 nips-2013-Dirty Statistical Models
15 0.065536246 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
16 0.058266193 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks
17 0.05789075 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
18 0.05744385 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
19 0.054339141 186 nips-2013-Matrix factorization with binary components
20 0.053394444 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
topicId topicWeight
[(0, 0.162), (1, 0.083), (2, 0.081), (3, 0.147), (4, 0.027), (5, -0.1), (6, -0.036), (7, -0.02), (8, -0.045), (9, 0.032), (10, 0.008), (11, 0.031), (12, 0.041), (13, 0.001), (14, -0.035), (15, -0.06), (16, -0.114), (17, 0.033), (18, 0.0), (19, -0.016), (20, -0.05), (21, -0.112), (22, 0.016), (23, -0.089), (24, -0.015), (25, -0.001), (26, -0.009), (27, -0.203), (28, -0.024), (29, -0.021), (30, -0.016), (31, -0.067), (32, -0.026), (33, -0.058), (34, 0.052), (35, 0.016), (36, 0.137), (37, -0.031), (38, -0.065), (39, 0.028), (40, -0.017), (41, 0.004), (42, 0.044), (43, 0.004), (44, 0.043), (45, 0.011), (46, -0.02), (47, -0.055), (48, -0.007), (49, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.95003659 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
Author: Miao Xu, Rong Jin, Zhi-Hua Zhou
Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1
2 0.87589878 185 nips-2013-Matrix Completion From any Given Set of Observations
Author: Troy Lee, Adi Shraibman
Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1
3 0.85309529 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Author: Adrien Todeschini, François Caron, Marie Chavent
Abstract: We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an ExpectationMaximization algorithm to obtain a Maximum A Posteriori estimate of the completed low rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefficients associated to the singular values. The algorithm is simple to implement and can scale to large matrices. We provide numerical comparisons between our approach and recent alternatives showing the interest of the proposed approach for low rank matrix completion. 1
4 0.79576725 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
Author: Akshay Krishnamurthy, Aarti Singh
Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1
5 0.78617907 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty
Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1
6 0.77493209 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
7 0.7593056 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
8 0.64472502 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
9 0.61615366 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
10 0.59426039 352 nips-2013-What do row and column marginals reveal about your dataset?
11 0.59387207 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
12 0.58178812 73 nips-2013-Convex Relaxations for Permutation Problems
13 0.57767898 186 nips-2013-Matrix factorization with binary components
14 0.52122694 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
15 0.51355582 11 nips-2013-A New Convex Relaxation for Tensor Completion
16 0.49499962 12 nips-2013-A Novel Two-Step Method for Cross Language Representation Learning
17 0.49296421 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
18 0.48074132 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
19 0.46459222 247 nips-2013-Phase Retrieval using Alternating Minimization
20 0.46117267 91 nips-2013-Dirty Statistical Models
topicId topicWeight
[(5, 0.223), (16, 0.034), (33, 0.19), (34, 0.086), (41, 0.036), (49, 0.043), (56, 0.114), (70, 0.028), (85, 0.048), (89, 0.036), (93, 0.048), (95, 0.018)]
simIndex simValue paperId paperTitle
1 0.90550643 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
same-paper 2 0.82816839 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
Author: Miao Xu, Rong Jin, Zhi-Hua Zhou
Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1
3 0.74389201 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
Author: Tianbao Yang
Abstract: We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances. 1
4 0.74337626 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
Author: Eftychios A. Pnevmatikakis, Liam Paninski
Abstract: We propose a compressed sensing (CS) calcium imaging framework for monitoring large neuronal populations, where we image randomized projections of the spatial calcium concentration at each timestep, instead of measuring the concentration at individual locations. We develop scalable nonnegative deconvolution methods for extracting the neuronal spike time series from such observations. We also address the problem of demixing the spatial locations of the neurons using rank-penalized matrix factorization methods. By exploiting the sparsity of neural spiking we demonstrate that the number of measurements needed per timestep is significantly smaller than the total number of neurons, a result that can potentially enable imaging of larger populations at considerably faster rates compared to traditional raster-scanning techniques. Unlike traditional CS setups, our problem involves a block-diagonal sensing matrix and a non-orthogonal sparse basis that spans multiple timesteps. We provide tight approximations to the number of measurements needed for perfect deconvolution for certain classes of spiking processes, and show that this number undergoes a “phase transition,” which we characterize using modern tools relating conic geometry to compressed sensing. 1
5 0.74289018 335 nips-2013-Transfer Learning in a Transductive Setting
Author: Marcus Rohrbach, Sandra Ebert, Bernt Schiele
Abstract: Category models for objects or activities typically rely on supervised learning requiring sufficiently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. Our proposed approach Propagated Semantic Transfer combines three techniques. First, we transfer information from known to novel categories by incorporating external knowledge, such as linguistic or expertspecified information, e.g., by a mid-level layer of semantic attributes. Second, we exploit the manifold structure of novel classes. More specifically we adapt a graph-based learning algorithm – so far only used for semi-supervised learning – to zero-shot and few-shot learning. Third, we improve the local neighborhood in such graph structures by replacing the raw feature-based representation with a mid-level object- or attribute-based representation. We evaluate our approach on three challenging datasets in two different applications, namely on Animals with Attributes and ImageNet for image classification and on MPII Composites for activity recognition. Our approach consistently outperforms state-of-the-art transfer and semi-supervised approaches on all datasets. 1
6 0.7416802 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
7 0.74153548 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
8 0.74144995 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
9 0.74138039 331 nips-2013-Top-Down Regularization of Deep Belief Networks
10 0.74103224 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
11 0.74051476 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
12 0.74014574 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
13 0.73991036 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
14 0.73933536 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
15 0.73912966 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
16 0.73869044 301 nips-2013-Sparse Additive Text Models with Low Rank Background
17 0.73819953 233 nips-2013-Online Robust PCA via Stochastic Optimization
18 0.73819721 149 nips-2013-Latent Structured Active Learning
19 0.73814458 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
20 0.73750907 75 nips-2013-Convex Two-Layer Modeling