nips nips2004 nips2004-2 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alexandre D'aspremont, Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet
Abstract: We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 A direct formulation for sparse PCA using semidefinite programming Alexandre d’Aspremont EECS Dept. [sent-1, score-0.198]
2 edu Abstract We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. [sent-23, score-0.346]
3 The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. [sent-24, score-0.326]
4 We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. [sent-25, score-0.565]
5 1 Introduction Principal component analysis (PCA) is a popular tool for data analysis and dimensionality reduction. [sent-26, score-0.032]
6 In essence, PCA finds linear combinations of the variables (the so-called principal components) that correspond to directions of maximal variance in the data. [sent-28, score-0.499]
7 It can be performed via a singular value decomposition (SVD) of the data matrix A, or via an eigenvalue decomposition if A is a covariance matrix. [sent-29, score-0.283]
8 First, by capturing directions of maximum variance in the data, the principal components offer a way to compress the data with minimum information loss. [sent-31, score-0.596]
9 Second, the principal components are uncorrelated, which can aid with interpretation or subsequent statistical analysis. [sent-32, score-0.523]
10 A particular disadvantage that is our focus here is the fact that the principal components are usually linear combinations of all variables. [sent-34, score-0.485]
11 In these cases, the interpretation of the principal components would be facilitated if these components involve very few non-zero loadings (coordinates). [sent-37, score-1.005]
12 , financial asset trading strategies based on principal component techniques, the sparsity of the loadings has important consequences, since fewer non-zero loadings imply fewer fixed transaction costs. [sent-40, score-1.276]
13 It would thus be of interest to be able to discover “sparse principal components”, i. [sent-41, score-0.341]
14 , sets of sparse vectors spanning a low-dimensional space that explain most of the variance present in the data. [sent-43, score-0.272]
15 To achieve this, it is necessary to sacrifice some of the explained variance and the orthogonality of the principal components, albeit hopefully not too much. [sent-44, score-0.514]
16 Rotation techniques are often used to improve interpretation of the standard principal components [1]. [sent-45, score-0.523]
17 [2] considered simple principal components by restricting the loadings to take values from a small set of allowable integers, such as 0, 1, and −1. [sent-46, score-0.823]
18 [3] propose an ad hoc way to deal with the problem, where the loadings with small absolute value are thresholded to zero. [sent-47, score-0.384]
19 ” Later, a method called SCoTLASS was introduced by [4] to find modified principal components with possible zero loadings. [sent-49, score-0.458]
20 In [5] a new approach, called sparse PCA (SPCA), was proposed to find modified components with zero loadings, based on the fact that PCA can be written as a regression-type optimization problem. [sent-50, score-0.248]
21 In particular, we obtain a convex semidefinite programming (SDP) formulation. [sent-53, score-0.07]
22 It turns out that our problem can be expressed as a special type of saddle-point problem that is well suited to recent specialized algorithms, such as those described in [8, 9]. [sent-57, score-0.065]
23 In Section 2, we show how to efficiently derive a sparse rank-one approximation of a given matrix using a semidefinite relaxation of the sparse PCA problem. [sent-61, score-0.476]
24 In Section 3, we derive an interesting robustness interpretation of our technique, and in Section 4 we describe how to use this interpretation in order to decompose a matrix into sparse factors. [sent-62, score-0.391]
25 Notation Here, Sn is the set of symmetric matrices of size n. [sent-64, score-0.055]
26 We denote by 1 a vector of ones, while Card(x) is the cardinality (number of non-zero elements) of a vector x. [sent-65, score-0.194]
27 , X F = Tr(X 2 ), and by λmax (X) the maximum eigenvalue of X, while |X| is the matrix whose elements are the absolute values of the elements of X. [sent-68, score-0.112]
28 2 Sparse eigenvectors In this section, we derive a semidefinite programming (SDP) relaxation for the problem of approximating a symmetric matrix by a rank one matrix with an upper bound on the cardinality of its eigenvector. [sent-69, score-0.672]
29 We first reformulate this as a variational problem, we then obtain a lower bound on its optimal value via an SDP relaxation (we refer the reader to [10] for an overview of semidefinite programming). [sent-70, score-0.24]
30 Let A ∈ Sn be a given n × n positive semidefinite, symmetric matrix and k be an integer with 1 ≤ k ≤ n. [sent-71, score-0.127]
31 We consider the problem: Φk (A) := min A − xxT F subject to Card(x) ≤ k, (1) in the variable x ∈ Rn . [sent-72, score-0.108]
32 We can solve instead the following equivalent problem: Φ2 (A) = min k subject to A − λxxT 2 F x 2 = 1, λ ≥ 0, Card(x) ≤ k, in the variable x ∈ Rn and λ ∈ R. [sent-73, score-0.138]
33 Minimizing over λ, we obtain: Φ2 (A) = A k where 2 F − νk (A), νk (A) := max xT Ax subject to x 2 = 1 Card(x) ≤ k. [sent-74, score-0.1]
34 (2) To compute a semidefinite relaxation of this program (see [10], for example), we rewrite (2) as: νk (A) := max Tr(AX) subject to Tr(X) = 1 (3) Card(X) ≤ k 2 X 0, Rank(X) = 1, in the symmetric, matrix variable X ∈ Sn . [sent-75, score-0.405]
35 Naturally, problem (3) is still non-convex and very difficult to solve, due to the √ and rank cardinality constraints. [sent-78, score-0.289]
36 If we also drop the rank constraint, we can form a relaxation of (3) and (2) as: ν k (A) := max Tr(AX) subject to Tr(X) = 1 (4) 1T |X|1 ≤ k X 0, which is a semidefinite program (SDP) in the variable X ∈ Sn , where k is an integer parameter controlling the sparsity of the solution. [sent-80, score-0.603]
37 The optimal value of this program will be an upper bound on the optimal value vk (a) of the variational program in (2), hence it gives a lower bound on the optimal value Φk (A) of the original problem (1). [sent-81, score-0.243]
38 Finally, the optimal solution X will not always be of rank one but we can truncate it and keep only its dominant eigenvector x as an approximate solution to the original problem (1). [sent-82, score-0.272]
39 In Section 6 we show that in practice the solution X to (4) tends to have a rank very close to one, and that its dominant eigenvector is indeed sparse. [sent-83, score-0.188]
40 3 A robustness interpretation In this section, we show that problem (4) can be interpreted as a robust formulation of the maximum eigenvalue problem, with additive, component-wise uncertainty in the matrix A. [sent-84, score-0.264]
41 We again assume A to be symmetric and positive semidefinite. [sent-85, score-0.055]
42 In the previous section, we considered in (2) a cardinality-constrained variational formulation of the maximum eigenvalue problem. [sent-86, score-0.143]
43 Here we look at a small variation where we penalize the cardinality and solve: max xT Ax − ρ Card2 (x) subject to x 2 = 1, in the variable x ∈ Rn , where the parameter ρ > 0 controls the size of the penalty. [sent-87, score-0.343]
44 As in the last section, we can form the equivalent program: max Tr(AX) − ρ Card(X) subject to Tr(X) = 1 X 0, Rank(X) = 1, in the variable X ∈ Sn . [sent-90, score-0.128]
45 Again, we get a relaxation of this program by forming: max Tr(AX) − ρ1T |X|1 subject to Tr(X) = 1 X 0, (5) which is a semidefinite program in the variable X ∈ Sn , where ρ > 0 controls the penalty size. [sent-91, score-0.41]
46 We can rewrite this last problem as: max min Tr(X(A + U )) X 0,Tr(X)=1 |Uij |≤ρ (6) and we get a dual to (5) as: min λmax (A + U ) subject to |Uij | ≤ ρ, i, j = 1, . [sent-92, score-0.186]
47 , n, (7) which is a maximum eigenvalue problem with variable U ∈ Rn×n . [sent-95, score-0.111]
48 This gives a natural robustness interpretation to the relaxation in (5): it corresponds to a worst-case maximum eigenvalue computation, with component-wise bounded noise of intensity ρ on the matrix coefficients. [sent-96, score-0.357]
49 4 Sparse decomposition Here, we use the results obtained in the previous two sections to describe a sparse equivalent to the PCA decomposition technique. [sent-97, score-0.269]
50 Suppose that we start with a matrix A1 ∈ Sn , our objective is to decompose it in factors with target sparsity k. [sent-98, score-0.263]
51 We solve the relaxed problem in (4): max Tr(A1 X) subject to Tr(X) = 1 1T |X|1 ≤ k X 0, to get a solution X1 , and truncate it to keep only the dominant (sparse) eigenvector x1 . [sent-99, score-0.301]
52 In the PCA case, the decomposition stops naturally after Rank(A) factors have been found, since ARank(A)+1 is then equal to zero. [sent-102, score-0.108]
53 In the case of the sparse decomposition, we have no guarantee that this will happen. [sent-103, score-0.131]
54 However, the robustness interpretation gives us a natural stopping criterion: if all the coefficients in |Ai | are smaller than the noise level ρ (computed in the last section) then we must stop since the matrix is essentially indistinguishable from zero. [sent-104, score-0.177]
55 So, even though we have no guarantee that the algorithm will terminate with a zero matrix, the decomposition will in practice terminate as soon as the coefficients in A become undistinguishable from the noise. [sent-105, score-0.121]
56 The results show that our approach can achieve more sparsity in the principal components than SPCA does, while explaining as much variance. [sent-116, score-0.653]
57 We begin by a simple example illustrating the link between k and the cardinality of the solution. [sent-117, score-0.194]
58 1 Controlling sparsity with k Here, we illustrate on a simple example how the sparsity of the solution to our relaxation evolves as k varies from 1 to n. [sent-119, score-0.474]
59 We generate a 10 × 10 matrix U with uniformly distributed coefficients in [0, 1]. [sent-120, score-0.05]
60 We let v be a sparse vector with: v = (1, 0, 1, 0, 1, 0, 1, 0, 1, 0). [sent-121, score-0.131]
61 We then form a test matrix A = U T U + σvv T , where σ is a signal-to-noise ratio equal to 15 in our case. [sent-122, score-0.05]
62 We then extract the first eigenvector of the solution X and record its cardinality. [sent-125, score-0.074]
63 In Figure 1, we show the mean cardinality (and standard deviation) as a function of k. [sent-126, score-0.194]
64 We observe that k + 1 is actually a good predictor of the cardinality, especially when k + 1 is close to the actual cardinality (5 in this case). [sent-127, score-0.194]
65 12 cardinality 10 8 6 4 2 0 0 2 4 6 8 10 12 k Figure 1: Cardinality versus k. [sent-128, score-0.194]
66 In this example, three hidden factors are created: V1 ∼ N (0, 290), V2 ∼ N (0, 300), V3 = −0. [sent-131, score-0.039]
67 Afterwards, 10 observed variables are generated as follows: Xi = V j + j i, j i ∼ N (0, 1), with j = 1 for i = 1, 2, 3, 4, j = 2 for i = 5, 6, 7, 8 and j = 3 for i = 9, 10 and { j } i independent for j = 1, 2, 3, i = 1, . [sent-134, score-0.036]
68 Instead of sampling data from this model and computing an empirical covariance matrix of (X1 , . [sent-138, score-0.083]
69 , X10 ), we use the exact covariance matrix to compute principal components using the different approaches. [sent-141, score-0.541]
70 Since the three underlying factors have about the same variance, and the first two are associated with 4 variables while the last one is only associated with 2 variables, V1 and V2 are almost equally important, and they are both significantly more important than V3 . [sent-142, score-0.075]
71 This, together with the fact that the first 2 principal components explain more than 99% of the total variance, suggests that considering two sparse linear combinations of the original variables should be sufficient to explain most of the variance in data sampled from this model. [sent-143, score-0.839]
72 The ideal solution would thus be to only use the variables (X1 , X2 , X3 , X4 ) for the first sparse principal component, to recover the factor V1 , and only (X5 , X6 , X7 , X8 ) for the second sparse principal component to recover V2 . [sent-145, score-1.039]
73 Using the true covariance matrix and the oracle knowledge that the ideal sparsity is 4, [5] performed SPCA (with λ = 0). [sent-146, score-0.235]
74 The results are reported in Table 1, together with results for PCA, simple thresholding and SCoTLASS (t = 2). [sent-148, score-0.126]
75 Notice that SPCA, DSPCA and SCoTLASS all find the correct sparse principal components, while simple thresholding yields inferior performance. [sent-149, score-0.598]
76 The latter wrongly includes the variables X9 and X10 to explain most variance (probably it gets misled by the high correlation between V2 and V3 ), even more, it assigns higher loadings to X9 and X10 than to one of the variables (X5 , X6 , X7 , X8 ) that are clearly more important. [sent-150, score-0.62]
77 Simple thresholding correctly identifies the second sparse principal component, probably because V1 has a lower correlation with V3 . [sent-151, score-0.619]
78 Simple thresholding also explains a bit less variance than the other methods. [sent-152, score-0.221]
79 ’ST’ is the simple thresholding method, ’other’ is all the other methods: SPCA, DSPCA and SCoTLASS. [sent-155, score-0.126]
80 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 explained variance PCA, PC1 . [sent-156, score-0.173]
81 [4] applied SCoTLASS to this problem and [5] used their SPCA approach, both with the goal of obtaining sparse principal components that can better be interpreted than those of PCA. [sent-199, score-0.61]
82 SPCA performs better than SCoTLASS: it identifies principal components with respectively 7, 4, 4, 1, 1, and 1 non-zero loadings, as shown in Table 2. [sent-200, score-0.458]
83 As shown in [5], this is much sparser than the modified principal components by SCoTCLASS, while explaining nearly the same variance (75. [sent-201, score-0.628]
84 Also, simple thresholding of PCA, with a number of non-zero loadings that matches the result of SPCA, does worse than SPCA in terms of explained variance. [sent-204, score-0.569]
85 Following this previous work, we also consider the first 6 principal components. [sent-205, score-0.341]
86 We try to identify principal components that are sparser than the best result of this previous work, i. [sent-206, score-0.49]
87 Figure 2 shows the cumulative number of non-zero loadings and the cumulative explained variance (measuring the variance in the subspace spanned by the first i eigenvectors). [sent-210, score-0.801]
88 The results for DSPCA are plotted with a red line and those for SPCA with a blue line. [sent-211, score-0.05]
89 The cumulative explained variance for normal PCA is depicted with a black line. [sent-212, score-0.257]
90 It can be seen that our approach is able to explain nearly the same variance as the SPCA method, while clearly reducing the number of non-zero loadings for the first 6 principal components. [sent-213, score-0.847]
91 Adjusting the first k from 5 to 6 (relaxing the sparsity), we obtain the results plotted with a red dash-dot line: still better in sparsity, but with a cumulative explained variance that is fully competitive with SPCA. [sent-214, score-0.284]
92 Moreover, as in the SPCA approach, the important variables associated with the 6 principal components do not overlap, which leads to a clearer interpretation. [sent-215, score-0.494]
93 Table 2 shows the first three corresponding principal components for the different approaches (DSPCAw5 for k1 = 5 and DSPCAw6 for k1 = 6). [sent-216, score-0.458]
94 Table 2: Loadings for first three principal components, for the real-life example. [sent-217, score-0.341]
95 5 Number of principal components Figure 2: Cumulative cardinality and cumulative explained variance for SPCA and DSPCA as a function of the number of principal components: black line for normal PCA, blue for SPCA and red for DSPCA (full for k1 = 5 and dash-dot for k1 = 6). [sent-266, score-1.3]
96 variance as previously proposed methods in the examples detailed above. [sent-267, score-0.095]
97 Loadings and correlations in the interpretation of principal components. [sent-285, score-0.406]
98 A modified principal component technique based on the lasso. [sent-291, score-0.373]
99 Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle-point problems. [sent-314, score-0.071]
100 Two case studies in the application of principal components. [sent-323, score-0.341]
wordName wordTfidf (topN-words)
[('spca', 0.462), ('loadings', 0.365), ('principal', 0.341), ('dspca', 0.219), ('cardinality', 0.194), ('pca', 0.183), ('tr', 0.168), ('semide', 0.164), ('card', 0.162), ('sparsity', 0.152), ('scotlass', 0.146), ('relaxation', 0.143), ('sdp', 0.136), ('sparse', 0.131), ('thresholding', 0.126), ('components', 0.117), ('xxt', 0.106), ('sn', 0.101), ('variance', 0.095), ('ax', 0.086), ('eecs', 0.085), ('cumulative', 0.084), ('berkeley', 0.078), ('explained', 0.078), ('rank', 0.074), ('decomposition', 0.069), ('interpretation', 0.065), ('eigenvalue', 0.062), ('subject', 0.06), ('program', 0.059), ('symmetric', 0.055), ('variational', 0.052), ('matrix', 0.05), ('props', 0.049), ('eigenvector', 0.047), ('explain', 0.046), ('nite', 0.046), ('explaining', 0.043), ('pit', 0.042), ('sedumi', 0.042), ('xt', 0.041), ('dominant', 0.04), ('max', 0.04), ('factors', 0.039), ('programming', 0.038), ('robustness', 0.037), ('rn', 0.037), ('variables', 0.036), ('gert', 0.036), ('truncate', 0.036), ('uij', 0.036), ('relaxing', 0.034), ('covariance', 0.033), ('convex', 0.032), ('moderate', 0.032), ('sparser', 0.032), ('component', 0.032), ('solve', 0.03), ('formulation', 0.029), ('st', 0.029), ('statistics', 0.028), ('variable', 0.028), ('solution', 0.027), ('combinations', 0.027), ('red', 0.027), ('bound', 0.026), ('modi', 0.026), ('terminate', 0.026), ('stop', 0.025), ('controlling', 0.025), ('rewrite', 0.025), ('offer', 0.024), ('specialized', 0.023), ('coef', 0.023), ('blue', 0.023), ('decompose', 0.022), ('integer', 0.022), ('biology', 0.022), ('disadvantages', 0.021), ('knots', 0.021), ('misled', 0.021), ('sac', 0.021), ('trading', 0.021), ('wrongly', 0.021), ('zou', 0.021), ('controls', 0.021), ('probably', 0.021), ('problem', 0.021), ('derive', 0.021), ('table', 0.021), ('rotation', 0.02), ('min', 0.02), ('cients', 0.019), ('compress', 0.019), ('interpreting', 0.019), ('lasso', 0.019), ('monotone', 0.019), ('nancial', 0.019), ('reformulate', 0.019), ('thresholded', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
Author: Alexandre D'aspremont, Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet
Abstract: We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. 1
2 0.11893507 163 nips-2004-Semi-parametric Exponential Family PCA
Author: Sajama Sajama, Alon Orlitsky
Abstract: We present a semi-parametric latent variable model based technique for density modelling, dimensionality reduction and visualization. Unlike previous methods, we estimate the latent distribution non-parametrically which enables us to model data generated by an underlying low dimensional, multimodal distribution. In addition, we allow the components of latent variable models to be drawn from the exponential family which makes the method suitable for special data types, for example binary or count data. Simulations on real valued, binary and count data show favorable comparison to other related schemes both in terms of separating different populations and generalization to unseen samples. 1
3 0.10853978 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth
Abstract: We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.
4 0.1009865 113 nips-2004-Maximum-Margin Matrix Factorization
Author: Nathan Srebro, Jason Rennie, Tommi S. Jaakkola
Abstract: We present a novel approach to collaborative prediction, using low-norm instead of low-rank factorizations. The approach is inspired by, and has strong connections to, large-margin linear discrimination. We show how to learn low-norm factorizations by solving a semi-definite program, and discuss generalization error bounds for them. 1
5 0.082599536 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units
Author: Eizaburo Doi, Michael S. Lewicki
Abstract: It has been suggested that the primary goal of the sensory system is to represent input in such a way as to reduce the high degree of redundancy. Given a noisy neural representation, however, solely reducing redundancy is not desirable, since redundancy is the only clue to reduce the effects of noise. Here we propose a model that best balances redundancy reduction and redundant representation. Like previous models, our model accounts for the localized and oriented structure of simple cells, but it also predicts a different organization for the population. With noisy, limited-capacity units, the optimal representation becomes an overcomplete, multi-scale representation, which, compared to previous models, is in closer agreement with physiological data. These results offer a new perspective on the expansion of the number of neurons from retina to V1 and provide a theoretical model of incorporating useful redundancy into efficient neural representations. 1
6 0.081427053 115 nips-2004-Maximum Margin Clustering
7 0.075052857 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling
8 0.074698329 171 nips-2004-Solitaire: Man Versus Machine
9 0.066412732 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
10 0.062982716 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
11 0.061480004 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
12 0.059373666 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data
13 0.056360006 131 nips-2004-Non-Local Manifold Tangent Learning
14 0.054985717 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
15 0.052471053 62 nips-2004-Euclidean Embedding of Co-Occurrence Data
16 0.051414475 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
17 0.051330861 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
18 0.051004421 68 nips-2004-Face Detection --- Efficient and Rank Deficient
19 0.050629187 168 nips-2004-Semigroup Kernels on Finite Sets
20 0.050555427 103 nips-2004-Limits of Spectral Clustering
topicId topicWeight
[(0, -0.164), (1, 0.05), (2, 0.01), (3, 0.007), (4, -0.052), (5, -0.081), (6, -0.059), (7, -0.016), (8, 0.07), (9, 0.038), (10, 0.015), (11, -0.097), (12, -0.001), (13, 0.005), (14, 0.041), (15, 0.031), (16, -0.037), (17, 0.085), (18, -0.063), (19, -0.007), (20, 0.054), (21, 0.009), (22, 0.059), (23, 0.056), (24, -0.086), (25, 0.064), (26, 0.023), (27, -0.042), (28, 0.042), (29, 0.02), (30, 0.015), (31, 0.111), (32, 0.072), (33, 0.15), (34, -0.009), (35, 0.053), (36, -0.08), (37, 0.083), (38, 0.178), (39, 0.193), (40, -0.053), (41, -0.112), (42, -0.244), (43, 0.059), (44, 0.088), (45, -0.031), (46, 0.065), (47, 0.052), (48, -0.06), (49, -0.157)]
simIndex simValue paperId paperTitle
same-paper 1 0.96454233 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
Author: Alexandre D'aspremont, Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet
Abstract: We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. 1
2 0.62291807 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units
Author: Eizaburo Doi, Michael S. Lewicki
Abstract: It has been suggested that the primary goal of the sensory system is to represent input in such a way as to reduce the high degree of redundancy. Given a noisy neural representation, however, solely reducing redundancy is not desirable, since redundancy is the only clue to reduce the effects of noise. Here we propose a model that best balances redundancy reduction and redundant representation. Like previous models, our model accounts for the localized and oriented structure of simple cells, but it also predicts a different organization for the population. With noisy, limited-capacity units, the optimal representation becomes an overcomplete, multi-scale representation, which, compared to previous models, is in closer agreement with physiological data. These results offer a new perspective on the expansion of the number of neurons from retina to V1 and provide a theoretical model of incorporating useful redundancy into efficient neural representations. 1
3 0.5968951 171 nips-2004-Solitaire: Man Versus Machine
Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy
Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1
4 0.49809897 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
Author: David P. Wipf, Bhaskar D. Rao
Abstract: Finding the sparsest, or minimum ℓ0 -norm, representation of a signal given an overcomplete dictionary of basis vectors is an important problem in many application domains. Unfortunately, the required optimization problem is often intractable because there is a combinatorial increase in the number of local minima as the number of candidate basis vectors increases. This deficiency has prompted most researchers to instead minimize surrogate measures, such as the ℓ1 -norm, that lead to more tractable computational methods. The downside of this procedure is that we have now introduced a mismatch between our ultimate goal and our objective function. In this paper, we demonstrate a sparse Bayesian learning-based method of minimizing the ℓ0 -norm while reducing the number of troublesome local minima. Moreover, we derive necessary conditions for local minima to occur via this approach and empirically demonstrate that there are typically many fewer for general problems of interest. 1
5 0.43580216 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem
Author: Suvrit Sra, Joel Tropp, Inderjit S. Dhillon
Abstract: Various problems in machine learning, databases, and statistics involve pairwise distances among a set of objects. It is often desirable for these distances to satisfy the properties of a metric, especially the triangle inequality. Applications where metric data is useful include clustering, classification, metric-based indexing, and approximation algorithms for various graph problems. This paper presents the Metric Nearness Problem: Given a dissimilarity matrix, find the “nearest” matrix of distances that satisfy the triangle inequalities. For p nearness measures, this paper develops efficient triangle fixing algorithms that compute globally optimal solutions by exploiting the inherent structure of the problem. Empirically, the algorithms have time and storage costs that are linear in the number of triangle constraints. The methods can also be easily parallelized for additional speed. 1
6 0.42617574 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
7 0.39958453 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling
8 0.39465943 158 nips-2004-Sampling Methods for Unsupervised Learning
9 0.39361796 113 nips-2004-Maximum-Margin Matrix Factorization
10 0.38917258 163 nips-2004-Semi-parametric Exponential Family PCA
11 0.33477709 127 nips-2004-Neighbourhood Components Analysis
12 0.32460052 57 nips-2004-Economic Properties of Social Networks
13 0.32174003 96 nips-2004-Learning, Regularization and Ill-Posed Inverse Problems
14 0.31691632 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis
15 0.31067708 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
16 0.30379051 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
17 0.30095461 115 nips-2004-Maximum Margin Clustering
18 0.29177353 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
19 0.28179523 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees
20 0.28090131 132 nips-2004-Nonlinear Blind Source Separation by Integrating Independent Component Analysis and Slow Feature Analysis
topicId topicWeight
[(11, 0.252), (13, 0.088), (15, 0.137), (26, 0.047), (31, 0.029), (33, 0.217), (35, 0.02), (39, 0.077), (50, 0.028), (76, 0.011)]
simIndex simValue paperId paperTitle
1 0.89729363 195 nips-2004-Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM
Author: Juan Coz, Gustavo F. Bayón, Jorge Díez, Oscar Luaces, Antonio Bahamonde, Carlos Sañudo
Abstract: In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers’ ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we had to use a special purpose SVM polynomial kernel. The training data set used collects the ratings of panels of experts and consumers; the meat was provided by 103 bovines of 7 Spanish breeds with different carcass weights and aging periods. Additionally, to gain insight into consumer preferences, we used feature subset selection tools. The result is that aging is the most important trait for improving consumers’ appreciation of beef meat. 1
same-paper 2 0.83187598 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
Author: Alexandre D'aspremont, Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet
Abstract: We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. 1
3 0.74547845 163 nips-2004-Semi-parametric Exponential Family PCA
Author: Sajama Sajama, Alon Orlitsky
Abstract: We present a semi-parametric latent variable model based technique for density modelling, dimensionality reduction and visualization. Unlike previous methods, we estimate the latent distribution non-parametrically which enables us to model data generated by an underlying low dimensional, multimodal distribution. In addition, we allow the components of latent variable models to be drawn from the exponential family which makes the method suitable for special data types, for example binary or count data. Simulations on real valued, binary and count data show favorable comparison to other related schemes both in terms of separating different populations and generalization to unseen samples. 1
4 0.74109548 42 nips-2004-Computing regularization paths for learning multiple kernels
Author: Francis R. Bach, Romain Thibaux, Michael I. Jordan
Abstract: The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity of solving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, we show empirically that the effect of the block 1-norm regularization differs notably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case. 1
5 0.73742282 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
Author: Francis R. Bach, Michael I. Jordan
Abstract: We present an algorithm to perform blind, one-microphone speech separation. Our algorithm separates mixtures of speech without modeling individual speakers. Instead, we formulate the problem of speech separation as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these features into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artificially superposing separately-recorded signals. Thus the parameters of the affinity matrices can be tuned using recent work on learning spectral clustering [1]. This yields an adaptive, speech-specific segmentation algorithm that can successfully separate one-microphone speech mixtures. 1
6 0.73718667 70 nips-2004-Following Curved Regularized Optimization Solution Paths
7 0.7369591 124 nips-2004-Multiple Alignment of Continuous Time Series
8 0.73589063 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
9 0.73477656 44 nips-2004-Conditional Random Fields for Object Recognition
10 0.73409933 125 nips-2004-Multiple Relational Embedding
11 0.73356354 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
12 0.73293054 178 nips-2004-Support Vector Classification with Input Data Uncertainty
13 0.73244578 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
14 0.73234546 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
15 0.73099607 131 nips-2004-Non-Local Manifold Tangent Learning
16 0.73089057 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
17 0.73036081 161 nips-2004-Self-Tuning Spectral Clustering
18 0.73022586 102 nips-2004-Learning first-order Markov models for control
19 0.72977263 127 nips-2004-Neighbourhood Components Analysis
20 0.72906131 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models