nips nips2009 nips2009-223 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yiming Ying, Kaizhu Huang, Colin Campbell
Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. [sent-2, score-0.408]
2 The sparse representation involves a mixed-norm regularization which is non-convex. [sent-3, score-0.191]
3 We then show that it can be equivalently formulated as a convex saddle (min-max) problem. [sent-4, score-0.233]
4 From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. [sent-5, score-0.686]
5 Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets. [sent-6, score-0.338]
6 1 Introduction For many machine learning algorithms, the choice of a distance metric has a direct impact on their success. [sent-7, score-0.36]
7 Hence, choosing a good distance metric remains a challenging problem. [sent-8, score-0.36]
8 There has been much work attempting to exploit a distance metric in many learning settings, e. [sent-9, score-0.36]
9 These methods have successfully indicated that a good distance metric can significantly improve the performance of k-nearest neighbor classification and k-means clustering, for example. [sent-12, score-0.399]
10 A good choice of a distance metric generally preserves the distance structure of the data: the distance between examples exhibiting similarity should be relatively smaller, in the transformed space, than between examples exhibiting dissimilarity. [sent-13, score-0.686]
11 Since it is very common that the presented data is contaminated by noise, especially for high-dimensional datasets, a good distance metric should also be minimally influenced by noise. [sent-16, score-0.36]
12 In this case, a low-rank distance matrix would produce a better generalization performance than non-sparse counterparts and provide a much faster and efficient distance calculation for test samples. [sent-17, score-0.27]
13 Hence, a good distance metric should also pursue dimension reduction during the learning process. [sent-18, score-0.408]
14 In this paper we present a novel approach to learn a low-rank (sparse) distance matrix. [sent-19, score-0.121]
15 We first propose in Section 2 a novel metric learning model for estimating the linear transformation (equivalently distance matrix) that combines and retains the advantages of existing methods [8, 9, 12, 20, 22, 23, 25]. [sent-20, score-0.393]
16 Our method can simultaneously conduct dimension reduction and learn a low-rank distance matrix. [sent-21, score-0.169]
17 The sparse representation is realized by a mixed-norm regularization used in various learning settings [1, 18, 21]. [sent-22, score-0.211]
18 We then show that this non-convex mixed-norm regularization framework is equivalent to a convex saddle (min-max) problem. [sent-23, score-0.281]
19 Based on this equivalent representation, we develop, in Section 3, Nesterov’s smooth optimization approach [16, 17] for sparse metric learning using smoothing approximation techniques, although the learning model is based on a non-differentiable loss function. [sent-24, score-0.593]
20 In Section 4, we demonstrate the effectiveness and efficiency of our sparse metric learning model with experiments on various datasets. [sent-25, score-0.338]
21 , xd ) ∈ Rd , i i class label yi (not necessary binary) and let xij = xi − xj . [sent-39, score-0.433]
22 Denote by xi = P xi for any i ∈ Nn and ˆ ˆ by x = {xi : i ∈ Nn } the transformed data matrix. [sent-41, score-0.158]
23 The linear transformation matrix P induces a ˆ distance matrix M = P P which defines a distance between xi and xj given by dM (xi , xj ) = (xi − xj ) M (xi − xj ). [sent-42, score-0.827]
24 Our sparse metric learning model is based on two principal hypotheses: 1) a good choice of distance matrix M should preserve the distance structure, i. [sent-43, score-0.628]
25 the distance between similar examples should be relatively smaller than between dissimilar examples; 2) a good distance matrix should also be able to effectively remove noise leading to dimension reduction. [sent-45, score-0.336]
26 Equivalently, xj − xk ) 2 ≥ xi − xj 2 + 1, ∀(xi , xj ) ∈ S and (xj , xk ) ∈ D. [sent-47, score-0.504]
27 ˆ ˆ ˆ ˆ (1) For the second hypothesis, we use a sparse regularization to give a sparse solution. [sent-48, score-0.255]
28 This regularization ranges from element-sparsity for variable selection to a low-rank matrix for dimension reduction [1, 2, 3, 13, 21]. [sent-49, score-0.133]
29 xi = P xi = 0 which means that P = 0 has the effect of eleminating -th variable. [sent-53, score-0.128]
30 Hence, we introduce an extra orthonormal transformation U ∈ O d and let xi = P U xi . [sent-74, score-0.181]
31 Denote a set of triplets T by ˆ T = {τ = (i, j, k) : i, j, k ∈ Nn , (xi , xj ) ∈ S and (xj , xk ) ∈ D}. [sent-75, score-0.193]
32 (2) By introducing slack variables ξ in constraints (1), we propose the following sparse (low-rank) distance matrix learning formulation: 2 min min τ ξτ + γ||W ||(2,1) d U ∈O d W ∈S+ s. [sent-76, score-0.314]
33 1 + xij U W U xij ≤ xkj U W U xkj + ξτ , d ξτ ≥ 0, ∀τ = (i, j, k) ∈ T , and W ∈ S+ . [sent-78, score-1.08]
34 A similar mixed (2, 1)-norm regularization was used in [1, 18] for multi-task learning and multi-class classification to learn the sparse representation shared across different tasks or classes. [sent-80, score-0.191]
35 1 Equivalent Saddle Representation We now turn our attention to an equivalent saddle (min-max) representation for sparse metric learning (3) which is essential for developing optimization algorithms in the next section. [sent-82, score-0.606]
36 To this end, we need the following lemma which develops and extends a similar version in multi-task learning [1, 2] to the case of learning a positive semi-definite distance matrix. [sent-83, score-0.149]
37 Problem (3) is equivalent to the following convex optimization problem (1 + xij M xij − xkj M xkj )+ + γ(Tr(M ))2 min M 0 (4) τ =(i,j,k)∈T Proof. [sent-85, score-1.257]
38 xij M xij ≤ xkj M xkj + ξτ , d ξτ ≥ 0 ∀τ = (i, j, k) ∈ T , and M ∈ S+ . [sent-89, score-1.08]
39 2 k λk Vki 2 2 k λk Vki 2 k λk Vki From the above lemma, we are ready to present an equivalent saddle (min-max) representation of d problem (3). [sent-104, score-0.204]
40 Problem (4) is equivalent to the following saddle representation uτ (xjk xjk − xij xij ), M − γ(Tr(M ))2 − min max u∈Q1 M ∈Q2 uτ (7) t∈T τ =(i,j,k)∈T Proof. [sent-109, score-0.958]
41 By its definition, there holds γ(Tr(M ∗ ))2 ≤ τ ∈T (1 + xkj M xik − xkj M xkj )+ + γ(Tr(M ))2 for any M 0. [sent-111, score-0.867]
42 Hence, problem (4) is identical to (1 + xij M xij − xkj M xkj )+ + γ(Tr(M ))2 . [sent-113, score-1.08]
43 Consequently, the above equation can be written as minM ∈Q2 max0≤u≤1 τ ∈T uτ (1 + xkj M xik − xij M xij ) + γ(Tr(M ))2 . [sent-115, score-0.831]
44 Combining τ ∈T uτ (−xij M xij + xjk M xjk ) − γ(Tr(M )) this with the fact that xjk M xjk − xij M xij = xjk xjk − xij xij , M completes the proof of the theorem. [sent-119, score-2.349]
45 2 Related Work There is a considerable amount of work on metric learning. [sent-121, score-0.239]
46 In [9], an information-theoretic approach to metric learning (ITML) is developed which equivalently transforms the metric learning problem 3 to that of learning an optimal Gaussian distribution with respect to an relative entropy. [sent-122, score-0.512]
47 The method of Relevant Component analysis (RCA)[7] attempts to find a distance metric which can minimize the covariance matrix imposed by the equivalence constraints. [sent-123, score-0.388]
48 In [25], a distance metric for k-means clustering is then learned to shrink the averaged distance within the similar set while enlarging the average distance within the dissimilar set simultaneously. [sent-124, score-0.647]
49 All the above methods generally do not yield sparse solutions and only work within their special settings. [sent-125, score-0.12]
50 There are many other metric learning approaches in either unsupervised or supervised learning setting, see [26] for a detailed review. [sent-127, score-0.239]
51 We particularly mention the following work which is more related to our sparse metric learning model (3). [sent-128, score-0.338]
52 • Large Margin Nearest Neighbor (LMNN) [23, 24]: LMNN aims to explore a large margin nearest neighbor classifier by exploiting nearest neighbor samples as side information in the training set. [sent-129, score-0.178]
53 Specifically, let Nk (x) denotes the k-nearest neighbor of sample x and define the similar set S = {(xi , xj ) : xi ∈ N (xj ), yi = yj } and D = {(xj , xk ) : xk ∈ N (xj ), yk = yj }. [sent-130, score-0.35]
54 Then, recall that the triplet set T is given by equation (2), the framework LMNN can be rewritten as the following: min M 0 (1 + xij M xij − xkj M xkj )+ + γTr(CM ) (9) τ =(i,j,k)∈T where the covariance matrix C over the similar set S is defined by C = (xi ,xj )∈S (xi − xj )(xi − xj ) . [sent-131, score-1.393]
55 From the above reformulation, we see that LMNN also involves a sparse regularization term Tr(CM ). [sent-132, score-0.156]
56 However, LMCA controls the sparsity by directly specifying the dimensionality of the transformation matrix and it is an extended version of LMNN. [sent-135, score-0.091]
57 The learned sparse model would not generate an appropriate low-ranked principal matrix M for metric learning. [sent-138, score-0.386]
58 Such a restriction would only result in a sub-optimal solution, although the final optimization is an efficient linear programming problem. [sent-140, score-0.097]
59 3 Smooth Optimization Algorithms Nesterov [17, 16] developed an efficient smooth optimization method for solving convex programming problems of the form minx∈Q f (x) where Q is a bounded closed convex set in a finitedimensional real vector space E. [sent-141, score-0.347]
60 This smooth optimization usually requires f to be differentiable with Lipschitz continuous gradient and it has an optimal convergence rate of O(1/t2 ) for smooth problems where t is the iteration number. [sent-142, score-0.396]
61 Unfortunately, we can not directly apply the smooth optimization method to problem (4) since the hinge loss there is not continuously differentiable. [sent-143, score-0.224]
62 Below we show the smooth approximation method [17] can be approached through the saddle representation (7). [sent-144, score-0.319]
63 Let E2 be the dual space of E2 with standard norm defined, for any ∗ s ∈ E2 , by s ∗ = max{ s, x 2 : x 2 = 1}, where the scalar product ·, · 2 denotes the value 2 ∗ ∗ of s at x. [sent-152, score-0.09]
64 (11) Here, φ(u) is assumed to be continuously differentiable and convex with Lipschitz continuous graˆ dient and f (x) is convex and differentiable. [sent-164, score-0.139]
65 The above min-max problem is usually not smooth and Nesterov [17] proposed a smoothing approximation approach to solve the above problem: min u∈Q1 φµ (u) = φ(u) + max{ Au, x 2 ˆ − f (x) − µd2 (x) : x ∈ Q2 } . [sent-165, score-0.199]
66 (12) Here, d2 (·) is a continuous proxy-function, strongly convex on Q2 with some convexity parameter σ2 > 0 and µ > 0 is a small smoohting parameter. [sent-166, score-0.102]
67 1 2 2 Hence, the proxy-function d2 can be regarded as a generalized Moreau-Yosida regularization term to smooth out the objective function. [sent-175, score-0.197]
68 Hence, we can apply the optimal smooth optimization scheme [17, Section 3] to the smooth approximate problem (12). [sent-177, score-0.369]
69 Below, we will apply this general scheme to solve the min-max representation (7) of the sparse metric learning problem (3), and hence solves the original problem (4). [sent-183, score-0.425]
70 2 Smooth Optimization Approach for Sparse Metric Learning We now turn our attention to developing a smooth optimization approach for problem (4). [sent-185, score-0.204]
71 Our main idea is to connect the saddle representation (7) in Theorem 1 with the special formulation (11). [sent-186, score-0.2]
72 To this end, firstly let E1 = RT with standard Euclidean norm · 1 = · and E2 = S d with 2 Frobenius norm defined, for any S ∈ S d , by S 2 = i,j∈Nd Sij . [sent-187, score-0.094]
73 Consequently, the proxy-function d2 (·) is strongly convex on Q2 with convexity parameter σ2 = 2. [sent-190, score-0.102]
74 Finally, for any τ = (i, j, k) ∈ T , let 5 Xτ = xjk xjk − xij xij . [sent-191, score-0.87]
75 (14) τ ∈T With the above preparations, the saddle representation (7) exactly matches the special structure (11) which can be approximated by problem (12) with µ sufficiently small. [sent-194, score-0.2]
76 Let the linear operator A be defined as above, then A d for any M ∈ S , M 2 1,2 ≤ 2 2 Xτ τ ∈T 1 2 where, denotes the Frobenius norm of M . [sent-197, score-0.101]
77 We now can adapt the smooth optimization [17, Section 3 and Theorem 3] to solve the smooth approximation formulation (12) for metric learning. [sent-200, score-0.583]
78 The smooth optimization pseudo-code for problem (7) (equivalently problem (4)) is outlined in Table 1. [sent-204, score-0.204]
79 The efficiency of Nesterov’s smooth optimization largely depends on Steps 2, 3, and 4 in Table 1. [sent-206, score-0.204]
80 Problem (15) is equivalent to the following s∗ = arg max λi si − γ( i∈Nd i∈Nd si )2 − µ s2 : i i∈Nd si ≤ T /γ, and si ≥ 0 ∀i ∈ Nd (16) i∈Nd where λ = (λ1 , . [sent-211, score-0.174]
81 Moreover, if we denotes the eigendecomposition t∈T ut Xt by t∈T ut Xt = U diag(λ)U with some U ∈ O d then the optimal solution of problem (15) is given by Mµ (u) = U diag(s∗ )U . [sent-215, score-0.129]
82 As listed in Table 1, the complexity for each iteration mainly depends on the eigen-decomposition on t∈Nt ut Xt and the quadratic programming to solve problem (15) which has complexity O(d3 ). [sent-222, score-0.086]
83 Hence, the overall iteration complexity of the smooth optimization approach for sparse metric learning is of the order O(d3 /ε) for finding an 1 ε-optimal solution. [sent-223, score-0.542]
84 We also implemented the iterative sub-gradient descent algorithm [24] to solve the proposed framework (4) (called SMLgd) in order to evaluate the efficiency of the proposed smooth optimization algorithm SMLsm. [sent-228, score-0.204]
85 We try to exploit all these methods to learn a good distance metric and a KNN classifier is used to examine the performance of these different learned metrics. [sent-229, score-0.36]
86 All the approaches except the Euclidean distance need to define a triplet set T before training. [sent-235, score-0.157]
87 It is observed that our model outputs the most sparse metric. [sent-245, score-0.099]
88 That is, our method directly learns both an accurate and sparse distance metric simultaneously. [sent-247, score-0.459]
89 ITML and Euc do not generate a sparse metric at all. [sent-249, score-0.338]
90 Finally, in order to examine the efficiency of the proposed smooth optimization algorithm, we plot the convergence graphs of SMLsm versus those of SMLgd in Figure 1(i)-(l). [sent-250, score-0.227]
91 5 Conclusion In this paper we proposed a novel regularization framework for learning a sparse (low-rank) distance matrix. [sent-254, score-0.277]
92 This model was realized by a mixed-norm regularization term over a distance matrix which is non-convex. [sent-255, score-0.226]
93 Using its special structure, it was shown to be equivalent to a convex min-max (saddle) representation involving a trace norm regularization. [sent-256, score-0.205]
94 Depart from the saddle representation, we successfully developed an efficient Nesterov’s first-order optimization approach [16, 17] for our metric learning model. [sent-257, score-0.447]
95 Experimental results on various datasets show that our sparse metric learning framework outperforms other state-of-the-art methods with higher accuracy and significantly smaller dimensionality. [sent-258, score-0.338]
96 Subfigures (a)-(d) present the average error rates; (e)-(h) plots the average dimensionality used in different methods; (i)-(l) give the convergence graph for the sub-gradient algorithm and the proposed smooth optimization algorithm. [sent-334, score-0.227]
97 Learning a similarity metric discriminatively with application to face verification. [sent-386, score-0.239]
98 Distance metric learning for large margin nearest neighbour classification. [sent-479, score-0.308]
99 Fast solvers and efficient implementations for distance metric learning. [sent-486, score-0.36]
100 Distance metric learning with application to clustering with side information. [sent-493, score-0.239]
wordName wordTfidf (topN-words)
[('smlsm', 0.398), ('smlgd', 0.378), ('xkj', 0.279), ('xij', 0.261), ('smllp', 0.259), ('metric', 0.239), ('euc', 0.199), ('lmnn', 0.194), ('itml', 0.176), ('xjk', 0.174), ('saddle', 0.144), ('smooth', 0.14), ('vki', 0.139), ('tr', 0.133), ('distance', 0.121), ('xj', 0.108), ('nesterov', 0.1), ('sparse', 0.099), ('au', 0.09), ('minu', 0.08), ('iris', 0.065), ('optimization', 0.064), ('xi', 0.064), ('vkj', 0.06), ('xk', 0.058), ('regularization', 0.057), ('wine', 0.057), ('dim', 0.057), ('epoch', 0.057), ('convex', 0.055), ('ut', 0.053), ('minv', 0.052), ('nn', 0.051), ('lipschitz', 0.048), ('norm', 0.047), ('dissimilar', 0.045), ('lmca', 0.04), ('neighbor', 0.039), ('margin', 0.038), ('diag', 0.037), ('rt', 0.037), ('triplet', 0.036), ('ionosphere', 0.036), ('representation', 0.035), ('bristol', 0.035), ('iono', 0.035), ('equivalently', 0.034), ('vk', 0.033), ('min', 0.033), ('transformation', 0.033), ('programming', 0.033), ('wd', 0.032), ('bal', 0.032), ('cm', 0.031), ('operator', 0.031), ('nearest', 0.031), ('sparsity', 0.03), ('xik', 0.03), ('transformed', 0.03), ('differentiable', 0.029), ('argyriou', 0.028), ('matrix', 0.028), ('lemma', 0.028), ('convexity', 0.028), ('triplets', 0.027), ('rennie', 0.027), ('exhibiting', 0.027), ('hence', 0.027), ('nips', 0.027), ('reduction', 0.027), ('ciency', 0.027), ('euclidean', 0.026), ('si', 0.026), ('smoothing', 0.026), ('enforce', 0.026), ('knn', 0.026), ('collapsing', 0.026), ('balance', 0.026), ('max', 0.025), ('equivalent', 0.025), ('scheme', 0.025), ('minx', 0.024), ('xt', 0.024), ('denotes', 0.023), ('convergence', 0.023), ('accelerate', 0.022), ('pd', 0.022), ('trace', 0.022), ('weinberger', 0.021), ('dimension', 0.021), ('theorem', 0.021), ('special', 0.021), ('realized', 0.02), ('arg', 0.02), ('principal', 0.02), ('dual', 0.02), ('orthonormal', 0.02), ('putting', 0.02), ('hinge', 0.02), ('strongly', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 223 nips-2009-Sparse Metric Learning via Smooth Optimization
Author: Yiming Ying, Kaizhu Huang, Colin Campbell
Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.
2 0.30749479 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
Author: Rong Jin, Shijun Wang, Yang Zhou
Abstract: In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.
3 0.12988304 33 nips-2009-Analysis of SVM with Indefinite Kernels
Author: Yiming Ying, Colin Campbell, Mark Girolami
Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.
4 0.11940141 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton Hengel
Abstract: The learning of appropriate distance metrics is a critical problem in image classification and retrieval. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. B OOST M ETRIC is instead based on a key observation that any positive semidefinite matrix can be decomposed into a linear positive combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting method is easy to implement, does not require tuning, and can accommodate various types of constraints. Experiments on various datasets show that the proposed algorithm compares favorably to those state-of-the-art methods in terms of classification accuracy and running time. 1
5 0.11697632 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu
Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1
6 0.096920088 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
7 0.086930014 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
8 0.083794534 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
9 0.077791899 90 nips-2009-Factor Modeling for Advertisement Targeting
10 0.072977982 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
11 0.070787936 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
12 0.069478944 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
13 0.069146372 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
14 0.064214379 147 nips-2009-Matrix Completion from Noisy Entries
15 0.062542491 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
16 0.061576787 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
17 0.059266675 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
18 0.056326285 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction
19 0.055784341 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
20 0.055718143 87 nips-2009-Exponential Family Graph Matching and Ranking
topicId topicWeight
[(0, -0.183), (1, 0.156), (2, -0.015), (3, 0.074), (4, 0.014), (5, 0.01), (6, 0.0), (7, 0.018), (8, 0.046), (9, 0.084), (10, 0.079), (11, 0.003), (12, 0.115), (13, 0.077), (14, 0.005), (15, -0.209), (16, 0.119), (17, -0.07), (18, -0.054), (19, 0.004), (20, -0.165), (21, -0.114), (22, -0.071), (23, -0.009), (24, 0.04), (25, 0.094), (26, -0.032), (27, -0.178), (28, -0.025), (29, 0.024), (30, -0.005), (31, 0.085), (32, 0.137), (33, 0.005), (34, -0.147), (35, 0.055), (36, -0.009), (37, 0.021), (38, -0.119), (39, 0.014), (40, -0.133), (41, -0.055), (42, -0.105), (43, 0.083), (44, -0.094), (45, -0.003), (46, 0.02), (47, 0.039), (48, -0.055), (49, -0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.94386762 223 nips-2009-Sparse Metric Learning via Smooth Optimization
Author: Yiming Ying, Kaizhu Huang, Colin Campbell
Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.
2 0.84649521 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
Author: Rong Jin, Shijun Wang, Yang Zhou
Abstract: In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.
3 0.83175874 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton Hengel
Abstract: The learning of appropriate distance metrics is a critical problem in image classification and retrieval. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a Mahalanobis distance metric. One of the primary difficulties in learning such a metric is to ensure that the Mahalanobis matrix remains positive semidefinite. Semidefinite programming is sometimes used to enforce this constraint, but does not scale well. B OOST M ETRIC is instead based on a key observation that any positive semidefinite matrix can be decomposed into a linear positive combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting method is easy to implement, does not require tuning, and can accommodate various types of constraints. Experiments on various datasets show that the proposed algorithm compares favorably to those state-of-the-art methods in terms of classification accuracy and running time. 1
4 0.59910089 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu
Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1
5 0.52949351 33 nips-2009-Analysis of SVM with Indefinite Kernels
Author: Yiming Ying, Colin Campbell, Mark Girolami
Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.
6 0.52527976 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
7 0.4485096 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
8 0.43513963 90 nips-2009-Factor Modeling for Advertisement Targeting
9 0.41208962 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization
10 0.41176978 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
11 0.3987751 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
12 0.39568308 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
13 0.38724363 180 nips-2009-On the Convergence of the Concave-Convex Procedure
14 0.3864367 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness
15 0.3712683 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
16 0.36204994 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
17 0.3547588 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
19 0.34037012 72 nips-2009-Distribution Matching for Transduction
20 0.33995593 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence
topicId topicWeight
[(24, 0.034), (25, 0.03), (35, 0.044), (36, 0.075), (39, 0.036), (44, 0.027), (58, 0.522), (61, 0.046), (71, 0.027), (86, 0.068)]
simIndex simValue paperId paperTitle
1 0.97852051 43 nips-2009-Bayesian estimation of orientation preference maps
Author: Sebastian Gerwinn, Leonard White, Matthias Kaschube, Matthias Bethge, Jakob H. Macke
Abstract: Imaging techniques such as optical imaging of intrinsic signals, 2-photon calcium imaging and voltage sensitive dye imaging can be used to measure the functional organization of visual cortex across different spatial and temporal scales. Here, we present Bayesian methods based on Gaussian processes for extracting topographic maps from functional imaging data. In particular, we focus on the estimation of orientation preference maps (OPMs) from intrinsic signal imaging data. We model the underlying map as a bivariate Gaussian process, with a prior covariance function that reflects known properties of OPMs, and a noise covariance adjusted to the data. The posterior mean can be interpreted as an optimally smoothed estimate of the map, and can be used for model based interpolations of the map from sparse measurements. By sampling from the posterior distribution, we can get error bars on statistical properties such as preferred orientations, pinwheel locations or pinwheel counts. Finally, the use of an explicit probabilistic model facilitates interpretation of parameters and quantitative model comparisons. We demonstrate our model both on simulated data and on intrinsic signaling data from ferret visual cortex. 1
Author: John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, Yi Ma
Abstract: Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized “robust principal component analysis” problem of recovering a low rank matrix A from corrupted observations D = A + E. Here, the corrupted entries E are unknown and the errors can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns by solving a simple convex program, for which we give a fast and provably convergent algorithm. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix. A by-product of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction of its entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision. 1
3 0.96307343 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems
Author: Marco Cuturi, Jean-philippe Vert, Alexandre D'aspremont
Abstract: We propose new methodologies to detect anomalies in discrete-time processes taking values in a probability space. These methods are based on the inference of functionals whose evaluations on successive states visited by the process are stationary and have low autocorrelations. Deviations from this behavior are used to flag anomalies. The candidate functionals are estimated in a subspace of a reproducing kernel Hilbert space associated with the original probability space considered. We provide experimental results on simulated datasets which show that these techniques compare favorably with other algorithms.
same-paper 4 0.95765501 223 nips-2009-Sparse Metric Learning via Smooth Optimization
Author: Yiming Ying, Kaizhu Huang, Colin Campbell
Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.
5 0.95467615 67 nips-2009-Directed Regression
Author: Yi-hao Kao, Benjamin V. Roy, Xiang Yan
Abstract: When used to guide decisions, linear regression analysis typically involves estimation of regression coefficients via ordinary least squares and their subsequent use to make decisions. When there are multiple response variables and features do not perfectly capture their relationships, it is beneficial to account for the decision objective when computing regression coefficients. Empirical optimization does so but sacrifices performance when features are well-chosen or training data are insufficient. We propose directed regression, an efficient algorithm that combines merits of ordinary least squares and empirical optimization. We demonstrate through a computational study that directed regression can generate significant performance gains over either alternative. We also develop a theory that motivates the algorithm. 1
6 0.94087088 81 nips-2009-Ensemble Nystrom Method
7 0.77480578 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing
8 0.77177984 147 nips-2009-Matrix Completion from Noisy Entries
9 0.75893903 177 nips-2009-On Learning Rotations
10 0.72643924 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering
11 0.72445142 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
12 0.70316368 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
13 0.70249629 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies
14 0.7000562 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
15 0.69880712 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
16 0.69852567 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
17 0.69685727 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
18 0.69455576 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints
19 0.69239587 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
20 0.69074374 124 nips-2009-Lattice Regression