jmlr jmlr2012 jmlr2012-33 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
Reference: text
sentIndex sentText sentNum sentScore
1 Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). [sent-6, score-0.534]
2 , 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. [sent-8, score-0.372]
3 This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. [sent-9, score-0.404]
4 Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification 1. [sent-14, score-0.485]
5 Introduction Distance metrics are fundamental concepts in machine learning since a proper choice of a metric has crucial effects on the performance of both supervised and unsupervised learning algorithms. [sent-15, score-0.202]
6 The k-means algorithm depends on the pairwise distance measurements between examples for clustering, and most information retrieval methods rely on a distance metric to identify the data points that are most similar to a given query. [sent-17, score-0.372]
7 Recently, learning a distance metric from data has been actively studied in machine learning (Bar-Hillel et al. [sent-18, score-0.266]
8 Most metric learning methods attempt to learn a distance metric from side information which is often available in the form of pairwise constraints, that is, pairs of similar data points and pairs of dissimilar data points. [sent-32, score-0.771]
9 A common theme in metric learning is to learn a distance metric such that the distance between similar examples should be relatively smaller than that between dissimilar examples. [sent-37, score-0.671]
10 Although the distance metric can be a general function, the most prevalent one is the Mahalanobis metric defined by dM (xi , x j ) = (xi − x j )⊤ M(xi − x j ) where M is a positive semi-definite (p. [sent-38, score-0.468]
11 In this work we restrict our attention to learning a Mahalanobis metric for k-nearest neighbor (k-NN) classification. [sent-42, score-0.202]
12 However, the proposed methods below can easily be adapted to metric learning for semi-supervised k-means clustering. [sent-43, score-0.202]
13 In particular, we can show our approach is equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Lewis and Overton, 1996; Overton, 1988). [sent-48, score-0.332]
14 Secondly, in contrast to the full eigen-decomposition used in many existing approaches to metric learning, we will develop novel approximate semi-definite programming (SDP) algorithms for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. [sent-52, score-0.284]
15 In Section 2, we propose our new approach (DML-eig) for distance metric learning and show its equivalence to the well-known eigenvalue optimization problem. [sent-58, score-0.474]
16 In Section 3, based on eigenvalue optimization formulations of DML-eig and LMNN, we develop novel firstorder algorithms. [sent-61, score-0.17]
17 Let S index the similar pairs and D index the dissimilar pairs. [sent-83, score-0.2]
18 Given a set of pairwise distance constraints, the target of metric learning is to find a distance matrix M such that the distance between the dissimilar pairs is large and the distance between the similar pairs is small. [sent-85, score-0.793]
19 In this paper, we propose to maximize the minimal squared distances between dissimilar pairs while maintaining an upper bound for the sum of squared distances between similar pairs, that is, maxM∈Sd + s. [sent-95, score-0.2]
20 As shown in the next subsection, this simple but important property1 plays a critical role in formulating problem (2) as an eigenvalue optimization problem. [sent-108, score-0.17]
21 If labels are known then the learning setting is often referred to as supervised metric learning which can 2 1. [sent-111, score-0.202]
22 3 Y ING AND L I further be divided into two categories: the global metric learning and the local metric learning. [sent-118, score-0.404]
23 The global approach learns the distance metric in a global sense, that is, to satisfy all the pairwise constraints simultaneously. [sent-119, score-0.342]
24 (2002) is a global method which used all the similar pairs (same labels) and dissimilar pairs (distinct labels). [sent-121, score-0.261]
25 The local approach is to learn a distance metric only using local pairwise constraints which usually outperforms the global methods as observed in many previous studies. [sent-122, score-0.342]
26 This is reasonable in the case of learning a metric for the k-NN classifiers since k-NN classifiers are influenced most by the data items that are close to the test/query examples. [sent-123, score-0.202]
27 Since we are mainly concerned with learning a metric for k-NN classifier, the pairwise constraints for DML-eig are generated locally, that is, the similar/dissimilar pairs are k-nearest neighbors. [sent-124, score-0.339]
28 Let D be the number of dissimilar pairs and the simplex is denoted by ∑ uτ = 1}. [sent-129, score-0.2]
29 + Now we can show problem (3) is indeed an eigenvalue optimization problem. [sent-131, score-0.17]
30 Then, problem (4) which can further be written as an eigenvalue optimization problem: min max u∈△ S∈P ∑ uτ Xτ , S τ∈D = min λmax u∈△ ∑ uτ Xτ . [sent-134, score-0.308]
31 By the min-max theorem, problem (4) can further be written by a well-known eigenvalue optimization problem: min max u∈△ M∈P ∑ uτ Xτ , M τ∈D = min λmax u∈△ ∑ uτ Xτ . [sent-142, score-0.308]
32 The problem of minimizing the maximal eigenvalue of a symmetric matrix is well-known which has important applications in engineering design, see Overton (1988); Lewis and Overton (1996). [sent-144, score-0.162]
33 Hereafter, we refer to metric learning formulation (3) (equivalently (4) or (5)) as DML-eig. [sent-145, score-0.259]
34 (2005) proposed the large margin nearest neighbor classification (LMNN) which is one of the state-of-the-art metric learning methods. [sent-161, score-0.228]
35 In analogy to the above argument for DML-eig, we can also formulate LMNN as a generalized eigenvalue optimization problem. [sent-162, score-0.17]
36 In contrast, LMNN aims to learn a metric using the relative distance constraints which are presented in the form of triplets. [sent-164, score-0.3]
37 With a little abuse of notation, we denote a triplet by τ = (i, j, k) which means that xi is similar to x j and x j is dissimilar to xk . [sent-165, score-0.212]
38 Given a set S of similar pairs and a set T of triplets, the target of LMNN is to learn a distance metric such that k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. [sent-169, score-0.367]
39 (1 − γ) ∑τ∈T ξτ + γTr(XS M) max M,ξ (13) Using exactly the same argument of proving the equivalence between (11) and (12), one can show that the above problem (13) is equivalent to max M,ξ min τ=(i, j,k)∈T Cτ , M + ξτ : (1 − γ) ∑ ξτ + γTr(XS M) = 1, (M, ξ) ∈ Ω . [sent-202, score-0.176]
40 Using the above min-max representation for LMNN, it is now easy to reformulate LMNN as a generalized eigenvalue optimization as we will do below. [sent-209, score-0.17]
41 Moreover, it can further be written as a generalized eigenvalue optimization problem: min max u∈△ 1 1 umax , λmax 1−γ γ ∑ uτCτ , (17) τ∈T where umax is the maximum element of the vector (uτ : τ ∈ T ). [sent-214, score-0.352]
42 Since we have formulated LMNN as an eigenvalue optimization problem in the above theorem, hereafter we refer to formulation (16) (equivalently (17)) as LMNN-eig. [sent-225, score-0.227]
43 The above eigenvalue optimization formulation is not restricted to metric learning problems. [sent-226, score-0.429]
44 Its eigenvalue optimization formulation can be found in Appendix A. [sent-230, score-0.227]
45 We can directly employ the entropy smoothing techniques (Nesterov, 2007; Baes and B¨ rgisser, 2009) for u eigenvalue optimization which, however, needs the computation of the full eigen-decomposition per iteration. [sent-233, score-0.218]
46 To this end, for a smoothing parameter µ > 0, define fµ (S) = min u∈△ ∑ uτ τ∈D Xτ , S + µ ∑ uτ ln uτ . [sent-237, score-0.166]
47 t Furthermore, µ max f (S) − f (St ) ≤ 2µ ln D + S∈P 10 8 maxτ∈D Xτ µt 2 + 8 ln D . [sent-279, score-0.19]
48 It is easy to see, for any S ∈ P that | f (S) − fµ (S)| ≤ µ max u∈△ ∑ (−uτ ln uτ ) ≤ µ ln D. [sent-281, score-0.19]
49 2 Approximate Frank-Wolfe Algorithm for LMNN-eig We can easily extend the above approximate Frank-Wolfe algorithm to solve the eigenvalue optimization formulation of LMNN-eig (formulation (16) or (17)). [sent-292, score-0.227]
50 If max >= max , then Z ∗ = v (v ) γ 1−γ γ where v∗ is the largest eigenvector of matrix A and ξ∗ = 0. [sent-312, score-0.174]
51 Related Work and Discussion There is a large amount of work on metric learning including distance metric learning for k-means clustering (Xing et al. [sent-316, score-0.468]
52 , 2005), maximally collapsing metric learning (MCML) (Goldberger et al. [sent-318, score-0.202]
53 , 2004) and an information-theoretic approach to metric learning (ITML) (Davis et al. [sent-320, score-0.202]
54 We refer the readers to Yang and Jin (2007) for a nice survey on metric learning. [sent-322, score-0.202]
55 Below we discuss some specific metric learning models which are closely related to our work. [sent-323, score-0.202]
56 (2002) developed the metric learning model (2) to learn a Mahalanobis metric for k-means clustering. [sent-325, score-0.404]
57 The main idea is to maximize the distance between points in the dissimilarity set under the constraint that the distance between points in the similarity set is upper-bounded. [sent-326, score-0.163]
58 It is worth mentioning that the metric learning model proposed in Xing et al. [sent-337, score-0.202]
59 (2002), DML-eig aims to maximize the minimal distance between dissimilar pairs instead of maximizing the summation of their distances. [sent-340, score-0.264]
60 (2005) developed a large margin framework to learn a Mahalanobis distance metric for k-nearest neighbor (k-NN) classification (LMNN). [sent-344, score-0.266]
61 Our method DML-eig is a local method which only uses the similar pairs and dissimilar pairs from k-nearest neighbors. [sent-348, score-0.261]
62 Rosales and Fung (2006) proposed the following element-sparse metric learning for high-dimensional data sets min ∑ M∈Sd t=(i, j,k)∈T + ⊤ (1 + xi⊤ Mxi j − xk j Mxk j )+ + γ j ∑ |Mℓk |. [sent-366, score-0.248]
63 However, since the pairs of similarly labeled and differently labeled examples are usually of order O (n2 ), the online learning procedure takes many rank-one matrix updates. [sent-375, score-0.153]
64 (2009) established generalization bounds for large margin metric learning and proposed an adaptive way to adjust the step sizes of the online metric learning method in order to guarantee the output matrix in each step is positive semi-definite. [sent-377, score-0.436]
65 (2009) recently employed the exponential loss for metric learning which can be written by min ∑ e Cτ ,M + Tr(M), M∈Sd τ=(i, j,k)∈T + where T is the triplet set and Cτ = (xi − x j )(xi − x j )⊤ − (x j − xk )(x j − xk )⊤ for any τ = (i, j, k) ∈ T . [sent-380, score-0.283]
66 (2009) proposed a metric learning model with logistic regression loss which is referred to as LDML. [sent-392, score-0.202]
67 Briefly speaking, for each training point xi , k nearest neighbors that have same labels as yi (targets) as well as k nearest neighbors that have different labels from yi (imposers) are found. [sent-414, score-0.17]
68 From xi and its corresponding targets and imposers, we then construct the set of similar pairs S (same labels) and the set of dissimilar pairs D (distinct labels), and the set of triplets T . [sent-415, score-0.351]
69 Furthermore, the data is challenging and difficult due to face variations in scale, pose, lighting, background, expression, hairstyle, and glasses, as the faces are detected in images in the wild, taken from Yahoo! [sent-421, score-0.226]
70 13 Table 4: Average test error (%) of different metric learning methods (standard deviation are in parentheses). [sent-555, score-0.202]
71 Hence, learning a Mahalanobis metric from training data does lead to improvements in k-NN classification. [sent-624, score-0.202]
72 The task of face verification is to determine whether two face images are from the same identity or not. [sent-650, score-0.3]
73 of the face images poses great challenges to the face verification algorithms. [sent-653, score-0.3]
74 Metric learning provides a viable solution by comparing the image pairs based on the metric learnt from the face data. [sent-655, score-0.384]
75 Here we evaluate our new metric learning method using a large scale face database—Labeled Faces in the Wild (LFW) (Huang et al. [sent-656, score-0.323]
76 These face images are automatically captured from news articles on the web. [sent-659, score-0.179]
77 In each fold, there are 300 pairs of images from the same identity and another 300 pairs of images from different identities. [sent-671, score-0.238]
78 We investigated several descriptors (features) from face images in this experiment. [sent-675, score-0.25]
79 For each of the ten-fold cross-validation test, we use the data from 2700 pairs of images from the same identities and another 2700 pairs of images from the different identities to learn a metric. [sent-707, score-0.238]
80 It shows that the DML-eig metric is less prone to overfitting than both LDML and ITML. [sent-724, score-0.202]
81 “Square Root” means the features preprocessed by taking square root before fed into metric learning method. [sent-767, score-0.202]
82 7 20 40 60 80 100 120 140 Dimension of principal components 160 Figure 2: Performance of DML-eig, ITML and LDML metric by varying the dimension of the principal components using SIFT descriptor. [sent-797, score-0.202]
83 Finally, the performance of DML-eig metric may be further improved by exploring different number of nearest neighbors and different types of descriptors such as those used in Pinto et al. [sent-816, score-0.339]
84 2 High−Throughput Brain−Inspired Features, aligned DML−eig + combined DML−eig + SIFT, funneled LDML + combined, funneled ITML + SIFT, funneled 0. [sent-826, score-0.214]
85 Conclusion The main theme of this paper is to develop a new eigenvalue-optimization framework for metric learning. [sent-833, score-0.202]
86 Within this context, we first proposed a novel metric learning model which was shown to be equivalent to a well-known eigenvalue optimization problem (Overton, 1988; Lewis and Overton, 1996). [sent-834, score-0.372]
87 Then, we developed efficient first-order algorithms for metric learning which only involve the computation of the largest eigenvector of a matrix. [sent-838, score-0.252]
88 Finally, experiments on various data sets have shown that our proposed approach is competitive with state-of-the-art metric learning methods. [sent-840, score-0.202]
89 In future we will exploit the extension of the above eigenvalue optimization framework to other machine learning tasks such as spectral graph cuts and semi-definite embedding (Weinberger et al. [sent-842, score-0.17]
90 Similar eigenvalue optimization formulation can be developed for maximum-margin matrix factorization (MMMF) for collaborative filtering (Srebro et al. [sent-862, score-0.291]
91 Given a partially labeled Yia ∈ {±1} with ia ∈ S, the target of MMMF is to learn a large matrix X ∈ Rm×n where each entry Xia indicates the preference of the customer i for product a. [sent-864, score-0.17]
92 Using exact arguments for proving Theorem 3, we can formulate MMMF as an eigenvalue optimization problem. [sent-872, score-0.17]
93 MMMF formulation (27) is equivalent to max min ∑ uia u∈△ ia∈S ξia + YiaCia , M (m+n) : ξ⊤ 1 + γTr(M) = 1, M ∈ S+ 22 , ξ≥0 . [sent-874, score-0.149]
94 D ISTANCE M ETRIC L EARNING WITH E IGENVALUE O PTIMIZATION In particular it is equivalent to the following eigenvalue optimization problem: 1 min max umax , λmax u∈△ γ ∑ uiaYiaCia . [sent-875, score-0.307]
95 Since the paper mainly focuses on metric learning, we leave its empirical implementation for future study. [sent-883, score-0.202]
96 Learning a similarity metric discriminatively with application to face verification. [sent-952, score-0.323]
97 Distance metric learning for large margin nearest neighbour classification. [sent-1130, score-0.228]
98 Fast solvers and efficient implementations for distance metric learning. [sent-1137, score-0.266]
99 Distance metric learning with application to clustering with side information. [sent-1164, score-0.202]
100 A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval. [sent-1180, score-0.266]
wordName wordTfidf (topN-words)
[('lmnn', 0.382), ('sd', 0.299), ('itml', 0.257), ('xs', 0.24), ('ldml', 0.223), ('st', 0.214), ('metric', 0.202), ('guillaumin', 0.171), ('igenvalue', 0.157), ('tr', 0.154), ('dissimilar', 0.139), ('eigenvalue', 0.13), ('boostmetric', 0.123), ('overton', 0.123), ('face', 0.121), ('istance', 0.121), ('weinberger', 0.113), ('etric', 0.111), ('maxs', 0.111), ('ia', 0.108), ('xing', 0.101), ('ptimization', 0.089), ('wolf', 0.082), ('pinto', 0.08), ('eig', 0.078), ('mlmnn', 0.078), ('mahalanobis', 0.078), ('sift', 0.074), ('ln', 0.072), ('descriptors', 0.071), ('lfw', 0.07), ('veri', 0.07), ('pca', 0.07), ('ing', 0.067), ('minu', 0.067), ('ying', 0.065), ('dml', 0.065), ('mmmf', 0.065), ('distance', 0.064), ('pairs', 0.061), ('maxm', 0.06), ('funneled', 0.06), ('sdp', 0.059), ('srebro', 0.059), ('images', 0.058), ('formulation', 0.057), ('minm', 0.056), ('zt', 0.054), ('wild', 0.054), ('yia', 0.052), ('triplets', 0.052), ('earning', 0.051), ('eigenvector', 0.05), ('dm', 0.048), ('smoothing', 0.048), ('faces', 0.047), ('max', 0.046), ('min', 0.046), ('umax', 0.045), ('righthand', 0.045), ('taigman', 0.045), ('tol', 0.045), ('guration', 0.043), ('pairwise', 0.042), ('optimization', 0.04), ('neighbors', 0.04), ('baes', 0.039), ('burer', 0.039), ('exeter', 0.039), ('davis', 0.039), ('xi', 0.038), ('equivalence', 0.038), ('lewis', 0.038), ('xia', 0.037), ('descriptor', 0.037), ('rt', 0.035), ('triplet', 0.035), ('dissimilarity', 0.035), ('constraints', 0.034), ('aligned', 0.034), ('customers', 0.034), ('goldberger', 0.033), ('factorization', 0.032), ('matrix', 0.032), ('huang', 0.031), ('labeled', 0.03), ('shen', 0.03), ('jin', 0.028), ('hoi', 0.028), ('rosales', 0.028), ('people', 0.027), ('nesterov', 0.027), ('kato', 0.026), ('monteiro', 0.026), ('mxs', 0.026), ('rgisser', 0.026), ('spectrahedron', 0.026), ('torresani', 0.026), ('nearest', 0.026), ('tk', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
2 0.31421939 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
3 0.17423247 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet
4 0.13423945 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions
Author: Daniel J. Lizotte, Michael Bowling, Susan A. Murphy
Abstract: We present a general and detailed development of an algorithm for finite-horizon fitted-Q iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3-reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a realworld decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the field of evidence-based clinical decision support. Keywords: reinforcement learning, dynamic programming, decision making, linear regression, preference elicitation
5 0.091522247 84 jmlr-2012-Online Submodular Minimization
Author: Elad Hazan, Satyen Kale
Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and partial feedback settings. Keywords: submodular optimization, online learning, regret minimization
6 0.082008779 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
7 0.076596051 73 jmlr-2012-Multi-task Regression using Minimal Penalties
8 0.066379517 97 jmlr-2012-Regularization Techniques for Learning with Matrices
9 0.06472297 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization
10 0.053418897 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
11 0.053179864 82 jmlr-2012-On the Necessity of Irrelevant Variables
12 0.050252274 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
13 0.047838796 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality
14 0.045717701 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
15 0.044950988 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally
16 0.043986719 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
17 0.043250415 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss
18 0.043081094 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
19 0.042241141 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
20 0.040246062 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
topicId topicWeight
[(0, -0.25), (1, -0.088), (2, 0.2), (3, 0.123), (4, -0.387), (5, -0.029), (6, -0.241), (7, 0.13), (8, -0.032), (9, -0.055), (10, 0.057), (11, 0.042), (12, -0.082), (13, -0.081), (14, 0.048), (15, -0.019), (16, -0.065), (17, 0.138), (18, 0.132), (19, -0.07), (20, -0.033), (21, 0.082), (22, -0.013), (23, -0.232), (24, 0.014), (25, -0.073), (26, -0.05), (27, -0.161), (28, -0.041), (29, -0.144), (30, 0.001), (31, 0.048), (32, 0.008), (33, 0.026), (34, -0.041), (35, 0.048), (36, -0.117), (37, -0.076), (38, -0.062), (39, -0.022), (40, 0.072), (41, 0.02), (42, 0.01), (43, -0.031), (44, 0.005), (45, 0.058), (46, 0.063), (47, 0.06), (48, 0.102), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.95226622 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
2 0.67954475 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
3 0.67059988 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet
4 0.51862615 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions
Author: Daniel J. Lizotte, Michael Bowling, Susan A. Murphy
Abstract: We present a general and detailed development of an algorithm for finite-horizon fitted-Q iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3-reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a realworld decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the field of evidence-based clinical decision support. Keywords: reinforcement learning, dynamic programming, decision making, linear regression, preference elicitation
5 0.35579741 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
Author: Carl Brunner, Andreas Fischer, Klaus Luig, Thorsten Thies
Abstract: Pairwise classification is the task to predict whether the examples a, b of a pair (a, b) belong to the same class or to different classes. In particular, interclass generalization problems can be treated in this way. In pairwise classification, the order of the two input examples should not affect the classification result. To achieve this, particular kernels as well as the use of symmetric training sets in the framework of support vector machines were suggested. The paper discusses both approaches in a general way and establishes a strong connection between them. In addition, an efficient implementation is discussed which allows the training of several millions of pairs. The value of these contributions is confirmed by excellent results on the labeled faces in the wild benchmark. Keywords: pairwise support vector machines, interclass generalization, pairwise kernels, large scale problems
6 0.29421461 73 jmlr-2012-Multi-task Regression using Minimal Penalties
7 0.28283924 84 jmlr-2012-Online Submodular Minimization
8 0.27382821 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
10 0.23944266 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
11 0.235245 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality
12 0.23076503 12 jmlr-2012-Active Clustering of Biological Sequences
13 0.22802302 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
14 0.2234834 60 jmlr-2012-Local and Global Scaling Reduce Hubs in Space
15 0.22283359 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization
16 0.20652789 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
17 0.20613897 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
18 0.20133473 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition
19 0.20095505 97 jmlr-2012-Regularization Techniques for Learning with Matrices
topicId topicWeight
[(7, 0.013), (21, 0.032), (26, 0.088), (29, 0.018), (35, 0.016), (49, 0.011), (56, 0.016), (59, 0.013), (64, 0.431), (69, 0.031), (75, 0.056), (77, 0.014), (81, 0.01), (92, 0.058), (96, 0.105)]
simIndex simValue paperId paperTitle
1 0.95711184 107 jmlr-2012-Smoothing Multivariate Performance Measures
Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan
Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing
2 0.78867948 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
same-paper 3 0.7498607 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
4 0.74137819 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
Author: Stephen Gould
Abstract: We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data flows. Keywords: machine learning, graphical models, computer vision, open-source software
5 0.46315354 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
6 0.37981817 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
7 0.37527257 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
8 0.37183517 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
9 0.3714188 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods
10 0.37052545 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
11 0.36531293 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
12 0.36477798 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
13 0.3629207 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
14 0.36280364 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
15 0.35999894 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
16 0.35724357 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
17 0.35317141 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
18 0.35250708 84 jmlr-2012-Online Submodular Minimization
19 0.35028142 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
20 0.34808514 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss