jmlr jmlr2012 jmlr2012-92 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
Reference: text
sentIndex sentText sentNum sentScore
1 AU University of Wollongong Wollongong, NSW 2522, Australia Anton van den Hengel ANTON . [sent-9, score-0.135]
2 AU The University of Adelaide Adelaide, SA 5005, Australia Editors: S¨ ren Sonnenburg, Francis Bach, Cheng Soon Ong o Abstract The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. [sent-12, score-0.167]
3 Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. [sent-15, score-0.213]
4 We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. [sent-20, score-0.124]
5 Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor 1. [sent-22, score-0.094]
6 Introduction The identification of an effective metric by which to measure distances between data points is an essential component of many machine learning algorithms including k-nearest neighbor (kNN), kmeans clustering, and kernel regression. [sent-23, score-0.113]
7 , 2008; Jian and c 2012 Chunhua Shen, Junae Kim, Lei Wang and Anton van den Hengel. [sent-25, score-0.135]
8 (2008), for instance, showed that in generic object recognition with local features, kNN with a Euclidean metric can achieve comparable or better accuracy than more sophisticated classifiers such as support vector machines (SVMs). [sent-33, score-0.14]
9 The Mahalanobis distance represents a generalization of the Euclidean distance, and offers the opportunity to learn a distance metric directly from the data. [sent-34, score-0.186]
10 If we let ai , i = 1, 2 · · · , represent a set of points in RD , then the Mahalanobis distance, or Gaussian quadratic distance, between two points is ai − a j X = (ai − a j )⊤ X(ai − a j ), where X 0 is a positive semidefinite (p. [sent-39, score-0.112]
11 We are interested here in the case where the training data consist of a set of constraints upon the relative distances between data points, I = {(ai , a j , ak ) | disti j < distik }, (1) where disti j measures the distance between ai and a j . [sent-50, score-0.145]
12 Each such constraint implies that “ai is closer to a j than ai is to ak ”. [sent-51, score-0.092]
13 Constraints such as these often arise when it is known that ai and a j belong to the same class of data points while ai , ak belong to different classes. [sent-52, score-0.148]
14 Recently, there has been significant research interest in supervised distance metric learning using side information that is typically presented in a set of pairwise constraints. [sent-130, score-0.133]
15 Most of these methods, although appearing in different formats, share a similar essential idea: to learn an optimal distance metric by keeping training examples in equivalence constraints close, and at the same time, examples in in-equivalence constraints well separated. [sent-131, score-0.133]
16 (2002) first proposed the idea of learning a Mahalanobis metric for clustering using convex optimization. [sent-151, score-0.114]
17 The algorithm maximizes the distance between points in the dis-similarity set under the constraint that the distance between points in the similarity set is upper-bounded. [sent-153, score-0.106]
18 (2005) that LMNN delivers the state-of-the-art performance among most distance metric learning algorithms. [sent-158, score-0.133]
19 Information theoretic metric learning (ITML) learns a suitable metric based on information theoretics (Davis et al. [sent-159, score-0.16]
20 Instead of using the hinge loss, however, we use the exponential loss and logistic loss functions in order to derive an AdaBoost-like (or LogitBoost-like) optimization procedure. [sent-188, score-0.174]
21 The dual of the restricted master problem is solved by the simplex method, and the optimal dual solution is used to find the new column to be included into the restricted master problem. [sent-194, score-0.093]
22 Significantly, LPBoost showed that in an LP framework, unknown weak hypotheses can be learned from the dual although the space of all weak hypotheses is infinitely large. [sent-197, score-0.097]
23 Shen and Li (2010) applied column generation to boosting with general loss functions. [sent-198, score-0.161]
24 ,x m } in a real vector or matrix space Sp, the convex hull of Sp spanned by m elements in Sp is defined as: x Convm (Sp) = ∑m wix i wi ≥ 0, ∑m wi = 1,x i ∈ Sp . [sent-222, score-0.092]
25 i=1 i=1 Define the linear convex span of Sp as:1 x Convm (Sp) = ∑m wix i wi ≥ 0, ∑m wi = 1,x i ∈ Sp, m ∈ Z+ . [sent-223, score-0.092]
26 If λ ′ is not an extreme point of Ψ2 , then it must be possible to express it as a convex combination of a set of other points in Ψ2 : λ ′ = ∑m wiλ i , wi > 0, ∑m wi = 1 and λ i = λ ′ . [sent-230, score-0.124]
27 Proof It is easy to check that any convex combination ∑i wi Zi , such that Zi ∈ Ψ1 , resides in Γ1 , with the following two facts: 1) a convex combination of p. [sent-240, score-0.097]
28 The Krein-Milman Theorem (Krein and Milman, 1940) tells us that a convex and compact set is equal to the convex hull of its extreme points. [sent-257, score-0.1]
29 Typically a boosting algorithm (Schapire, 1999) creates a single strong learner by incrementally adding base (weak) learners to the final strong learner. [sent-271, score-0.153]
30 In general, a boosting algorithm builds on a user-specified base learning procedure and runs it repeatedly on modified data that are outputs from the previous iterations. [sent-273, score-0.093]
31 As shown next, our first case, which learns a metric using the hinge loss, greatly resembles this idea. [sent-292, score-0.106]
32 To simplify notation, we rewrite the distance between dist2j and dist2 as dist2 − dist2j = Ar , X , where i i ik ik Ar = (ai − ak )(ai − ak )⊤ − (ai − a j )(ai − a j )⊤ , (4) for r = 1, · · · , |I| and |I| is the size of the set of constraints I defined in Equation (1). [sent-310, score-0.125]
33 We mainly investigate the cases using the hinge loss, exponential loss and logistic loss functions. [sent-313, score-0.174]
34 u π,u We can then use column generation to solve the original problem iteratively by looking at both the primal and dual problems. [sent-351, score-0.13]
35 In this work we are more interested in smooth loss functions such as the exponential loss and logistic loss, as presented in the sequel. [sent-354, score-0.148]
36 In order to derive its dual, we write its Lagrangian |I| |I| wρu 1 L(w ,ρ ,u ) = log ∑r=1 exp(−ρr ) + v1⊤w + ∑r=1 ur (ρr − Hr:w ) − p⊤w , with p ≥ 0. [sent-373, score-0.154]
37 ρ L2 L1 |I| |I| 1 u w inf L = inf log ∑r=1 exp(−ρr ) +u⊤ρ + inf (v1⊤ − ∑r=1 ur Hr: − p⊤ )w ρ w ,ρ ρ w (10) |I| = −∑r=1 ur log ur . [sent-375, score-0.462]
38 The infimum of L1 is found by setting its first derivative to zero and we have: inf L1 = ρ −∑r ur log ur −∞ 1 if u ≥ 0 ,1⊤u = 1, otherwise. [sent-376, score-0.308]
39 (11) The Lagrange dual problem of (9) is an entropy maximization problem, which writes |I| 1 max − ∑r=1 ur log ur , s. [sent-380, score-0.341]
40 Let us start from some basic knowledge of column generation because our coordinate descent strategy is inspired by column generation. [sent-389, score-0.088]
41 Then either the primal (9) or the dual (12) can be trivially solved (at least in theory) because both are convex optimization problems. [sent-394, score-0.103]
42 In convex optimization, column generation is a technique that is designed for solving this difficulty. [sent-400, score-0.095]
43 For a general convex problem, we can use column generation to obtain an approximate solution. [sent-404, score-0.095]
44 For this purpose, we need to solve |I| ˆ Z = argmaxZ ∑r=1 ur Ar , Z , s. [sent-411, score-0.154]
45 At iteration j, we have |I| urj ∝ exp(−ρrj ) ∝ urj−1 exp(−Hr j w j ), and ∑r=1 urj = 1, derived from (13). [sent-457, score-0.112]
46 So once w j is calculated, we can update u as j−1 urj = ur exp(−Hr j w j ) , r = 1, . [sent-458, score-0.21]
47 , |I|, z |I| (16) j where z is a normalization factor so that ∑r=1 ur = 1. [sent-461, score-0.154]
48 We have |I| |I| ∑r=1 ur Ar , Z = v ∑r=1 ur Ar v⊤ . [sent-466, score-0.308]
49 By denoting |I| ˆ A = ∑r=1 ur Ar , (17) the base learning optimization equals: ˆv max v⊤ Av , s. [sent-467, score-0.183]
50 Input: • Training set triplets (ai , a j , ak ) ∈ I; Compute Ar , r = 1, 2, · · · , using (4). [sent-475, score-0.164]
51 : ∑r=1 ur Hr: ≤ v1⊤ , where logit∗ (·) is the Fenchel conjugate function of logit(·), defined as logit∗ (−u) = u log(u) + (1 − u) log(1 − u), when 0 ≤ u ≤ 1, and ∞ otherwise. [sent-501, score-0.154]
52 First, let j j−1 us find the relationship between ur and ur . [sent-507, score-0.308]
53 (21) j−1 (1/ur − 1) exp(Hr j w j ) + 1 The optimization of w j can be solved by looking for the root of |I| ∑r=1 Hr j urj − v = 0, (22) j where ur is a function of w j as defined in (21). [sent-510, score-0.21]
54 As discussed, our learning procedure is able to employ various loss functions such as the hinge loss, exponential loss or logistic loss. [sent-529, score-0.174]
55 To devise a totally-corrective optimization procedure for solving our problem efficiently, we need to ensure the object function 1020 M ETRIC L EARNING U SING B OOSTING - LIKE A LGORITHMS Algorithm 4 Positive semidefinite matrix learning with totally corrective boosting. [sent-530, score-0.101]
56 Input: • Training set triplets (ai , a j , ak ) ∈ I; Compute Ar , r = 1, 2, · · · , using (4). [sent-531, score-0.164]
57 Here, we use the exponential loss function and the logistic loss function. [sent-541, score-0.148]
58 It is possible to use sub-gradient descent methods when a non-smooth loss function like the hinge loss is used. [sent-542, score-0.098]
59 It is clear that solving for w is a typical convex optimization problem since it has a differentiable and convex function (9) when the exponential loss is used, or (19) when the logistic loss is used. [sent-543, score-0.216]
60 To calculate ur , we use (13) (exponential loss) or (20) (logistic loss) instead of (16) or (21) respectively. [sent-553, score-0.154]
61 The proof follows the same discussion for the logistic loss, or any other smooth convex loss function. [sent-562, score-0.109]
62 ˆ If, after this weak learner Z is added into the primal problem, the primal solution remains unchanged, that is, the corresponding w = 0, then from the optimality condition that L2 in (10) must ˆ |I| ˆ be zero, we know that p = v − ∑r=1 ur Ar , Z < 0. [sent-570, score-0.286]
63 ˆ ˆ We can conclude that after the base learner Z is added into the primal problem, its corresponding w must admit a positive value. [sent-572, score-0.093]
64 Our B OOST M ETRIC uses training set triplets (ai , a j , ak ) ∈ I as input for training. [sent-581, score-0.164]
65 The Mahalanobis distance metric X can be viewed as a linear transformation in the Euclidean space by projecting the data using matrix L (X = LL⊤ ). [sent-582, score-0.133]
66 That is, nearest neighbors of samples using Mahalanobis distance metric X are the same as nearest neighbors using Euclidean distance in the transformed space. [sent-583, score-0.31]
67 B OOST M ETRIC assumes that the triplets of input training set approximately represent the actual nearest neighbors of samples in the transformed space defined by the Mahalanobis metric. [sent-584, score-0.19]
68 However, even though the triplets of B OOST M ETRIC consist of nearest neighbors of the original training samples, generated triplets are not exactly the same as the actual nearest neighbors of training samples in the transformed space by L. [sent-585, score-0.38]
69 We can refine the results of B OOST M ETRIC iteratively, as in the multiple-pass LMNN (Weinberger and Saul, 2009): B OOST M ETRIC can estimate the triplets in the transformed space under a multiple-pass procedure as close to actual triplets as possible. [sent-586, score-0.256]
70 At each pass p (p = 1, 2, · · · ), we decompose the learned Mahalanobis distance metric X p−1 of previous pass into transformation matrix L p . [sent-588, score-0.133]
71 Then we generate the training set triplets from the set of points {L⊤ a1 , . [sent-590, score-0.128]
72 The final Mahalanobis distance metric X becomes LL⊤ in Multi-pass B OOSTM ETRIC. [sent-594, score-0.133]
73 BoostMetric-E indicates B OOSTM ETRIC with the exponential loss and BoostMetric-L is B OOST M ETRIC with the logistic loss; both use stage-wise optimization. [sent-759, score-0.112]
74 In this experiment, 9000 triplets are generated for training. [sent-781, score-0.128]
75 We have used the same mechanism to generate training triplets as described in Weinberger et al. [sent-799, score-0.128]
76 Briefly, for each training point ai , k nearest neighbors that have same labels as yi (targets), as well as k nearest neighbors that have different labels from yi (imposers) are found. [sent-801, score-0.18]
77 We then construct triplets from ai and its corresponding targets and imposers. [sent-802, score-0.184]
78 We vary the input dimensions from 50 to 1000 and keep the number of triplets fixed to 250. [sent-904, score-0.128]
79 The proposed B OOST M ETRIC and the LMNN are further compared on visual object categorization tasks. [sent-919, score-0.1]
80 This experiment involves both object categorization (Motorbikes versus Airplanes) and object retrieval (Faces versus Background-Google) problems. [sent-923, score-0.123]
81 : 100D 200D 1000 2000 3000 4000 5000 6000 7000 8000 9000 # of triplets Figure 4: Test error (3-nearest neighbor) of B OOST M ETRIC on the Motorbikes versus Airplanes data sets. [sent-951, score-0.128]
82 The second plot shows the test error against the number of training triplets with a 100-word codebook. [sent-952, score-0.128]
83 Also, Figure 4 (right) plots the test error of the B OOST M ETRIC against the number of triplets for training. [sent-971, score-0.128]
84 The general trend is that more triplets lead to smaller errors. [sent-972, score-0.128]
85 Each face image in a test subset is used as a query, and its distances from other test images are calculated by the proposed BoostMetric, LMNN and the Euclidean distance, respectively. [sent-979, score-0.087]
86 3 5 10 rank 15 20 Figure 5: Retrieval accuracy of distance metric learning algorithms on the Faces versus Background-Google data set. [sent-993, score-0.133]
87 5 By setting k to 2000, a visual codebook of 2000 visual words is obtained. [sent-1011, score-0.101]
88 With 10 nearest neighbors information, about 20, 000 triplets are constructed and used to train the B OOST M ETRIC . [sent-1016, score-0.19]
89 Figure 6: Four generated triplets based on the pairwise information provided by the LFW data set. [sent-1060, score-0.128]
90 1030 M ETRIC L EARNING U SING B OOSTING - LIKE A LGORITHMS number of triplets 3,000 6,000 9,000 12,000 15,000 18,000 100D 80. [sent-1076, score-0.128]
91 55) Table 4: Comparison of the face recognition accuracy (%) of our proposed B OOST M ETRIC on the LFW data set by varying the PCA dimensionality and the number of triplets for each fold. [sent-1124, score-0.205]
92 The face recognition task here is pair matching—given two face images, to determine if these two images are of the same individual. [sent-1126, score-0.164]
93 Increasing the number of training triplets gives slight improvement of recognition accuracy. [sent-1133, score-0.162]
94 This comparison also confirms the importance of learning an appropriate metric for vision problems. [sent-1144, score-0.116]
95 Conclusion We have presented a new algorithm, B OOST M ETRIC, to learn a positive semidefinite metric using boosting techniques. [sent-1146, score-0.144]
96 Learning globally-consistent local distance functions for shape-based image retrieval and classification. [sent-1311, score-0.088]
97 Labeled faces in the wild: A database for studying face recognition in unconstrained environments. [sent-1361, score-0.109]
98 Totally corrective boosting algorithms that maximize the a margin. [sent-1563, score-0.112]
99 Distance metric learning for large margin nearest neighbor classification. [sent-1579, score-0.147]
100 Distance metric learning for large margin nearest neighbor classification. [sent-1588, score-0.147]
wordName wordTfidf (topN-words)
[('etric', 0.554), ('oost', 0.545), ('lmnn', 0.248), ('mahalanobis', 0.155), ('ur', 0.154), ('hr', 0.144), ('triplets', 0.128), ('engel', 0.103), ('den', 0.103), ('weinberger', 0.1), ('oosting', 0.096), ('int', 0.096), ('semide', 0.091), ('sing', 0.087), ('hen', 0.086), ('metric', 0.08), ('ar', 0.07), ('lgorithms', 0.067), ('boosting', 0.064), ('im', 0.06), ('ai', 0.056), ('urj', 0.056), ('distance', 0.053), ('nca', 0.05), ('adelaide', 0.048), ('corrective', 0.048), ('oostm', 0.048), ('wang', 0.046), ('lda', 0.046), ('shen', 0.045), ('images', 0.044), ('adaboost', 0.043), ('face', 0.043), ('sp', 0.043), ('boostmetric', 0.041), ('itml', 0.041), ('logit', 0.04), ('evd', 0.04), ('merl', 0.04), ('tr', 0.039), ('logistic', 0.039), ('visual', 0.038), ('exponential', 0.037), ('earning', 0.037), ('descriptor', 0.036), ('primal', 0.036), ('categorization', 0.036), ('ak', 0.036), ('vision', 0.036), ('loss', 0.036), ('pca', 0.035), ('retrieval', 0.035), ('generation', 0.034), ('nearest', 0.034), ('convex', 0.034), ('recognition', 0.034), ('neighbor', 0.033), ('dual', 0.033), ('extreme', 0.032), ('faces', 0.032), ('airplanes', 0.032), ('boiman', 0.032), ('lpp', 0.032), ('motorbikes', 0.032), ('bisection', 0.032), ('weak', 0.032), ('learners', 0.032), ('van', 0.032), ('tc', 0.031), ('guillaumin', 0.031), ('lfw', 0.031), ('codebooks', 0.031), ('eigenvector', 0.031), ('goldberger', 0.03), ('knn', 0.03), ('base', 0.029), ('saul', 0.029), ('nowak', 0.029), ('wi', 0.029), ('neighbors', 0.028), ('learner', 0.028), ('msrc', 0.028), ('ldml', 0.028), ('column', 0.027), ('totally', 0.027), ('lagrange', 0.027), ('object', 0.026), ('hinge', 0.026), ('lter', 0.026), ('euclidean', 0.026), ('wolf', 0.025), ('wl', 0.025), ('mp', 0.025), ('classi', 0.025), ('winn', 0.025), ('rca', 0.025), ('jian', 0.025), ('codebook', 0.025), ('chunhua', 0.024), ('concentric', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
2 0.38996172 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package
Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl
Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classifiers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classifier
3 0.31421939 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
4 0.1751707 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet
5 0.093084119 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
6 0.04765375 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
7 0.044200655 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
8 0.043577343 60 jmlr-2012-Local and Global Scaling Reduce Hubs in Space
9 0.038773607 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
10 0.038023997 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
11 0.035516616 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
12 0.034983981 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
13 0.03358487 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
14 0.033335388 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
15 0.032763034 97 jmlr-2012-Regularization Techniques for Learning with Matrices
16 0.032560669 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
17 0.032388981 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage
18 0.032174584 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise
19 0.032154169 106 jmlr-2012-Sign Language Recognition using Sub-Units
20 0.031856772 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss
topicId topicWeight
[(0, -0.201), (1, -0.013), (2, 0.295), (3, 0.219), (4, -0.645), (5, 0.02), (6, -0.182), (7, 0.165), (8, -0.054), (9, 0.035), (10, 0.067), (11, 0.15), (12, 0.008), (13, 0.007), (14, -0.018), (15, -0.033), (16, -0.047), (17, -0.011), (18, -0.103), (19, 0.065), (20, -0.005), (21, -0.048), (22, 0.058), (23, 0.072), (24, -0.106), (25, -0.002), (26, -0.014), (27, 0.035), (28, 0.003), (29, 0.007), (30, -0.04), (31, -0.028), (32, 0.053), (33, -0.0), (34, 0.026), (35, 0.013), (36, 0.019), (37, 0.02), (38, 0.038), (39, 0.022), (40, -0.019), (41, 0.04), (42, 0.016), (43, -0.014), (44, 0.005), (45, 0.03), (46, -0.019), (47, 0.009), (48, -0.025), (49, -0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.93844771 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
2 0.79446054 62 jmlr-2012-MULTIBOOST: A Multi-purpose Boosting Package
Author: Djalel Benbouzid, Róbert Busa-Fekete, Norman Casagrande, François-David Collin, Balázs Kégl
Abstract: The M ULTI B OOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on A DA B OOST.MH but it also implements popular cascade classifiers and F ILTER B OOST. The package contains common multi-class base learners (stumps, trees, products, Haar filters). Further base learners and strong learners following the boosting paradigm can be easily implemented in a flexible framework. Keywords: boosting, A DA B OOST.MH, F ILTER B OOST, cascade classifier
3 0.55570871 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
4 0.41863528 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet
5 0.25150156 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
6 0.18222131 60 jmlr-2012-Local and Global Scaling Reduce Hubs in Space
7 0.16066958 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
8 0.15433495 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
9 0.15011342 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
10 0.14767268 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
11 0.14213857 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
12 0.13507773 106 jmlr-2012-Sign Language Recognition using Sub-Units
13 0.13289215 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
14 0.1328515 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss
15 0.12907952 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization
16 0.12798302 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class
17 0.12672624 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization
18 0.12309464 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
19 0.12295564 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
20 0.12165482 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
topicId topicWeight
[(5, 0.245), (7, 0.024), (14, 0.016), (21, 0.031), (26, 0.097), (27, 0.013), (29, 0.023), (35, 0.021), (49, 0.022), (56, 0.022), (59, 0.021), (64, 0.093), (69, 0.023), (75, 0.064), (77, 0.017), (81, 0.012), (92, 0.062), (96, 0.104)]
simIndex simValue paperId paperTitle
same-paper 1 0.73848444 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
2 0.60852677 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
3 0.58276379 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
4 0.56839228 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
Author: Stephen Gould
Abstract: We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data flows. Keywords: machine learning, graphical models, computer vision, open-source software
5 0.54336208 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet
6 0.53569865 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods
7 0.52502483 107 jmlr-2012-Smoothing Multivariate Performance Measures
8 0.52339274 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
9 0.52261698 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
10 0.51811594 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
11 0.51644003 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
12 0.51468283 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
13 0.51384801 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
14 0.50878584 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
15 0.50736845 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
16 0.50711584 55 jmlr-2012-Learning Algorithms for the Classification Restricted Boltzmann Machine
17 0.50622797 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
18 0.50430465 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
19 0.49758652 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
20 0.49753499 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems