nips nips2009 nips2009-126 knowledge-graph by maker-knowledge-mining

126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering


Source: pdf

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 China Abstract Learning distance functions with side information plays a key role in many machine learning and data mining applications. [sent-5, score-0.326]

2 In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. [sent-8, score-0.366]

3 The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. [sent-9, score-0.776]

4 We also present an efficient learning algorithm for the proposed scheme for distance function learning. [sent-10, score-0.295]

5 The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. [sent-11, score-0.468]

6 1 Introduction An effective distance function plays an important role in many machine learning and data mining techniques. [sent-12, score-0.273]

7 In general, learning effective distance functions is a fundamental problem in both data mining and machine learning. [sent-14, score-0.294]

8 Recently, learning distance functions from data has been actively studied in machine learning. [sent-15, score-0.279]

9 , Euclidean distance), researchers have attempted to learn distance functions from side information that is often provided in the form of pairwise constraints, i. [sent-18, score-0.398]

10 , must-link constraints for pairs of similar data points and cannot-link constraints for pairs of dissimilar data points. [sent-20, score-0.129]

11 Given two data points x and x , the distance between x and x is calculated by d(x, x ) = (x − x ) A(x − x ), where A is the distance metric that needs to be learned from the side information. [sent-23, score-0.717]

12 [16] learns a global distance metric (GDM) by minimizing the distance between similar data points while keeping dissimilar data points far apart. [sent-24, score-0.725]

13 This problem was addressed by Discriminative Component Analysis (DCA) in [8], which learns a distance metric by minimizing the distance between similar data points and in the meantime maximizing the distance 1 between dissimilar data points. [sent-28, score-0.983]

14 The authors in [4] proposed an information-theoretic based metric learning approach (ITML) that learns the Mahalanobis distance by minimizing the differential relative entropy between two multivariate Gaussians. [sent-29, score-0.457]

15 Neighborhood Component Analysis (NCA) [5] learns a distance metric by extending the nearest neighbor classifier. [sent-30, score-0.476]

16 The maximum-margin nearest neighbor (LMNN) classifier [14] extends NCA through a maximum margin framework. [sent-31, score-0.057]

17 [7] propose a semi-supervised distance metric learning approach that explores the unlabeled data for metric learning. [sent-35, score-0.552]

18 In addition to learning a distance metric, several studies [12, 6] are devoted to learning a distance function, mostly non-metric, from the side information. [sent-36, score-0.548]

19 Despite the success, the existing approaches for distance metric learning are limited in two aspects. [sent-37, score-0.405]

20 First, most existing methods assume a fixed distance metric for the entire input space, which make it difficult for them to handle the heterogeneous data. [sent-38, score-0.427]

21 This issue was already demonstrated in [17] when learning distance metrics from multi-modal data distributions. [sent-39, score-0.277]

22 Second, the existing methods aim to learn a full matrix for the target distance metric that is in the square of the dimensionality, making it computationally unattractive for high dimensional data. [sent-40, score-0.498]

23 Although the computation can be reduced significantly by assuming certain forms of the distance metric (e. [sent-41, score-0.405]

24 To address these two limitations, we propose a novel scheme that learns Bregman distance functions from the given side information. [sent-44, score-0.366]

25 Bregman distance or Bregman divergence [3] has several salient properties for distance measure. [sent-45, score-0.516]

26 Bregman distance generalizes the class of Mahalanobis distance by deriving a distance function from a given convex function φ(x). [sent-46, score-0.819]

27 Since the local distance metric can be derived from the local Hessian matrix of ϕ(x), Bregman distance function avoids the assumption of fixed distance metric. [sent-47, score-0.952]

28 For example, Kullback-Leibler divergence is a special Bregman distance when choosing the negative entropy function for the convex function ϕ(x). [sent-49, score-0.303]

29 The objective of this work is to design an efficient and effective algorithm that learns a Bregman distance function from pairwise constraints. [sent-50, score-0.38]

30 Although Bregman distance or Bregman divergence has been explored in [1], all these studies assume a predefined Bregman distance function. [sent-51, score-0.516]

31 To the best of our knowledge, this is the first work that addresses the problem of learning Bregman distances from the pairwise constraints. [sent-52, score-0.087]

32 We present a non-parametric framework for Bregman distance learning, together with an efficient learning algorithm. [sent-53, score-0.258]

33 Our empirical study with semi-supervised clustering show that the proposed approach (i) outperforms the state-of-the-art algorithms for distance metric learning, and (ii) is computationally efficient for high dimensional data. [sent-54, score-0.615]

34 Section 2 presents the proposed framework of learning Bregman distance functions from the pairwise constraints, together with an efficient learning algorithm. [sent-56, score-0.383]

35 Section 3 presents the experimental results with semi-supervised clustering by comparing the proposed algorithms with a number of state-of-the-art algorithms for distance metric learning. [sent-57, score-0.537]

36 1 Learning Bregman Distance Functions Bregman Distance Function Bregman distance function is defined based on a given convex function. [sent-60, score-0.303]

37 Let ϕ(x) : Rd → R be a strictly convex function that is twice differentiable. [sent-61, score-0.06]

38 As indicated by the above expression, the ˜ Bregman distance function can be viewed as a general Mahalanobis distance that introduces a local distance metric A = 2 ϕ(˜). [sent-65, score-0.921]

39 Unlike the conventional Mahalanobis distance where metric A is a x constant matrix throughout the entire space, the local distance metric A = 2 ϕ(˜) is introduced via x the Hessian matrix of convex function ϕ(x) and therefore depends on the location of x1 and x2 . [sent-66, score-0.906]

40 Although the Bregman distance function defined in (1) does not satisfy the triangle inequality, the following proposition shows the degree of violation could be bounded if the Hessian matrix of ϕ(x) is bounded. [sent-67, score-0.384]

41 If ∃m, M ∈ R, M > m > 0 and mI min 2 ϕ(x) max 2 ϕ(x) M I x∈Ω x∈Ω where I is the identity matrix, we have the following inequality √ √ d(xa , xb ) ≤ d(xa , xc ) + d(xc , xb ) + ( M − m)[d(xa , xc )d(xc , xb )]1/4 (2) The proof of this proposition can be found in Appendix A. [sent-70, score-2.24]

42 As indicated by Proposition 2, the de√ √ gree of violation of the triangle inequality is essentially controlled by M − m. [sent-71, score-0.055]

43 Given a smooth convex function with almost constant Hessian matrix, we would expect that to a large degree, Bregman distance will satisfy the triangle inequality. [sent-72, score-0.336]

44 2 Problem Formulation To a learn a Bregman distance function, the key is to find the appropriate convex function ϕ(x) that is consistent with the given pairwise constraints. [sent-75, score-0.39]

45 Given a kernel function κ(x, x ) : Rd × Rd → R, our goal is to search for a convex function ϕ(x) ∈ Hκ such that the induced Bregman distance function, denoted by dϕ (x, x ), minimizes the overall training error with respect to the given pairwise constraints. [sent-77, score-0.418]

46 , n} the collection of pairwise constraints for training. [sent-81, score-0.115]

47 i i Each pairwise constraint consists of a pair of instances x1 and x2 , and a label yi that is +1 if x1 and i i i x2 are similar and −1 if x1 and x2 are dissimilar. [sent-82, score-0.2]

48 Following the maximum margin framework for classification, we cast the problem of learning a Bregman distance function from pairwise constraints into the following optimization problem, i. [sent-87, score-0.394]

49 The main challenge with solving the variational problem in (3) is that it is difficult to derive a representer theorem for ϕ(x) because it is ϕ(x) used in the definition of distance function, not ϕ(x). [sent-90, score-0.28]

50 To resolve this problem, we consider a special family of kernel functions κ(x, x ) that has the form κ(x1 , x2 ) = h(x1 x2 ) where h : R → R is a strictly convex function. [sent-92, score-0.109]

51 , xN ) (5) 3 We define H and H⊥ as follows ¯ H = span{κ(x, ·), ∀x ∈ A}, H⊥ = span{κ(x, ·), ∀x ∈ A} (6) The following proposition summarizes an important property of reproducing kernel Hilbert space Hκ when kernel function κ(·, ·) is restricted to the form in Eq. [sent-105, score-0.112]

52 This results N in ϕ(x) = i=1 αi h(xi x), and consequently d(xa , xb ) as follows N d(xa , xb ) = αi (h (xa xi ) − h (xb xi ))xi (xa − xb ) (8) i=1 By defining h(xa ) = (h (xa x1 ), . [sent-124, score-1.514]

53 , h (xa xN )) , we can express d(xa , xb ) as follows d(xa , xb ) = (xa − xb ) X(α ◦ [h(xa ) − h(xb )]) (9) 2 Notice that when h(z) = z /2, we have d(xa , xb ) expressed as d(xa , xb ) = (xa − xb ) Xdiag(α)X (xa − xb ). [sent-127, score-3.444]

54 (10) N This is a Mahanalobis distance with metric A = Xdiag(α)X = i=1 αi xi xi . [sent-128, score-0.443]

55 , exp(x xN )), and the resulting distance function is no longer stationary due to the non-linear function exp(z). [sent-132, score-0.258]

56 N i=1 αi δ(y − xi ), we have (3) simplified as 1 α Kα + C 2 n εi (11) i=1 yi (x1 − x2 ) X(α ◦ [h(x1 ) − h(x2 )]) − b ≥ 1 − εi , i i i i εi ≥ 0, i = 1, . [sent-135, score-0.086]

57 By defining N k=1 αk h(x xk ) is a convex zi = [h(x1 ) − h(x2 )] ◦ [X (x1 − x2 )], i i i i (12) we simplify the problem in (11) as follows min α∈RN ,b + L= 1 α Kα + C 2 where (z) = max(0, 1 − z). [sent-142, score-0.14]

58 In particular, at iteration t, given the current solution αt and bt , we compute the gradients as n αL = Kαt + C n ∂ (yi [zi αt − bt ])yi zi , bL ∂ (yi [zi αt − bt ])yi = −C i=1 (14) i=1 + where ∂ (z) stands for the subgradient of (z). [sent-144, score-0.297]

59 t α and b: 9: + yi zi , + yi α L = Kα − C bL = C zi ∈St zi ∈St 10: (4) update Bregman coefficients α = (α1 , . [sent-154, score-0.419]

60 3 Experiments We evaluate the proposed distance learning technique by semi-supervised clustering. [sent-166, score-0.275]

61 In particular, we first learn a distance function from the given pairwise constraints and then apply the learned distance function to data clustering. [sent-167, score-0.653]

62 We verify the efficacy and efficiency of the proposed technique by comparing it with a number of state-of-the-art algorithms for distance metric learning. [sent-168, score-0.422]

63 1 Experimental Testbed and Settings We adopt six well-known datasets from UCI machine learning repository, and six popular text benchmark datasets1 in our experiments. [sent-170, score-0.206]

64 These datasets are chosen for clustering because they vary signif1 The Reuter dataset is available at: http://renatocorrea. [sent-171, score-0.155]

65 The diversity of datasets allows us to examine the effectiveness of the proposed learning technique more comprehensively. [sent-174, score-0.057]

66 More specifically, we randomly sample a subset of pairs from the pool of all possible pairs (every two instances forms a pair). [sent-177, score-0.076]

67 , yi = +1) if they share the same class label, and form a cannot-link constraint (i. [sent-180, score-0.083]

68 , yi = −1) if they are assigned to different classes. [sent-182, score-0.067]

69 To perform data clustering, we run the k-means algorithm using the distance function learned from 500 randomly sampled positive constraints 500 random negative constraints. [sent-187, score-0.308]

70 We repeat each clustering experiment for 20 times, and report the final results by averaging over the 20 runs. [sent-191, score-0.115]

71 To evaluate the clustering performance, we use the some standard performance metrics, including pairwise Precision, pairwise Recall, and pairwise F1 measures [9], which are evaluated base on the pairwise results. [sent-193, score-0.463]

72 2 Performance Evaluation on Low-dimensional Datasets The first set of experiments evaluates the clustering performance on six UCI datasets. [sent-196, score-0.178]

73 The top two highest average F1 scores on each dataset were highlighted in bold font. [sent-198, score-0.056]

74 3 Performance Evaluation on High-dimensional Text Data We evaluate the clustering performance on six text datasets. [sent-201, score-0.218]

75 Since some of the methods are infeasible for text clustering due to the high dimensionality, we only include the results for the methods which are feasible for this experiment (i. [sent-202, score-0.187]

76 The results show that the learned Bregman distance function is applicable for high dimensional data, and it outperforms the other commonly used text clustering methods for four out of six datasets. [sent-208, score-0.552]

77 6 method baseline Ck-means ITML Xing RCA DCA DistBoost Bk-means precision 72. [sent-209, score-0.051]

78 19 Table 2: Evaluation of clustering performance (average precision, recall, and F1) on six UCI datasets. [sent-491, score-0.178]

79 The top two F1 scores are highlighted in bold font for each dataset. [sent-492, score-0.056]

80 95 Table 3: Evaluation of clustering F1 performance on the high dimensional text data. [sent-571, score-0.209]

81 For a conventional clustering algorithm such as k-means, its computational complexity is determined by both the calculation of distance and the clustering scheme. [sent-576, score-0.509]

82 For a semi-supervised clustering algorithm based on distance learning, the overall computational time include both the time for training an appropriate distance function and the time for clustering data points. [sent-577, score-0.746]

83 The average running times of semi-supervised clustering over the six UCI datasets are listed in Table 4. [sent-578, score-0.218]

84 It is clear that the Bregman distance based clustering has comparable efficiency with simple methods like RCA and DCA on low dimensional data, and runs much faster than Xing, ITML, and DistBoost. [sent-579, score-0.41]

85 On the high dimensional text data, it is much faster than other applicable DML methods. [sent-580, score-0.094]

86 84 Table 4: Comparison of average running time over the six UCI datasets and subsets of six text datasets (10% sampling from the datasets in Table 1). [sent-597, score-0.286]

87 7 4 Conclusions In this paper, we propose to learn a Bregman distance function for clustering algorithms using a non-parametric approach. [sent-598, score-0.373]

88 The proposed scheme explicitly address two shortcomings of the existing approaches for distance fuction/metric learning, i. [sent-599, score-0.295]

89 , assuming a fixed distance metric for the entire input space and high computational cost for high dimensional data. [sent-601, score-0.476]

90 We incorporate the Bregman distance function into the k-means clustering algorithm for semi-supervised data clustering. [sent-602, score-0.373]

91 Experiments of semi-supervised clustering with six UCI datasets and six high dimensional text datasets have shown that the Bregman distance function outperforms other distance metric learning algorithms in F1 measure. [sent-603, score-1.078]

92 It also verifies that the proposed distance learning algorithm is computationally efficient, and is capable of handling high dimensional data. [sent-604, score-0.353]

93 First, let us denote by f as follows: √ √ f = ( M − m)[d(xa , xc )d(xc , xb )]1/4 The square of the right side of Eq. [sent-609, score-0.878]

94 (2) is ( d(xa , xc ) + d(xc , xb ) + f 1/4 )2 = d(xa , xb ) − η(xa , xb , xc ) + δ(xa , xb , xc ) where δ(xa , xb , xc ) = f 2 + 2f d(xa , xc ) + 2f d(xc , xb ) + 2 d(xa , xc )d(xc , xb ) η(xa , xb , xc ) = ( ϕ(xa ) − ϕ(xc ))(xc − xb ) + ( ϕ(xc ) − ϕ(xb ))(xa − xc ). [sent-610, score-7.26]

95 From this above equation, the proposition holds if and only if δ(xa , xb , xc ) − η(xa , xb , xc ) ≥ 0. [sent-611, score-1.748]

96 The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. [sent-633, score-0.09]

97 Learning distance metrics with contextual constraints for image retrieval. [sent-678, score-0.305]

98 Distance metric learning for large margin nearest neighbor classification. [sent-724, score-0.204]

99 Distance metric learning from uncertain side information with application to automated photo tagging. [sent-734, score-0.179]

100 Distance metric learning with application to clustering with side-information. [sent-745, score-0.262]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xb', 0.492), ('xa', 0.438), ('bregman', 0.383), ('xc', 0.354), ('distance', 0.258), ('metric', 0.147), ('clustering', 0.115), ('dca', 0.112), ('oom', 0.112), ('itml', 0.103), ('rca', 0.103), ('oot', 0.096), ('zi', 0.095), ('pairwise', 0.087), ('distboost', 0.08), ('yi', 0.067), ('mahalanobis', 0.064), ('six', 0.063), ('hoi', 0.063), ('bt', 0.059), ('proposition', 0.056), ('xing', 0.053), ('precision', 0.051), ('bl', 0.048), ('dyq', 0.048), ('reuter', 0.048), ('uci', 0.047), ('convex', 0.045), ('st', 0.043), ('webkb', 0.042), ('text', 0.04), ('jin', 0.04), ('datasets', 0.04), ('xn', 0.039), ('hessian', 0.038), ('dimensional', 0.037), ('learns', 0.035), ('recall', 0.033), ('triangle', 0.033), ('side', 0.032), ('xdiag', 0.032), ('instances', 0.03), ('constraints', 0.028), ('kernel', 0.028), ('dissimilar', 0.027), ('diabetes', 0.026), ('nanyang', 0.026), ('nca', 0.026), ('placed', 0.026), ('china', 0.025), ('subgradient', 0.025), ('testbed', 0.024), ('lei', 0.024), ('computationally', 0.024), ('appendix', 0.023), ('pairs', 0.023), ('technological', 0.023), ('bold', 0.023), ('learned', 0.022), ('wu', 0.022), ('heterogeneous', 0.022), ('newsgroup', 0.022), ('violation', 0.022), ('pegasos', 0.022), ('representer', 0.022), ('hertz', 0.022), ('sonar', 0.022), ('span', 0.021), ('functions', 0.021), ('cluster', 0.021), ('conventional', 0.021), ('margin', 0.021), ('eu', 0.02), ('multimedia', 0.02), ('nearest', 0.02), ('scheme', 0.02), ('ionosphere', 0.019), ('xi', 0.019), ('table', 0.019), ('metrics', 0.019), ('boosting', 0.018), ('highlighted', 0.018), ('rd', 0.018), ('liu', 0.017), ('cosine', 0.017), ('proposed', 0.017), ('high', 0.017), ('avoids', 0.016), ('constraint', 0.016), ('exp', 0.016), ('neighbor', 0.016), ('calculate', 0.016), ('rn', 0.016), ('matrix', 0.015), ('strictly', 0.015), ('scores', 0.015), ('collaborative', 0.015), ('infeasible', 0.015), ('mining', 0.015), ('dimensionality', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

2 0.27054253 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

Author: Rong Jin, Shijun Wang, Yang Zhou

Abstract: In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.

3 0.26782373 74 nips-2009-Efficient Bregman Range Search

Author: Lawrence Cayton

Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1

4 0.24156162 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

5 0.13571766 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

Author: Peter Carbonetto, Matthew King, Firas Hamze

Abstract: We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean field and expectation propagation it is not restricted to tractable classes of approximating distributions. Our approach can also be viewed as a “population-based” sequential Monte Carlo (SMC) method, but unlike existing SMC methods there is no need to design the artificial sequence of distributions. Significantly, our framework offers a principled means to exchange the variance of an importance sampling estimate for the bias incurred through variational approximation. We conduct experiments on a difficult inference problem in population genetics, a problem that is related to inference for LDA. The results of these experiments suggest that our method can offer improvements in stability and accuracy over existing methods, and at a comparable cost. 1

6 0.12430906 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

7 0.11697632 223 nips-2009-Sparse Metric Learning via Smooth Optimization

8 0.11208607 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

9 0.10295461 234 nips-2009-Streaming k-means approximation

10 0.071975984 129 nips-2009-Learning a Small Mixture of Trees

11 0.069803424 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

12 0.060753986 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

13 0.055195607 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

14 0.05497165 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

15 0.052631181 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

16 0.051757187 72 nips-2009-Distribution Matching for Transduction

17 0.049833823 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

18 0.047017545 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

19 0.046740759 147 nips-2009-Matrix Completion from Noisy Entries

20 0.045939766 182 nips-2009-Optimal Scoring for Unsupervised Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.162), (1, 0.101), (2, 0.006), (3, 0.039), (4, 0.06), (5, 0.015), (6, -0.092), (7, 0.103), (8, -0.029), (9, 0.074), (10, 0.037), (11, -0.059), (12, 0.212), (13, 0.077), (14, 0.059), (15, -0.255), (16, 0.18), (17, -0.102), (18, 0.039), (19, -0.017), (20, -0.347), (21, -0.152), (22, -0.089), (23, -0.155), (24, 0.103), (25, 0.091), (26, -0.103), (27, 0.074), (28, -0.051), (29, -0.173), (30, -0.012), (31, 0.055), (32, -0.151), (33, -0.099), (34, 0.084), (35, 0.085), (36, 0.098), (37, -0.026), (38, 0.003), (39, -0.059), (40, 0.057), (41, 0.018), (42, 0.119), (43, -0.022), (44, 0.085), (45, 0.002), (46, 0.063), (47, 0.027), (48, -0.063), (49, -0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95900834 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

2 0.70491421 74 nips-2009-Efficient Bregman Range Search

Author: Lawrence Cayton

Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1

3 0.70339727 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

4 0.61089182 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

Author: Rong Jin, Shijun Wang, Yang Zhou

Abstract: In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.

5 0.47765338 223 nips-2009-Sparse Metric Learning via Smooth Optimization

Author: Yiming Ying, Kaizhu Huang, Colin Campbell

Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.

6 0.45895883 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

7 0.44236204 234 nips-2009-Streaming k-means approximation

8 0.41999719 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

9 0.3465994 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

10 0.31463182 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

11 0.30641252 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

12 0.28322652 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines

13 0.2752625 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

14 0.27115256 91 nips-2009-Fast, smooth and adaptive regression in metric spaces

15 0.25799423 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

16 0.24560331 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

17 0.24188969 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming

18 0.23265257 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem

19 0.23233268 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

20 0.22910492 34 nips-2009-Anomaly Detection with Score functions based on Nearest Neighbor Graphs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.041), (25, 0.05), (35, 0.046), (36, 0.102), (37, 0.047), (38, 0.204), (39, 0.069), (44, 0.061), (48, 0.011), (58, 0.104), (61, 0.021), (71, 0.037), (86, 0.072), (91, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83306432 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Author: Lei Wu, Rong Jin, Steven C. Hoi, Jianke Zhu, Nenghai Yu

Abstract: Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data. 1

2 0.72629803 122 nips-2009-Label Selection on Graphs

Author: Andrew Guillory, Jeff A. Bilmes

Abstract: We investigate methods for selecting sets of labeled vertices for use in predicting the labels of vertices on a graph. We specifically study methods which choose a single batch of labeled vertices (i.e. offline, non sequential methods). In this setting, we find common graph smoothness assumptions directly motivate simple label selection methods with interesting theoretical guarantees. These methods bound prediction error in terms of the smoothness of the true labels with respect to the graph. Some of these bounds give new motivations for previously proposed algorithms, and some suggest new algorithms which we evaluate. We show improved performance over baseline methods on several real world data sets. 1

3 0.67666924 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

Author: Jean-pascal Pfister, Peter Dayan, Máté Lengyel

Abstract: Synapses exhibit an extraordinary degree of short-term malleability, with release probabilities and effective synaptic strengths changing markedly over multiple timescales. From the perspective of a fixed computational operation in a network, this seems like a most unacceptable degree of added variability. We suggest an alternative theory according to which short-term synaptic plasticity plays a normatively-justifiable role. This theory starts from the commonplace observation that the spiking of a neuron is an incomplete, digital, report of the analog quantity that contains all the critical information, namely its membrane potential. We suggest that a synapse solves the inverse problem of estimating the pre-synaptic membrane potential from the spikes it receives, acting as a recursive filter. We show that the dynamics of short-term synaptic depression closely resemble those required for optimal filtering, and that they indeed support high quality estimation. Under this account, the local postsynaptic potential and the level of synaptic resources track the (scaled) mean and variance of the estimated presynaptic membrane potential. We make experimentally testable predictions for how the statistics of subthreshold membrane potential fluctuations and the form of spiking non-linearity should be related to the properties of short-term plasticity in any particular cell type. 1

4 0.67540216 86 nips-2009-Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis

Author: Barry Chai, Dirk Walther, Diane Beck, Li Fei-fei

Abstract: In this study, we present a new method for establishing fMRI pattern-based functional connectivity between brain regions by estimating their multivariate mutual information. Recent advances in the numerical approximation of highdimensional probability distributions allow us to successfully estimate mutual information from scarce fMRI data. We also show that selecting voxels based on the multivariate mutual information of local activity patterns with respect to ground truth labels leads to higher decoding accuracy than established voxel selection methods. We validate our approach with a 6-way scene categorization fMRI experiment. Multivariate information analysis is able to find strong information sharing between PPA and RSC, consistent with existing neuroscience studies on scenes. Furthermore, an exploratory whole-brain analysis uncovered other brain regions that share information with the PPA-RSC scene network.

5 0.6686644 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

Author: Rong Jin, Shijun Wang, Yang Zhou

Abstract: In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.

6 0.66159767 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

7 0.66092032 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

8 0.64931405 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

9 0.64916933 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

10 0.64727759 70 nips-2009-Discriminative Network Models of Schizophrenia

11 0.64623523 191 nips-2009-Positive Semidefinite Metric Learning with Boosting

12 0.64583808 104 nips-2009-Group Sparse Coding

13 0.64175028 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

14 0.64171201 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

15 0.64127713 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

16 0.64047933 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

17 0.63916248 234 nips-2009-Streaming k-means approximation

18 0.6388815 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

19 0.6362825 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

20 0.63603586 148 nips-2009-Matrix Completion from Power-Law Distributed Samples