nips nips2006 nips2006-105 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. [sent-7, score-0.301]
2 In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. [sent-8, score-0.334]
3 We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. [sent-9, score-0.29]
4 This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. [sent-10, score-0.147]
5 The distance metric defining the neighbors of a query point plays a fundamental role in the accuracy of kNN classification. [sent-16, score-0.242]
6 In most cases Euclidean distance is used as a similarity measure. [sent-17, score-0.082]
7 However, in most cases the data is distributed in a way so that distance analysis along some specific directions of the features space can be more informative than along others. [sent-19, score-0.121]
8 In such cases and when training data is available in advance, distance metric learning [5, 10, 4, 1, 9] has been shown to yield significant improvement in kNN classification. [sent-20, score-0.272]
9 Euclidean distance computation in the transformed space is then equivalent to a non-uniform metric analysis in the original input space. [sent-22, score-0.268]
10 The traditional solution is to apply dimensionality reduction methods to the data and then learn a suitable metric in the resulting low-dimensional subspace. [sent-27, score-0.301]
11 However, dimensionality reduction methods generally optimize objectives unrelated to classification and, as a consequence, might generate representations that are significantly less discriminative than the original data. [sent-29, score-0.163]
12 Thus, metric learning within the subspace might lead to suboptimal similarity measures. [sent-30, score-0.239]
13 In this paper we show that better performance can be achieved by directly solving for a low-dimensional embedding that optimizes a measure of kNN classification performance. [sent-31, score-0.064]
14 Their technique learns a metric that attempts to shrink distances of neighboring similarly-labeled points and to separate points in different classes by a large margin. [sent-34, score-0.234]
15 We describe the Large Margin Component Analysis (LMCA) algorithm, a technique that solves directly for a low-dimensional embedding of the data such that Euclidean distance in this space minimizes the large margin metric objective described in [9]. [sent-36, score-0.404]
16 Our approach solves for only D · d unknowns, where D is the dimensionality of the inputs and d is the dimensionality of the target space. [sent-37, score-0.205]
17 [9] learns a Mahalanobis distance of the inputs, which requires solving for a D × D matrix, using iterative semidefinite programming methods. [sent-39, score-0.087]
18 We propose a technique that learns Mahalanobis distance metrics in nonlinear feature spaces. [sent-42, score-0.23]
19 Our approach combines the goal of dimensionality reduction with a novel ”kernelized” version of the metric learning objective of Weinberger et al. [sent-43, score-0.343]
20 We describe an algorithm that optimizes this combined objective directly. [sent-45, score-0.083]
21 We demonstrate that, even when data is low-dimensional and dimensionality reduction is not needed, this technique can be used to learn nonlinear metrics leading to significant improvement in kNN classification accuracy over [9]. [sent-46, score-0.275]
22 2 Linear Dimensionality Reduction for Large Margin kNN Classification In this section we briefly review the algorithm presented in [9] for metric learning in the context of kNN classification. [sent-47, score-0.171]
23 [9] exploits this property by learning a linear transformation of the input space that aims at creating consistently labeled k-nearest neighborhoods, i. [sent-51, score-0.111]
24 The second term encourages distancing each example xi from differently labeled points by an amount equal to 1 plus the distance from xi to any of its k similarly-labeled closest points. [sent-63, score-0.154]
25 Upon optimization of ǫ(L), test example xq is classified according to the kNN rule applied to its projection x′ = Lxq , using Euclidean distance as metric. [sent-66, score-0.117]
26 Equivalently, such classification can be q interpreted as kNN classification in the original input space under the Mahalanobis distance metric induced by matrix M = LT L. [sent-67, score-0.268]
27 Although Equation 1 is non-convex in L, it can be rewritten as a semidefinite program ǫ(M) in terms of the metric M [9]. [sent-68, score-0.171]
28 In such cases [9] propose applying dimensionality reduction methods, such as PCA, followed by metric learning within the resulting low-dimensional subspace. [sent-71, score-0.318]
29 As outlined above, this procedure leads to suboptimal metric learning. [sent-72, score-0.192]
30 In this paper we propose an alternative approach that solves jointly for dimensionality reduction and metric learning. [sent-73, score-0.333]
31 The key idea is to choose the transformation L in Equation 1 to be a nonsquare matrix of size d×D, with d << D. [sent-74, score-0.074]
32 Euclidean distance in this low-dimensional embedding is equivalent to Mahalanobis distance in the original input space under the rank-deficient metric M = LT L (M has now rank at most d). [sent-76, score-0.341]
33 Although the objective optimized by our method is also not convex, we experimentally demonstrate that our solution converges consistently to better metrics than those computed via the application of PCA followed by subspace distance learning (see Section 4). [sent-83, score-0.213]
34 3 Nonlinear Feature Extraction for Large Margin kNN Classification In the previous section we have described an algorithm that jointly solves for linear dimensionality reduction and metric learning. [sent-86, score-0.333]
35 We now describe how to ”kernelize” this method in order to compute non-linear features of the inputs that optimize our distance learning objective. [sent-87, score-0.121]
36 Our approach learns a low-rank Mahalanobis distance metric in a high dimensional feature space F , related to the inputs by a nonlinear map φ : ℜD → F . [sent-88, score-0.379]
37 We restrict our analysis to nonlinear maps φ for which there exist kernel functions k that can be used to compute the feature inner products without carrying out the map, i. [sent-89, score-0.064]
38 i We modify our objective ǫ(L) by substituting inputs xi with features φ(xi ) into Equation 1. [sent-92, score-0.154]
39 L is now a transformation from the space F into a low-dimensional space ℜd . [sent-93, score-0.086]
40 We seek the transformation L minimizing the modified objective function ǫ(L). [sent-94, score-0.088]
41 The gradient in feature space can now be written as: ∂ǫ(L) = 2 ηij L(φi − φj )(φi − φj )T + ∂L ij ηij (1 − yil )h′ (sijl )L (φi − φj )(φi − φj )T − (φi − φl )(φi − φl )T 2c ijl (3) where sijl = (||L(φi − φj )||2 − ||L(φi − φl )||2 + 1). [sent-95, score-0.39]
42 This form of nonlinear map is analogous to that used in kernel-PCA and it allows us to parameterize the transformation L in terms of only d · n parameters, the entries of the matrix Ω. [sent-101, score-0.107]
43 1 The gradient in feature space can be computed as features φi solely in terms of dot products (φT φj ). [sent-104, score-0.106]
44 i ∂ǫ(L) ∂L = ΓΦ, where Γ depends on T Proof Defining ki = Φφi = [k(x1 , xi ), . [sent-105, score-0.156]
45 , k(xn , xi )] , non-linear feature projections can be computed as Lφi = ΩΦφi = Ωki . [sent-108, score-0.102]
46 Setting Γ = (ki −kj ) 2Ω ηij Ei (ki −kj ) − Ej + ij (ki −kj ) ηij (1 − yil )h′ (sijl ) Ei 2cΩ (ki −kj ) − Ej (ki −kl ) − Ei (ki −kl ) + El (4) ijl proves the Lemma. [sent-115, score-0.266]
47 This result allows us to implicitly solve for the transformation without ever computing the features in the high-dimensional space F : the key idea is to iteratively update Ω rather than L. [sent-116, score-0.1]
48 4 Experimental results We compared our methods to the metric learning algorithm of Weinberger et al. [sent-120, score-0.171]
49 In all of the experiments reported here, LMCA was initialized using PCA, while KLMCA used the transformation computed by kernel-PCA as initial guess. [sent-123, score-0.064]
50 • Isolet1 is a dataset of speech features from the UC Irvine repository, consisting of 6238 training examples and 1559 testing examples with 617 attributes. [sent-131, score-0.128]
51 • The AT&T; Faces2 database contains 10 grayscale face images of each of 40 distinct subjects. [sent-133, score-0.064]
52 The images were taken at different times, with varying illumination, facial expressions and poses. [sent-134, score-0.082]
53 Except for Isolet, for which a separate testing set is specified, we computed all of the experimental results by averaging over 100 runs of random splitting of the examples into training and testing sets. [sent-142, score-0.172]
54 For AT&T; Faces, training sets were selected by sampling 7 images at random for each person. [sent-144, score-0.062]
55 Unlike LMCA and KLMCA, which directly solve for low-dimensional embeddings of the input data, LMNN cannot be run on datasets of dimensionalities such as those considered here and must be trained on lower-dimensional representations of the inputs. [sent-146, score-0.075]
56 Figure 1 summarizes the training and testing performances of kNN classification using the metrics learned by the three algorithms for different subspace dimensions. [sent-148, score-0.179]
57 LMCA and KLMCA give considerably better classification accuracy than LMNN on all datasets, with the kernelized version of our algorithm always outperforming the linear version. [sent-149, score-0.076]
58 The difference in accuracy between our algorithms and LMNN is particularly dramatic when a small number of projection dimensions is used. [sent-150, score-0.096]
59 In such cases, LMNN is unable to find good metrics in the low-dimensional subspace computed by PCA. [sent-151, score-0.103]
60 By contrast, LMCA and KLMCA solve for the low-dimensional subspace that optimizes the classification-related objective 1 Available at http://www. [sent-152, score-0.115]
61 (c) Absolute difference between original images and reconstructions from features for PCA (left) and LMCA (right). [sent-169, score-0.099]
62 LMCA learns invariance to effects that are irrelevant for classification: non-uniform illumination, facial expressions, and glasses (training data contains images with and without glasses for same individuals). [sent-171, score-0.165]
63 In our experiments we found that all three classification algorithms (LMNN, LMCA+kNN, and KLMCA+kNN) performed considerably better than kNN using the Euclidean metric in the PCA and KPCA subspaces. [sent-173, score-0.187]
64 9% testing error rate when used on the PCA features, and a 9. [sent-175, score-0.08]
65 7% testing error rate when applied to the nonlinear features computed by KPCA. [sent-176, score-0.177]
66 While LMNN is applied to features in a low-dimensional space, LMCA and KLMCA learn a lowrank metric directly from the high-dimensional inputs. [sent-177, score-0.205]
67 As a reference, using d = 10 and K = 3 on the AT&T; dataset, LMNN learns a metric in about 5 seconds, while LMCA and KLMCA converge to a minimum in 21 and 24 seconds, respectively. [sent-180, score-0.208]
68 Figure 2 shows comparative reconstructions of images obtained from PCA and LMCA features by inverting their linear mappings. [sent-182, score-0.099]
69 The PCA and LMCA subspaces in this experiment were computed from cropped face images of size 50 × 50 pixels, taken from a set of consumer photographs. [sent-183, score-0.082]
70 The dataset contains 2459 face images corresponding to 152 distinct individuals. [sent-184, score-0.064]
71 For a given target dimensionality, PCA has the property of computing the linear transformation minimizing the reconstruction error under the L2 norm. [sent-187, score-0.066]
72 Unsurprisingly, the PCA face reconstructions are extremely faithful reproductions of the original images. [sent-188, score-0.073]
73 By contrast, LMCA seeks a subspace where neighboring examples belong to the same class and points differently labeled are separated by a large margin. [sent-190, score-0.071]
74 For the case of face verification, LMCA de-emphasizes changes in illumination, presence or absence of glasses and smiling expressions (Figure 2). [sent-192, score-0.101]
75 When the input data does not require dimensionality reduction, LMNN and LMCA solve the same optimization problem, but LMNN should be preferred over LMCA in light of its guarantees of convergence to the global minimum of the objective. [sent-193, score-0.117]
76 However, even in such cases, KLMCA can be used in lieu of LMNN in order to extract nonlinear features from the inputs. [sent-194, score-0.079]
77 The dimensionality of the data in these sets ranges from 4 to 34. [sent-197, score-0.068]
78 1 30 IRIS − training error % IONO − training error % 4. [sent-199, score-0.108]
79 1 (a) kNN w/ KLMCA LMNN EUCL + kNN kNN w/ KLMCA LMNN EUCL + kNN BAL − testing error % 14. [sent-207, score-0.08]
80 4 kNN w/ KLMCA LMNN EUCL + kNN kNN w/ KLMCA LMNN EUCL + kNN WINE − testing error % IRIS − testing error % 4. [sent-208, score-0.16]
81 Algorithms are kNN using Euclidean distance, LMNN [9], kNN in the nonlinear feature space computed by our KLMCA algorithm, and multiclass SVM. [sent-224, score-0.102]
82 to compare LMNN with KLMCA under identical conditions, KLMCA was restricted to compute a number of features equal to the input dimensionality, although in our experience using additional nonlinear features often results in better classification performance. [sent-225, score-0.14]
83 On all datasets except on Wine, for which the mapping to the highdimensional space seems to hurt performance (note also the high error rate of SVM), KLMCA gives better classification accuracy than LMNN. [sent-228, score-0.108]
84 Note also that the error rates of KLMCA are consistently lower than those reported in [9] for SVM under identical training and testing conditions. [sent-229, score-0.132]
85 The novelty of our method lies in an optimization that solves for data reduction and metric learning simultaneously. [sent-233, score-0.287]
86 Additionally, while [9] is limited to learning a global linear transformation of the inputs, we describe a kernelized version of our method that extracts non-linear features of the inputs. [sent-234, score-0.119]
87 In our work the metric is parameterized by arbitrary nonlinear maps for which kernel functions exist. [sent-245, score-0.216]
88 However, this approach computes dimensionality reductions through a two-step solution which involves first solving for a possibly full-rank metric and then estimating the low-rank approximation via spectral decomposition. [sent-248, score-0.239]
89 Furthermore, the metric is trained with the aim of collapsing all examples in the same class to a single point. [sent-250, score-0.19]
90 SVDM optimizes an objective that is a combination of dimensionality reduction and classification. [sent-253, score-0.213]
91 Specifically, a linear mapping from input to feature space and a linear classifier applied to feature space, are trained simultaneously. [sent-254, score-0.085]
92 6 Discussion We have presented a novel algorithm that simultaneously optimizes the objectives of dimensionality reduction and metric learning. [sent-257, score-0.375]
93 Our algorithm seeks, among all possible low-dimensional projections, the one that best satisfies a large margin metric objective. [sent-258, score-0.211]
94 Our approach contrasts techniques that are unable to learn metrics in high-dimensions and that must rely on dimensionality reduction methods to be first applied to the data. [sent-259, score-0.183]
95 Although our optimization is not convex, we have experimentally demonstrated that the metrics learned by our solution are consistently superior to those computed by globally-optimal methods forced to search in a low-dimensional subspace. [sent-260, score-0.111]
96 The nonlinear version of our technique requires us to compute the kernel distance of a query point to all training examples. [sent-261, score-0.155]
97 In addition, we will investigate methods to further reduce overfitting when learning dimensionality reduction from very high dimensions. [sent-263, score-0.13]
98 Learning a similarity metric discriminatively, with application to face verification. [sent-270, score-0.222]
99 Distance metric learning for large margin nearest neighbor classification. [sent-326, score-0.255]
100 Distance metric learning, with application to clustering with side-information. [sent-339, score-0.171]
wordName wordTfidf (topN-words)
[('knn', 0.491), ('klmca', 0.451), ('lmca', 0.395), ('lmnn', 0.395), ('metric', 0.171), ('ki', 0.115), ('eucl', 0.113), ('yil', 0.099), ('kj', 0.098), ('pca', 0.095), ('ijl', 0.085), ('ij', 0.082), ('mahalanobis', 0.073), ('weinberger', 0.073), ('classi', 0.073), ('sijl', 0.07), ('dimensionality', 0.068), ('reduction', 0.062), ('testing', 0.06), ('fmri', 0.058), ('isolet', 0.056), ('metrics', 0.053), ('distance', 0.05), ('bal', 0.049), ('wine', 0.047), ('transformation', 0.046), ('nonlinear', 0.045), ('projection', 0.045), ('nca', 0.042), ('starplus', 0.042), ('svdm', 0.042), ('objective', 0.042), ('iris', 0.042), ('optimizes', 0.041), ('xi', 0.041), ('margin', 0.04), ('ei', 0.039), ('kernelized', 0.039), ('ej', 0.037), ('reconstructions', 0.037), ('learns', 0.037), ('euclidean', 0.037), ('glasses', 0.037), ('inputs', 0.037), ('face', 0.036), ('cation', 0.035), ('training', 0.034), ('features', 0.034), ('objectives', 0.033), ('faces', 0.032), ('solves', 0.032), ('subspace', 0.032), ('xl', 0.032), ('dimensions', 0.03), ('unknowns', 0.029), ('kl', 0.029), ('xj', 0.029), ('images', 0.028), ('iono', 0.028), ('lold', 0.028), ('nonsquare', 0.028), ('illumination', 0.028), ('datasets', 0.028), ('expressions', 0.028), ('input', 0.027), ('neighbor', 0.026), ('technique', 0.026), ('facial', 0.026), ('goldberger', 0.025), ('lorenzo', 0.025), ('riya', 0.025), ('projections', 0.024), ('veri', 0.023), ('embedding', 0.023), ('optimization', 0.022), ('differently', 0.022), ('accuracy', 0.021), ('globerson', 0.021), ('suboptimal', 0.021), ('semide', 0.021), ('space', 0.02), ('weiss', 0.02), ('error', 0.02), ('embeddings', 0.02), ('feature', 0.019), ('collapsing', 0.019), ('repository', 0.019), ('highdimensional', 0.019), ('svm', 0.019), ('consistently', 0.018), ('nearest', 0.018), ('lda', 0.018), ('computed', 0.018), ('seeks', 0.017), ('cases', 0.017), ('tting', 0.016), ('parameterize', 0.016), ('considerably', 0.016), ('similarity', 0.015), ('gradient', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 105 nips-2006-Large Margin Component Analysis
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
2 0.11335473 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg
Abstract: We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.
3 0.092736997 130 nips-2006-Max-margin classification of incomplete data
Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller
Abstract: We consider the problem of learning classifiers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to begin with a preprocessing phase that completes the missing features, and then use a standard classification procedure. In this paper we show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate this task using a geometrically-inspired objective function, and discuss two optimization approaches: The linearly separable case is written as a set of convex feasibility problems, and the non-separable case has a non-convex objective that we optimize iteratively. By avoiding the pre-processing phase in which the data is completed, these approaches offer considerable computational savings. More importantly, we show that by elegantly handling complex patterns of missing values, our approach is both competitive with other methods when the values are missing at random and outperforms them when the missing values have non-trivial structure. We demonstrate our results on two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. 1
4 0.085930079 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
5 0.078012988 65 nips-2006-Denoising and Dimension Reduction in Feature Space
Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1
6 0.064536825 85 nips-2006-Geometric entropy minimization (GEM) for anomaly detection and localization
7 0.062287956 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
8 0.055561081 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
9 0.053751487 186 nips-2006-Support Vector Machines on a Budget
10 0.052826144 120 nips-2006-Learning to Traverse Image Manifolds
11 0.052639611 7 nips-2006-A Local Learning Approach for Clustering
12 0.052207708 149 nips-2006-Nonnegative Sparse PCA
13 0.050555266 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
14 0.049491044 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm
15 0.04904566 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
16 0.047530286 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
17 0.046204358 109 nips-2006-Learnability and the doubling dimension
18 0.04560107 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
19 0.045394078 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
20 0.043002866 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
topicId topicWeight
[(0, -0.155), (1, 0.049), (2, 0.049), (3, 0.047), (4, -0.021), (5, 0.018), (6, -0.045), (7, 0.001), (8, 0.013), (9, -0.004), (10, -0.057), (11, 0.023), (12, -0.009), (13, -0.076), (14, 0.059), (15, -0.079), (16, -0.001), (17, -0.004), (18, -0.024), (19, 0.023), (20, 0.127), (21, 0.038), (22, 0.079), (23, -0.026), (24, 0.032), (25, -0.05), (26, -0.092), (27, 0.013), (28, -0.121), (29, -0.188), (30, 0.015), (31, 0.158), (32, -0.057), (33, -0.028), (34, 0.021), (35, -0.062), (36, -0.053), (37, 0.151), (38, 0.012), (39, 0.037), (40, 0.01), (41, -0.105), (42, 0.103), (43, -0.097), (44, -0.006), (45, 0.04), (46, -0.054), (47, 0.003), (48, 0.083), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.90259951 105 nips-2006-Large Margin Component Analysis
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
2 0.66654092 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements
Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke
Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.
3 0.61746329 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg
Abstract: We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.
4 0.53590357 130 nips-2006-Max-margin classification of incomplete data
Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller
Abstract: We consider the problem of learning classifiers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to begin with a preprocessing phase that completes the missing features, and then use a standard classification procedure. In this paper we show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate this task using a geometrically-inspired objective function, and discuss two optimization approaches: The linearly separable case is written as a set of convex feasibility problems, and the non-separable case has a non-convex objective that we optimize iteratively. By avoiding the pre-processing phase in which the data is completed, these approaches offer considerable computational savings. More importantly, we show that by elegantly handling complex patterns of missing values, our approach is both competitive with other methods when the values are missing at random and outperforms them when the missing values have non-trivial structure. We demonstrate our results on two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. 1
5 0.47656488 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis
Author: Murat Dundar, Balaji Krishnapuram, R. B. Rao, Glenn M. Fung
Abstract: Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data: i.e. , the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a Convex Hull representation of multiple instances that is significantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to find the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm significantly improves diagnostic accuracy when compared to both MIL and traditional classifiers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.
6 0.46946323 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
7 0.45186678 120 nips-2006-Learning to Traverse Image Manifolds
8 0.42771432 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
9 0.41502625 82 nips-2006-Gaussian and Wishart Hyperkernels
10 0.37565905 85 nips-2006-Geometric entropy minimization (GEM) for anomaly detection and localization
11 0.35690036 129 nips-2006-Map-Reduce for Machine Learning on Multicore
12 0.35339364 149 nips-2006-Nonnegative Sparse PCA
13 0.34162924 65 nips-2006-Denoising and Dimension Reduction in Feature Space
14 0.33833006 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
15 0.33499715 53 nips-2006-Combining causal and similarity-based reasoning
16 0.32814407 24 nips-2006-Aggregating Classification Accuracy across Time: Application to Single Trial EEG
17 0.32489803 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
18 0.32130459 110 nips-2006-Learning Dense 3D Correspondence
19 0.31700554 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
20 0.31450853 177 nips-2006-Sparse Kernel Orthonormalized PLS for feature extraction in large data sets
topicId topicWeight
[(1, 0.114), (3, 0.04), (7, 0.058), (9, 0.057), (12, 0.011), (22, 0.054), (34, 0.011), (44, 0.042), (52, 0.346), (57, 0.067), (65, 0.051), (69, 0.042), (90, 0.014)]
simIndex simValue paperId paperTitle
1 0.81375271 13 nips-2006-A Scalable Machine Learning Approach to Go
Author: Lin Wu, Pierre F. Baldi
Abstract: Go is an ancient board game that poses unique opportunities and challenges for AI and machine learning. Here we develop a machine learning approach to Go, and related board games, focusing primarily on the problem of learning a good evaluation function in a scalable way. Scalability is essential at multiple levels, from the library of local tactical patterns, to the integration of patterns across the board, to the size of the board itself. The system we propose is capable of automatically learning the propensity of local patterns from a library of games. Propensity and other local tactical information are fed into a recursive neural network, derived from a Bayesian network architecture. The network integrates local information across the board and produces local outputs that represent local territory ownership probabilities. The aggregation of these probabilities provides an effective strategic evaluation function that is an estimate of the expected area at the end (or at other stages) of the game. Local area targets for training can be derived from datasets of human games. A system trained using only 9 × 9 amateur game data performs surprisingly well on a test set derived from 19 × 19 professional game data. Possible directions for further improvements are briefly discussed. 1
same-paper 2 0.74196941 105 nips-2006-Large Margin Component Analysis
Author: Lorenzo Torresani, Kuang-chih Lee
Abstract: Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines. 1
3 0.71983683 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors
Author: Mark Girolami, Mingjun Zhong
Abstract: By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classification setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classifier combination. 1
4 0.46082827 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
Author: Hamed Valizadegan, Rong Jin
Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its efficiency for real-world applications. First, it is computationally expensive and difficult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It significantly improves the computational efficiency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the efficiency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1
5 0.456945 65 nips-2006-Denoising and Dimension Reduction in Feature Space
Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1
6 0.45254567 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
7 0.45199496 167 nips-2006-Recursive ICA
8 0.44946751 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
9 0.44892749 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
10 0.44850564 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
11 0.44832137 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
12 0.44829389 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
13 0.44685775 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights
14 0.44664341 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
15 0.44643748 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
16 0.44535372 158 nips-2006-PG-means: learning the number of clusters in data
17 0.44517216 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
18 0.44480649 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections
19 0.44464573 20 nips-2006-Active learning for misspecified generalized linear models
20 0.4442203 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints