nips nips2006 nips2006-105 knowledge-graph by maker-knowledge-mining

105 nips-2006-Large Margin Component Analysis

Source: pdf

Author: Lorenzo Torresani, Kuang-chih Lee

Abstract: Metric learning has been shown to signiﬁcantly improve the accuracy of k-nearest neighbor (kNN) classiﬁcation. In problems involving thousands of features, distance learning algorithms cannot be used due to overﬁtting and high computational complexity. In such cases, previous work has relied on a two-step solution: ﬁrst apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classiﬁcation performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is deﬁned by a signiﬁcantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overﬁtting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classiﬁcation rates similar, and in several cases superior, to those of support vector machines. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In such cases, previous work has relied on a two-step solution: ﬁrst apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. [sent-7, score-0.301]

2 In this paper we show that better classiﬁcation performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. [sent-8, score-0.334]

3 We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. [sent-9, score-0.29]

4 This projection is deﬁned by a signiﬁcantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overﬁtting. [sent-10, score-0.147]

5 The distance metric deﬁning the neighbors of a query point plays a fundamental role in the accuracy of kNN classiﬁcation. [sent-16, score-0.242]

6 In most cases Euclidean distance is used as a similarity measure. [sent-17, score-0.082]

7 However, in most cases the data is distributed in a way so that distance analysis along some speciﬁc directions of the features space can be more informative than along others. [sent-19, score-0.121]

8 In such cases and when training data is available in advance, distance metric learning [5, 10, 4, 1, 9] has been shown to yield signiﬁcant improvement in kNN classiﬁcation. [sent-20, score-0.272]

9 Euclidean distance computation in the transformed space is then equivalent to a non-uniform metric analysis in the original input space. [sent-22, score-0.268]

10 The traditional solution is to apply dimensionality reduction methods to the data and then learn a suitable metric in the resulting low-dimensional subspace. [sent-27, score-0.301]

11 However, dimensionality reduction methods generally optimize objectives unrelated to classiﬁcation and, as a consequence, might generate representations that are signiﬁcantly less discriminative than the original data. [sent-29, score-0.163]

12 Thus, metric learning within the subspace might lead to suboptimal similarity measures. [sent-30, score-0.239]

13 In this paper we show that better performance can be achieved by directly solving for a low-dimensional embedding that optimizes a measure of kNN classiﬁcation performance. [sent-31, score-0.064]

14 Their technique learns a metric that attempts to shrink distances of neighboring similarly-labeled points and to separate points in different classes by a large margin. [sent-34, score-0.234]

15 We describe the Large Margin Component Analysis (LMCA) algorithm, a technique that solves directly for a low-dimensional embedding of the data such that Euclidean distance in this space minimizes the large margin metric objective described in [9]. [sent-36, score-0.404]

16 Our approach solves for only D · d unknowns, where D is the dimensionality of the inputs and d is the dimensionality of the target space. [sent-37, score-0.205]

17 [9] learns a Mahalanobis distance of the inputs, which requires solving for a D × D matrix, using iterative semideﬁnite programming methods. [sent-39, score-0.087]

18 We propose a technique that learns Mahalanobis distance metrics in nonlinear feature spaces. [sent-42, score-0.23]

19 Our approach combines the goal of dimensionality reduction with a novel ”kernelized” version of the metric learning objective of Weinberger et al. [sent-43, score-0.343]

20 We describe an algorithm that optimizes this combined objective directly. [sent-45, score-0.083]

21 We demonstrate that, even when data is low-dimensional and dimensionality reduction is not needed, this technique can be used to learn nonlinear metrics leading to signiﬁcant improvement in kNN classiﬁcation accuracy over [9]. [sent-46, score-0.275]

22 2 Linear Dimensionality Reduction for Large Margin kNN Classiﬁcation In this section we brieﬂy review the algorithm presented in [9] for metric learning in the context of kNN classiﬁcation. [sent-47, score-0.171]

23 [9] exploits this property by learning a linear transformation of the input space that aims at creating consistently labeled k-nearest neighborhoods, i. [sent-51, score-0.111]

24 The second term encourages distancing each example xi from differently labeled points by an amount equal to 1 plus the distance from xi to any of its k similarly-labeled closest points. [sent-63, score-0.154]

25 Upon optimization of ǫ(L), test example xq is classiﬁed according to the kNN rule applied to its projection x′ = Lxq , using Euclidean distance as metric. [sent-66, score-0.117]

26 Equivalently, such classiﬁcation can be q interpreted as kNN classiﬁcation in the original input space under the Mahalanobis distance metric induced by matrix M = LT L. [sent-67, score-0.268]

27 Although Equation 1 is non-convex in L, it can be rewritten as a semideﬁnite program ǫ(M) in terms of the metric M [9]. [sent-68, score-0.171]

28 In such cases [9] propose applying dimensionality reduction methods, such as PCA, followed by metric learning within the resulting low-dimensional subspace. [sent-71, score-0.318]

29 As outlined above, this procedure leads to suboptimal metric learning. [sent-72, score-0.192]

30 In this paper we propose an alternative approach that solves jointly for dimensionality reduction and metric learning. [sent-73, score-0.333]

31 The key idea is to choose the transformation L in Equation 1 to be a nonsquare matrix of size d×D, with d << D. [sent-74, score-0.074]

32 Euclidean distance in this low-dimensional embedding is equivalent to Mahalanobis distance in the original input space under the rank-deﬁcient metric M = LT L (M has now rank at most d). [sent-76, score-0.341]

33 Although the objective optimized by our method is also not convex, we experimentally demonstrate that our solution converges consistently to better metrics than those computed via the application of PCA followed by subspace distance learning (see Section 4). [sent-83, score-0.213]

34 3 Nonlinear Feature Extraction for Large Margin kNN Classiﬁcation In the previous section we have described an algorithm that jointly solves for linear dimensionality reduction and metric learning. [sent-86, score-0.333]

35 We now describe how to ”kernelize” this method in order to compute non-linear features of the inputs that optimize our distance learning objective. [sent-87, score-0.121]

36 Our approach learns a low-rank Mahalanobis distance metric in a high dimensional feature space F , related to the inputs by a nonlinear map φ : ℜD → F . [sent-88, score-0.379]

37 We restrict our analysis to nonlinear maps φ for which there exist kernel functions k that can be used to compute the feature inner products without carrying out the map, i. [sent-89, score-0.064]

38 i We modify our objective ǫ(L) by substituting inputs xi with features φ(xi ) into Equation 1. [sent-92, score-0.154]

39 L is now a transformation from the space F into a low-dimensional space ℜd . [sent-93, score-0.086]

40 We seek the transformation L minimizing the modiﬁed objective function ǫ(L). [sent-94, score-0.088]

41 The gradient in feature space can now be written as: ∂ǫ(L) = 2 ηij L(φi − φj )(φi − φj )T + ∂L ij ηij (1 − yil )h′ (sijl )L (φi − φj )(φi − φj )T − (φi − φl )(φi − φl )T 2c ijl (3) where sijl = (||L(φi − φj )||2 − ||L(φi − φl )||2 + 1). [sent-95, score-0.39]

42 This form of nonlinear map is analogous to that used in kernel-PCA and it allows us to parameterize the transformation L in terms of only d · n parameters, the entries of the matrix Ω. [sent-101, score-0.107]

43 1 The gradient in feature space can be computed as features φi solely in terms of dot products (φT φj ). [sent-104, score-0.106]

44 i ∂ǫ(L) ∂L = ΓΦ, where Γ depends on T Proof Deﬁning ki = Φφi = [k(x1 , xi ), . [sent-105, score-0.156]

45 , k(xn , xi )] , non-linear feature projections can be computed as Lφi = ΩΦφi = Ωki . [sent-108, score-0.102]

46 Setting Γ = (ki −kj ) 2Ω ηij Ei (ki −kj ) − Ej + ij (ki −kj ) ηij (1 − yil )h′ (sijl ) Ei 2cΩ (ki −kj ) − Ej (ki −kl ) − Ei (ki −kl ) + El (4) ijl proves the Lemma. [sent-115, score-0.266]

47 This result allows us to implicitly solve for the transformation without ever computing the features in the high-dimensional space F : the key idea is to iteratively update Ω rather than L. [sent-116, score-0.1]

48 4 Experimental results We compared our methods to the metric learning algorithm of Weinberger et al. [sent-120, score-0.171]

49 In all of the experiments reported here, LMCA was initialized using PCA, while KLMCA used the transformation computed by kernel-PCA as initial guess. [sent-123, score-0.064]

50 • Isolet1 is a dataset of speech features from the UC Irvine repository, consisting of 6238 training examples and 1559 testing examples with 617 attributes. [sent-131, score-0.128]

51 • The AT&T; Faces2 database contains 10 grayscale face images of each of 40 distinct subjects. [sent-133, score-0.064]

52 The images were taken at different times, with varying illumination, facial expressions and poses. [sent-134, score-0.082]

53 Except for Isolet, for which a separate testing set is speciﬁed, we computed all of the experimental results by averaging over 100 runs of random splitting of the examples into training and testing sets. [sent-142, score-0.172]

54 For AT&T; Faces, training sets were selected by sampling 7 images at random for each person. [sent-144, score-0.062]

55 Unlike LMCA and KLMCA, which directly solve for low-dimensional embeddings of the input data, LMNN cannot be run on datasets of dimensionalities such as those considered here and must be trained on lower-dimensional representations of the inputs. [sent-146, score-0.075]

56 Figure 1 summarizes the training and testing performances of kNN classiﬁcation using the metrics learned by the three algorithms for different subspace dimensions. [sent-148, score-0.179]

57 LMCA and KLMCA give considerably better classiﬁcation accuracy than LMNN on all datasets, with the kernelized version of our algorithm always outperforming the linear version. [sent-149, score-0.076]

58 The difference in accuracy between our algorithms and LMNN is particularly dramatic when a small number of projection dimensions is used. [sent-150, score-0.096]

59 In such cases, LMNN is unable to ﬁnd good metrics in the low-dimensional subspace computed by PCA. [sent-151, score-0.103]

60 By contrast, LMCA and KLMCA solve for the low-dimensional subspace that optimizes the classiﬁcation-related objective 1 Available at http://www. [sent-152, score-0.115]

61 (c) Absolute difference between original images and reconstructions from features for PCA (left) and LMCA (right). [sent-169, score-0.099]

62 LMCA learns invariance to effects that are irrelevant for classiﬁcation: non-uniform illumination, facial expressions, and glasses (training data contains images with and without glasses for same individuals). [sent-171, score-0.165]

63 In our experiments we found that all three classiﬁcation algorithms (LMNN, LMCA+kNN, and KLMCA+kNN) performed considerably better than kNN using the Euclidean metric in the PCA and KPCA subspaces. [sent-173, score-0.187]

64 9% testing error rate when used on the PCA features, and a 9. [sent-175, score-0.08]

65 7% testing error rate when applied to the nonlinear features computed by KPCA. [sent-176, score-0.177]

66 While LMNN is applied to features in a low-dimensional space, LMCA and KLMCA learn a lowrank metric directly from the high-dimensional inputs. [sent-177, score-0.205]

67 As a reference, using d = 10 and K = 3 on the AT&T; dataset, LMNN learns a metric in about 5 seconds, while LMCA and KLMCA converge to a minimum in 21 and 24 seconds, respectively. [sent-180, score-0.208]

68 Figure 2 shows comparative reconstructions of images obtained from PCA and LMCA features by inverting their linear mappings. [sent-182, score-0.099]

69 The PCA and LMCA subspaces in this experiment were computed from cropped face images of size 50 × 50 pixels, taken from a set of consumer photographs. [sent-183, score-0.082]

70 The dataset contains 2459 face images corresponding to 152 distinct individuals. [sent-184, score-0.064]

71 For a given target dimensionality, PCA has the property of computing the linear transformation minimizing the reconstruction error under the L2 norm. [sent-187, score-0.066]

72 Unsurprisingly, the PCA face reconstructions are extremely faithful reproductions of the original images. [sent-188, score-0.073]

73 By contrast, LMCA seeks a subspace where neighboring examples belong to the same class and points differently labeled are separated by a large margin. [sent-190, score-0.071]

74 For the case of face veriﬁcation, LMCA de-emphasizes changes in illumination, presence or absence of glasses and smiling expressions (Figure 2). [sent-192, score-0.101]

75 When the input data does not require dimensionality reduction, LMNN and LMCA solve the same optimization problem, but LMNN should be preferred over LMCA in light of its guarantees of convergence to the global minimum of the objective. [sent-193, score-0.117]

76 However, even in such cases, KLMCA can be used in lieu of LMNN in order to extract nonlinear features from the inputs. [sent-194, score-0.079]

77 The dimensionality of the data in these sets ranges from 4 to 34. [sent-197, score-0.068]

78 1 30 IRIS − training error % IONO − training error % 4. [sent-199, score-0.108]

79 1 (a) kNN w/ KLMCA LMNN EUCL + kNN kNN w/ KLMCA LMNN EUCL + kNN BAL − testing error % 14. [sent-207, score-0.08]

80 4 kNN w/ KLMCA LMNN EUCL + kNN kNN w/ KLMCA LMNN EUCL + kNN WINE − testing error % IRIS − testing error % 4. [sent-208, score-0.16]

81 Algorithms are kNN using Euclidean distance, LMNN [9], kNN in the nonlinear feature space computed by our KLMCA algorithm, and multiclass SVM. [sent-224, score-0.102]

82 to compare LMNN with KLMCA under identical conditions, KLMCA was restricted to compute a number of features equal to the input dimensionality, although in our experience using additional nonlinear features often results in better classiﬁcation performance. [sent-225, score-0.14]

83 On all datasets except on Wine, for which the mapping to the highdimensional space seems to hurt performance (note also the high error rate of SVM), KLMCA gives better classiﬁcation accuracy than LMNN. [sent-228, score-0.108]

84 Note also that the error rates of KLMCA are consistently lower than those reported in [9] for SVM under identical training and testing conditions. [sent-229, score-0.132]

85 The novelty of our method lies in an optimization that solves for data reduction and metric learning simultaneously. [sent-233, score-0.287]

86 Additionally, while [9] is limited to learning a global linear transformation of the inputs, we describe a kernelized version of our method that extracts non-linear features of the inputs. [sent-234, score-0.119]

87 In our work the metric is parameterized by arbitrary nonlinear maps for which kernel functions exist. [sent-245, score-0.216]

88 However, this approach computes dimensionality reductions through a two-step solution which involves ﬁrst solving for a possibly full-rank metric and then estimating the low-rank approximation via spectral decomposition. [sent-248, score-0.239]

89 Furthermore, the metric is trained with the aim of collapsing all examples in the same class to a single point. [sent-250, score-0.19]

90 SVDM optimizes an objective that is a combination of dimensionality reduction and classiﬁcation. [sent-253, score-0.213]

91 Speciﬁcally, a linear mapping from input to feature space and a linear classiﬁer applied to feature space, are trained simultaneously. [sent-254, score-0.085]

92 6 Discussion We have presented a novel algorithm that simultaneously optimizes the objectives of dimensionality reduction and metric learning. [sent-257, score-0.375]

93 Our algorithm seeks, among all possible low-dimensional projections, the one that best satisﬁes a large margin metric objective. [sent-258, score-0.211]

94 Our approach contrasts techniques that are unable to learn metrics in high-dimensions and that must rely on dimensionality reduction methods to be ﬁrst applied to the data. [sent-259, score-0.183]

95 Although our optimization is not convex, we have experimentally demonstrated that the metrics learned by our solution are consistently superior to those computed by globally-optimal methods forced to search in a low-dimensional subspace. [sent-260, score-0.111]

96 The nonlinear version of our technique requires us to compute the kernel distance of a query point to all training examples. [sent-261, score-0.155]

97 In addition, we will investigate methods to further reduce overﬁtting when learning dimensionality reduction from very high dimensions. [sent-263, score-0.13]

98 Learning a similarity metric discriminatively, with application to face veriﬁcation. [sent-270, score-0.222]

99 Distance metric learning for large margin nearest neighbor classiﬁcation. [sent-326, score-0.255]

100 Distance metric learning, with application to clustering with side-information. [sent-339, score-0.171]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('knn', 0.491), ('klmca', 0.451), ('lmca', 0.395), ('lmnn', 0.395), ('metric', 0.171), ('ki', 0.115), ('eucl', 0.113), ('yil', 0.099), ('kj', 0.098), ('pca', 0.095), ('ijl', 0.085), ('ij', 0.082), ('mahalanobis', 0.073), ('weinberger', 0.073), ('classi', 0.073), ('sijl', 0.07), ('dimensionality', 0.068), ('reduction', 0.062), ('testing', 0.06), ('fmri', 0.058), ('isolet', 0.056), ('metrics', 0.053), ('distance', 0.05), ('bal', 0.049), ('wine', 0.047), ('transformation', 0.046), ('nonlinear', 0.045), ('projection', 0.045), ('nca', 0.042), ('starplus', 0.042), ('svdm', 0.042), ('objective', 0.042), ('iris', 0.042), ('optimizes', 0.041), ('xi', 0.041), ('margin', 0.04), ('ei', 0.039), ('kernelized', 0.039), ('ej', 0.037), ('reconstructions', 0.037), ('learns', 0.037), ('euclidean', 0.037), ('glasses', 0.037), ('inputs', 0.037), ('face', 0.036), ('cation', 0.035), ('training', 0.034), ('features', 0.034), ('objectives', 0.033), ('faces', 0.032), ('solves', 0.032), ('subspace', 0.032), ('xl', 0.032), ('dimensions', 0.03), ('unknowns', 0.029), ('kl', 0.029), ('xj', 0.029), ('images', 0.028), ('iono', 0.028), ('lold', 0.028), ('nonsquare', 0.028), ('illumination', 0.028), ('datasets', 0.028), ('expressions', 0.028), ('input', 0.027), ('neighbor', 0.026), ('technique', 0.026), ('facial', 0.026), ('goldberger', 0.025), ('lorenzo', 0.025), ('riya', 0.025), ('projections', 0.024), ('veri', 0.023), ('embedding', 0.023), ('optimization', 0.022), ('differently', 0.022), ('accuracy', 0.021), ('globerson', 0.021), ('suboptimal', 0.021), ('semide', 0.021), ('space', 0.02), ('weiss', 0.02), ('error', 0.02), ('embeddings', 0.02), ('feature', 0.019), ('collapsing', 0.019), ('repository', 0.019), ('highdimensional', 0.019), ('svm', 0.019), ('consistently', 0.018), ('nearest', 0.018), ('lda', 0.018), ('computed', 0.018), ('seeks', 0.017), ('cases', 0.017), ('tting', 0.016), ('parameterize', 0.016), ('considerably', 0.016), ('similarity', 0.015), ('gradient', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 105 nips-2006-Large Margin Component Analysis

Author: Lorenzo Torresani, Kuang-chih Lee

2 0.11335473 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg

Abstract: We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized sufﬁx trees for efﬁcient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefﬁcients for sequences as alternatives to classical kernel functions.

3 0.092736997 130 nips-2006-Max-margin classification of incomplete data

Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller

Abstract: We consider the problem of learning classiﬁers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to begin with a preprocessing phase that completes the missing features, and then use a standard classiﬁcation procedure. In this paper we show how incomplete data can be classiﬁed directly without any completion of the missing features using a max-margin learning framework. We formulate this task using a geometrically-inspired objective function, and discuss two optimization approaches: The linearly separable case is written as a set of convex feasibility problems, and the non-separable case has a non-convex objective that we optimize iteratively. By avoiding the pre-processing phase in which the data is completed, these approaches oﬀer considerable computational savings. More importantly, we show that by elegantly handling complex patterns of missing values, our approach is both competitive with other methods when the values are missing at random and outperforms them when the missing values have non-trivial structure. We demonstrate our results on two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. 1

4 0.085930079 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements

Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke

Abstract: Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difﬁcult challenge. Some of the difﬁculty arises because of conﬂicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one inﬂuential data point a conﬂictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.

5 0.078012988 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: We show that the relevant information about a classiﬁcation problem in feature space is contained up to negligible error in a ﬁnite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efﬁcient implicit representations of the data to perform classiﬁcation. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classiﬁcation. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classiﬁcation results. 1

6 0.064536825 85 nips-2006-Geometric entropy minimization (GEM) for anomaly detection and localization

7 0.062287956 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

8 0.055561081 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions

9 0.053751487 186 nips-2006-Support Vector Machines on a Budget

10 0.052826144 120 nips-2006-Learning to Traverse Image Manifolds

11 0.052639611 7 nips-2006-A Local Learning Approach for Clustering

12 0.052207708 149 nips-2006-Nonnegative Sparse PCA

13 0.050555266 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

14 0.049491044 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

15 0.04904566 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

16 0.047530286 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

17 0.046204358 109 nips-2006-Learnability and the doubling dimension

18 0.04560107 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

19 0.045394078 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

20 0.043002866 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.155), (1, 0.049), (2, 0.049), (3, 0.047), (4, -0.021), (5, 0.018), (6, -0.045), (7, 0.001), (8, 0.013), (9, -0.004), (10, -0.057), (11, 0.023), (12, -0.009), (13, -0.076), (14, 0.059), (15, -0.079), (16, -0.001), (17, -0.004), (18, -0.024), (19, 0.023), (20, 0.127), (21, 0.038), (22, 0.079), (23, -0.026), (24, 0.032), (25, -0.05), (26, -0.092), (27, 0.013), (28, -0.121), (29, -0.188), (30, 0.015), (31, 0.158), (32, -0.057), (33, -0.028), (34, 0.021), (35, -0.062), (36, -0.053), (37, 0.151), (38, 0.012), (39, 0.037), (40, 0.01), (41, -0.105), (42, 0.103), (43, -0.097), (44, -0.006), (45, 0.04), (46, -0.054), (47, 0.003), (48, 0.083), (49, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90259951 105 nips-2006-Large Margin Component Analysis

Author: Lorenzo Torresani, Kuang-chih Lee

2 0.66654092 97 nips-2006-Inducing Metric Violations in Human Similarity Judgements

Author: Julian Laub, Klaus-Robert Müller, Felix A. Wichmann, Jakob H. Macke

3 0.61746329 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg

4 0.53590357 130 nips-2006-Max-margin classification of incomplete data

Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller

5 0.47656488 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

Author: Murat Dundar, Balaji Krishnapuram, R. B. Rao, Glenn M. Fung

Abstract: Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data: i.e. , the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a Convex Hull representation of multiple instances that is signiﬁcantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to ﬁnd the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm signiﬁcantly improves diagnostic accuracy when compared to both MIL and traditional classiﬁers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.

6 0.46946323 4 nips-2006-A Humanlike Predictor of Facial Attractiveness

7 0.45186678 120 nips-2006-Learning to Traverse Image Manifolds

8 0.42771432 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

9 0.41502625 82 nips-2006-Gaussian and Wishart Hyperkernels

10 0.37565905 85 nips-2006-Geometric entropy minimization (GEM) for anomaly detection and localization

11 0.35690036 129 nips-2006-Map-Reduce for Machine Learning on Multicore

12 0.35339364 149 nips-2006-Nonnegative Sparse PCA

13 0.34162924 65 nips-2006-Denoising and Dimension Reduction in Feature Space

14 0.33833006 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

15 0.33499715 53 nips-2006-Combining causal and similarity-based reasoning

16 0.32814407 24 nips-2006-Aggregating Classification Accuracy across Time: Application to Single Trial EEG

17 0.32489803 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions

18 0.32130459 110 nips-2006-Learning Dense 3D Correspondence

19 0.31700554 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

20 0.31450853 177 nips-2006-Sparse Kernel Orthonormalized PLS for feature extraction in large data sets

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.114), (3, 0.04), (7, 0.058), (9, 0.057), (12, 0.011), (22, 0.054), (34, 0.011), (44, 0.042), (52, 0.346), (57, 0.067), (65, 0.051), (69, 0.042), (90, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81375271 13 nips-2006-A Scalable Machine Learning Approach to Go

Author: Lin Wu, Pierre F. Baldi

Abstract: Go is an ancient board game that poses unique opportunities and challenges for AI and machine learning. Here we develop a machine learning approach to Go, and related board games, focusing primarily on the problem of learning a good evaluation function in a scalable way. Scalability is essential at multiple levels, from the library of local tactical patterns, to the integration of patterns across the board, to the size of the board itself. The system we propose is capable of automatically learning the propensity of local patterns from a library of games. Propensity and other local tactical information are fed into a recursive neural network, derived from a Bayesian network architecture. The network integrates local information across the board and produces local outputs that represent local territory ownership probabilities. The aggregation of these probabilities provides an effective strategic evaluation function that is an estimate of the expected area at the end (or at other stages) of the game. Local area targets for training can be derived from datasets of human games. A system trained using only 9 × 9 amateur game data performs surprisingly well on a test set derived from 19 × 19 professional game data. Possible directions for further improvements are brieﬂy discussed. 1

same-paper 2 0.74196941 105 nips-2006-Large Margin Component Analysis

Author: Lorenzo Torresani, Kuang-chih Lee

3 0.71983683 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

Author: Mark Girolami, Mingjun Zhong

Abstract: By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classiﬁcation setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classiﬁer combination. 1

4 0.46082827 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its eﬃciency for real-world applications. First, it is computationally expensive and diﬃcult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It signiﬁcantly improves the computational eﬃciency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the eﬃciency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

5 0.456945 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

6 0.45254567 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

7 0.45199496 167 nips-2006-Recursive ICA

8 0.44946751 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

9 0.44892749 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

10 0.44850564 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

11 0.44832137 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

12 0.44829389 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

13 0.44685775 127 nips-2006-MLLE: Modified Locally Linear Embedding Using Multiple Weights

14 0.44664341 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

15 0.44643748 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model

16 0.44535372 158 nips-2006-PG-means: learning the number of clusters in data

17 0.44517216 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

18 0.44480649 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

19 0.44464573 20 nips-2006-Active learning for misspecified generalized linear models

20 0.4442203 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints