nips nips2002 nips2002-188 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tilman Lange, Mikio L. Braun, Volker Roth, Joachim M. Buhmann
Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.
Reference: text
sentIndex sentText sentNum sentScore
1 For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. [sent-7, score-0.326]
2 In this paper, a new model assessment scheme is introduced which is based on a notion of stability. [sent-8, score-0.198]
3 The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. [sent-9, score-1.063]
4 In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area. [sent-10, score-0.891]
5 1 Introduction One of the fundamental problems of learning theory is model assessment: Given a specific data set, how can one practically measure the generalization performance of a model trained to the data. [sent-11, score-0.173]
6 For semi-supervised and unsupervised learning, there exist no standard techniques for estimating the generalization of an algorithm, since there is no expected risk. [sent-14, score-0.193]
7 Furthermore, in unsupervised learning, the problem of model order selection arises, i. [sent-15, score-0.237]
8 This number is part of the input data for supervised and semi-supervised problems, but it is not available for unsupervised problems. [sent-18, score-0.274]
9 We present a common point of view, which provides a unified framework for model assessment in these seemingly unrelated areas of machine learning. [sent-19, score-0.198]
10 The main idea is that an algorithm generalizes well, if the solution on one data set has small disagreement with the solution on another data set. [sent-20, score-0.212]
11 The main emphasis lies on developing model assessment procedures for semi-supervised and unsupervised clustering, because a definitive answer to the question of model assessment has not been given in these areas. [sent-24, score-0.568]
12 In section 3, we derive a stability measure for solutions to learning problems, which allows us to characterize the generalization in terms of the stability of solutions on different sets. [sent-25, score-1.607]
13 For supervised learning, this stability measure is an upper bound to the 2-fold cross- validation error, and can thus be understood as a natural extension of cross-validation to semi-supervised and unsupervised problems. [sent-26, score-1.067]
14 For experiments (section 4), we have chosen the model order selection problem in the unsupervised setting, which is one of the relevant areas of application as argued above. [sent-27, score-0.237]
15 We compare the stability measure to other techniques from the literature. [sent-28, score-0.821]
16 2 Related Work For supervised learning problems, several notions of stability have been introduced ([10], [3]). [sent-29, score-0.834]
17 In contrast, this work aims at developing practical procedures for model assessment, which are also applicable in semi- and unsupervised settings. [sent-31, score-0.282]
18 Furthermore, the definition of stability developed in this paper does not build upon the cited works. [sent-32, score-0.735]
19 Several procedures have been proposed for inferring the number of clusters of which we name a few here. [sent-33, score-0.227]
20 Given a clustering solution, the total sum of within-cluster dissimilarities is computed. [sent-36, score-0.192]
21 Recently, resampling-based approaches for model order selection have been proposed that perform model assessment in the spirit of cross validation. [sent-39, score-0.292]
22 The methods exploit the idea that a clustering solution can be used to construct a predictor, in order to compute a solution for a second data set and to compare the computed and predicted class memberships for the second data set. [sent-41, score-0.42]
23 In an early study, Breckenridge [4] investigated the usefulness of this approach (called replication analysis there) for the purpose of cluster validation. [sent-42, score-0.163]
24 Free parameters of their method are the predictor, the measure of agreement between a computed and a predicted solution and a baseline distribution similar to the Gap Statistic. [sent-47, score-0.174]
25 In particular, the predictor can be chosen independent of the clustering algorithm which can lead to unreliable results (see section 3). [sent-49, score-0.309]
26 [13] formulated a similar method (Prediction Strength) for inferring the number of clusters which is based on using nearest centroid predictors. [sent-54, score-0.323]
27 Roughly, their measure of agreement quantifies the similarity of two clusters in the computed and in the predicted solution. [sent-55, score-0.295]
28 For inferring a number of clusters, the least similar pair of clusters is taken into consideration. [sent-56, score-0.198]
29 ¤ ¢ ¥£ ¡ 3 The Stability Measure We begin by introducing a stability measure for supervised learning. [sent-59, score-0.896]
30 Then, the stability measure is generalized to semi-supervised and unsupervised settings. [sent-60, score-0.94]
31 Finally, a scheme for practical estimation of the stability is proposed. [sent-62, score-0.769]
32 A measure of the stability of the labeling function learned is derived as follows. [sent-72, score-0.916]
33 Now let and be two data sets drawn independently from the same source, and denote the predictor trained on by . [sent-74, score-0.177]
34 (3), the stability defined in (2) yields an upper bound on the generalization error. [sent-84, score-0.786]
35 Note that the stability measures the disagreement between labels on training data and test data, both assigned by . [sent-86, score-0.922]
36 Furthermore, the stability can be interpreted as the expected empirical risk of with respects to the labels computed by itself (compare (1) and (2)). [sent-88, score-0.87]
37 Practical evaluation of the stability amounts to 2-fold cross-validation. [sent-91, score-0.735]
38 However, unlike cross-validation, stability can also be defined in settings where no label information is available. [sent-93, score-0.779]
39 In the first alternative, the solution is a labeling function defined on the whole object space as in supervised learning. [sent-101, score-0.299]
40 The second alternative is that the solution is not given by a labeling function on the whole object space, but only by a labeling function on the training set . [sent-104, score-0.319]
41 As mentioned above, the stability compares labels given to the training data with predicted labels. [sent-107, score-0.878]
42 One possibility to obtain predicted labels is to introduce a predictor , which is trained using to predict labels on the new set . [sent-109, score-0.297]
43 Leaving as a free parameter, we define the stability for semi-supervised learning as £ ¡ ¡ 1 a © (£¡ ¦£ ¥ ¢ pg! [sent-110, score-0.735]
44 Analogously to supervised learning, the minimal attainable stability measures the extent to which classes overlap, or how consistent semi the labels are. [sent-115, score-1.022]
45 Note that (4) measures the mismatch between the label generator and the predictor . [sent-123, score-0.223]
46 Intuitively, can lead to good stability only if the strategy of and are similar. [sent-124, score-0.735]
47 For example, -means clustering suggests to use nearest centroid classification. [sent-126, score-0.317]
48 Minimum spanning tree type clustering algorithms suggest nearest neighbor classifiers, and finally, clustering algorithms which fit a parametric density model should use the class posteriors computed by the Bayes rule for prediction. [sent-127, score-0.497]
49 The solution is again a function only defined labeling a finite data set on . [sent-129, score-0.188]
50 # $ % Model Order Selection The problem of model order selection consists in determining the number of clusters to be estimated, and exists only in unsupervised learning. [sent-141, score-0.419]
51 The range of the stability depends on , therefore stability values cannot be compared for different values of . [sent-142, score-1.47]
52 For unsupervised learning, the stability minimized over is bounded from above by , since for a larger instability, there exists a relabeling ( )' 1¤ 0 ¤ which has smaller stability costs. [sent-143, score-1.637]
53 This stability value is asymptotically achieved by the random predictor which assigns uniformly drawn labels to objects. [sent-144, score-0.949]
54 Normalizing by the stability of the random predictor yields values independent of . [sent-145, score-0.876]
55 We thus define the re-normalized stability as (6) un un un ( " W! [sent-146, score-0.906]
56 The stability is defined in terms of an expectation, which has to be estimated for practical applications. [sent-149, score-0.796]
57 Note that it is necessary to split into disjoint subsets, because common points potentially increase the stability artificially. [sent-153, score-0.735]
58 For semi-supervised and unsupervised learning, the comparison might entail predicting labels on a new set, and for the latter also minimizing over permutation of labels. [sent-155, score-0.252]
59 4 Stability for Model Order Selection in Clustering: Experimental Results We now provide experimental evidence for the usefulness of our approach to model order selection, which is one of the hardest model assessment problems. [sent-156, score-0.26]
60 First, the algorithms are compared for toy data, in order to study the performance of the stability measure under well-controlled conditions. [sent-157, score-0.86]
61 Therefore, in a second experiment, the stability measure is compared to the other methods for the problem of clustering gene expression data. [sent-159, score-1.104]
62 We compare the proposed stability index of section 3 with the Gap Statistic, Clest and with Tibshirani’s Prediction Strength method using two toy data sets and a microarray data set taken from [7]. [sent-166, score-0.992]
63 Table 1 summarizes the estimated number of clusters of each method. [sent-167, score-0.185]
64 Note that for some , for example in figure 1(a), the variance in the stability over different resamples is quite high. [sent-172, score-0.803]
65 This effect is due to the model mismatch, since for , the clustering of the three classes depends highly on the subset selected in the resampling. [sent-173, score-0.242]
66 This means that besides the absolute value of the stability ¢ ¢ ¢ £ ¢ ¢ See section 2 for a brief overview over these techniques. [sent-174, score-0.735]
67 data Gap Statistic ¦¢ ££ ¢ ££ ¢ ¢ ££ Stability Method Data Set Table 1: The estimated model orders for the two toy and the microarray data set. [sent-176, score-0.227]
68 costs, additional information about the fit can be obtained from the distribution of the stability costs over the resampled subsets. [sent-177, score-0.803]
69 For this data set, all methods under comparison are able to infer the “true” number of clusters . [sent-178, score-0.226]
70 Figures 1(d) and 1(a) show the clustered data set and the proposed stability index. [sent-179, score-0.767]
71 For , the stability is relatively high, which is due to the hierarchical structure of the data set, which enables stable merging of the two smaller sub-clusters. [sent-180, score-0.767]
72 In the ring data set (depicted in figures 1(e) and 1(f)), one can naturally distinguish three ring shaped clusters that violate the modeling assumptions of -means since clusters are not spherically distributed. [sent-181, score-0.393]
73 Thus, the stability for this number of clusters is highest (figure 1(b)). [sent-183, score-0.924]
74 Applying the proposed stability estimator with Path-Based Clustering on the same data set yields highest stability for , the “correct” number of clusters (figures 1(f) and 1(c)). [sent-185, score-1.715]
75 In all these cases, the basic requirement for a validation scheme is violated, namely that it must not incorporate additional assumptions about the group structure in a data set that go beyond the ones of the clustering principle employed. [sent-189, score-0.252]
76 Apart from that, it is noteworthy that the stability with -means is significantly worse than the one achieved with Path-Based Clustering, which indicates that the latter is the better choice for this data set. [sent-190, score-0.767]
77 £¢ ¢ ¤ §¢ ¤ ¢ ¡ ¤ ¢ ¡ £¢ Application to Microarray Data Recently, several authors have investigated the possibility of identifying novel tumor classes based solely on gene expression data [7, 2, 1]. [sent-191, score-0.171]
78 [7] studied in their analysis the problem of classifying and clustering acute leukemias. [sent-193, score-0.27]
79 Acute leukemias can be roughly divided into two groups, acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) where the latter can furthermore be subdivided into B-cell ALL and T-cell ALL. [sent-196, score-0.366]
80 For the purpose of cluster analysis, the feature set was additionally reduced by only retaining the 100 genes with highest variance across samples. [sent-202, score-0.163]
81 We have performed cluster analysis using -means and the nearest centroid rule. [sent-205, score-0.188]
82 1 0 0 2 3 4 5 6 7 number of clusters 8 9 0 2 10 (a) The stability index for the Gaussians data set with -means. [sent-231, score-0.984]
83 3 4 5 6 7 number of clusters 8 9 10 2 (b) The stability index for the three-ring data set with -means Clustering. [sent-232, score-0.984]
84 4 5 6 7 number of clusters 8 9 10 (c) The stability index for the three-ring data set with Path-Based Clustering. [sent-233, score-0.984]
85 ¢ ¦£¡ −4 ¢ ¤£¡ Figure 1: Results of the stability index on the toy data (see section 4). [sent-237, score-0.889]
86 For separates AML, B-cell ALL and T-cell ALL samples from each that clustering with other. [sent-240, score-0.232]
87 With respect to the known ground-truth labels, of the samples (66 samples) are correctly classified (the Hungarian method is used to map the clusters to the ground-truth). [sent-241, score-0.198]
88 We cluster the data set again for and compare the result with the ALL – AML labeling of the data. [sent-245, score-0.238]
89 Hence, our re-analysis demonstrates that we could have recovered a biologically meaningful grouping in a completely unsupervised manner. [sent-252, score-0.165]
90 ¢ ¨ ©§ g¤ ¢ £ ¡ ¤ ¢ ¡ ¢ ¢ ££ ¨ ¤ %§ 5 Conclusion The problem of model assessment was addressed in this paper. [sent-253, score-0.198]
91 The goal was to derive a common framework for practical assessment of learning models. [sent-254, score-0.206]
92 Starting with defining a stability measure in the context of supervised learning, this measure was generalized to semi-supervised and unsupervised learning. [sent-255, score-1.101]
93 1 0 2 3 4 5 6 7 number of clusters 8 9 10 Figure 2: Resampled stability for the leukemia dataset vs. [sent-263, score-0.983]
94 der selection for unsupervised learning, because this is the area where the need for widely applicable model assessment strategies is highest. [sent-266, score-0.459]
95 On toy data, the stability measure outperforms other techniques, when their respective modeling assumptions are violated. [sent-267, score-0.86]
96 On real-world data, the stability measure compares favorably to the best of the competitors. [sent-268, score-0.797]
97 Path based pairwise data clustering with application to o texture segmentation. [sent-294, score-0.249]
98 Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. [sent-300, score-0.397]
99 Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. [sent-305, score-0.181]
100 Estimating the number of clusters via the gap statistic. [sent-351, score-0.272]
wordName wordTfidf (topN-words)
[('stability', 0.735), ('clustering', 0.192), ('assessment', 0.172), ('clest', 0.158), ('clusters', 0.158), ('unsupervised', 0.143), ('labeling', 0.119), ('predictor', 0.117), ('gap', 0.114), ('ac', 0.11), ('supervised', 0.099), ('aml', 0.09), ('leukemia', 0.09), ('golub', 0.089), ('acute', 0.078), ('gene', 0.074), ('statistic', 0.074), ('tibshirani', 0.072), ('labels', 0.069), ('selection', 0.068), ('resamples', 0.068), ('centroid', 0.067), ('strength', 0.066), ('risk', 0.066), ('prediction', 0.066), ('toy', 0.063), ('cluster', 0.063), ('measure', 0.062), ('index', 0.059), ('hungarian', 0.059), ('semi', 0.059), ('nearest', 0.058), ('un', 0.057), ('jh', 0.054), ('disagreement', 0.05), ('applicable', 0.05), ('resampling', 0.047), ('microarray', 0.047), ('buh', 0.045), ('fridlyand', 0.045), ('spherically', 0.045), ('walther', 0.045), ('et', 0.044), ('label', 0.044), ('classi', 0.043), ('predicted', 0.042), ('genes', 0.041), ('expression', 0.041), ('inferring', 0.04), ('samples', 0.04), ('permutation', 0.04), ('breckenridge', 0.039), ('bonn', 0.039), ('rings', 0.039), ('competitors', 0.039), ('solution', 0.037), ('measures', 0.036), ('infer', 0.036), ('usefulness', 0.036), ('replication', 0.036), ('resampled', 0.036), ('partitioning', 0.035), ('practical', 0.034), ('agreement', 0.033), ('costs', 0.032), ('data', 0.032), ('highest', 0.031), ('aq', 0.031), ('september', 0.031), ('px', 0.031), ('furthermore', 0.03), ('procedures', 0.029), ('spanning', 0.029), ('validation', 0.028), ('purpose', 0.028), ('drawn', 0.028), ('de', 0.028), ('generalization', 0.027), ('estimated', 0.027), ('stress', 0.026), ('ned', 0.026), ('model', 0.026), ('molecular', 0.026), ('mismatch', 0.026), ('annealing', 0.026), ('pairwise', 0.025), ('deterministic', 0.024), ('practice', 0.024), ('classes', 0.024), ('compare', 0.024), ('idea', 0.024), ('yields', 0.024), ('exists', 0.024), ('solutions', 0.024), ('estimating', 0.023), ('whole', 0.022), ('biologically', 0.022), ('gp', 0.022), ('pro', 0.022), ('object', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 188 nips-2002-Stability-Based Model Selection
Author: Tilman Lange, Mikio L. Braun, Volker Roth, Joachim M. Buhmann
Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.
2 0.1551163 27 nips-2002-An Impossibility Theorem for Clustering
Author: Jon M. Kleinberg
Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. 1
3 0.13447809 90 nips-2002-Feature Selection in Mixture-Based Clustering
Author: Martin H. Law, Anil K. Jain, Mário Figueiredo
Abstract: There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
4 0.11481888 98 nips-2002-Going Metric: Denoising Pairwise Data
Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1
5 0.10896085 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
Author: Tom Heskes
Abstract: We extend recent work on the connection between loopy belief propagation and the Bethe free energy. Constrained minimization of the Bethe free energy can be turned into an unconstrained saddle-point problem. Both converging double-loop algorithms and standard loopy belief propagation can be interpreted as attempts to solve this saddle-point problem. Stability analysis then leads us to conclude that stable fixed points of loopy belief propagation must be (local) minima of the Bethe free energy. Perhaps surprisingly, the converse need not be the case: minima can be unstable fixed points. We illustrate this with an example and discuss implications. 1
6 0.10681292 53 nips-2002-Clustering with the Fisher Score
7 0.10559861 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
8 0.10020187 83 nips-2002-Extracting Relevant Structures with Side Information
9 0.099602222 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
10 0.090727985 135 nips-2002-Learning with Multiple Labels
11 0.088857703 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods
12 0.088842861 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
13 0.08545433 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
14 0.083560705 99 nips-2002-Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA
15 0.080169424 40 nips-2002-Bayesian Models of Inductive Generalization
16 0.072543822 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine
17 0.066450536 77 nips-2002-Effective Dimension and Generalization of Kernel Learning
18 0.0627174 56 nips-2002-Concentration Inequalities for the Missing Mass and for Histogram Rule Error
19 0.062017057 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
20 0.061956502 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
topicId topicWeight
[(0, -0.193), (1, -0.087), (2, 0.028), (3, -0.005), (4, -0.021), (5, 0.07), (6, -0.091), (7, -0.128), (8, -0.145), (9, 0.135), (10, 0.04), (11, 0.057), (12, -0.091), (13, 0.084), (14, -0.059), (15, -0.102), (16, 0.005), (17, -0.155), (18, -0.008), (19, 0.084), (20, -0.043), (21, -0.04), (22, 0.094), (23, 0.053), (24, 0.093), (25, -0.01), (26, 0.037), (27, -0.052), (28, 0.01), (29, -0.013), (30, -0.018), (31, -0.007), (32, 0.076), (33, -0.045), (34, 0.043), (35, 0.089), (36, 0.014), (37, -0.018), (38, -0.072), (39, -0.102), (40, 0.009), (41, -0.029), (42, -0.099), (43, 0.071), (44, 0.1), (45, 0.157), (46, -0.016), (47, -0.002), (48, 0.003), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.9577781 188 nips-2002-Stability-Based Model Selection
Author: Tilman Lange, Mikio L. Braun, Volker Roth, Joachim M. Buhmann
Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.
2 0.74423552 27 nips-2002-An Impossibility Theorem for Clustering
Author: Jon M. Kleinberg
Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. 1
3 0.61922097 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
Author: Eric P. Xing, Michael I. Jordan, Stuart Russell, Andrew Y. Ng
Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many “plausible” ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider “similar.” For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, , learns a distance metric over if desired, dissimilar) pairs of points in that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance. £ ¤¢ £ ¥¢
4 0.5790512 53 nips-2002-Clustering with the Fisher Score
Author: Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
5 0.57605672 90 nips-2002-Feature Selection in Mixture-Based Clustering
Author: Martin H. Law, Anil K. Jain, Mário Figueiredo
Abstract: There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
6 0.56012678 98 nips-2002-Going Metric: Denoising Pairwise Data
7 0.51668668 83 nips-2002-Extracting Relevant Structures with Side Information
8 0.47655761 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
9 0.46884137 40 nips-2002-Bayesian Models of Inductive Generalization
10 0.46693319 131 nips-2002-Learning to Classify Galaxy Shapes Using the EM Algorithm
11 0.42087051 135 nips-2002-Learning with Multiple Labels
12 0.40821749 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
13 0.38996565 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory
14 0.3835544 142 nips-2002-Maximum Likelihood and the Information Bottleneck
15 0.37967119 107 nips-2002-Identity Uncertainty and Citation Matching
16 0.37571037 56 nips-2002-Concentration Inequalities for the Missing Mass and for Histogram Rule Error
17 0.37091625 89 nips-2002-Feature Selection by Maximum Marginal Diversity
18 0.36357158 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
19 0.35837868 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering
20 0.35082015 87 nips-2002-Fast Transformation-Invariant Factor Analysis
topicId topicWeight
[(11, 0.019), (23, 0.027), (38, 0.019), (42, 0.071), (54, 0.124), (55, 0.053), (63, 0.24), (67, 0.016), (68, 0.014), (74, 0.126), (87, 0.025), (92, 0.047), (98, 0.127)]
simIndex simValue paperId paperTitle
1 0.92011225 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model
Author: Alexandre R. Romariz, Kelvin Wagner
Abstract: An optoelectronic implementation of a spiking neuron model based on the FitzHugh-Nagumo equations is presented. A tunable semiconductor laser source and a spectral filter provide a nonlinear mapping from driver voltage to detected signal. Linear electronic feedback completes the implementation, which allows either electronic or optical input signals. Experimental results for a single system and numeric results of model interaction confirm that important features of spiking neural models can be implemented through this approach.
same-paper 2 0.82793283 188 nips-2002-Stability-Based Model Selection
Author: Tilman Lange, Mikio L. Braun, Volker Roth, Joachim M. Buhmann
Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.
3 0.69222724 89 nips-2002-Feature Selection by Maximum Marginal Diversity
Author: Nuno Vasconcelos
Abstract: We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic principle for feature selection (the principle of maximum marginal diversity) of extreme computational simplicity. The relationships between infomax and the maximization of marginal diversity are identified, uncovering the existence of a family of classification procedures for which near optimal (in the Bayes error sense) feature selection does not require combinatorial search. Examination of this family in light of recent studies on the statistics of natural images suggests that visual recognition problems are a subset of it.
4 0.6883716 2 nips-2002-A Bilinear Model for Sparse Coding
Author: David B. Grimes, Rajesh P. Rao
Abstract: Recent algorithms for sparse coding and independent component analysis (ICA) have demonstrated how localized features can be learned from natural images. However, these approaches do not take image transformations into account. As a result, they produce image codes that are redundant because the same feature is learned at multiple locations. We describe an algorithm for sparse coding based on a bilinear generative model of images. By explicitly modeling the interaction between image features and their transformations, the bilinear approach helps reduce redundancy in the image code and provides a basis for transformationinvariant vision. We present results demonstrating bilinear sparse coding of natural images. We also explore an extension of the model that can capture spatial relationships between the independent features of an object, thereby providing a new framework for parts-based object recognition.
5 0.687675 10 nips-2002-A Model for Learning Variance Components of Natural Images
Author: Yan Karklin, Michael S. Lewicki
Abstract: We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.
6 0.68451256 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
7 0.68445694 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
8 0.68232602 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
9 0.68161917 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
10 0.68150616 53 nips-2002-Clustering with the Fisher Score
11 0.67944801 3 nips-2002-A Convergent Form of Approximate Policy Iteration
12 0.67743927 27 nips-2002-An Impossibility Theorem for Clustering
13 0.67741627 124 nips-2002-Learning Graphical Models with Mercer Kernels
14 0.67698056 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
15 0.67679298 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
16 0.67603266 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
18 0.67562091 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
19 0.67530113 74 nips-2002-Dynamic Structure Super-Resolution
20 0.67360234 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs