nips nips2001 nips2001-185 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Horn, Assaf Gottlieb
Abstract: We propose a novel clustering method that is an extension of ideas inherent to scale-space clustering and support-vector clustering. Like the latter, it associates every data point with a vector in Hilbert space, and like the former it puts emphasis on their total sum, that is equal to the scalespace probability function. The novelty of our approach is the study of an operator in Hilbert space, represented by the Schr¨ dinger equation of o which the probability function is a solution. This Schr¨ dinger equation o contains a potential function that can be derived analytically from the probability function. We associate minima of the potential with cluster centers. The method has one variable parameter, the scale of its Gaussian kernel. We demonstrate its applicability on known data sets. By limiting the evaluation of the Schr¨ dinger potential to the locations of data points, o we can apply this method to problems in high dimensions.
Reference: text
sentIndex sentText sentNum sentScore
1 Like the latter, it associates every data point with a vector in Hilbert space, and like the former it puts emphasis on their total sum, that is equal to the scalespace probability function. [sent-2, score-0.117]
2 The novelty of our approach is the study of an operator in Hilbert space, represented by the Schr¨ dinger equation of o which the probability function is a solution. [sent-3, score-0.445]
3 This Schr¨ dinger equation o contains a potential function that can be derived analytically from the probability function. [sent-4, score-0.589]
4 We associate minima of the potential with cluster centers. [sent-5, score-0.496]
5 By limiting the evaluation of the Schr¨ dinger potential to the locations of data points, o we can apply this method to problems in high dimensions. [sent-8, score-0.696]
6 1 Introduction Methods of data clustering are usually based on geometric or probabilistic considerations [1, 2, 3]. [sent-9, score-0.344]
7 The problem of unsupervised learning of clusters based on locations of points in data-space, is in general ill defined. [sent-10, score-0.201]
8 Hence intuition based on other fields of study may be useful in formulating new heuristic procedures. [sent-11, score-0.028]
9 The example of [4] shows how intuition derived from statistical mechanics leads to successful results. [sent-12, score-0.088]
10 Here we propose a model based on tools that are borrowed from quantum mechanics. [sent-13, score-0.335]
11 Using a Gaussian kernel, one generates from the data points in a Euclidean space of dimension a probability distribution given by, up to an overall normalization, the expression ¡ (1) 1 ) 2) 0)(&$"! [sent-15, score-0.129]
12 # © § ¥£ ¥ where are the data points. [sent-17, score-0.044]
13 It seems quite natural [5] to associate maxima of this function with cluster centers. [sent-18, score-0.22]
14 ¥ Here we will also consider a Hilbert space, but, in contradistinction with kernel methods where the Hilbert space is implicit, here we work with a Schr¨ dinger equation that serves o as the basic framework of the Hilbert space. [sent-20, score-0.51]
15 Its main emphasis is on the Schr¨ dinger potential, o whose minima will determine the cluster centers. [sent-22, score-0.764]
16 This potential is part of the Schr¨ dinger o equation that is a solution of. [sent-23, score-0.589]
17 ¢ 2 The Schr¨ dinger Potential o We define[7] the Schr¨ dinger equation o § ¦£ ¢ ¥ (2) © $ © %¡ © ¥ ¥ ¥ ¡ § ¨©§ ¦¤££ ¢¢ § ¦£ ¢ § § (£ ¥ ¥ § (£ ¢ ¥ for which is a solution, or eigenstate. [sent-24, score-0.854]
18 This quadratic function, whose center lies at , is known as the harmonic potential in quantum mechanics (see, e. [sent-27, score-0.512]
19 Its eigenvalue is the lowest possible eigenvalue of , hence the Gaussian function is said to describe the ground state of . [sent-30, score-0.084]
20 § © § ¥¦¤¢ £ § ¦¤¢ ¥£ § ¦£ ¥ ¢ Conventionally, in quantum mechanics, one is given and one searches for solutions, or eigenfunctions, . [sent-32, score-0.308]
21 Here, we have already , as determined by the data points, we ask therefore for the whose solution is the given . [sent-33, score-0.044]
22 1 Crab Data To show the power of our new method we discuss the crab data set taken from Ripley’s book [9]. [sent-43, score-0.18]
23 This data set is defined over a five-dimensional parameter space. [sent-44, score-0.044]
24 When analyzed in terms of the 2nd and 3rd principal components of the correlation matrix one observes a nice separation of the 200 instances into their four classes. [sent-45, score-0.301]
25 1 we show the data as well as the Parzen probability distribution using the width parameter . [sent-48, score-0.106]
26 It is quite obvious that this width is not small enough to deduce the correct clustering according to the approach of [5]. [sent-49, score-0.362]
27 2 shows the required four minima for the same width parameter. [sent-51, score-0.3]
28 One needs, however, the quantum clustering approach, to bring it out. [sent-53, score-0.608]
29 A ( © B$ © § (£ ¢ ¥ ¥ D C 1 (the Hamiltonian) and (potential energy) are conventional quantum mechanical operators, rescaled so that depends on one parameter, . [sent-54, score-0.383]
30 F E C 2 1 2 1 0 0 −1 −2 −1 −3 −2 PC3 PC2 Figure 1: A plot of Roberts’ probability distribution for Ripley’s crab data [9] as defined over the 2nd and 3rd principal components of the correlation matrix. [sent-56, score-0.336]
31 Using a Gaussian width of we observe only one maximum. [sent-57, score-0.062]
32 2 2 0 2 1 1 0 0 −1 PC2 −2 −1 −3 −2 PC3 Figure 2: A plot of the Schr¨ dinger potential for the same problem as Fig. [sent-64, score-0.553]
33 2 that the potential grows quadratically outside the domain over which the data are located. [sent-69, score-0.188]
34 If the width is decreased more structure is to be expected. [sent-73, score-0.118]
35 Thus, for , two more minima appear, as seen in Fig. [sent-74, score-0.194]
36 Nonetheless, they lie high and contain only a few data points. [sent-76, score-0.133]
37 2 Iris Data Our second example consists of the iris data set [10], which is a standard benchmark obtainable from the UCI repository [11]. [sent-80, score-0.308]
38 Here we use the first two principal components to define the two dimensions in which we apply our method. [sent-81, score-0.163]
39 4, which shows the case for , provides an almost perfect separation of the 150 instances into the three classes into which they should belong. [sent-83, score-0.133]
40 ©) 7 ¡ © ¥ 4 Application of Quantum Clustering The examples displayed in the previous section show that, if the spatial representation of the data allows for meaningful clustering using geometric information, quantum clustering (QC) will do the job. [sent-84, score-1.012]
41 1 Varying In the crabs-data we find that as is decreased to , the previous minima of get deeper and two new minima are formed. [sent-90, score-0.471]
42 However the latter are insignificant, in the sense that they lie at high values (of order ), as shown in Fig. [sent-91, score-0.089]
43 Thus, if we classify data-points , roughly the same to clusters according to their topographic location on the surface of clustering assignment is expected for as for . [sent-93, score-0.408]
44 By the way, the wave function acquires only one additional maximum at . [sent-94, score-0.06]
45 As is being further decreased, more and more maxima are expected in and an ever increasing number of minima (limited by ) in . [sent-95, score-0.256]
46 § ¦£ ¥ §¥ ¢ § § ¥ ©§ © ¥ ¢ ¥ The one parameter of our problem, , signifies the distance that we probe. [sent-96, score-0.037]
47 Accordingly we expect to find clusters relevant to proximity information of the same order of magnitude. [sent-97, score-0.077]
48 One may therefore vary continuously and look for stability of cluster solutions, or limit oneself to relatively high values of and decide to stop the search once a few clusters are being uncovered. [sent-98, score-0.225]
49 2 Higher Dimensions In the iris problem we obtained excellent clustering results using the first two principal components, whereas in the crabs problem, clustering that depicts correctly the classification necessitates components 2 and 3. [sent-100, score-0.933]
50 However, once this is realized, it does not harm to add the 1st component. [sent-101, score-0.03]
51 This requires working in a 3-dimensional space, spanned by the three leading PCs. [sent-102, score-0.034]
52 Calculating on a fine computational grid becomes a heavy task in high dimensions. [sent-103, score-0.03]
53 To cut down complexity, we propose using the analytic expression of Eq. [sent-104, score-0.027]
54 3 and evaluating the potential on data points only. [sent-105, score-0.243]
55 This should be good enough to give a close estimate of where the minima lie, and it reduces the complexity to irrespective of dimension. [sent-106, score-0.194]
56 In the gradient-descent algorithm described below, we will require further computations, also restricted to well defined locations in space. [sent-107, score-0.069]
57 2 2 1 0 2 0 1 0 −1 −1 −2 −2 −3 PC2 PC3 ©$ © ¥ Figure 3: The potential for the crab data with displays two additional, but insignificant, minima. [sent-113, score-0.324]
58 The four deep minima are roughly at the same locations as in Fig. [sent-114, score-0.307]
59 8 PC1 ©) 7 ¡ © ¥ Figure 4: Quantum clustering of the iris data for in a space spanned by the first two principal components. [sent-128, score-0.679]
60 Equipotential lines are drawn at ) ( ¤ ) £ ) ¢ ) ¡© ) © $ § ¦¥£ ¢ ¡ ¥ £ ¥ £© ¡ ¢ © § (£ ¥ When restricted to the locations of data points, we evaluate on a discrete set of points . [sent-130, score-0.168]
61 We can then express in terms of the distance matrix as ©§) )% 21)¥ ¡ ¤ § ( ¥ © © ¡ £' ¨§) )% 1 )¥¦ ¡ § ¡ ¤ © with (6) chosen appropriately so that min =0. [sent-131, score-0.037]
62 All problems that we have used as examples were such that data were given in some space, and we have exercised our freedom to define a metric, using the PCA approach, as the basis for distance calculations. [sent-132, score-0.081]
63 The previous analysis tells us that QC can also be applied to data for which only the distance information is known. [sent-133, score-0.124]
64 3 Principal Component Metrics The QC algorithm starts from distance information. [sent-135, score-0.037]
65 The question how the distances are calculated is another - very important - piece of the clustering procedure. [sent-136, score-0.3]
66 The PCA approach defines a metric that is intrinsic to the data, determined by their second order statistics. [sent-137, score-0.053]
67 Principal component decomposition can be applied both to the correlation matrix and to the covariance matrix. [sent-139, score-0.042]
68 The PCA approach that we have used is based on a whitened correlation matrix. [sent-141, score-0.042]
69 This turns out to lead to the good separation of crab-data in PC2-PC3 and of iris-data in PC1-PC2. [sent-142, score-0.041]
70 Since our aim was to convince the reader that once a good metric is found, QC conveys the correct information, we have used the best preprocessing before testing QC. [sent-143, score-0.083]
71 5 The Gradient Descent Algorithm After discovering the cluster centers we are faced with the problem of allocating the data points to the different clusters. [sent-144, score-0.244]
72 We propose using a gradient descent algorithm for this purpose. [sent-145, score-0.057]
73 Defining we define the process (7) § § £ £ § £ ¥ § 7 £ © £ § £ © § £ letting the points reach an asymptotic fixed value coinciding with a cluster center. [sent-146, score-0.203]
74 To demonstrate the results of this algorithm, as well as the application of QC to higher dimensions, we analyze the iris data in 4 dimensions. [sent-148, score-0.233]
75 We use the original data space with only one modification: all axes are normalized to lie within a unified range of variation. [sent-149, score-0.195]
76 Shown here are different windows for the four different axes, within which we display the values of the points after descending the potential surface and reaching its minima, whose values are shown in the fifth window. [sent-152, score-0.274]
77 Applying QC to data space without normalization of the different axes, leads to misclassifications of the order of 15 instances, similar to the clustering quality of [4]. [sent-154, score-0.41]
78 6 Discussion In the literature of image analysis one often looks for the curve on which the Laplacian of the Gaussian filter of an image vanishes[13]. [sent-155, score-0.08]
79 This is known as zero-crossing and serves as dim 1 1. [sent-156, score-0.143]
80 5 50 100 150 0 dim 2 0 50 100 150 0 50 100 150 0 50 100 150 1 dim 3 2 1 0 dim 4 2 1 0 V/E 0. [sent-158, score-0.324]
81 1 0 20 40 60 80 serial number 100 120 140 Figure 5: The fixed points of the four-dimensional iris problem following the gradientdescent algorithm. [sent-160, score-0.244]
82 The results show almost perfect clustering into the three families of 50 instances each for . [sent-161, score-0.392]
83 Clearly each such contour can also be viewed as surrounding maxima of the probability function, and therefore representing some kind of cluster boundary, although different from the conventional one [5]. [sent-164, score-0.24]
84 Clearly they surround the data but do not give a satisfactory indication of where the clusters are. [sent-170, score-0.166]
85 One may therefore speculate that equipotential levels of may serve as alternatives to curves in future applications to image analysis. [sent-172, score-0.142]
86 Since the Schr¨ dinger potential, the function that plays the major role in our o analysis, has minima that lie in the neighborhood of data points, we find that it suffices to evaluate it at these points. [sent-176, score-0.736]
87 This enables us to deal with clustering in high dimensional spaces. [sent-177, score-0.33]
88 They show that the basic idea, as well as the gradient-descent algorithm of data allocation to clusters, work well. [sent-180, score-0.044]
89 Quantum clustering does not presume any particular shape or any specific number of clusters. [sent-181, score-0.33]
90 It can be used in conjunction with other clustering methods. [sent-182, score-0.3]
91 This would be one example where not all points are given the same weight in the construction of the Parzen probability distribution. [sent-184, score-0.055]
92 It may seem strange to see the Schr¨ dinger equation in the context of machine learning. [sent-185, score-0.445]
93 The potential represents the attractive force that tries to concentrate the distribution around its minima. [sent-188, score-0.144]
94 The Laplacian has the opposite effect of spreading the wave-function. [sent-189, score-0.058]
95 In a clustering analysis we implicitly assume that two such effects exist. [sent-190, score-0.3]
96 Its success proves that this equation can o serve as the basic tool of a clustering method. [sent-192, score-0.37]
wordName wordTfidf (topN-words)
[('dinger', 0.409), ('schr', 0.409), ('quantum', 0.308), ('clustering', 0.3), ('qc', 0.213), ('minima', 0.194), ('iris', 0.189), ('potential', 0.144), ('crab', 0.136), ('cluster', 0.118), ('hilbert', 0.112), ('dim', 0.108), ('horn', 0.102), ('ripley', 0.089), ('principal', 0.082), ('clusters', 0.077), ('locations', 0.069), ('assaf', 0.068), ('aviv', 0.068), ('equipotential', 0.068), ('svc', 0.068), ('tel', 0.068), ('axes', 0.062), ('maxima', 0.062), ('width', 0.062), ('instances', 0.06), ('displayed', 0.06), ('mechanics', 0.06), ('insigni', 0.059), ('lie', 0.059), ('decreased', 0.056), ('de', 0.055), ('points', 0.055), ('parzen', 0.054), ('metric', 0.053), ('ne', 0.051), ('dimensions', 0.049), ('laplacian', 0.047), ('uci', 0.045), ('rescaled', 0.045), ('satisfactory', 0.045), ('repository', 0.045), ('data', 0.044), ('four', 0.044), ('misclassi', 0.043), ('tells', 0.043), ('emphasis', 0.043), ('pca', 0.042), ('correlation', 0.042), ('eigenvalue', 0.042), ('contours', 0.041), ('nonetheless', 0.041), ('separation', 0.041), ('image', 0.04), ('associate', 0.04), ('distance', 0.037), ('normalization', 0.036), ('equation', 0.036), ('pattern', 0.036), ('serves', 0.035), ('serve', 0.034), ('spanned', 0.034), ('perfect', 0.032), ('components', 0.032), ('surface', 0.031), ('opposite', 0.031), ('major', 0.03), ('space', 0.03), ('high', 0.03), ('presume', 0.03), ('convince', 0.03), ('hamiltonian', 0.03), ('crabs', 0.03), ('wave', 0.03), ('astronomy', 0.03), ('surrounding', 0.03), ('fth', 0.03), ('harm', 0.03), ('jain', 0.03), ('puts', 0.03), ('acquires', 0.03), ('obtainable', 0.03), ('coinciding', 0.03), ('conventional', 0.03), ('descent', 0.03), ('gaussian', 0.028), ('symbols', 0.028), ('intuition', 0.028), ('propose', 0.027), ('allocating', 0.027), ('answered', 0.027), ('spreading', 0.027), ('cliffs', 0.027), ('englewood', 0.027), ('siegelmann', 0.027), ('conventionally', 0.027), ('operators', 0.027), ('roberts', 0.027), ('deeper', 0.027), ('recipes', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 185 nips-2001-The Method of Quantum Clustering
Author: David Horn, Assaf Gottlieb
Abstract: We propose a novel clustering method that is an extension of ideas inherent to scale-space clustering and support-vector clustering. Like the latter, it associates every data point with a vector in Hilbert space, and like the former it puts emphasis on their total sum, that is equal to the scalespace probability function. The novelty of our approach is the study of an operator in Hilbert space, represented by the Schr¨ dinger equation of o which the probability function is a solution. This Schr¨ dinger equation o contains a potential function that can be derived analytically from the probability function. We associate minima of the potential with cluster centers. The method has one variable parameter, the scale of its Gaussian kernel. We demonstrate its applicability on known data sets. By limiting the evaluation of the Schr¨ dinger potential to the locations of data points, o we can apply this method to problems in high dimensions.
2 0.14452419 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1
Author: Takashi Morie, Tomohiro Matsuura, Makoto Nagata, Atsushi Iwata
Abstract: This paper describes a clustering algorithm for vector quantizers using a “stochastic association model”. It offers a new simple and powerful softmax adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the “neural gas” algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluation process for each reference vector. For hardware implementation of this process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.
4 0.13205059 171 nips-2001-Spectral Relaxation for K-means Clustering
Author: Hongyuan Zha, Xiaofeng He, Chris Ding, Ming Gu, Horst D. Simon
Abstract: The popular K-means clustering partitions a data set by minimizing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function. 1
5 0.086552076 170 nips-2001-Spectral Kernel Methods for Clustering
Author: Nello Cristianini, John Shawe-Taylor, Jaz S. Kandola
Abstract: In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the 'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels. We state the problem of unsupervised learning as assigning labels so as to optimize these cost functions. We show how the optimal solution can be approximated by slightly relaxing the corresponding optimization problem, and how this corresponds to using eigenvector information. The resulting simple algorithms are tested on real world data with positive results. 1
6 0.083104074 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
7 0.078595817 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
8 0.072113454 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
9 0.072059259 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
10 0.070836499 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
11 0.066851899 21 nips-2001-A Variational Approach to Learning Curves
12 0.065808393 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
13 0.065669976 24 nips-2001-Active Information Retrieval
14 0.064737387 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
15 0.059759893 136 nips-2001-On the Concentration of Spectral Properties
16 0.059641536 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
17 0.057960179 164 nips-2001-Sampling Techniques for Kernel Methods
18 0.057879988 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines
19 0.057725094 58 nips-2001-Covariance Kernels from Bayesian Generative Models
20 0.053638931 149 nips-2001-Probabilistic Abstraction Hierarchies
topicId topicWeight
[(0, -0.18), (1, 0.046), (2, -0.027), (3, -0.134), (4, 0.076), (5, -0.05), (6, -0.077), (7, 0.008), (8, 0.012), (9, -0.066), (10, 0.111), (11, -0.063), (12, -0.044), (13, 0.074), (14, -0.056), (15, 0.045), (16, 0.005), (17, -0.066), (18, 0.02), (19, -0.009), (20, 0.073), (21, -0.052), (22, 0.133), (23, -0.011), (24, 0.204), (25, 0.12), (26, 0.015), (27, 0.101), (28, -0.046), (29, 0.028), (30, -0.059), (31, -0.086), (32, -0.054), (33, 0.018), (34, -0.069), (35, -0.117), (36, 0.001), (37, 0.022), (38, 0.02), (39, 0.092), (40, -0.147), (41, 0.042), (42, 0.14), (43, 0.008), (44, 0.128), (45, 0.011), (46, 0.194), (47, 0.027), (48, -0.079), (49, -0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.94682276 185 nips-2001-The Method of Quantum Clustering
Author: David Horn, Assaf Gottlieb
Abstract: We propose a novel clustering method that is an extension of ideas inherent to scale-space clustering and support-vector clustering. Like the latter, it associates every data point with a vector in Hilbert space, and like the former it puts emphasis on their total sum, that is equal to the scalespace probability function. The novelty of our approach is the study of an operator in Hilbert space, represented by the Schr¨ dinger equation of o which the probability function is a solution. This Schr¨ dinger equation o contains a potential function that can be derived analytically from the probability function. We associate minima of the potential with cluster centers. The method has one variable parameter, the scale of its Gaussian kernel. We demonstrate its applicability on known data sets. By limiting the evaluation of the Schr¨ dinger potential to the locations of data points, o we can apply this method to problems in high dimensions.
Author: Takashi Morie, Tomohiro Matsuura, Makoto Nagata, Atsushi Iwata
Abstract: This paper describes a clustering algorithm for vector quantizers using a “stochastic association model”. It offers a new simple and powerful softmax adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the “neural gas” algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluation process for each reference vector. For hardware implementation of this process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.
3 0.63966238 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
Author: Ran El-Yaniv, Oren Souroujon
Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a significantly more accurate classification. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1
4 0.63523585 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
Author: Wheeler Ruml
Abstract: If the promise of computational modeling is to be fully realized in higherlevel cognitive domains such as language processing, principled methods must be developed to construct the semantic representations used in such models. In this paper, we propose the use of an established formalism from mathematical psychology, additive clustering, as a means of automatically constructing binary representations for objects using only pairwise similarity data. However, existing methods for the unsupervised learning of additive clustering models do not scale well to large problems. We present a new algorithm for additive clustering, based on a novel heuristic technique for combinatorial optimization. The algorithm is simpler than previous formulations and makes fewer independence assumptions. Extensive empirical tests on both human and synthetic data suggest that it is more effective than previous methods and that it also scales better to larger problems. By making additive clustering practical, we take a significant step toward scaling connectionist models beyond hand-coded examples.
5 0.59396732 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss
Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1
6 0.58115399 171 nips-2001-Spectral Relaxation for K-means Clustering
7 0.42863998 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA
8 0.3998239 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
9 0.37996033 149 nips-2001-Probabilistic Abstraction Hierarchies
10 0.3442142 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
11 0.33302623 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
12 0.33130035 101 nips-2001-K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
13 0.32108662 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
14 0.31005168 76 nips-2001-Fast Parameter Estimation Using Green's Functions
15 0.30972201 24 nips-2001-Active Information Retrieval
16 0.30717549 30 nips-2001-Agglomerative Multivariate Information Bottleneck
17 0.30537978 122 nips-2001-Model Based Population Tracking and Automatic Detection of Distribution Changes
18 0.30169669 21 nips-2001-A Variational Approach to Learning Curves
19 0.29582271 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
20 0.29231349 13 nips-2001-A Natural Policy Gradient
topicId topicWeight
[(14, 0.043), (17, 0.034), (19, 0.037), (27, 0.116), (30, 0.097), (36, 0.012), (38, 0.036), (59, 0.03), (72, 0.091), (79, 0.027), (83, 0.02), (91, 0.127), (99, 0.25)]
simIndex simValue paperId paperTitle
same-paper 1 0.82139015 185 nips-2001-The Method of Quantum Clustering
Author: David Horn, Assaf Gottlieb
Abstract: We propose a novel clustering method that is an extension of ideas inherent to scale-space clustering and support-vector clustering. Like the latter, it associates every data point with a vector in Hilbert space, and like the former it puts emphasis on their total sum, that is equal to the scalespace probability function. The novelty of our approach is the study of an operator in Hilbert space, represented by the Schr¨ dinger equation of o which the probability function is a solution. This Schr¨ dinger equation o contains a potential function that can be derived analytically from the probability function. We associate minima of the potential with cluster centers. The method has one variable parameter, the scale of its Gaussian kernel. We demonstrate its applicability on known data sets. By limiting the evaluation of the Schr¨ dinger potential to the locations of data points, o we can apply this method to problems in high dimensions.
2 0.75624382 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
Author: Ran El-Yaniv, Oren Souroujon
Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a significantly more accurate classification. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1
3 0.66815466 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
Author: Mário Figueiredo
Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.
Author: Gregory Z. Grudic, Lyle H. Ungar
Abstract: We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, . Theory is presented showing that linear function approximation representations of can degrade the rate of convergence of performance gradient estimates by a factor of relative to when no function approximation of is used, where is the number of possible actions and is the number of basis functions in the function approximation representation. The second concerns the use of a bias term in estimating the state action value function. Theory is presented showing that a non-zero bias term can improve the rate of convergence of performance gradient estimates by , where is the number of possible actions. Experimental evidence is presented showing that these theoretical results lead to significant improvement in the convergence properties of Policy Gradient Reinforcement Learning algorithms. ¤ ¨ ¦ ¢ ©§¥¤£¡ ¦ ¤ ¨ £¡ ¨ ¤¢ ¢
5 0.66424125 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
6 0.66227412 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
7 0.65969789 46 nips-2001-Categorization by Learning and Combining Object Parts
8 0.65807384 60 nips-2001-Discriminative Direction for Kernel Classifiers
9 0.65800941 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
10 0.65461814 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
11 0.65422869 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
12 0.65292639 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
13 0.65247184 121 nips-2001-Model-Free Least-Squares Policy Iteration
14 0.65198106 56 nips-2001-Convolution Kernels for Natural Language
15 0.65145618 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex
16 0.6508373 22 nips-2001-A kernel method for multi-labelled classification
17 0.65029651 1 nips-2001-(Not) Bounding the True Error
18 0.65012151 8 nips-2001-A General Greedy Approximation Algorithm with Applications
19 0.64990807 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
20 0.64932573 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines