nips nips2008 nips2008-218 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ling Huang, Donghui Yan, Nina Taft, Michael I. Jordan
Abstract: Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. [sent-10, score-0.387]
2 However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. [sent-11, score-0.433]
3 In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. [sent-12, score-1.169]
4 We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. [sent-13, score-1.53]
5 From this result we derive approximate upper bounds on the clustering error. [sent-14, score-0.453]
6 We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance. [sent-15, score-0.592]
7 One general tool for approaching large-scale problems is that of clustering or partitioning, in essence an appeal to the principle of divide-and-conquer. [sent-17, score-0.313]
8 Such preprocessing steps also arise in the distributed sensing and distributed computing setting, where communication and storage limitations may preclude transmitting the original data to centralized processors. [sent-19, score-0.172]
9 A number of recent works have begun to tackle the issue of determining the tradeoffs that arise under various “perturbations” of data, including quantization and downsampling [2, 3, 4]. [sent-20, score-0.431]
10 In this paper we focus on spectral clustering, a class of clustering methods that are based on eigendecompositions of affinity, dissimilarity or kernel matrices [5, 6, 7, 8]. [sent-23, score-0.559]
11 These algorithms often outperform traditional clustering algorithms such as the K-means algorithm or hierarchical clustering. [sent-24, score-0.313]
12 To date, however, their impact on real-world, large-scale problems has been limited; in particular, a distributed or “in-network” version of spectral clustering has not yet appeared. [sent-25, score-0.6]
13 Moreover, there has been little work on the statistical analysis of spectral clustering, and thus there is little theory to guide the design of distributed algorithms. [sent-26, score-0.329]
14 Data error Error propagation Figure 1: A spectral bipartitioning algorithm. [sent-42, score-0.399]
15 σ Assumption A Perturbation analysis Figure 2: Perturbation analysis: from clustering error to data perturbation error. [sent-43, score-0.843]
16 scaling spectral clustering (including downsampling [9, 10] and the relaxation of precision requirements for the eigenvector computation [7]), but this literature does not provide end-to-end, practical bounds on error rates as a function of data perturbations. [sent-44, score-0.913]
17 In this paper we present the first end-to-end analysis of the effect of data perturbations on spectral clustering. [sent-45, score-0.347]
18 Indeed, given that our approach is based on treating perturbations as random variables, we believe that our methods will also prove useful in developing statistical analyses of spectral clustering (although that is not our focus in this paper). [sent-47, score-0.655]
19 In Section 2, we provide a brief introduction to spectral clustering. [sent-49, score-0.246]
20 Section 3 contains the main results of the paper; specifically we introduce the mis-clustering rate η, and present upper bounds on η due to data perturbations. [sent-50, score-0.242]
21 The point of departure of a spectral clustering algorithm is a weighted similarity graph G(V, E), where the vertices correspond to data points and the weights correspond to the pairwise similarities. [sent-55, score-0.65]
22 Based on this weighted graph, spectral clustering algorithms form the graph Laplacian and compute an eigendecomposition of this Laplacian [5, 6, 7]. [sent-56, score-0.595]
23 While some algorithms use multiple eigenvectors and find a k-way clustering directly, the most widely studied algorithms form a bipartitioning of the data by thresholding the second eigenvector of the Laplacian (the eigenvector with the second smallest eigenvalue). [sent-57, score-0.694]
24 Larger numbers of clusters are found by applying the bipartitioning algorithm recursively. [sent-58, score-0.128]
25 We present a specific example of a spectral bipartitioning algorithm in Fig. [sent-59, score-0.349]
26 2 Input data perturbation Let the data matrix X ∈ Rn×d be formed by stacking n data samples in rows. [sent-62, score-0.615]
27 To this data matrix we ˜ assume that perturbation W is applied, such that we obtain a perturbed version X of the original data ˜ and we wish to compare the results X. [sent-63, score-0.671]
28 We assume that a spectral clustering algorithm is applied to X of this clustering with respect to the spectral clustering of X. [sent-64, score-1.431]
29 This analysis captures a number of data perturbation methods, including data filtering, quantization, lossy compression and synopsis-based data approximation [11]. [sent-65, score-0.575]
30 The multi-scale clustering algorithms that use “representative” samples to approximate the original data can be treated using our analysis as well [12]. [sent-66, score-0.371]
31 2 3 Mis-clustering rate and effects of data perturbation ˜ ˜ Let K and L be the similarity and Laplacian matrix on the original data X, and let K and L be those on the perturbed data. [sent-67, score-0.79]
32 ˜ − X, which we now We wish to bound η in terms of the “magnitude” of the error matrix W = X define. [sent-69, score-0.222]
33 We make the following general stochastic assumption on the error matrix W : A. [sent-70, score-0.111]
34 (ii) This assumption is distribution free, and captures a wide variety of practical data collection and quantization schemes. [sent-80, score-0.372]
35 (iii) Certain data perturbation schemes may not satisfy the independence assumption. [sent-81, score-0.48]
36 We have not yet conducted an analysis of the robustness of our bounds to lack of independence, but in our empirical work we have found that the bounds are robust to relatively small amounts of correlation. [sent-82, score-0.1]
37 We aim to produce practically useful bounds on η in terms of σ and the data matrix X. [sent-83, score-0.172]
38 The bounds should be reasonably tight so that in practice they could be used to determine the degree of perturbation σ given a desired level of clustering performance, or to provide a clustering error guarantee on the original data even though we have access only to its approximate version. [sent-84, score-1.306]
39 , by filtering, quantization or compression), we introduce a perturbation W to the data which is quan˜ tified by σ 2 . [sent-89, score-0.773]
40 This induces an error dK := K − K in the similarity matrix, and in turn an error ˜ − L in the Laplacian matrix. [sent-90, score-0.154]
41 This further yields an error in the second eigenvector of dL := L the Laplacian matrix, which results in mis-clustering error. [sent-91, score-0.153]
42 Overall, we establish an analytical relationship between the mis-clustering rate η and the data perturbation error σ 2 , where η is usually monotonically increasing with σ 2 . [sent-92, score-0.627]
43 Our goal is to allow practitioners to specify a mis-clustering rate η ∗ , and by inverting this relationship, to determine the right magnitude of the perturbation σ ∗ allowed. [sent-93, score-0.609]
44 That is, our work can provide a practical method to determine the tradeoff between data ˜ perturbation and the loss of clustering accuracy due to the use of X instead of X. [sent-94, score-0.859]
45 When the data perturbation can be related to computational or communications savings, then our analysis yields a practical characterization of the overall resource/accuracy tradeoff. [sent-95, score-0.522]
46 Practical Applications Consider in particular a clustering task in a distributed networking system that allows an application to specify a desired clustering error C ∗ on the distributed data (which is not available to the coordinator). [sent-96, score-0.795]
47 , network operation center) gets access to the perturbed data X for spectral clustering. [sent-99, score-0.378]
48 The coordinator can compute a clustering error bound C using our method. [sent-100, score-0.515]
49 By setting C ≤ C ∗ , it determines the tolerable data perturbation error σ ∗ and instructs distributed devices to use appropriate numbers of bits to quantize their data. [sent-101, score-0.7]
50 Thus we can provide guarantees on the achieved error, C ≤ C ∗ , with respect to the original distributed data even with access only to the perturbed data. [sent-102, score-0.194]
51 1 Upper bounding the mis-clustering rate Little is currently known about the connection between clustering error and perturbations to the Laplacian matrix in the spectral clustering setting. [sent-104, score-1.112]
52 [5] presented an upper bound for the clustering error, however this bound is usually quite loose and is not viable for practical applications. [sent-105, score-0.667]
53 If v2 is changed to v2 due to the perturbation, an incorrect clustering happens whenever a component of v2 in set a jumps to set b′ , denoted as a → b′ , or a component in set b jumps to set a′ , denoted as b → a′ . [sent-112, score-0.363]
54 005 Figure 3: The second eigenvector v2 and its per˜ turbed counterpart v2 (denoted by dashed lines). [sent-121, score-0.103]
55 035 Figure 4: An example of the tightness of the upper bound for η in Eq. [sent-128, score-0.201]
56 , missed clusterings) is therefore constrained, and this translates into an upper bound for η. [sent-133, score-0.201]
57 The components of v2 form two clusters (with respect to the spectral bipartitioning algorithm in Fig. [sent-135, score-0.4]
58 The perturbation is small with the total number of mis-clusterings m < min(k1 , k2 ), and ˜ the components of v2 form two clusters. [sent-139, score-0.469]
59 The perturbation of individual components of v2 in each set of a → a′ , a → b′ , b → a′ and b → b′ have identical (not necessary independent) distributions with bounded second moments, respectively, and they are uncorrelated with the components in v2 . [sent-142, score-0.518]
60 Our perturbation bound can now be stated as follows: Proposition 1. [sent-143, score-0.554]
61 Under assumptions B1, B2 and B3, the mis-clustering rate η of the spectral biparti˜ tioning algorithm under the perturbation satisfies η ≤ δ 2 = v2 − v2 2 . [sent-144, score-0.754]
62 (iii) Our bound has a different flavor than that obtained in [5]. [sent-152, score-0.111]
63 3 in [5] works for k-way clustering, it assumes a block-diagonal Laplacian matrix and requires the gap between the k th and (k + 1)th eigenvalues to be greater than 1/2, which is unrealistic in many data sets. [sent-154, score-0.098]
64 In the setting of 2-way spectral clustering and a small perturbation, our bound is much tighter than that derived in [5]; see Fig. [sent-155, score-0.67]
65 2 Perturbation on the second eigenvector of Laplacian matrix We now turn to the relationship between the perturbation of eigenvectors with that of its matrix. [sent-158, score-0.674]
66 One approach is to simply draw on the classical domain of matrix perturbation theory; in particular, applying Theorem V. [sent-159, score-0.504]
67 8 from [13], we have the following bound on the (small) perturbation of the second eigenvector: ˜ v2 − v2 ≤ 4dL √ F ν − 2 dL , (2) F where ν is the gap between the second and the third eigenvalue. [sent-161, score-0.554]
68 zero mean Gaussian noise to the input data with different σ, and we see that the right-hand side (RHS) of (5) approximately upper bounds the left-hand side (LHS). [sent-204, score-0.202]
69 Thus the bound given by (2) is often not useful in practical applications. [sent-206, score-0.153]
70 (5) and (6), we can bound the squared norm of the perturbation on the second eigenvector in expectation, which in turn bounds the mis-clustering rate. [sent-220, score-0.707]
71 3 Perturbation on the Laplacian matrix Let D be the diagonal matrix with Di = j Kij . [sent-223, score-0.122]
72 If perturbation dK is small compared to K, then dL = (1 + o(1)) ∆D−2 K − D−1 dK. [sent-226, score-0.443]
73 The quantities needed to estimate E(dL) and E(dL ) can ˜ be obtained from moments and correlations among the elements of the similarity matrix Kij . [sent-228, score-0.211]
74 For simplicity, we could set ˜ K ijq ρk to 1 in Eq. [sent-231, score-0.165]
75 This bound could ijq optionally be tightened by using a simulation method to estimate the values of ρk . [sent-234, score-0.276]
76 However, in our ijq experimental work we have found that our results are insensitive to the values of ρk , and setting ijq ρk = 0. [sent-235, score-0.33]
77 , to upper bound) the first two moments of dL using those of dK, which are computed using Eq. [sent-241, score-0.186]
78 4 Perturbation on the similarity matrix ˜ ˜ The similarity matrix K on perturbed data X is 2 ||xi − xj + ǫi − ǫj || ˜ Kij = exp − 2 2σk , (14) ˜ where σk is the kernel bandwidth. [sent-245, score-0.361]
79 Then, given data X, the first two moments of dKij = Kij − Kij , the error in the similarity matrix, can be determined by one of the following lemmas. [sent-246, score-0.237]
80 6 (15) (a) Gaussian data (b) Sin−sep data 5 4 2 3 (c) Concentric data 3 1 2 10 5 0 0 −1 −5 1 0 −2 −1 −10 −3 −2 −2 0 2 4 −2 −1 0 1 2 −15 −10 −5 0 5 10 Figure 6: Synthetic data sets illustrated in two dimensions. [sent-252, score-0.148]
81 Under Assumption A, given X and for large values of the dimension d, the first two ˜ moments of Kij can be computed approximately as follows: 1 ˜ E Kij = Mij − 2 2σk where Mij (t) = exp , 1 ˜2 E Kij = Mij − 2 σk , (16) λij + 2dσ 2 t + dµ4 + dσ 4 + 4σ 2 λ2 t2 , and λij = ||xi − xj ||2 . [sent-254, score-0.118]
82 (i) Given data perturbation error σ, kernel bandwidth σk and data X, the first two moments of dKij can be estimated directly using (15) or (16). [sent-256, score-0.685]
83 (1)–(16), we have established a relationship between the mis-clustering rate η and the data perturbation magnitude σ. [sent-258, score-0.599]
84 , and the different data sets impose different difficulty levels on the underlying spectral clustering algorithm, demonstrating the wide applicability of our analysis. [sent-265, score-0.596]
85 In the experiments, we use data quantization as the perturbation scheme to evaluate the upper bound provided by our analysis on the clustering error. [sent-266, score-1.287]
86 7 plots the mis-clustering rate and the upper bound for data sets subject to varying degrees of quantization. [sent-268, score-0.303]
87 As expected, the mis-clustering rate increases as one decreases the number of quantization bits. [sent-269, score-0.358]
88 We find that the error bounds are remarkably tight, which validate the assumptions we make in the analysis. [sent-270, score-0.1]
89 It is also interesting to note that even when using as few as 3-4 bits, the clustering degrades very little in both real error and as assessed by our bound. [sent-271, score-0.384]
90 The effectiveness of our bound should allow the practitioner to determine the right amount of quantization given a permitted loss in clustering performance. [sent-272, score-0.774]
91 5 Conclusion In this paper, we proposed a theoretical analysis of the clustering error for spectral clustering in the face of stochastic perturbations. [sent-273, score-0.922]
92 Our experimental evaluation has provided support for the assumptions made in the analysis, showing that the bound is tight under conditions of practical interest. [sent-274, score-0.185]
93 02 3 4 5 6 7 Number of quantization bits 8 0 0. [sent-358, score-0.422]
94 018 4 5 6 7 Number of quantization bits (i) Waveform Data 0. [sent-369, score-0.422]
95 001 8 Figure 7: Upper bounds of clustering error on approximate data obtained from quantization as a function of the number of bits. [sent-406, score-0.743]
96 (h) Wisconsin breast cancer data (569 sample size, 30 features); (i) Waveform data (5000 sample size, 21 features). [sent-408, score-0.182]
97 The x-axis shows the number of quantization bits and (above the axis) the corresponding data perturbation error σ. [sent-409, score-0.952]
98 Weiss, “On spectral clustering: Analysis and an algorithm,” in Advances in Neural Information Processing Systems (NIPS), 2002. [sent-433, score-0.246]
99 Bousquet, “Consistency of spectral clustering,” Annals of Statistics, vol. [sent-443, score-0.246]
100 Brandt, “Fast multiscale clustering and manifold identification,” Pattern Recognition, vol. [sent-466, score-0.313]
wordName wordTfidf (topN-words)
[('perturbation', 0.443), ('kij', 0.336), ('clustering', 0.313), ('quantization', 0.293), ('spectral', 0.246), ('mis', 0.207), ('ekij', 0.165), ('ijq', 0.165), ('laplacian', 0.158), ('dl', 0.155), ('dk', 0.145), ('bits', 0.129), ('bound', 0.111), ('bipartitioning', 0.103), ('dlpq', 0.103), ('edi', 0.103), ('eigenvector', 0.103), ('moments', 0.096), ('upper', 0.09), ('mij', 0.088), ('dkij', 0.083), ('di', 0.076), ('downsampling', 0.072), ('perturbed', 0.072), ('lhs', 0.066), ('rate', 0.065), ('perturbations', 0.064), ('ij', 0.061), ('matrix', 0.061), ('breast', 0.056), ('wisconsin', 0.056), ('similarity', 0.054), ('rhs', 0.054), ('cancer', 0.052), ('bounds', 0.05), ('error', 0.05), ('waveform', 0.044), ('tradeoffs', 0.044), ('practical', 0.042), ('coordinator', 0.041), ('delity', 0.041), ('dlpi', 0.041), ('ekiq', 0.041), ('garofalakis', 0.041), ('kiq', 0.041), ('misclustering', 0.041), ('taft', 0.041), ('vpj', 0.041), ('distributed', 0.041), ('vj', 0.04), ('data', 0.037), ('concentric', 0.036), ('eigendecomposition', 0.036), ('iris', 0.036), ('wine', 0.036), ('eigenvectors', 0.035), ('perturb', 0.033), ('permitted', 0.033), ('pen', 0.033), ('vldb', 0.033), ('communication', 0.032), ('tight', 0.032), ('analyses', 0.032), ('jordan', 0.032), ('relationship', 0.032), ('iq', 0.031), ('digits', 0.031), ('kannan', 0.029), ('etc', 0.029), ('sep', 0.029), ('inverting', 0.028), ('nystr', 0.028), ('segmentation', 0.027), ('practitioners', 0.027), ('clusterings', 0.027), ('malik', 0.027), ('nguyen', 0.027), ('lemma', 0.027), ('components', 0.026), ('noise', 0.025), ('clusters', 0.025), ('cluster', 0.025), ('proposition', 0.025), ('jumps', 0.025), ('practically', 0.024), ('intel', 0.024), ('determine', 0.024), ('bousquet', 0.023), ('access', 0.023), ('uj', 0.023), ('uncorrelated', 0.023), ('xi', 0.022), ('magnitude', 0.022), ('xj', 0.022), ('bandwidth', 0.022), ('pj', 0.022), ('tackle', 0.022), ('original', 0.021), ('little', 0.021), ('compression', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 218 nips-2008-Spectral Clustering with Perturbed Data
Author: Ling Huang, Donghui Yan, Nina Taft, Michael I. Jordan
Abstract: Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance. 1
2 0.16292191 219 nips-2008-Spectral Hashing
Author: Yair Weiss, Antonio Torralba, Rob Fergus
Abstract: Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art. 1
3 0.14992157 117 nips-2008-Learning Taxonomies by Dependence Maximization
Author: Matthew Blaschko, Arthur Gretton
Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1
4 0.13451444 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering
Author: Shai Ben-David, Margareta Ackerman
Abstract: Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, ([1]) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in [1]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg’s axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg’s axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time. 1
5 0.1242158 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing
Author: Benjamin Schrauwen, Lars Buesing, Robert A. Legenstein
Abstract: Randomly connected recurrent neural circuits have proven to be very powerful models for online computations when a trained memoryless readout function is appended. Such Reservoir Computing (RC) systems are commonly used in two flavors: with analog or binary (spiking) neurons in the recurrent circuits. Previous work showed a fundamental difference between these two incarnations of the RC idea. The performance of a RC system built from binary neurons seems to depend strongly on the network connectivity structure. In networks of analog neurons such dependency has not been observed. In this article we investigate this apparent dichotomy in terms of the in-degree of the circuit nodes. Our analyses based amongst others on the Lyapunov exponent reveal that the phase transition between ordered and chaotic network behavior of binary circuits qualitatively differs from the one in analog circuits. This explains the observed decreased computational performance of binary circuits of high node in-degree. Furthermore, a novel mean-field predictor for computational performance is introduced and shown to accurately predict the numerically obtained results. 1
6 0.11032368 48 nips-2008-Clustering via LP-based Stabilities
7 0.10046745 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
8 0.098282665 107 nips-2008-Influence of graph construction on graph-based clustering measures
9 0.084405668 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
10 0.083911307 113 nips-2008-Kernelized Sorting
11 0.078997366 193 nips-2008-Regularized Co-Clustering with Dual Supervision
12 0.078723386 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks
13 0.07571099 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph
14 0.06923987 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation
15 0.062344514 238 nips-2008-Theory of matching pursuit
16 0.062129091 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
17 0.060887475 194 nips-2008-Regularized Learning with Networks of Features
18 0.060536902 40 nips-2008-Bounds on marginal probability distributions
19 0.058582224 112 nips-2008-Kernel Measures of Independence for non-iid Data
20 0.056472387 239 nips-2008-Tighter Bounds for Structured Estimation
topicId topicWeight
[(0, -0.175), (1, -0.04), (2, -0.04), (3, 0.073), (4, 0.022), (5, -0.045), (6, 0.054), (7, -0.059), (8, 0.017), (9, -0.166), (10, 0.05), (11, -0.002), (12, -0.041), (13, 0.112), (14, -0.148), (15, 0.104), (16, -0.145), (17, 0.058), (18, 0.139), (19, -0.123), (20, -0.042), (21, 0.101), (22, 0.056), (23, 0.131), (24, 0.033), (25, 0.03), (26, -0.038), (27, -0.025), (28, -0.011), (29, 0.084), (30, -0.029), (31, -0.042), (32, 0.012), (33, 0.073), (34, -0.009), (35, -0.064), (36, 0.091), (37, -0.203), (38, 0.008), (39, -0.059), (40, -0.038), (41, 0.114), (42, 0.068), (43, 0.044), (44, 0.134), (45, -0.109), (46, 0.004), (47, 0.029), (48, -0.04), (49, 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.95427328 218 nips-2008-Spectral Clustering with Perturbed Data
Author: Ling Huang, Donghui Yan, Nina Taft, Michael I. Jordan
Abstract: Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance. 1
2 0.70949596 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering
Author: Shai Ben-David, Margareta Ackerman
Abstract: Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kleinberg, ([1]) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in [1]. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the object to be axiomatized. We show that principles like those formulated in Kleinberg’s axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure (CQM) is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how strong or conclusive the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (axioms) for such measures. Our axioms capture the principles expressed by Kleinberg’s axioms while retaining consistency. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we analyze the computational complexity of evaluating the quality of a given clustering and show that, for the proposed CQMs, it can be computed in polynomial time. 1
3 0.69590491 219 nips-2008-Spectral Hashing
Author: Yair Weiss, Antonio Torralba, Rob Fergus
Abstract: Semantic hashing[1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art. 1
4 0.69471347 117 nips-2008-Learning Taxonomies by Dependence Maximization
Author: Matthew Blaschko, Arthur Gretton
Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1
5 0.53508443 107 nips-2008-Influence of graph construction on graph-based clustering measures
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
6 0.50697517 48 nips-2008-Clustering via LP-based Stabilities
7 0.49981183 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph
8 0.4639574 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing
9 0.46198526 193 nips-2008-Regularized Co-Clustering with Dual Supervision
10 0.42299819 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
11 0.40328848 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks
12 0.37927461 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
13 0.37707925 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation
14 0.33993202 29 nips-2008-Automatic online tuning for fast Gaussian summation
15 0.33802426 41 nips-2008-Breaking Audio CAPTCHAs
16 0.32446969 4 nips-2008-A Scalable Hierarchical Distributed Language Model
17 0.31334421 236 nips-2008-The Mondrian Process
18 0.30941328 200 nips-2008-Robust Kernel Principal Component Analysis
19 0.30087885 113 nips-2008-Kernelized Sorting
20 0.294983 168 nips-2008-Online Metric Learning and Fast Similarity Search
topicId topicWeight
[(6, 0.071), (7, 0.079), (10, 0.075), (12, 0.046), (28, 0.184), (57, 0.044), (59, 0.029), (63, 0.048), (66, 0.011), (71, 0.026), (77, 0.036), (83, 0.037), (90, 0.201)]
simIndex simValue paperId paperTitle
same-paper 1 0.81549001 218 nips-2008-Spectral Clustering with Perturbed Data
Author: Ling Huang, Donghui Yan, Nina Taft, Michael I. Jordan
Abstract: Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance. 1
2 0.75903231 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
Author: Tong Zhang
Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
3 0.74598926 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
Author: Hannes Nickisch, Rolf Pohmann, Bernhard Schölkopf, Matthias Seeger
Abstract: We show how improved sequences for magnetic resonance imaging can be found through optimization of Bayesian design scores. Combining approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires large-scale approximate inference for dense, non-Gaussian models. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on raw data from a 3T MR scanner. 1
Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar
Abstract: We consider the problem of estimating the graph structure associated with a Gaussian Markov random field (GMRF) from i.i.d. samples. We study the performance of study the performance of the ℓ1 -regularized maximum likelihood estimator in the high-dimensional setting, where the number of nodes in the graph p, the number of edges in the graph s and the maximum node degree d, are allowed to grow as a function of the number of samples n. Our main result provides sufficient conditions on (n, p, d) for the ℓ1 -regularized MLE estimator to recover all the edges of the graph with high probability. Under some conditions on the model covariance, we show that model selection can be achieved for sample sizes n = Ω(d2 log(p)), with the error decaying as O(exp(−c log(p))) for some constant c. We illustrate our theoretical results via simulations and show good correspondences between the theoretical predictions and behavior in simulations.
5 0.71519721 196 nips-2008-Relative Margin Machines
Author: Tony Jebara, Pannagadatta K. Shivaswamy
Abstract: In classification problems, Support Vector Machines maximize the margin of separation between two classes. While the paradigm has been successful, the solution obtained by SVMs is dominated by the directions with large data spread and biased to separate the classes by cutting along large spread directions. This article proposes a novel formulation to overcome such sensitivity and maximizes the margin relative to the spread of the data. The proposed formulation can be efficiently solved and experiments on digit datasets show drastic performance improvements over SVMs. 1
6 0.71415257 62 nips-2008-Differentiable Sparse Coding
7 0.71310467 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions
8 0.71208328 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
9 0.71157938 4 nips-2008-A Scalable Hierarchical Distributed Language Model
10 0.71133167 179 nips-2008-Phase transitions for high-dimensional joint support recovery
11 0.71108031 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
12 0.71085811 202 nips-2008-Robust Regression and Lasso
13 0.70992625 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
14 0.70931506 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
15 0.70867503 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
16 0.70852286 106 nips-2008-Inferring rankings under constrained sensing
17 0.70823431 85 nips-2008-Fast Rates for Regularized Objectives
18 0.70789665 16 nips-2008-Adaptive Template Matching with Shift-Invariant Semi-NMF
19 0.70768255 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
20 0.70746124 195 nips-2008-Regularized Policy Iteration