nips nips2008 nips2008-107 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. [sent-3, score-0.922]
2 In this paper we investigate the influence of the construction of the similarity graph on the clustering results. [sent-4, score-0.525]
3 We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. [sent-5, score-0.795]
4 We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. [sent-6, score-0.519]
5 In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! [sent-7, score-0.38]
6 This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. [sent-8, score-0.888]
7 The easiest and most popular neighborhood graphs are the r-neighborhood graph, in which every point is connected to all other points within a distance of r, and the k-nearest neighbor (kNN) graph, in which every point is connected to the k closest neighboring points. [sent-11, score-0.2]
8 When applying graph based machine learning methods to given sets of data points, there are several choices to be made: the type of the graph to construct (e. [sent-12, score-0.7]
9 , r-neighborhood graph or kNN graph), and the connectivity parameter (r or k, respectively). [sent-14, score-0.35]
10 While different researchers use different heuristics and their “gut feeling” to set these parameters, neither systematic empirical studies have been conducted (for example: how sensitive are the results to the graph parameters? [sent-18, score-0.385]
11 Our goal in this paper is to address the theoretical side of this question in the context of graph based clustering. [sent-20, score-0.385]
12 We assume that the quality of a clustering is defined by some clustering objective function. [sent-23, score-0.306]
13 To this end, we first want to study the convergence of the clustering criterion (Ncut) on different kinds of graphs (kNN graph and r-neighborhood graph), as the sample size tends to infinity. [sent-25, score-0.709]
14 To our own surprise, when studying this convergence it turned out that, depending on the type of graph, the normalized cut converges to different limit values! [sent-26, score-0.355]
15 That is, the (suitably normalized) values of Ncut tend to a different limit functional, depending on whether we use the r-neighborhood graph or the kNN graph on the finite sample. [sent-27, score-0.821]
16 But of course, given a fixed partition of a sample of points, this Ncut value is different for different graphs constructed on the sample (different graph constructions put different numbers of edges between points, which leads to different Ncut values). [sent-29, score-0.532]
17 This can lead to the effect that the minimizer of Ncut on the kNN graph is different from the minimizer of Ncut on the r-graph. [sent-32, score-0.376]
18 This means that ultimately, the question about the “best Ncut” clustering, given infinite amount of data, has different answers, depending on which underlying graph we use! [sent-33, score-0.374]
19 This observation opens Pandora’s box on clustering criteria: the “meaning” of a clustering criterion does not only depend on the exact definition of the criterion itself, but also on how the graph on the finite sample is constructed. [sent-34, score-0.76]
20 More sloppy: Ncut on a kNN graph does something different than Ncut on an r-neighborhood graph! [sent-36, score-0.35]
21 We investigate how and under which conditions the Ncut criterion converges on the different graphs, and what the corresponding limit expressions are. [sent-38, score-0.196]
22 The second part of our paper shows that these findings are not only of theoretical interest, but that they also influence concrete algorithms such as spectral clustering in practice. [sent-39, score-0.228]
23 We give examples of well-clustered distributions (mixtures of Gaussians), where the optimal limit cut on the kNN graph is different from the one on the r-neighborhood graph. [sent-40, score-0.598]
24 That is, given a finite sample from some well-clustered distribution, normalized spectral clustering on the kNN graph produces systematically different results from spectral clustering on the r-neighborhood graph. [sent-42, score-0.89]
25 2 Definitions and assumptions Given a graph G = (V, E) with weights w : E → R and a partition of the nodes V into (C, V \ C) we define cut(C, V \ C) = u∈C,v∈V \C w(u, v) + w(v, u), vol(C) = u∈C,v∈V w(u, v), and Ncut(C, V \ C) = cut(C, V \ C) 1 1 + . [sent-43, score-0.373]
26 , xn we consider two main types of neighborhood graphs: • the r-neighborhood graph Gn,r : there is an edge from point xi to point xj if dist(xi , xj ) ≤ r for all 1 ≤ i, j ≤ n, i = j. [sent-47, score-0.426]
27 • the directed k-nearest neighbor graph Gn,k : there is a directed edge from xi to xj if xj is one of the k nearest neighbors of xi for 1 ≤ i, j ≤ n, i = j. [sent-48, score-0.536]
28 Given a surface S which separates the data points in two non-empty parts C + and C − , we denote by cutn,r (S) the number of edges in Gn,r that go from a sample point on one side of the surface to a sample point on the other side of the surface. [sent-52, score-0.293]
29 The corresponding quantity for the directed k-nearest neighbor graph is denoted by cutn,k (S). [sent-53, score-0.439]
30 , xn } ∩ A in the graph Gn,r is denoted by voln,r (A), and correspondingly voln,k (A) in the graph Gn,k . [sent-57, score-0.7]
31 For the cut surface S, we assume that the volume of S ∩ ∂C with respect to the (d − 1)-dimensional measure on S is a set of measure 0. [sent-67, score-0.238]
32 Moreover, in the case of the kNN-graph we consider the directed graph for simplicity. [sent-73, score-0.405]
33 Some statements can be carried over by simple arguments from the directed graph to the symmetric graph, but not all of them. [sent-74, score-0.405]
34 3 Limits of quality measures In this section we study the asymptotic behavior of the quantities introduced above for both the unweighted directed kNN graph and the unweighted r-graph. [sent-79, score-0.512]
35 , xn from the underlying distribution, we will construct the graph Gn,kn and study the convergence of Ncutn,kn (S), that is the Ncut value induced by S, evaluated on the graph Gn,kn . [sent-86, score-0.779]
36 Similarly, given a sequence (rn )n∈N of radii, we consider the convergence of Ncutn,rn induced by S on the graph Gn,rn . [sent-87, score-0.408]
37 In case d = 1, assume that kn / n → ∞, in case d ≥ 2 assume kn / log n → ∞. [sent-92, score-0.466]
38 Ncutn,kn (S) −→ 1+1/d kn (d + 1)ηd Z p1−1/d (s)ds „“ Z p(x)dx ”−1 + “Z ”−1 « . [sent-95, score-0.233]
39 C− C+ S p(x)dx d+1 For the r-neighborhood graph, assume rn > 0, rn → 0 and nrn → ∞ for n → ∞. [sent-96, score-0.434]
40 Ncutn,rn (S) −→ rn (d + 1)ηd Z p2 (s)ds „“ Z ”−1 “ Z p2 (x)dx + C− C+ S ”−1 « p2 (x)dx . [sent-99, score-0.2]
41 Proof (Sketch for the case of the kNN graph, the case of the r graph is similar. [sent-100, score-0.35]
42 Define the scaling factors ccut (n, kn ) = n−1+1/d k −1−1/d and cvol (n, kn ) = (nkn )−1 . [sent-104, score-0.583]
43 Then, (n/kn )1/d Ncut(S) can be decomposed in cut and volume term: ccut (n, kn ) cutn,kn (S) · (cvol (n, kn ) voln,kn (C + ))−1 + (cvol (n, kn ) voln,kn (C − ))−1 . [sent-105, score-0.936]
44 cvol (n, kn ) voln,kn (C + ) −→ p(x)dx, C+ and the corresponding expression holds for C − . [sent-108, score-0.3]
45 ccut (n, kn ) cutn,kn (S) −→ 1+1/d (d + 1)ηd 3 p1−1/d (s)ds. [sent-111, score-0.283]
46 For the kNN graph, if kn /n → 0 and kn / log n → ∞ for n → ∞, then E 1 nkn d n cutn,kn (S) kn 2ηd−1 −1−1/d η d+1 d → p1−1/d (s)ds. [sent-114, score-0.783]
47 S For the r-neighborhood graph, if rn → 0, rn > 0 for n → ∞, then E cutn,rn (S) d+1 n2 rn → 2ηd−1 d+1 p2 (s)ds. [sent-115, score-0.6]
48 , n) denote the number of edges in the graph that start in point xi and end in some point on the other side of the cut surface S. [sent-122, score-0.629]
49 We consider a ball B(x, rn ) of radius rn around x (where rn is the current parameter of the r-neighborhood graph). [sent-128, score-0.714]
50 That is, setting g(x, rn ) = if x ∈ C − if x ∈ C + µ B(x, rn ) ∩ C + µ B(x, rn ) ∩ C − we have E(N1 |X1 = x) = (n − 1)g(x, rn ), since the number of points in the intersection of B(x, rn ) with the other side of the hyperplane is binomially distributed with parameters n − 1 and g(x, rn ). [sent-130, score-1.385]
51 Integrating this conditional expectation over all positions of the point x in Rd gives E cutn,rn (S) = n(n − 1) g(x, rn )p(x)dx. [sent-131, score-0.2]
52 This leads to ∞ n(n − 1) g(x, rn )p(x)dx = n(n − 1) Rd g(s + tn, rn )p(s + tn) dt ds. [sent-133, score-0.433]
53 First, if x is far enough from S (that is, dist(x, s) > rn for all s ∈ S), then g(x, rn ) = 0 and the corresponding terms in the integral vanish. [sent-135, score-0.4]
54 Second, if x is close to s ∈ S and the radius rn is small, then the density on the ball B(x, rn ) can be considered approximately homogeneous, that is p(y) ≈ p(s) for all y ∈ B(x, rn ). [sent-136, score-0.747]
55 Thus, ∞ rn g(s + tn, rn )p(s + tn) dt = −∞ g(s + tn, rn )p(s + tn) dt −rn rn ≈2 p(s) vol B(s + tn, rn ) ∩ C − p(s) dt. [sent-137, score-1.161]
56 0 − d It is not hard to see that vol B(s + tn, rn ) ∩ C = rn A(t/rn ), where A(t/rn ) denotes the volume of the cap of the unit ball capped at distance t/rn . [sent-138, score-0.598]
57 Solving the integrals leads to rn 1 d+1 vol B(s + tn, rn ) ∩ C − dt = rn 0 d+1 A(t)dt = rn 0 Combining the steps above we obtain the result for the r-neighborhood graph. [sent-139, score-0.928]
58 We have to replace the radius rn by the k-nearest neighbor radius, that is, the distance of a data point to its kth nearest neighbor. [sent-142, score-0.333]
59 By a technical lemma one can show that for large n, under the condition kn / log n → ∞ we can replace the integration over the possible values of the kNN radius by its expectation. [sent-144, score-0.306]
60 Then we observe that as kn /n → 0, the expected kNN radius converges to 0, that is for large n we only have to integrate over balls of homogeneous density. [sent-145, score-0.378]
61 ˜ Proposition 1 already shows one of the most important differences between the limits of the expected cut for the different graphs: For the r-graph we integrate over p2 , while we integrate over p1−1/d for the kNN graph. [sent-148, score-0.272]
62 This difference comes from the fact that the kNN-radius is a random quantity, which is not the case for the deterministically chosen radius rn in the r-graph. [sent-149, score-0.273]
63 For the kNN graph, if the dimension d = 1 then assume kn / n → ∞, for d ≥ 2 assume kn / log n → ∞. [sent-151, score-0.496]
64 nkn kn nkn kn d+1 For the r-neighborhood graph, let rn > 0, rn → 0 such that nrn → ∞ for n → ∞. [sent-156, score-1.068]
65 In the graph Gn,kn there are exactly k outgoing edges from each node. [sent-172, score-0.412]
66 For the graph Gn,rn we decompose the volume into the contributions of all the points, and for a single point we condition on its location. [sent-174, score-0.386]
67 The number of outgoing edges, provided the point is at position x, is the number of other points in B(x, rn ), which is binomially distributed with parameters (n − 1) and µ(B(x, rn )). [sent-175, score-0.505]
68 If rn is d sufficiently small we can approximate µ(B(x, rn )) by ηd rn p(x) under our conditions on the density. [sent-176, score-0.6]
69 In the literature, we only know of one other limit result for graph cuts, proved by Narayanan et al. [sent-179, score-0.489]
70 Here the authors study the case of a fully connected graph with Gaussian weights wt (xi , xj ) = 1/(4πt)d/2 exp(−dist(xi − xj )2 /4t). [sent-181, score-0.413]
71 Denoting the corresponding cut value by cutn,t , the authors show that if tn → 0 such that tn > 1/n1/(2d+2) , then √ π √ cutn,tn → p(s) ds a. [sent-182, score-0.441]
72 n tn S Comparing this result to ours, we can see that it corroborates our finding: yet another graph leads to yet another limit result (for cut, as the authors did not study the Ncut criterion). [sent-184, score-0.637]
73 5 4 Examples where different limits of Ncut lead to different optimal cuts In Theorem 1 we have proved that the kNN graph leads to a different limit functional for Ncut(S) than the r-neighborhood graph. [sent-185, score-0.575]
74 We will see that if we select an optimal cut based on the limit criterion for the kNN graph we can obtain a different result than if we use the limit criterion based on the r-neighborhood graph. [sent-187, score-0.777]
75 This shows that on finite data sets, different constructions of the graph can lead to systematic differences in the clustering results. [sent-189, score-0.63]
76 We first investigate the theoretic limit Ncut values, for hyperplanes which cut perpendicular to the first dimension (which is the “informative” dimension of the data). [sent-211, score-0.423]
77 We sampled n = 2000 points from the given distributions and constructed the (unweighted) kNN graph (we tried a range of parameters of k and r, our results are stable with respect to this choice). [sent-216, score-0.385]
78 Then we evaluated the empirical Ncut values for all hyperplanes which cut perpendicular to the informative dimension, similar as in the last paragraph. [sent-217, score-0.29]
79 Instead of the directed kNN graph we used the undirected one, as standard spectral clustering is not defined for directed graphs. [sent-223, score-0.688]
80 In our spectral clustering experiment, we could observe that the clusterings obtained by spectral clustering are usually very close to the theoretically optimal hyperplane splits predicted by theory (the minimal matching distances to the optimal hyperplane splits were always in the order of 0. [sent-227, score-0.83]
81 As predicted by theory, both kinds of graph give different cuts in the data. [sent-229, score-0.423]
82 045 We can see that for the same graph, the clustering results are very stable (differences in the order of 10−3 ) whereas the differences between the kNN graph and the r-neighborhood graph are substantial (0. [sent-243, score-0.894]
83 The dashed blue vertical line depicts the optimal limit cut of the r-graph, the solid red vertical line the optimal limit cut of the kNN graph. [sent-252, score-0.496]
84 NCut of hyperplanes, kNN graph, d=1, n=2000, k=30 20 emp pred NCut of hyperplanes, kNN graph, d=2, n=2000, k=100 20 emp pred 10 10 0 −2 0 0 −2 2 0 2 Ncut of hyperplanes, r−graph, d=1, n=2000, r=0. [sent-253, score-0.252]
85 Then we ran spectral clustering on different subsamples of the data sets (n = 200 for breast cancer, n = 170 for heart). [sent-272, score-0.296]
86 To evaluate whether our clusterings were any useful at all, we computed the minimal matching distance between the clusterings and the true class labels and obtained distances of 0. [sent-273, score-0.322]
87 This shows that the clustering results differ considerably between the two kinds of graph (and compared to the expected random effects, this difference does not look random at all). [sent-311, score-0.523]
88 This experiment shows that for some data sets a systematic difference between the clusterings based on different graph types exists. [sent-313, score-0.483]
89 5 −2 −1 0 1 2 −1 0 1 2 Figure 3: Results of spectral clustering in two dimensions, for r-graph (left) and kNN graph (right) different limit results might just be one potential reason, and other reasons might exist. [sent-324, score-0.675]
90 But whatever the reason is, it is interesting to observe these systematic differences between graph types in real data. [sent-325, score-0.426]
91 5 Discussion In this paper we have investigated the influence of the graph construction on graph-based clustering measures such as the normalized cut. [sent-326, score-0.557]
92 In our paper, we computed the exact limit expressions for the r-neighborhood graph and the kNN graph. [sent-328, score-0.485]
93 2, yet a different limit result for a complete graph using Gaussian weights exists in the literature (Narayanan et al. [sent-329, score-0.489]
94 The fact that all these different graphs lead to different clustering criteria shows that these criteria cannot be studied isolated from the graph they will be applied to. [sent-331, score-0.67]
95 It would be interesting to use these to determine an optimal choice of the connectivity parameter k or r of the graphs (we have already proved such results in a completely different graph clustering setting, cf. [sent-336, score-0.595]
96 For practice, it will be important to study how the different limit results influence clustering results. [sent-343, score-0.271]
97 The examples we provided above already show that different graphs indeed can lead to systematically different clusterings in practice. [sent-345, score-0.225]
98 Gaining more understanding of this effect will be an important direction of research if one wants to understand the nature of different graph clustering criteria. [sent-346, score-0.524]
99 Influence of graph construction on graph-based quality measures - technical supplement. [sent-365, score-0.372]
100 On the relation between low density separation, spectral clustering and graph cuts. [sent-372, score-0.611]
wordName wordTfidf (topN-words)
[('ncut', 0.499), ('knn', 0.453), ('graph', 0.35), ('kn', 0.233), ('rn', 0.2), ('dknn', 0.185), ('clustering', 0.153), ('cut', 0.151), ('maier', 0.132), ('tn', 0.127), ('clusterings', 0.098), ('limit', 0.097), ('hyperplanes', 0.095), ('vol', 0.095), ('rd', 0.086), ('nkn', 0.084), ('spectral', 0.075), ('radius', 0.073), ('graphs', 0.071), ('von', 0.069), ('breast', 0.068), ('luxburg', 0.067), ('cvol', 0.067), ('pred', 0.067), ('hyperplane', 0.065), ('emp', 0.059), ('dr', 0.057), ('directed', 0.055), ('cuts', 0.053), ('cancer', 0.052), ('surface', 0.051), ('binomially', 0.05), ('ccut', 0.05), ('minimal', 0.048), ('unweighted', 0.043), ('heart', 0.043), ('edges', 0.042), ('dx', 0.041), ('differences', 0.041), ('ball', 0.041), ('criterion', 0.041), ('dist', 0.04), ('narayanan', 0.04), ('expressions', 0.038), ('dim', 0.038), ('volume', 0.036), ('ds', 0.036), ('originating', 0.036), ('points', 0.035), ('criteria', 0.035), ('side', 0.035), ('systematic', 0.035), ('neighborhood', 0.034), ('proposition', 0.034), ('neighbor', 0.034), ('markus', 0.034), ('nrn', 0.034), ('ulrike', 0.034), ('dt', 0.033), ('density', 0.033), ('normalized', 0.032), ('convergence', 0.031), ('systematically', 0.03), ('dimension', 0.03), ('distances', 0.029), ('hypersurface', 0.029), ('uence', 0.029), ('limits', 0.028), ('induced', 0.027), ('supplement', 0.027), ('lead', 0.026), ('integrate', 0.026), ('homogeneous', 0.026), ('distance', 0.026), ('hein', 0.025), ('constructions', 0.025), ('mcdiarmid', 0.025), ('depending', 0.024), ('informative', 0.024), ('splits', 0.023), ('matching', 0.023), ('assumptions', 0.023), ('suitably', 0.023), ('sample', 0.022), ('construction', 0.022), ('sketch', 0.022), ('nite', 0.022), ('proved', 0.021), ('et', 0.021), ('xj', 0.021), ('yet', 0.021), ('wants', 0.021), ('clusters', 0.021), ('study', 0.021), ('converges', 0.02), ('kinds', 0.02), ('shi', 0.02), ('surfaces', 0.02), ('outgoing', 0.02), ('perpendicular', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 107 nips-2008-Influence of graph construction on graph-based clustering measures
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
2 0.12589261 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
3 0.11488116 171 nips-2008-Online Prediction on Large Diameter Graphs
Author: Mark Herbster, Guy Lever, Massimiliano Pontil
Abstract: We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacian-based algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when tackling simple problems. We overcome this drawback by means of an efficient algorithm which achieves a logarithmic mistake bound. It is based on the notion of a spine, a path graph which provides a linear embedding of the original graph. In practice, graphs may exhibit cluster structure; thus in the last part, we present a modified algorithm which achieves the “best of both worlds”: it performs well locally in the presence of cluster structure, and globally on large diameter graphs. 1
4 0.10796262 84 nips-2008-Fast Prediction on a Tree
Author: Mark Herbster, Massimiliano Pontil, Sergio R. Galeano
Abstract: Given an n-vertex weighted tree with structural diameter S and a subset of m vertices, we present a technique to compute a corresponding m × m Gram matrix of the pseudoinverse of the graph Laplacian in O(n + m2 + mS) time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. We present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 vertices and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information. 1
5 0.10745534 138 nips-2008-Modeling human function learning with Gaussian processes
Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1
6 0.10705766 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
8 0.098403879 102 nips-2008-ICA based on a Smooth Estimation of the Differential Entropy
9 0.098282665 218 nips-2008-Spectral Clustering with Perturbed Data
10 0.085407197 194 nips-2008-Regularized Learning with Networks of Features
11 0.082912594 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering
12 0.080791764 117 nips-2008-Learning Taxonomies by Dependence Maximization
13 0.07987643 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
14 0.075905874 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
15 0.074216686 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
16 0.069609478 44 nips-2008-Characteristic Kernels on Groups and Semigroups
17 0.068900354 219 nips-2008-Spectral Hashing
18 0.067787431 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph
19 0.063866258 40 nips-2008-Bounds on marginal probability distributions
20 0.062387444 193 nips-2008-Regularized Co-Clustering with Dual Supervision
topicId topicWeight
[(0, -0.155), (1, -0.03), (2, -0.043), (3, 0.068), (4, 0.073), (5, -0.141), (6, 0.088), (7, -0.051), (8, 0.082), (9, -0.21), (10, -0.015), (11, -0.015), (12, -0.046), (13, -0.045), (14, -0.105), (15, 0.02), (16, -0.088), (17, 0.083), (18, -0.05), (19, -0.015), (20, -0.06), (21, 0.028), (22, 0.144), (23, 0.104), (24, 0.034), (25, -0.012), (26, 0.029), (27, 0.013), (28, 0.05), (29, -0.073), (30, 0.008), (31, -0.084), (32, 0.001), (33, 0.056), (34, 0.043), (35, -0.006), (36, -0.007), (37, -0.099), (38, -0.062), (39, -0.016), (40, 0.025), (41, -0.097), (42, 0.111), (43, 0.014), (44, 0.101), (45, 0.039), (46, -0.082), (47, 0.126), (48, 0.078), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.96617866 107 nips-2008-Influence of graph construction on graph-based clustering measures
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
2 0.66329598 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
3 0.62629175 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
Author: Benjamin Yackley, Eduardo Corona, Terran Lane
Abstract: Many interesting problems, including Bayesian network structure-search, can be cast in terms of finding the optimum value of a function over the space of graphs. However, this function is often expensive to compute exactly. We here present a method derived from the study of Reproducing Kernel Hilbert Spaces which takes advantage of the regular structure of the space of all graphs on a fixed number of nodes to obtain approximations to the desired function quickly and with reasonable accuracy. We then test this method on both a small testing set and a real-world Bayesian network; the results suggest that not only is this method reasonably accurate, but that the BDe score itself varies quadratically over the space of all graphs. 1
4 0.56705701 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph
Author: Deli Zhao, Xiaoou Tang
Abstract: Detecting underlying clusters from large-scale data plays a central role in machine learning research. In this paper, we tackle the problem of clustering complex data of multiple distributions and multiple scales. To this end, we develop an algorithm named Zeta l-links (Zell) which consists of two parts: Zeta merging with a similarity graph and an initial set of small clusters derived from local l-links of samples. More specifically, we propose to structurize a cluster using cycles in the associated subgraph. A new mathematical tool, Zeta function of a graph, is introduced for the integration of all cycles, leading to a structural descriptor of a cluster in determinantal form. The popularity character of a cluster is conceptualized as the global fusion of variations of such a structural descriptor by means of the leave-one-out strategy in the cluster. Zeta merging proceeds, in the hierarchical agglomerative fashion, according to the maximum incremental popularity among all pairwise clusters. Experiments on toy data clustering, imagery pattern clustering, and image segmentation show the competitive performance of Zell. The 98.1% accuracy, in the sense of the normalized mutual information (NMI), is obtained on the FRGC face data of 16028 samples and 466 facial clusters. 1
5 0.5613665 84 nips-2008-Fast Prediction on a Tree
Author: Mark Herbster, Massimiliano Pontil, Sergio R. Galeano
Abstract: Given an n-vertex weighted tree with structural diameter S and a subset of m vertices, we present a technique to compute a corresponding m × m Gram matrix of the pseudoinverse of the graph Laplacian in O(n + m2 + mS) time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. We present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 vertices and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information. 1
6 0.54784131 171 nips-2008-Online Prediction on Large Diameter Graphs
7 0.54326457 218 nips-2008-Spectral Clustering with Perturbed Data
8 0.54308408 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering
9 0.51513869 212 nips-2008-Skill Characterization Based on Betweenness
10 0.50661778 117 nips-2008-Learning Taxonomies by Dependence Maximization
11 0.49988469 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
12 0.48476151 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
13 0.43512774 219 nips-2008-Spectral Hashing
14 0.39474633 225 nips-2008-Supervised Bipartite Graph Inference
15 0.39246041 48 nips-2008-Clustering via LP-based Stabilities
16 0.38687739 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
17 0.38660571 236 nips-2008-The Mondrian Process
18 0.36517799 194 nips-2008-Regularized Learning with Networks of Features
19 0.34946582 193 nips-2008-Regularized Co-Clustering with Dual Supervision
20 0.33495641 104 nips-2008-Improved Moves for Truncated Convex Models
topicId topicWeight
[(6, 0.049), (7, 0.066), (12, 0.041), (28, 0.227), (33, 0.249), (57, 0.083), (59, 0.04), (63, 0.021), (64, 0.01), (71, 0.036), (77, 0.047), (83, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.831025 107 nips-2008-Influence of graph construction on graph-based clustering measures
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
2 0.77659136 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
Author: Gabriele Schweikert, Gunnar Rätsch, Christian Widmer, Bernhard Schölkopf
Abstract: We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.
3 0.72106808 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
4 0.71709126 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
5 0.71684492 104 nips-2008-Improved Moves for Truncated Convex Models
Author: Philip Torr, M. P. Kumar
Abstract: We consider the problem of obtaining the approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose an improved st-MINCUT based move making algorithm. Unlike previous move making approaches, which either provide a loose bound or no bound on the quality of the solution (in terms of the corresponding Gibbs energy), our algorithm achieves the same guarantees as the standard linear programming (LP) relaxation. Compared to previous approaches based on the LP relaxation, e.g. interior-point algorithms or treereweighted message passing (TRW), our method is faster as it uses only the efficient st-MINCUT algorithm in its design. Furthermore, it directly provides us with a primal solution (unlike TRW and other related methods which solve the dual of the LP). We demonstrate the effectiveness of the proposed approach on both synthetic and standard real data problems. Our analysis also opens up an interesting question regarding the relationship between move making algorithms (such as α-expansion and the algorithms presented in this paper) and the randomized rounding schemes used with convex relaxations. We believe that further explorations in this direction would help design efficient algorithms for more complex relaxations.
6 0.7164858 231 nips-2008-Temporal Dynamics of Cognitive Control
7 0.71506006 40 nips-2008-Bounds on marginal probability distributions
8 0.7148065 138 nips-2008-Modeling human function learning with Gaussian processes
10 0.71350366 4 nips-2008-A Scalable Hierarchical Distributed Language Model
11 0.71327257 152 nips-2008-Non-stationary dynamic Bayesian networks
12 0.71285826 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
13 0.71262622 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
14 0.71159101 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
15 0.71154439 211 nips-2008-Simple Local Models for Complex Dynamical Systems
16 0.71118253 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
17 0.71021575 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions
18 0.70933264 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
19 0.70932907 113 nips-2008-Kernelized Sorting
20 0.70904714 31 nips-2008-Bayesian Exponential Family PCA