nips nips2005 nips2005-178 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kai Yu, Shipeng Yu, Volker Tresp
Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We propose a simple clustering framework on graphs encoding pairwise data similarities. [sent-8, score-0.34]
2 Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. [sent-9, score-0.193]
3 More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. [sent-10, score-0.685]
4 A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i. [sent-11, score-0.401]
5 , a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. [sent-13, score-0.444]
6 , normalized cut [5]) requires no assumption on data densities but simply a similarity function, and usually partitions data exclusively into clusters. [sent-19, score-0.22]
7 In contrast, model-based methods apply mixture models to fit data distributions and assign data to clusters (i. [sent-20, score-0.228]
8 This soft clustering is often desired, as it encodes uncertainties on datato-cluster assignments. [sent-23, score-0.368]
9 clusters have to be Gaussian-like in Gaussian mixture models (GMMs). [sent-26, score-0.228]
10 In contrast to flat clustering, hierarchical clustering makes intuitive senses by forming a tree of clusters. [sent-27, score-0.427]
11 This paper suggests a novel graph-factorization clustering (GFC) framework that employs data’s affinities and meanwhile partitions data probabilistically. [sent-32, score-0.383]
12 A hierarchical clustering algorithm (HGFC) is further derived by merging lower-level clusters into higher-level ones. [sent-33, score-0.62]
13 Analysis based on graph random walks suggests that our clustering method models data affinities as empirical transitions generated by a mixture of latent factors. [sent-34, score-0.759]
14 This view significantly differs from conventional model-based clustering since here the mixture model is not directly for data objects but for their relations. [sent-35, score-0.366]
15 Interestingly, we prove that the higher-level clusters are associated with longer-term diffusive transitions on the graph, amounting to smoother and more global similarity functions on the data mani- fold. [sent-37, score-0.447]
16 To the best of our knowledge, this property has never been considered by other agglomerative hierarchical clustering algorithms (e. [sent-39, score-0.479]
17 In the following section we describe a clustering algorithm based on similarity graphs. [sent-43, score-0.404]
18 2 Graph-factorization clustering (GFC) Data similarity relations can be conveniently encoded by a graph, where vertices denote data objects and adjacency weights represent data similarities. [sent-50, score-0.764]
19 This section introduces graph factorization clustering, which is a probabilistic partition of graph vertices. [sent-51, score-0.429]
20 Formally, let G(V, E) be a weighted undirected graph with vertices V = {vi }n and edges i=1 E ⊆ {(vi , vj )}. [sent-52, score-0.55]
21 Let W = {wij } be the adjacency matrix, where wij = wji , wij > 0 if (vi , vj ) ∈ E and wij = 0 otherwise. [sent-53, score-1.081]
22 For instances, wij can be computed by the RBF similarity function based on the features of objects i and j, or by a binary indicator (0 or 1) of the k-nearest neighbor affinity. [sent-54, score-0.351]
23 1 Bipartite graphs Before presenting the main idea, it is necessary to introduce bipartite graphs. [sent-56, score-0.21]
24 Let B = {bip } denote the n × m adjacency matrix with bip ≥ 0 being the weight for edge [vi , up ]. [sent-61, score-0.386]
25 The bipartite graph K induces a similarity between v1 and vj [6] m wij = bip bjp = BΛ−1 B λp p=1 , ij Λ = diag(λ1 , . [sent-62, score-1.184]
26 , λm ) (1) n where λp = i=1 bip denotes the degree of vertex up ∈ U. [sent-65, score-0.212]
27 wij is essentially a quantity proportional to the stationary probability of direct transitions between vi and vj , denoted by p(vi , vj ). [sent-68, score-1.15]
28 Without loss of generality, we normalize W to ensure ij wij = 1 and wij = p(vi , vj ). [sent-69, score-0.843]
29 For a bipartite graph K(V, U, F), there is no direct links between vertices in V, and all the paths from vi to vj must go through vertices in U. [sent-70, score-1.119]
30 This indicates p(vi , vj ) = p(vi )p(vj |vi ) = di p(up |vi )p(vj |up ) = p p p(vi , up )p(up , vj ) , λp where p(vj |vi ) is the conditional transition probability from vi to vj , and di = p(vi ) the degree of vi . [sent-71, score-1.406]
31 2 Graph factorization by bipartite graph construction For a bipartite graph K, p(up |vi ) = bip /di tells the conditional probability of transitions from vi to up . [sent-75, score-1.356]
32 This property suggests that one can construct a bipartite graph K(V, U, F) to approximate a given G(V, E), and then obtain a soft clustering structure, where U corresponds to clusters (see Fig. [sent-77, score-0.947]
33 (a) (b) (c) Figure 1: (a) The original graph representing data affinities; (b) The bipartite graph representing data-to-cluster relations; (c) The induced cluster affinities. [sent-79, score-0.672]
34 (1) suggests that this approximation can be done by minimizing (W, BΛ−1 B ), given a distance (·, ·) between two adjacency matrices. [sent-81, score-0.255]
35 To make the problem easy to solve, we remove the coupling between B and Λ via H = BΛ−1 and then have n min H,Λ W, HΛH , hip = 1, H ∈ Rn×m , Λ ∈ Dm×m , + + s. [sent-82, score-0.19]
36 For divergence distance (X, Y) = ij (xij log yij − xij + yij ), the cost function in Eq. [sent-89, score-0.295]
37 (2) is non-increasing under the update rule ( ˜ denote updated quantities) · ˜ hip ∝ hip j ˜ λp ∝ λp ij wij λp hjp , (HΛH )ij wij hip hjp , (HΛH )ij ˜ hip = 1; normalize s. [sent-90, score-1.493]
38 (4) ij The distance is invariant under the update if and only if H and Λ are at a stationary point. [sent-95, score-0.187]
39 Similar to GMM, p(up |vi ) = bip / q biq is the soft probabilistic assignment of vertex vi to cluster up . [sent-97, score-0.732]
40 , for k-nearest neighbor graph the complexity O(m2 nk) scales linearly with sample size n). [sent-102, score-0.192]
41 3 Hierarchical graph-factorization clustering (HGFC) As a nice property of the proposed graph factorization, a natural affinity between two clusters up and uq can be computed as n p(up , uq ) = i=1 bip biq = B D−1 B di , pq D = diag(d1 , . [sent-103, score-0.965]
42 Note that the similarity between clusters p and q takes into account a weighted average of contributions from all the data (see Fig. [sent-108, score-0.297]
43 Let G0 (V0 , E0 ) be the initial graph describing the similarities of totally m0 = n data points, with adjacency matrix W0 . [sent-110, score-0.538]
44 Based on G0 we can build a bipartite graph K1 (V0 , V1 , F1 ), with m1 < m0 vertices in V1 . [sent-111, score-0.487]
45 A hierarchical clustering method can be motivated from the observation that the cluster similarity in Eq. [sent-112, score-0.62]
46 (5) suggests a new adjacency matrix W1 for graph G1 (V1 , E1 ), where V1 is formed by clusters, and E1 contains edges connecting these clusters. [sent-113, score-0.527]
47 Then we can group those clusters by constructing another bipartite graph K2 (V1 , V2 , F2 ) with m2 < m1 vertices in V2 , such that W1 is again factorized as in Eq. [sent-114, score-0.657]
48 Algorithm 1 Hierarchical Graph-Factorization Clustering (HGFC) Require: given n data objects and a similarity measure 1: build the similarity graph G0 (V0 , E0 ) with adjacency matrix W0 , and let m0 = n 2: for l = 1, 2, . [sent-118, score-0.702]
49 For level l, we can assign data to the obtained ml clusters via a propagation from the bottom level of clusters. [sent-122, score-0.336]
50 , probabilistic) assignment of vi ∈ (l) V0 to cluster vp ∈ Vl is given by (l) p vp |vi = (l) ¯ p vp |v (l−1) · · · p v (1) |vi = D−1 Bl 1 ··· v (l−1) ∈V l−1 v (1) ∈V ip , (6) 1 ¯ where Bl = B1 D−1 B2 D−1 B3 . [sent-125, score-0.588]
51 One can interpret this by deriving an equiv2 3 l ¯ ¯ ¯ alent bipartite graph Kl (V0 , Vl , Fl ), and treating Bl as the equivalent adjacency matrix ¯ attached to the equivalent edges Fl connecting data V0 and clusters Vl . [sent-129, score-0.866]
52 1 Analysis of the proposed algorithms Flat clustering: statistical modeling of single-hop transitions In this section we provide some insights to the suggested clustering algorithm, mainly from the perspective of random walks on graphs. [sent-131, score-0.508]
53 Suppose that from a stationary stage of random walks on G(V, E), one observes πij single-hop transitions between vi and vj in a unitary time frame. [sent-132, score-0.786]
54 Thus we connect the observed similarities to the frequency of transitions via wij ∝ πij . [sent-134, score-0.416]
55 sampled from a true distribution p(vi , vj ) = (HΛH )ij where a bipartite graph is behind, then the log likelihood with respect to the observed transitions is p(vi , vj )πij ∝ L(H, Λ) = log ij wij log(HΛH )ij . [sent-138, score-1.356]
56 For a weighted undirected graph G(V, E) and the log likelihood defined in Eq. [sent-141, score-0.221]
57 Figure 2: The similarities of vertices to a fixed vertex (marked in the left panel) on a 6nearest-neighbor graph, respectively induced by clustering level l = 2 (the middle panel) and l = 6 (the right panel). [sent-145, score-0.636]
58 2 Hierarchical clustering: statistical modeling of multi-hop transitions The adjacency matrix W0 of G0 (V0 , E0 ) only models one-hop transitions that follow direct links from vertices to their neighbors. [sent-148, score-0.602]
59 Obviously, multihop transitions induce a slowly decaying similarity function on the graph. [sent-151, score-0.272]
60 Based on the chain rule of Markov process, the equivalent adjacency matrix for t-hop transitions is At = W0 (D−1 W0 )t−1 = At−1 D−1 W0 . [sent-152, score-0.374]
61 (8) 0 0 Generally speaking, a slowly decaying similarity function on the similarity graph captures a global affinity structure of data manifolds, while a rapidly decaying similarity function only tells the local affinity structure. [sent-153, score-0.612]
62 The following proposition states that in the suggested HGFC, a higher-level clustering implicitly employs a more global similarity measure caused by multi-hop Markov random walks: Proposition 4. [sent-154, score-0.501]
63 For a given hierarchical clustering structure that starts from a bottom graph G0 (V0 , E0 ) to a higher level Gk (Vk , Ek ), the vertices Vl at level 0 < l ≤ k induces an equivalent adjacency matrix of V0 , which is At with t = 2l−1 as defined in Eq. [sent-156, score-1.083]
64 Therefore the presented hierarchical clustering algorithm HGFC applies different sizes of time windows to examine random walks, and derives different scales of similarity measures to expose the local and global clustering structures of data manifolds. [sent-158, score-0.855]
65 2 illustrates the employed similarities of vertices to a fixed vertex in clustering levels l = 2 and 6, which corresponds to time periods t = 2 and 32. [sent-160, score-0.55]
66 It can be seen that for a short period t = 2, the similarity is very local and helps to uncover low-level clusters, while in a longer period t = 32 the similarity function is rather global. [sent-161, score-0.28]
67 We compare HGFC with two popular agglomerative hierarchical clustering algorithms, single link and complete link (e. [sent-169, score-0.625]
68 Both methods merge two closest clusters at each step. [sent-172, score-0.235]
69 Figure 4: Comparison of clustering methods on USPS (left) and Newsgroup (right), evaluated by normalized mutual information (NMI). [sent-175, score-0.364]
70 Single link defines the cluster distance to be the smallest point-wise distance between two clusters, while complete link uses the largest one. [sent-177, score-0.291]
71 Before showing the comparison, we visualize a part of clustering results for USPS data in Fig. [sent-181, score-0.3]
72 On top of the left figure, we show the top three levels of the hierarchy with respectively 4, 10 and 20 clusters, where each cluster is represented by its mean image via an average over all the images weighted by their posterior probabilities of belonging to this cluster. [sent-183, score-0.228]
73 Then 10 randomly sampled digits with soft cluster assignments to the top 3rd level clusters are illustrated with a Hinton graph. [sent-184, score-0.535]
74 3 show the assignments between clusters across the hierarchy. [sent-186, score-0.242]
75 “1” “2” “3” “4” 635 2 2 10 Normalized cut 630 1 3 4 744 179 1 817 4 6 1 835 1254 1 1 4 HGFC 3 8 886 33 4 816 8 2 4 9 3 838 1265 17 10 58 K-means 1 0 720 95 9 796 20 0 3 97 9 774 Table 1: Confusion matrices of clustering results, 4 clusters, USPS data. [sent-188, score-0.387]
76 baseball hockey Normalized cut 858 98 30 79 893 16 44 33 875 11 8 893 2 5 40 85 772 42 15 7 HGFC 182 13 934 5 33 843 21 11 21 12 101 958 977 985 39 16 K-means 7 4 3 5 835 114 4 900 0 0 4 77 Table 2: Confusion matrices of clustering results, 4 clusters, Newsgroup data. [sent-191, score-0.427]
77 We compare the clustering methods by evaluating the normalized mutual information (NMI) in Fig. [sent-193, score-0.364]
78 It is defined to be the mutual information between clusters and true classes, normalized by the maximum of marginal entropies. [sent-195, score-0.257]
79 Moreover, in order to more directly assess the clustering quality, we also illustrate the confusion matrices in Table 1 and Table 2, in the case of producing 4 clusters. [sent-196, score-0.433]
80 We drop out the confusion matrices of single link and complete link in the tables, for saving spaces and also due to their clearly poor performance compared with others. [sent-197, score-0.279]
81 For the Newsgroup data it even gets stuck at the 3601-th merge because all the similarities between clusters are 0. [sent-200, score-0.309]
82 Top-down hierarchical normalized cut obtains reasonable results, but sometimes cannot split one big cluster (see the tables). [sent-201, score-0.305]
83 It also produces confusion matrices with clear diagonal structures (see tables 1 and 2), which indicates a very good clustering quality. [sent-205, score-0.519]
84 6 Conclusion and Future Work In this paper we have proposed a probabilistic graph partition method for clustering data objects based on their pairwise similarities. [sent-206, score-0.523]
85 A novel hierarchical clustering algorithm HGFC has been derived, where a higher level in HGFC corresponds to a statistical model of random walk transitions in a longer period, giving rise to a more global clustering structure. [sent-207, score-0.974]
86 In this paper we have empirically specified the number of clusters in each level. [sent-209, score-0.193]
87 Another direction is hierarchical clustering on directed graphs, as well as its applications in web mining. [sent-211, score-0.427]
88 We first notice that p λp = ij wij under constraints i hip = 1. [sent-214, score-0.54]
89 Therefore we can normalize W by ij wij and after convergence multiply all λp by this quantity to get the solution. [sent-215, score-0.397]
90 Under this assumption we are maximizing L(H, Λ) = ij wij log(HΛH )ij with an extra constraint p λp = 1. [sent-216, score-0.35]
91 Define f (H, H∗ ) = ij wij p h∗ λp h∗ jp ip ∗ ∗ l hil λl hjl ∗ log hip λp hjp − log h∗ λp h∗ jp ip ∗ ∗ l hil λl hjl . [sent-221, score-0.958]
92 (3) by setting the derivative of f with respect to hip to be zero. [sent-224, score-0.19]
93 In the E-step we estimate the a posteriori probability of taking up for pair (vi , vj ) using Bayes’ rule: p(up |vi , vj ) ∝ ˆ p(vi |up )p(vj |up )p(up ). [sent-233, score-0.46]
94 And then in the M-step we maximize the “complete” data likelihood ˆ ˆ L(G) = p p(up |vi , vj ) log p(vi |up )p(vj |up )p(up ) with respect to model parameters ij wij hip = p(vi |up ) and λp = p(up ), with constraints i hip = 1 and p λp = 1. [sent-234, score-0.989]
95 By setting the corˆ ˆ responding derivatives to zero we obtain hip ∝ j wij p(up |vi , vj ) and λp ∝ ij wij p(up |vi , vj ). [sent-235, score-1.216]
96 (6)) with adjacency matrix Bl , degrees D0 for V0 , and ¯ ¯ ¯ degrees Λl for Vl . [sent-243, score-0.292]
97 In this case the induced adjacency matrix of V0 is Wl = Bl Λ−1 Bl , and l ¯ l D−1 Bl . [sent-244, score-0.277]
98 Let Kl (Vl , Vl+1 , Fl+1 ) be the bipartite graph ¯ the adjacency matrix of Vl is Wl = B 0 connecting Vl and Vl+1 , with the adjacency Bl+1 and degrees Λl+1 for Vl+1 . [sent-245, score-0.872]
99 Then the adjacency ¯ ¯ ¯ ¯ ¯ matrix of V0 induced by level l + 1 is Wl+1 = Bl Λ−1 Bl+1 Λ−1 Bl+1 Λ−1 Bl = Wl D−1 Wl , 0 l l+1 l −1 ¯ l D−1 Bl and Wl = Bl Λ−1 Bl are applied. [sent-246, score-0.334]
100 Interpreting and extending classical agglomerative clustering algorithms using a model-based approach. [sent-269, score-0.352]
wordName wordTfidf (topN-words)
[('vi', 0.323), ('clustering', 0.3), ('bl', 0.29), ('hgfc', 0.278), ('vj', 0.23), ('wij', 0.216), ('vl', 0.208), ('adjacency', 0.203), ('clusters', 0.193), ('graph', 0.192), ('hip', 0.19), ('bipartite', 0.17), ('bip', 0.138), ('ij', 0.134), ('hierarchical', 0.127), ('transitions', 0.126), ('wl', 0.105), ('similarity', 0.104), ('vertices', 0.102), ('confusion', 0.097), ('ht', 0.09), ('cluster', 0.089), ('usps', 0.088), ('walks', 0.082), ('fl', 0.079), ('similarities', 0.074), ('vertex', 0.074), ('link', 0.073), ('af', 0.072), ('newsgroup', 0.069), ('soft', 0.068), ('hjp', 0.06), ('level', 0.057), ('nmi', 0.052), ('agglomerative', 0.052), ('cut', 0.051), ('hinton', 0.049), ('assignments', 0.049), ('vp', 0.047), ('nities', 0.047), ('normalize', 0.047), ('matrix', 0.045), ('factorization', 0.045), ('decaying', 0.042), ('merge', 0.042), ('proposition', 0.041), ('digits', 0.04), ('walk', 0.04), ('autos', 0.04), ('baseball', 0.04), ('biq', 0.04), ('hil', 0.04), ('hjl', 0.04), ('uq', 0.04), ('graphs', 0.04), ('top', 0.039), ('nity', 0.039), ('normalized', 0.038), ('connecting', 0.037), ('period', 0.036), ('matrices', 0.036), ('kl', 0.036), ('mixture', 0.035), ('hierarchy', 0.035), ('ip', 0.035), ('gfc', 0.035), ('exposes', 0.035), ('jp', 0.035), ('employs', 0.032), ('tables', 0.032), ('objects', 0.031), ('induced', 0.029), ('log', 0.029), ('ml', 0.029), ('diagonal', 0.028), ('gl', 0.028), ('resolutions', 0.028), ('distance', 0.028), ('partitions', 0.027), ('edges', 0.026), ('xij', 0.026), ('yij', 0.026), ('mutual', 0.026), ('indicates', 0.026), ('images', 0.026), ('divergence', 0.026), ('stationary', 0.025), ('global', 0.024), ('suggests', 0.024), ('dm', 0.024), ('totally', 0.024), ('relations', 0.024), ('gradually', 0.023), ('diffusion', 0.023), ('correspond', 0.023), ('build', 0.023), ('panel', 0.023), ('auxiliary', 0.023), ('di', 0.022), ('degrees', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 178 nips-2005-Soft Clustering on Graphs
Author: Kai Yu, Shipeng Yu, Volker Tresp
Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1
2 0.22431719 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering
Author: Rong Jin, Feng Kang, Chris H. Ding
Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1
3 0.18213373 102 nips-2005-Kernelized Infomax Clustering
Author: David Barber, Felix V. Agakov
Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1
4 0.17809971 177 nips-2005-Size Regularized Cut for Data Clustering
Author: Yixin Chen, Ya Zhang, Xiang Ji
Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1
5 0.16882835 79 nips-2005-Fusion of Similarity Data in Clustering
Author: Tilman Lange, Joachim M. Buhmann
Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1
6 0.15478024 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
7 0.15218173 159 nips-2005-Q-Clustering
8 0.14732553 127 nips-2005-Mixture Modeling by Affinity Propagation
9 0.1429251 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
10 0.13798426 38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks
11 0.1375024 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
12 0.10473528 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
13 0.098247655 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
14 0.092769556 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
15 0.090003915 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition
16 0.087995686 84 nips-2005-Generalization in Clustering with Unobserved Features
17 0.086056724 105 nips-2005-Large-Scale Multiclass Transduction
18 0.085960306 75 nips-2005-Fixing two weaknesses of the Spectral Method
19 0.082807414 50 nips-2005-Convex Neural Networks
20 0.078840546 167 nips-2005-Robust design of biological experiments
topicId topicWeight
[(0, 0.227), (1, 0.128), (2, -0.111), (3, -0.091), (4, -0.402), (5, -0.101), (6, -0.011), (7, -0.014), (8, -0.207), (9, -0.016), (10, -0.085), (11, 0.057), (12, -0.038), (13, 0.039), (14, 0.106), (15, 0.002), (16, 0.098), (17, 0.044), (18, 0.061), (19, 0.077), (20, -0.091), (21, 0.12), (22, -0.095), (23, -0.012), (24, 0.001), (25, -0.057), (26, -0.005), (27, 0.026), (28, -0.045), (29, -0.009), (30, -0.032), (31, 0.004), (32, -0.071), (33, 0.024), (34, 0.075), (35, 0.016), (36, -0.059), (37, -0.079), (38, 0.095), (39, 0.098), (40, 0.081), (41, -0.019), (42, 0.077), (43, -0.045), (44, -0.034), (45, -0.082), (46, -0.097), (47, -0.038), (48, 0.051), (49, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.97929341 178 nips-2005-Soft Clustering on Graphs
Author: Kai Yu, Shipeng Yu, Volker Tresp
Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1
2 0.6695019 102 nips-2005-Kernelized Infomax Clustering
Author: David Barber, Felix V. Agakov
Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1
3 0.64201945 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering
Author: Rong Jin, Feng Kang, Chris H. Ding
Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1
4 0.62220258 159 nips-2005-Q-Clustering
Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes
Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
5 0.61829883 79 nips-2005-Fusion of Similarity Data in Clustering
Author: Tilman Lange, Joachim M. Buhmann
Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1
6 0.59678078 177 nips-2005-Size Regularized Cut for Data Clustering
7 0.54381913 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition
8 0.52338713 84 nips-2005-Generalization in Clustering with Unobserved Features
9 0.51021391 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
10 0.50017464 127 nips-2005-Mixture Modeling by Affinity Propagation
11 0.43088377 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
12 0.39895383 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation
13 0.38775721 38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks
14 0.37258062 75 nips-2005-Fixing two weaknesses of the Spectral Method
15 0.36296821 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
16 0.35705248 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
17 0.34414619 46 nips-2005-Consensus Propagation
18 0.32847732 167 nips-2005-Robust design of biological experiments
19 0.31833711 105 nips-2005-Large-Scale Multiclass Transduction
20 0.30817801 71 nips-2005-Fast Krylov Methods for N-Body Learning
topicId topicWeight
[(3, 0.081), (10, 0.034), (15, 0.238), (27, 0.028), (31, 0.074), (34, 0.11), (41, 0.046), (55, 0.044), (65, 0.014), (69, 0.041), (73, 0.067), (88, 0.065), (91, 0.055)]
simIndex simValue paperId paperTitle
1 0.8705762 127 nips-2005-Mixture Modeling by Affinity Propagation
Author: Brendan J. Frey, Delbert Dueck
Abstract: Clustering is a fundamental problem in machine learning and has been approached in many ways. Two general and quite different approaches include iteratively fitting a mixture model (e.g., using EM) and linking together pairs of training cases that have high affinity (e.g., using spectral methods). Pair-wise clustering algorithms need not compute sufficient statistics and avoid poor solutions by directly placing similar examples in the same cluster. However, many applications require that each cluster of data be accurately described by a prototype or model, so affinity-based clustering – and its benefits – cannot be directly realized. We describe a technique called “affinity propagation”, which combines the advantages of both approaches. The method learns a mixture model of the data by recursively propagating affinity messages. We demonstrate affinity propagation on the problems of clustering image patches for image segmentation and learning mixtures of gene expression models from microarray data. We find that affinity propagation obtains better solutions than mixtures of Gaussians, the K-medoids algorithm, spectral clustering and hierarchical clustering, and is both able to find a pre-specified number of clusters and is able to automatically determine the number of clusters. Interestingly, affinity propagation can be viewed as belief propagation in a graphical model that accounts for pairwise training case likelihood functions and the identification of cluster centers. 1
same-paper 2 0.85449219 178 nips-2005-Soft Clustering on Graphs
Author: Kai Yu, Shipeng Yu, Volker Tresp
Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1
3 0.61729813 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
Author: John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientific computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1
4 0.59880066 184 nips-2005-Structured Prediction via the Extragradient Method
Author: Ben Taskar, Simon Lacoste-Julian, Michael I. Jordan
Abstract: We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1
5 0.59731507 79 nips-2005-Fusion of Similarity Data in Clustering
Author: Tilman Lange, Joachim M. Buhmann
Abstract: Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1
6 0.59640849 78 nips-2005-From Weighted Classification to Policy Search
7 0.59343928 177 nips-2005-Size Regularized Cut for Data Clustering
8 0.59245151 47 nips-2005-Consistency of one-class SVM and related algorithms
9 0.59043616 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
10 0.59005278 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
11 0.58854663 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
12 0.58785242 58 nips-2005-Divergences, surrogate loss functions and experimental design
13 0.58769935 102 nips-2005-Kernelized Infomax Clustering
14 0.58622152 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
15 0.58569515 112 nips-2005-Learning Minimum Volume Sets
16 0.58341449 46 nips-2005-Consensus Propagation
17 0.58324325 144 nips-2005-Off-policy Learning with Options and Recognizers
18 0.58273262 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations
19 0.58248502 105 nips-2005-Large-Scale Multiclass Transduction
20 0.58103079 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs