nips nips2005 nips2005-178 knowledge-graph by maker-knowledge-mining

178 nips-2005-Soft Clustering on Graphs

Source: pdf

Author: Kai Yu, Shipeng Yu, Volker Tresp

Abstract: We propose a simple clustering framework on graphs encoding pairwise data similarities. Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i.e., a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. Finally we provide very encouraging experimental results. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We propose a simple clustering framework on graphs encoding pairwise data similarities. [sent-8, score-0.34]

2 Unlike usual similarity-based methods, the approach softly assigns data to clusters in a probabilistic way. [sent-9, score-0.193]

3 More importantly, a hierarchical clustering is naturally derived in this framework to gradually merge lower-level clusters into higher-level ones. [sent-10, score-0.685]

4 A random walk analysis indicates that the algorithm exposes clustering structures in various resolutions, i. [sent-11, score-0.401]

5 , a higher level statistically models a longer-term diffusion on graphs and thus discovers a more global clustering structure. [sent-13, score-0.444]

6 , normalized cut [5]) requires no assumption on data densities but simply a similarity function, and usually partitions data exclusively into clusters. [sent-19, score-0.22]

7 In contrast, model-based methods apply mixture models to ﬁt data distributions and assign data to clusters (i. [sent-20, score-0.228]

8 This soft clustering is often desired, as it encodes uncertainties on datato-cluster assignments. [sent-23, score-0.368]

9 clusters have to be Gaussian-like in Gaussian mixture models (GMMs). [sent-26, score-0.228]

10 In contrast to ﬂat clustering, hierarchical clustering makes intuitive senses by forming a tree of clusters. [sent-27, score-0.427]

11 This paper suggests a novel graph-factorization clustering (GFC) framework that employs data’s afﬁnities and meanwhile partitions data probabilistically. [sent-32, score-0.383]

12 A hierarchical clustering algorithm (HGFC) is further derived by merging lower-level clusters into higher-level ones. [sent-33, score-0.62]

13 Analysis based on graph random walks suggests that our clustering method models data afﬁnities as empirical transitions generated by a mixture of latent factors. [sent-34, score-0.759]

14 This view signiﬁcantly differs from conventional model-based clustering since here the mixture model is not directly for data objects but for their relations. [sent-35, score-0.366]

15 Interestingly, we prove that the higher-level clusters are associated with longer-term diffusive transitions on the graph, amounting to smoother and more global similarity functions on the data mani- fold. [sent-37, score-0.447]

16 To the best of our knowledge, this property has never been considered by other agglomerative hierarchical clustering algorithms (e. [sent-39, score-0.479]

17 In the following section we describe a clustering algorithm based on similarity graphs. [sent-43, score-0.404]

18 2 Graph-factorization clustering (GFC) Data similarity relations can be conveniently encoded by a graph, where vertices denote data objects and adjacency weights represent data similarities. [sent-50, score-0.764]

19 This section introduces graph factorization clustering, which is a probabilistic partition of graph vertices. [sent-51, score-0.429]

20 Formally, let G(V, E) be a weighted undirected graph with vertices V = {vi }n and edges i=1 E ⊆ {(vi , vj )}. [sent-52, score-0.55]

21 Let W = {wij } be the adjacency matrix, where wij = wji , wij > 0 if (vi , vj ) ∈ E and wij = 0 otherwise. [sent-53, score-1.081]

22 For instances, wij can be computed by the RBF similarity function based on the features of objects i and j, or by a binary indicator (0 or 1) of the k-nearest neighbor afﬁnity. [sent-54, score-0.351]

23 1 Bipartite graphs Before presenting the main idea, it is necessary to introduce bipartite graphs. [sent-56, score-0.21]

24 Let B = {bip } denote the n × m adjacency matrix with bip ≥ 0 being the weight for edge [vi , up ]. [sent-61, score-0.386]

25 The bipartite graph K induces a similarity between v1 and vj [6] m wij = bip bjp = BΛ−1 B λp p=1 , ij Λ = diag(λ1 , . [sent-62, score-1.184]

26 , λm ) (1) n where λp = i=1 bip denotes the degree of vertex up ∈ U. [sent-65, score-0.212]

27 wij is essentially a quantity proportional to the stationary probability of direct transitions between vi and vj , denoted by p(vi , vj ). [sent-68, score-1.15]

28 Without loss of generality, we normalize W to ensure ij wij = 1 and wij = p(vi , vj ). [sent-69, score-0.843]

29 For a bipartite graph K(V, U, F), there is no direct links between vertices in V, and all the paths from vi to vj must go through vertices in U. [sent-70, score-1.119]

30 This indicates p(vi , vj ) = p(vi )p(vj |vi ) = di p(up |vi )p(vj |up ) = p p p(vi , up )p(up , vj ) , λp where p(vj |vi ) is the conditional transition probability from vi to vj , and di = p(vi ) the degree of vi . [sent-71, score-1.406]

31 2 Graph factorization by bipartite graph construction For a bipartite graph K, p(up |vi ) = bip /di tells the conditional probability of transitions from vi to up . [sent-75, score-1.356]

32 This property suggests that one can construct a bipartite graph K(V, U, F) to approximate a given G(V, E), and then obtain a soft clustering structure, where U corresponds to clusters (see Fig. [sent-77, score-0.947]

33 (a) (b) (c) Figure 1: (a) The original graph representing data afﬁnities; (b) The bipartite graph representing data-to-cluster relations; (c) The induced cluster afﬁnities. [sent-79, score-0.672]

34 (1) suggests that this approximation can be done by minimizing (W, BΛ−1 B ), given a distance (·, ·) between two adjacency matrices. [sent-81, score-0.255]

35 To make the problem easy to solve, we remove the coupling between B and Λ via H = BΛ−1 and then have n min H,Λ W, HΛH , hip = 1, H ∈ Rn×m , Λ ∈ Dm×m , + + s. [sent-82, score-0.19]

36 For divergence distance (X, Y) = ij (xij log yij − xij + yij ), the cost function in Eq. [sent-89, score-0.295]

37 (2) is non-increasing under the update rule ( ˜ denote updated quantities) · ˜ hip ∝ hip j ˜ λp ∝ λp ij wij λp hjp , (HΛH )ij wij hip hjp , (HΛH )ij ˜ hip = 1; normalize s. [sent-90, score-1.493]

38 (4) ij The distance is invariant under the update if and only if H and Λ are at a stationary point. [sent-95, score-0.187]

39 Similar to GMM, p(up |vi ) = bip / q biq is the soft probabilistic assignment of vertex vi to cluster up . [sent-97, score-0.732]

40 , for k-nearest neighbor graph the complexity O(m2 nk) scales linearly with sample size n). [sent-102, score-0.192]

41 3 Hierarchical graph-factorization clustering (HGFC) As a nice property of the proposed graph factorization, a natural afﬁnity between two clusters up and uq can be computed as n p(up , uq ) = i=1 bip biq = B D−1 B di , pq D = diag(d1 , . [sent-103, score-0.965]

42 Note that the similarity between clusters p and q takes into account a weighted average of contributions from all the data (see Fig. [sent-108, score-0.297]

43 Let G0 (V0 , E0 ) be the initial graph describing the similarities of totally m0 = n data points, with adjacency matrix W0 . [sent-110, score-0.538]

44 Based on G0 we can build a bipartite graph K1 (V0 , V1 , F1 ), with m1 < m0 vertices in V1 . [sent-111, score-0.487]

45 A hierarchical clustering method can be motivated from the observation that the cluster similarity in Eq. [sent-112, score-0.62]

46 (5) suggests a new adjacency matrix W1 for graph G1 (V1 , E1 ), where V1 is formed by clusters, and E1 contains edges connecting these clusters. [sent-113, score-0.527]

47 Then we can group those clusters by constructing another bipartite graph K2 (V1 , V2 , F2 ) with m2 < m1 vertices in V2 , such that W1 is again factorized as in Eq. [sent-114, score-0.657]

48 Algorithm 1 Hierarchical Graph-Factorization Clustering (HGFC) Require: given n data objects and a similarity measure 1: build the similarity graph G0 (V0 , E0 ) with adjacency matrix W0 , and let m0 = n 2: for l = 1, 2, . [sent-118, score-0.702]

49 For level l, we can assign data to the obtained ml clusters via a propagation from the bottom level of clusters. [sent-122, score-0.336]

50 , probabilistic) assignment of vi ∈ (l) V0 to cluster vp ∈ Vl is given by (l) p vp |vi = (l) ¯ p vp |v (l−1) · · · p v (1) |vi = D−1 Bl 1 ··· v (l−1) ∈V l−1 v (1) ∈V ip , (6) 1 ¯ where Bl = B1 D−1 B2 D−1 B3 . [sent-125, score-0.588]

51 One can interpret this by deriving an equiv2 3 l ¯ ¯ ¯ alent bipartite graph Kl (V0 , Vl , Fl ), and treating Bl as the equivalent adjacency matrix ¯ attached to the equivalent edges Fl connecting data V0 and clusters Vl . [sent-129, score-0.866]

52 1 Analysis of the proposed algorithms Flat clustering: statistical modeling of single-hop transitions In this section we provide some insights to the suggested clustering algorithm, mainly from the perspective of random walks on graphs. [sent-131, score-0.508]

53 Suppose that from a stationary stage of random walks on G(V, E), one observes πij single-hop transitions between vi and vj in a unitary time frame. [sent-132, score-0.786]

54 Thus we connect the observed similarities to the frequency of transitions via wij ∝ πij . [sent-134, score-0.416]

55 sampled from a true distribution p(vi , vj ) = (HΛH )ij where a bipartite graph is behind, then the log likelihood with respect to the observed transitions is p(vi , vj )πij ∝ L(H, Λ) = log ij wij log(HΛH )ij . [sent-138, score-1.356]

56 For a weighted undirected graph G(V, E) and the log likelihood deﬁned in Eq. [sent-141, score-0.221]

57 Figure 2: The similarities of vertices to a ﬁxed vertex (marked in the left panel) on a 6nearest-neighbor graph, respectively induced by clustering level l = 2 (the middle panel) and l = 6 (the right panel). [sent-145, score-0.636]

58 2 Hierarchical clustering: statistical modeling of multi-hop transitions The adjacency matrix W0 of G0 (V0 , E0 ) only models one-hop transitions that follow direct links from vertices to their neighbors. [sent-148, score-0.602]

59 Obviously, multihop transitions induce a slowly decaying similarity function on the graph. [sent-151, score-0.272]

60 Based on the chain rule of Markov process, the equivalent adjacency matrix for t-hop transitions is At = W0 (D−1 W0 )t−1 = At−1 D−1 W0 . [sent-152, score-0.374]

61 (8) 0 0 Generally speaking, a slowly decaying similarity function on the similarity graph captures a global afﬁnity structure of data manifolds, while a rapidly decaying similarity function only tells the local afﬁnity structure. [sent-153, score-0.612]

62 The following proposition states that in the suggested HGFC, a higher-level clustering implicitly employs a more global similarity measure caused by multi-hop Markov random walks: Proposition 4. [sent-154, score-0.501]

63 For a given hierarchical clustering structure that starts from a bottom graph G0 (V0 , E0 ) to a higher level Gk (Vk , Ek ), the vertices Vl at level 0 < l ≤ k induces an equivalent adjacency matrix of V0 , which is At with t = 2l−1 as deﬁned in Eq. [sent-156, score-1.083]

64 Therefore the presented hierarchical clustering algorithm HGFC applies different sizes of time windows to examine random walks, and derives different scales of similarity measures to expose the local and global clustering structures of data manifolds. [sent-158, score-0.855]

65 2 illustrates the employed similarities of vertices to a ﬁxed vertex in clustering levels l = 2 and 6, which corresponds to time periods t = 2 and 32. [sent-160, score-0.55]

66 It can be seen that for a short period t = 2, the similarity is very local and helps to uncover low-level clusters, while in a longer period t = 32 the similarity function is rather global. [sent-161, score-0.28]

67 We compare HGFC with two popular agglomerative hierarchical clustering algorithms, single link and complete link (e. [sent-169, score-0.625]

68 Both methods merge two closest clusters at each step. [sent-172, score-0.235]

69 Figure 4: Comparison of clustering methods on USPS (left) and Newsgroup (right), evaluated by normalized mutual information (NMI). [sent-175, score-0.364]

70 Single link deﬁnes the cluster distance to be the smallest point-wise distance between two clusters, while complete link uses the largest one. [sent-177, score-0.291]

71 Before showing the comparison, we visualize a part of clustering results for USPS data in Fig. [sent-181, score-0.3]

72 On top of the left ﬁgure, we show the top three levels of the hierarchy with respectively 4, 10 and 20 clusters, where each cluster is represented by its mean image via an average over all the images weighted by their posterior probabilities of belonging to this cluster. [sent-183, score-0.228]

73 Then 10 randomly sampled digits with soft cluster assignments to the top 3rd level clusters are illustrated with a Hinton graph. [sent-184, score-0.535]

74 3 show the assignments between clusters across the hierarchy. [sent-186, score-0.242]

75 “1” “2” “3” “4” 635 2 2 10 Normalized cut 630 1 3 4 744 179 1 817 4 6 1 835 1254 1 1 4 HGFC 3 8 886 33 4 816 8 2 4 9 3 838 1265 17 10 58 K-means 1 0 720 95 9 796 20 0 3 97 9 774 Table 1: Confusion matrices of clustering results, 4 clusters, USPS data. [sent-188, score-0.387]

76 baseball hockey Normalized cut 858 98 30 79 893 16 44 33 875 11 8 893 2 5 40 85 772 42 15 7 HGFC 182 13 934 5 33 843 21 11 21 12 101 958 977 985 39 16 K-means 7 4 3 5 835 114 4 900 0 0 4 77 Table 2: Confusion matrices of clustering results, 4 clusters, Newsgroup data. [sent-191, score-0.427]

77 We compare the clustering methods by evaluating the normalized mutual information (NMI) in Fig. [sent-193, score-0.364]

78 It is deﬁned to be the mutual information between clusters and true classes, normalized by the maximum of marginal entropies. [sent-195, score-0.257]

79 Moreover, in order to more directly assess the clustering quality, we also illustrate the confusion matrices in Table 1 and Table 2, in the case of producing 4 clusters. [sent-196, score-0.433]

80 We drop out the confusion matrices of single link and complete link in the tables, for saving spaces and also due to their clearly poor performance compared with others. [sent-197, score-0.279]

81 For the Newsgroup data it even gets stuck at the 3601-th merge because all the similarities between clusters are 0. [sent-200, score-0.309]

82 Top-down hierarchical normalized cut obtains reasonable results, but sometimes cannot split one big cluster (see the tables). [sent-201, score-0.305]

83 It also produces confusion matrices with clear diagonal structures (see tables 1 and 2), which indicates a very good clustering quality. [sent-205, score-0.519]

84 6 Conclusion and Future Work In this paper we have proposed a probabilistic graph partition method for clustering data objects based on their pairwise similarities. [sent-206, score-0.523]

85 A novel hierarchical clustering algorithm HGFC has been derived, where a higher level in HGFC corresponds to a statistical model of random walk transitions in a longer period, giving rise to a more global clustering structure. [sent-207, score-0.974]

86 In this paper we have empirically speciﬁed the number of clusters in each level. [sent-209, score-0.193]

87 Another direction is hierarchical clustering on directed graphs, as well as its applications in web mining. [sent-211, score-0.427]

88 We ﬁrst notice that p λp = ij wij under constraints i hip = 1. [sent-214, score-0.54]

89 Therefore we can normalize W by ij wij and after convergence multiply all λp by this quantity to get the solution. [sent-215, score-0.397]

90 Under this assumption we are maximizing L(H, Λ) = ij wij log(HΛH )ij with an extra constraint p λp = 1. [sent-216, score-0.35]

91 Deﬁne f (H, H∗ ) = ij wij p h∗ λp h∗ jp ip ∗ ∗ l hil λl hjl ∗ log hip λp hjp − log h∗ λp h∗ jp ip ∗ ∗ l hil λl hjl . [sent-221, score-0.958]

92 (3) by setting the derivative of f with respect to hip to be zero. [sent-224, score-0.19]

93 In the E-step we estimate the a posteriori probability of taking up for pair (vi , vj ) using Bayes’ rule: p(up |vi , vj ) ∝ ˆ p(vi |up )p(vj |up )p(up ). [sent-233, score-0.46]

94 And then in the M-step we maximize the “complete” data likelihood ˆ ˆ L(G) = p p(up |vi , vj ) log p(vi |up )p(vj |up )p(up ) with respect to model parameters ij wij hip = p(vi |up ) and λp = p(up ), with constraints i hip = 1 and p λp = 1. [sent-234, score-0.989]

95 By setting the corˆ ˆ responding derivatives to zero we obtain hip ∝ j wij p(up |vi , vj ) and λp ∝ ij wij p(up |vi , vj ). [sent-235, score-1.216]

96 (6)) with adjacency matrix Bl , degrees D0 for V0 , and ¯ ¯ ¯ degrees Λl for Vl . [sent-243, score-0.292]

97 In this case the induced adjacency matrix of V0 is Wl = Bl Λ−1 Bl , and l ¯ l D−1 Bl . [sent-244, score-0.277]

98 Let Kl (Vl , Vl+1 , Fl+1 ) be the bipartite graph ¯ the adjacency matrix of Vl is Wl = B 0 connecting Vl and Vl+1 , with the adjacency Bl+1 and degrees Λl+1 for Vl+1 . [sent-245, score-0.872]

99 Then the adjacency ¯ ¯ ¯ ¯ ¯ matrix of V0 induced by level l + 1 is Wl+1 = Bl Λ−1 Bl+1 Λ−1 Bl+1 Λ−1 Bl = Wl D−1 Wl , 0 l l+1 l −1 ¯ l D−1 Bl and Wl = Bl Λ−1 Bl are applied. [sent-246, score-0.334]

100 Interpreting and extending classical agglomerative clustering algorithms using a model-based approach. [sent-269, score-0.352]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vi', 0.323), ('clustering', 0.3), ('bl', 0.29), ('hgfc', 0.278), ('vj', 0.23), ('wij', 0.216), ('vl', 0.208), ('adjacency', 0.203), ('clusters', 0.193), ('graph', 0.192), ('hip', 0.19), ('bipartite', 0.17), ('bip', 0.138), ('ij', 0.134), ('hierarchical', 0.127), ('transitions', 0.126), ('wl', 0.105), ('similarity', 0.104), ('vertices', 0.102), ('confusion', 0.097), ('ht', 0.09), ('cluster', 0.089), ('usps', 0.088), ('walks', 0.082), ('fl', 0.079), ('similarities', 0.074), ('vertex', 0.074), ('link', 0.073), ('af', 0.072), ('newsgroup', 0.069), ('soft', 0.068), ('hjp', 0.06), ('level', 0.057), ('nmi', 0.052), ('agglomerative', 0.052), ('cut', 0.051), ('hinton', 0.049), ('assignments', 0.049), ('vp', 0.047), ('nities', 0.047), ('normalize', 0.047), ('matrix', 0.045), ('factorization', 0.045), ('decaying', 0.042), ('merge', 0.042), ('proposition', 0.041), ('digits', 0.04), ('walk', 0.04), ('autos', 0.04), ('baseball', 0.04), ('biq', 0.04), ('hil', 0.04), ('hjl', 0.04), ('uq', 0.04), ('graphs', 0.04), ('top', 0.039), ('nity', 0.039), ('normalized', 0.038), ('connecting', 0.037), ('period', 0.036), ('matrices', 0.036), ('kl', 0.036), ('mixture', 0.035), ('hierarchy', 0.035), ('ip', 0.035), ('gfc', 0.035), ('exposes', 0.035), ('jp', 0.035), ('employs', 0.032), ('tables', 0.032), ('objects', 0.031), ('induced', 0.029), ('log', 0.029), ('ml', 0.029), ('diagonal', 0.028), ('gl', 0.028), ('resolutions', 0.028), ('distance', 0.028), ('partitions', 0.027), ('edges', 0.026), ('xij', 0.026), ('yij', 0.026), ('mutual', 0.026), ('indicates', 0.026), ('images', 0.026), ('divergence', 0.026), ('stationary', 0.025), ('global', 0.024), ('suggests', 0.024), ('dm', 0.024), ('totally', 0.024), ('relations', 0.024), ('gradually', 0.023), ('diffusion', 0.023), ('correspond', 0.023), ('build', 0.023), ('panel', 0.023), ('auxiliary', 0.023), ('di', 0.022), ('degrees', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 178 nips-2005-Soft Clustering on Graphs

Author: Kai Yu, Shipeng Yu, Volker Tresp

2 0.22431719 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering

Author: Rong Jin, Feng Kang, Chris H. Ding

Abstract: Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named “Soft Cut”. It improves the normalized cut algorithm by introducing soft membership, and can be efﬁciently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm. 1

3 0.18213373 102 nips-2005-Kernelized Infomax Clustering

Author: David Barber, Felix V. Agakov

Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of speciﬁcally constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to speciﬁc unknown vectors in the feature space. The method may be conveniently applied to learning the optimal aﬃnity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1

4 0.17809971 177 nips-2005-Size Regularized Cut for Data Clustering

Author: Yixin Chen, Ya Zhang, Xiang Ji

Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is deﬁned as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1

5 0.16882835 79 nips-2005-Fusion of Similarity Data in Clustering

Author: Tilman Lange, Joachim M. Buhmann

Abstract: Fusing multiple information sources can yield signiﬁcant beneﬁts to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets. 1

6 0.15478024 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

7 0.15218173 159 nips-2005-Q-Clustering

8 0.14732553 127 nips-2005-Mixture Modeling by Affinity Propagation

9 0.1429251 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

10 0.13798426 38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks

11 0.1375024 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

12 0.10473528 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm

13 0.098247655 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

14 0.092769556 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

15 0.090003915 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

16 0.087995686 84 nips-2005-Generalization in Clustering with Unobserved Features

17 0.086056724 105 nips-2005-Large-Scale Multiclass Transduction

18 0.085960306 75 nips-2005-Fixing two weaknesses of the Spectral Method

19 0.082807414 50 nips-2005-Convex Neural Networks

20 0.078840546 167 nips-2005-Robust design of biological experiments

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.227), (1, 0.128), (2, -0.111), (3, -0.091), (4, -0.402), (5, -0.101), (6, -0.011), (7, -0.014), (8, -0.207), (9, -0.016), (10, -0.085), (11, 0.057), (12, -0.038), (13, 0.039), (14, 0.106), (15, 0.002), (16, 0.098), (17, 0.044), (18, 0.061), (19, 0.077), (20, -0.091), (21, 0.12), (22, -0.095), (23, -0.012), (24, 0.001), (25, -0.057), (26, -0.005), (27, 0.026), (28, -0.045), (29, -0.009), (30, -0.032), (31, 0.004), (32, -0.071), (33, 0.024), (34, 0.075), (35, 0.016), (36, -0.059), (37, -0.079), (38, 0.095), (39, 0.098), (40, 0.081), (41, -0.019), (42, 0.077), (43, -0.045), (44, -0.034), (45, -0.082), (46, -0.097), (47, -0.038), (48, 0.051), (49, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97929341 178 nips-2005-Soft Clustering on Graphs

Author: Kai Yu, Shipeng Yu, Volker Tresp

2 0.6695019 102 nips-2005-Kernelized Infomax Clustering

Author: David Barber, Felix V. Agakov

3 0.64201945 13 nips-2005-A Probabilistic Approach for Optimizing Spectral Clustering

Author: Rong Jin, Feng Kang, Chris H. Ding

4 0.62220258 159 nips-2005-Q-Clustering

Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes

Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two speciﬁc criteria that we consider in this paper are the single linkage and the minimum description length criteria. The ﬁrst criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the ﬁrst time that a tractable algorithm for ﬁnding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application speciﬁc criterion for which efﬁcient algorithm are not known.

5 0.61829883 79 nips-2005-Fusion of Similarity Data in Clustering

Author: Tilman Lange, Joachim M. Buhmann

6 0.59678078 177 nips-2005-Size Regularized Cut for Data Clustering

7 0.54381913 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

8 0.52338713 84 nips-2005-Generalization in Clustering with Unobserved Features

9 0.51021391 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm

10 0.50017464 127 nips-2005-Mixture Modeling by Affinity Propagation

11 0.43088377 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

12 0.39895383 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

13 0.38775721 38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks

14 0.37258062 75 nips-2005-Fixing two weaknesses of the Spectral Method

15 0.36296821 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning

16 0.35705248 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning

17 0.34414619 46 nips-2005-Consensus Propagation

18 0.32847732 167 nips-2005-Robust design of biological experiments

19 0.31833711 105 nips-2005-Large-Scale Multiclass Transduction

20 0.30817801 71 nips-2005-Fast Krylov Methods for N-Body Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.081), (10, 0.034), (15, 0.238), (27, 0.028), (31, 0.074), (34, 0.11), (41, 0.046), (55, 0.044), (65, 0.014), (69, 0.041), (73, 0.067), (88, 0.065), (91, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8705762 127 nips-2005-Mixture Modeling by Affinity Propagation

Author: Brendan J. Frey, Delbert Dueck

Abstract: Clustering is a fundamental problem in machine learning and has been approached in many ways. Two general and quite different approaches include iteratively ﬁtting a mixture model (e.g., using EM) and linking together pairs of training cases that have high afﬁnity (e.g., using spectral methods). Pair-wise clustering algorithms need not compute sufﬁcient statistics and avoid poor solutions by directly placing similar examples in the same cluster. However, many applications require that each cluster of data be accurately described by a prototype or model, so afﬁnity-based clustering – and its beneﬁts – cannot be directly realized. We describe a technique called “afﬁnity propagation”, which combines the advantages of both approaches. The method learns a mixture model of the data by recursively propagating afﬁnity messages. We demonstrate afﬁnity propagation on the problems of clustering image patches for image segmentation and learning mixtures of gene expression models from microarray data. We ﬁnd that afﬁnity propagation obtains better solutions than mixtures of Gaussians, the K-medoids algorithm, spectral clustering and hierarchical clustering, and is both able to ﬁnd a pre-speciﬁed number of clusters and is able to automatically determine the number of clusters. Interestingly, afﬁnity propagation can be viewed as belief propagation in a graphical model that accounts for pairwise training case likelihood functions and the identiﬁcation of cluster centers. 1

same-paper 2 0.85449219 178 nips-2005-Soft Clustering on Graphs

Author: Kai Yu, Shipeng Yu, Volker Tresp

3 0.61729813 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

Author: John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientiﬁc computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean ﬁeld approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1

4 0.59880066 184 nips-2005-Structured Prediction via the Extragradient Method

Author: Ben Taskar, Simon Lacoste-Julian, Michael I. Jordan

Abstract: We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic ﬂow. This makes the approach an efﬁcient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1

5 0.59731507 79 nips-2005-Fusion of Similarity Data in Clustering

Author: Tilman Lange, Joachim M. Buhmann

6 0.59640849 78 nips-2005-From Weighted Classification to Policy Search

7 0.59343928 177 nips-2005-Size Regularized Cut for Data Clustering

8 0.59245151 47 nips-2005-Consistency of one-class SVM and related algorithms

9 0.59043616 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

10 0.59005278 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

11 0.58854663 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines

12 0.58785242 58 nips-2005-Divergences, surrogate loss functions and experimental design

13 0.58769935 102 nips-2005-Kernelized Infomax Clustering

14 0.58622152 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

15 0.58569515 112 nips-2005-Learning Minimum Volume Sets

16 0.58341449 46 nips-2005-Consensus Propagation

17 0.58324325 144 nips-2005-Off-policy Learning with Options and Recognizers

18 0.58273262 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations

19 0.58248502 105 nips-2005-Large-Scale Multiclass Transduction

20 0.58103079 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs