nips nips2003 nips2003-152 knowledge-graph by maker-knowledge-mining

152 nips-2003-Pairwise Clustering and Graphical Models

Source: pdf

Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss

Abstract: Signiﬁcant progress in clustering has been achieved by algorithms that are based on pairwise afﬁnities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efﬁcient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difﬁcult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for afﬁnity matrices based on labeled data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il Abstract Signiﬁcant progress in clustering has been achieved by algorithms that are based on pairwise afﬁnities between the datapoints. [sent-17, score-0.373]

2 In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efﬁcient eigenvector calculations. [sent-18, score-0.306]

3 In this paper we use the previously proposed typical cut framework for pairwise clustering. [sent-20, score-0.714]

4 We show an equivalence between calculating the typical cut and inference in an undirected graphical model. [sent-21, score-0.833]

5 We show that for clustering problems with hundreds of datapoints exact inference may still be possible. [sent-22, score-0.511]

6 For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. [sent-23, score-0.5]

7 We also use graphical models to derive a learning algorithm for afﬁnity matrices based on labeled data. [sent-24, score-0.159]

8 Datasets of this type, where the two clusters are not easily described by a parametric model can be successfully clustered using pairwise clustering algorithms [4, 6, 3]. [sent-26, score-0.468]

9 These algorithms start by building a graph whose vertices correspond to datapoints and edges exist between nearby points with a weight that decreases with distance. [sent-27, score-0.251]

10 a Figure 1: Clustering as graph partitioning (following [8]). [sent-29, score-0.192]

11 Vertices correspond to datapoints and edges between adjacent pixels are weighted by the distance. [sent-30, score-0.17]

12 The minimal cut criterion ﬁnds clusterings that minimize cut(A, B). [sent-34, score-0.521]

13 The advantage of using the minimal cut criterion is that the optimal segmentation can be computed in polynomial time. [sent-35, score-0.599]

14 Since the cut value grows linearly with the number of edges cut, a single datapoint cut from its neighbors will often have a lower cut value than the desired clustering (e. [sent-37, score-1.501]

15 g, the minimal cut solution separates the full dot in ﬁg1, instead of the desired ‘N’ and ‘I’ clusters). [sent-38, score-0.475]

16 In order to avoid these trivial clusterings, several graph partitioning criteria have been proposed. [sent-39, score-0.223]

17 Shi and Malik suggested the normalized cut criterion which directly penalizes partitions where one of the groups is small, hence a separation of a single isolated datapoint is not favored. [sent-40, score-0.634]

18 Minimization of the normalized cut criterion is NP-Complete but it can be approximated using spectral methods. [sent-41, score-0.484]

19 Despite the success of spectral methods in a wide range of clustering problems, several problems remain. [sent-42, score-0.211]

20 The typical cut criterion, suggested by Blatt et al [1] and later by Gdalyahu et al [2], is based on a simple probabilistic model. [sent-45, score-0.701]

21 Thus performing MAP inference under this probability distribution will still lead to trivial segmentations. [sent-48, score-0.165]

22 For example, consider the pairwise correlation p(i, j) deﬁned for any two neighboring nodes in the graph as the probability that they belong to the same segment: p(i, j) = Pr(A, B)SAM E(i, j; A, B) (3) A,B with SAM E(i, j; A, B) deﬁned as 1 iff i ∈ A and j ∈ A or i ∈ B and j ∈ B. [sent-50, score-0.405]

23 Referring again, to the single isolated datapoint in ﬁgure 1, then while that datapoint and its neighbors do not appear in the same cluster in the most probable partition, they do appear in the same cluster for the vast majority of partitions. [sent-51, score-0.286]

24 Hence the typical cut algorithm of Blatt et al consists of three stages: • Preprocessing: Construct the afﬁnity matrix W so that each node will be connected to at most K neighbors. [sent-53, score-0.641]

25 • Estimating pairwise correlations: Use a Markov chain Monte-Carlo (MCMC) sampling method to estimate p(i, j) at each temperature T . [sent-55, score-0.318]

26 • Postprocessing: Deﬁne the typical cut partition as the connected components of the graph after removing any links for which p(i, j) < 1/2. [sent-56, score-0.704]

27 At zero temperature all the datapoints reside in one cluster (this trivially minimizes the cut value), and at high temperatures every datapoint forms a separate cluster. [sent-60, score-0.785]

28 In this paper we show that calculating the typical cut is equivalent to performing inference in an undirected graphical model. [sent-61, score-0.797]

29 We use this equivalence to show that in problems with hundreds of datapoints, the typical cut may be calculated exactly. [sent-62, score-0.613]

30 We show that when exact inference is impossible, loopy belief propagation (BP) and generalized belief propagation (GBP) may give an excellent approximation with very little computational cost. [sent-63, score-0.537]

31 Finally, we use the standard algorithm for ML estimation in graphical models to derive a learning algorithm for afﬁnity matrices based on labeled data 1 . [sent-64, score-0.159]

32 To connect this to typical cuts we ﬁrst deﬁne for every partition (A, B) a binary vector x such that x(i) = 0 if i ∈ A and x(i) = 1 if i ∈ B. [sent-66, score-0.273]

33 So far we have focused on partitioning the graph into two segments, but the equivalence holds for any number of segments q. [sent-68, score-0.293]

34 Let (A1 , A2 , · · · , Aq ) be a partitioning of the graph into q segments (note that these segments need not be connected in G). [sent-69, score-0.356]

35 Deﬁne cut(A1 , A2 , · · · Aq ) in direct analogy to equation 1, and: 1 1 (6) Pr((A1 , A2 , · · · , Aq ) = e− T cut(A1 ,A2 ,···,Aq ) Z The implication of observation 1 is that we can use the powerful tools of graphical models in the context of pairwise clustering. [sent-70, score-0.373]

36 In subsequent sections we provide examples of the beneﬁts of using graphical models to compute typical cuts. [sent-71, score-0.222]

37 3 Computing typical cuts using inference in a graphical model Typical cuts has been successfully used for clustering of datapoints in R n [1] using an expensive MCMC to calculate pairwise correlations, p(i, j). [sent-73, score-1.093]

38 Using inference algorithms we provide a deterministic and more efﬁcient estimate of p(i, j). [sent-74, score-0.134]

39 More speciﬁcally, we use inference algorithms to compute the pairwise beliefs over neighboring nodes b ij (xi , xj ), q and calculate the pairwise correlation as p(i, j) = t=1 bij (t, t). [sent-75, score-0.898]

40 In all other cases we must resort to approximate inference using the BP and the GBP algorithms. [sent-77, score-0.171]

41 The following subsections discuss exact and approximate inference for computing typical cuts. [sent-78, score-0.36]

42 1 Exact inference for typical cut clustering The nature of real life clustering problems seems to suggest that exact inference would be intractable due to the clique size of the junction tree. [sent-80, score-1.3]

43 As shown previously by Gdalyahu et al the typical cut criterion does sensible things: it does not favor segmentation of individual datapoints (as in minimal cut), nor is it fooled by narrow bridges between clusters (as in simple connected components). [sent-84, score-1.039]

44 However, while previous typical cut algorithms approximate p(i, j) using MCMC, in some cases using the framework of graphical model we can calculate p(i, j) exactly and efﬁciently. [sent-85, score-0.687]

45 In example (a) the pairwise correlations were calculated exactly, while in example (b) we used BP. [sent-89, score-0.396]

46 Approximate inference for typical cut clustering Although exact inference is shown to be possible, in the more common case it is infeasible, and p(i, j) can only be estimated using approximate inference algorithms. [sent-103, score-1.199]

47 In this section we discuss approximate inference using the BP and the GBP algorithms. [sent-104, score-0.171]

48 Approximate inference using Belief Propagation In BP the pairwise beliefs over neighboring nodes, bij , are deﬁned using the messages as: bij (xi , xj ) = αΨij (xi , xj ) mki (xi ) xk ∈N (xi )\xj mkj (xj ) xk ∈N (xj )\xi Can this be used as an approximation for pairwise clustering? [sent-105, score-1.033]

49 (7) Observation 2: In case where the messages are initialized uniformly the pairwise beliefs calculated by BP are only a function of the local potentials, i. [sent-106, score-0.36]

50 Proof: Due to the symmetry of the potentials and since the messages are initialized uniformly, all the messages in BP remain uniform. [sent-108, score-0.264]

51 Due to the symmetry of the potentials, if exact inference is used then conditioning on a single node xc = 1 and calculating conditional correlations P (xi = xj |xc = 1) should give exactly the same answer as the unconditional correlations p(i, j) = P (xi = xj ). [sent-112, score-0.832]

52 However, when BP inference is used, clamping the value of xc causes its outgoing messages to be nonuniform, and as these messages propagate through the graph they break the symmetry used in the proof of observation 2. [sent-113, score-0.546]

53 when the graph is disconnected) conditioning on a single point does not break the symmetry throughout the graph and additional points need to be clamped. [sent-117, score-0.346]

54 In order to evaluate the quality of the approximation provided by BP, we compared BP using conditioning and exact inference over the dataset shown in ﬁg 2a. [sent-118, score-0.243]

55 Each row presents the clustering solution of exact inference and BP, and a scatter plot of the correlations over all of the edges using the two methods. [sent-120, score-0.658]

56 At the “low” temperature the approximation almost coincides with the exact values, but at the “high” temperature BP over estimates the correlation values. [sent-121, score-0.309]

57 2 Loopy correlations Figure 3: Clustering results at a “low” temperature (upper row) and a “high” temperature (lower row). [sent-206, score-0.392]

58 The left and middle columns present clustering results of exact inference and of BP, respectively. [sent-207, score-0.381]

59 The right column compares the values of the correlations provided by the two methods. [sent-208, score-0.158]

60 At “low” temperature most of the correlations are close to 1, hence many edges appear as a single dot. [sent-210, score-0.336]

61 Approximate inference using Generalized Belief Propagation Generalized Belief Propagation algorithms (GBP) [10] extend the BP algorithm by sending messages that are functions of clusters of variables, and has been shown to provide a better approximation than BP in many applications. [sent-211, score-0.314]

62 Can GBP improve the approximation of pairwise correlations in typical cuts? [sent-212, score-0.473]

63 Our empirical studies show that the performance and convergence of GBP over a general graph obtained from arbitrary points in Rn , strongly depends on the initial choice of clusters (regions). [sent-213, score-0.209]

64 As also observed by Minka et al [5] a speciﬁc choice of clusters may yield worse results than BP, or may even cause GBP not to converge. [sent-214, score-0.189]

65 The clique size in a junction tree is of order 230 hence exact inference is infeasible. [sent-221, score-0.336]

66 We compare the correlations p(i, j) calculated using an extensive MCMC sampling procedure [9] to those calculated using GBP with the clusters being four neighboring pixels in the graph. [sent-222, score-0.415]

67 Figure 4c presents a comparison of the MCMC correlations with those calculated by GBP on a real 120x80 image shown in ﬁg 4b with afﬁnity based on color similarity. [sent-224, score-0.382]

68 Figure 4d presents the clustering results, which provides a segmentation of the image. [sent-225, score-0.327]

69 8 1 SW−MC (b) (c) (d) Figure 4: (a) Scatter plot of pairwise correlations in a 30x30 grid, using MCMC [9] and GBP. [sent-252, score-0.359]

70 Each dot corresponds to the pairwise correlation of one edge at a speciﬁc temperature. [sent-253, score-0.272]

71 Notice the excellent correspondence between GBP and MCMC (c) The same comparison performed over the image in (b). [sent-254, score-0.134]

72 4 Learning Afﬁnity Matrices from Labeled Datasets As noted in the introduction using graphical models to compute typical cuts, can also be advantageous for other aspects of the clustering problem, apart from computing p(i, j). [sent-256, score-0.394]

73 For example, in image segmentation where the nodes are pixels, one can deﬁne afﬁnity based on color similarity, texture similarity or some combination of the two. [sent-259, score-0.289]

74 Hence the afﬁnity between neighboring points i and j, is deﬁned by: W (i, j) = K k=1 αk fk (i, j). [sent-262, score-0.159]

75 In addition assume we are given a labeled training sample, which consists of the following: (i) A graph in which neighboring nodes are connected by edges. [sent-263, score-0.289]

76 This problem can be solved using the graphical model deﬁned by the typical cut probability distribution (Equation 6). [sent-267, score-0.621]

77 Recall that the probability of a partition x is deﬁned as P (x) = 1 1 −cut(x) e = e− Z Z (1−δ(xi −xj ))W (i,j) = 1 − e Z(α) K k=1 αk fcutk (x) (8) Where we have deﬁned: fcutk (x) = (1 − δ(xi − xj ))fk (i, j). [sent-268, score-0.487]

78 fcutk (x) is the cut value deﬁned by x when only taking into account the afﬁnity function fk , hence it can be computed using the training sample. [sent-269, score-0.707]

79 Differentiating the log likelihood with respect to α k gives the exponential family equation: ∂ ln P (x) = −fcutk (x)+ < fcutk >α ∂αk (9) Equation 9 gives an intuitive deﬁnition for the optimal α: the optimal α is the one for which < fcutk >α = fcutk (x), i. [sent-270, score-0.531]

80 e, for optimal α the expected values of the cuts for each feature separately, match exactly the values of these cuts in the training set. [sent-271, score-0.232]

81 To calculate the gradient explicitly, we use the linearity of expectation: < fcutk >α = < (1 − δ(yi − yj ) >α fk (i, j) = (1 − p(i, j)α )fk (i, j) Where p(i, j)α are the pairwise correlations for given values of α. [sent-273, score-0.669]

82 1 Combining learning and GBP approximate inference We experimented with the learning algorithm on images, with the pixels grid as the graph and using GBP for approximating p(i, j)α . [sent-276, score-0.347]

83 There is one training image (ﬁg 5a) but two different manual segmentations (ﬁg 5b,c). [sent-280, score-0.138]

84 Figure 5d shows a novel image and ﬁgures 5e,f show two different pairwise correlations of this image using the learned α. [sent-283, score-0.519]

85 Nevertheless, the segmentation based on the learned afﬁnity (ﬁgure 5e) ignores the shadows and segments the facial features from each other. [sent-290, score-0.243]

86 In contrast, a typical cut segmentation which uses a naive afﬁnity function (combining the three color channels with uniform weights) segments mostly based on shadows (ﬁgure 5f). [sent-291, score-0.819]

87 5 Discussion Pairwise clustering algorithms have a wide range of applicability due to their ability to ﬁnd clusters with arbitrary shapes. [sent-292, score-0.267]

88 In this paper we have shown how pairwise clustering can be (a) (b) (c) (a) (b) (c) (d) (e) (f) (d) (e) (f) Figure 5: Left pane: A synthetic example for learning the afﬁnity function. [sent-293, score-0.373]

89 The top row presents the training set: The input image (a), the clusters of the ﬁrst (b) and second (c) experiments. [sent-294, score-0.26]

90 The bottom row presents the result of the learning algorithm: The input image (d), the marginal probabilities p(i, j) (Eqn. [sent-295, score-0.165]

91 The top row shows the learning data set: The input image(a), the pre-processed image (b) and the manual segmentation (invariant to shadows) (c). [sent-298, score-0.232]

92 The bottom row presents, from left to right, the pre-processed test image (d), an edge map produced by learning the shadow-invariant afﬁnity (e) and an edge map produced by a naive afﬁnity function, combining the 3 color channels with uniform weights (f). [sent-299, score-0.26]

93 The edge maps were computed by thresholding the pairwise correlations p(i,j) (Eqn. [sent-300, score-0.397]

94 mapped to an inference problem in a graphical model. [sent-304, score-0.242]

95 This equivalence allowed us to use the standard tools of graphical models: exact and approximate inference and ML learning. [sent-305, score-0.422]

96 We showed how to combine approximate inference and ML learning in the challenging problem of learning afﬁnities for images from labeled data. [sent-306, score-0.251]

97 We have only begun to use the many tools of graphical models. [sent-307, score-0.14]

98 We are currently working on learning from unlabeled sets and on other approximate inference algorithms. [sent-308, score-0.171]

99 Self organization in vision: Stochastic clustering for image segmentation, perceptual grouping, and image database organization. [sent-319, score-0.332]

100 Learning and inferring image segmentations using the gbp typical cut. [sent-349, score-0.595]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cut', 0.399), ('gbp', 0.343), ('af', 0.299), ('nity', 0.233), ('bp', 0.205), ('pairwise', 0.201), ('fcutk', 0.177), ('clustering', 0.172), ('correlations', 0.158), ('inference', 0.134), ('temperature', 0.117), ('cuts', 0.116), ('typical', 0.114), ('graph', 0.114), ('segmentation', 0.111), ('graphical', 0.108), ('fk', 0.104), ('datapoints', 0.103), ('datapoint', 0.098), ('jerusalem', 0.097), ('nities', 0.096), ('blatt', 0.096), ('clusters', 0.095), ('mcmc', 0.091), ('xj', 0.09), ('pane', 0.089), ('messages', 0.085), ('image', 0.08), ('partitioning', 0.078), ('exact', 0.075), ('bij', 0.07), ('al', 0.067), ('aq', 0.067), ('gdalyahu', 0.067), ('shadows', 0.067), ('segments', 0.065), ('color', 0.063), ('shi', 0.062), ('propagation', 0.06), ('zomet', 0.058), ('meila', 0.058), ('segmentations', 0.058), ('neighboring', 0.055), ('belief', 0.054), ('excellent', 0.054), ('clique', 0.052), ('labeled', 0.051), ('symmetry', 0.049), ('junction', 0.048), ('xi', 0.048), ('hebrew', 0.047), ('criterion', 0.046), ('ij', 0.046), ('loopy', 0.046), ('potentials', 0.045), ('israel', 0.044), ('presents', 0.044), ('xc', 0.044), ('partition', 0.043), ('pr', 0.043), ('ml', 0.043), ('minimal', 0.043), ('undirected', 0.042), ('row', 0.041), ('spectral', 0.039), ('shental', 0.039), ('temperatures', 0.039), ('edge', 0.038), ('beliefs', 0.037), ('approximate', 0.037), ('calculated', 0.037), ('equivalence', 0.036), ('nodes', 0.035), ('break', 0.035), ('hertz', 0.035), ('connected', 0.034), ('conditioning', 0.034), ('edges', 0.034), ('pixels', 0.033), ('dot', 0.033), ('clusterings', 0.033), ('partitions', 0.032), ('equation', 0.032), ('tools', 0.032), ('isolated', 0.032), ('sw', 0.031), ('trivial', 0.031), ('gure', 0.03), ('datasets', 0.029), ('cluster', 0.029), ('experimented', 0.029), ('mc', 0.029), ('malik', 0.029), ('calculate', 0.029), ('images', 0.029), ('et', 0.027), ('hundreds', 0.027), ('minka', 0.027), ('sam', 0.027), ('hence', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 152 nips-2003-Pairwise Clustering and Graphical Models

Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss

2 0.26664162 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1

3 0.21521248 107 nips-2003-Learning Spectral Clustering

Author: Francis R. Bach, Michael I. Jordan

Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors. 1

4 0.20360959 117 nips-2003-Linear Response for Approximate Inference

Author: Max Welling, Yee W. Teh

Abstract: Belief propagation on cyclic graphs is an efﬁcient algorithm for computing approximate marginal probability distributions over single nodes and neighboring nodes in the graph. In this paper we propose two new algorithms for approximating joint probabilities of arbitrary pairs of nodes and prove a number of desirable properties that these estimates fulﬁll. The ﬁrst algorithm is a propagation algorithm which is shown to converge if belief propagation converges to a stable ﬁxed point. The second algorithm is based on matrix inversion. Experiments compare a number of competing methods.

5 0.15221249 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

Abstract: Loopy belief propagation (BP) has been successfully used in a number of diﬃcult graphical models to ﬁnd the most probable conﬁguration of the hidden variables. In applications ranging from protein folding to image analysis one would like to ﬁnd not just the best conﬁguration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of ﬁnding the M best conﬁgurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best conﬁgurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best conﬁgurations in graphs with hundreds of variables. 1

6 0.15104213 46 nips-2003-Clustering with the Connectivity Kernel

7 0.13962758 113 nips-2003-Learning with Local and Global Consistency

8 0.13063437 111 nips-2003-Learning the k in k-means

9 0.12804486 73 nips-2003-Feature Selection in Clustering Problems

10 0.11653785 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

11 0.11542852 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method

12 0.11499294 87 nips-2003-Identifying Structure across Pre-partitioned Data

13 0.11392259 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

14 0.10141076 169 nips-2003-Sample Propagation

15 0.093512021 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

16 0.092909522 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

17 0.090009116 32 nips-2003-Approximate Expectation Maximization

18 0.08677955 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

19 0.086690322 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

20 0.086270064 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.255), (1, -0.164), (2, -0.118), (3, 0.284), (4, -0.07), (5, -0.107), (6, -0.086), (7, 0.169), (8, -0.026), (9, 0.021), (10, 0.049), (11, 0.123), (12, -0.044), (13, -0.121), (14, -0.113), (15, -0.019), (16, -0.012), (17, 0.008), (18, -0.212), (19, -0.003), (20, 0.095), (21, 0.022), (22, 0.072), (23, -0.068), (24, -0.083), (25, -0.01), (26, 0.022), (27, -0.067), (28, 0.046), (29, -0.036), (30, 0.064), (31, -0.095), (32, 0.056), (33, -0.025), (34, 0.005), (35, 0.066), (36, -0.073), (37, 0.106), (38, 0.016), (39, -0.01), (40, -0.043), (41, 0.119), (42, 0.043), (43, -0.013), (44, -0.03), (45, 0.028), (46, -0.095), (47, 0.042), (48, 0.035), (49, -0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96791965 152 nips-2003-Pairwise Clustering and Graphical Models

Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss

2 0.7184127 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

3 0.68799275 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

4 0.58181268 117 nips-2003-Linear Response for Approximate Inference

Author: Max Welling, Yee W. Teh

5 0.56972277 46 nips-2003-Clustering with the Connectivity Kernel

Author: Bernd Fischer, Volker Roth, Joachim M. Buhmann

Abstract: Clustering aims at extracting hidden structure in dataset. While the problem of ﬁnding compact clusters has been widely studied in the literature, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm which tackles the problem by a two step procedure: ﬁrst the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactness-based criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactness-based criteria naturally transfer to the problem of extracting elongated structures, leading to a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally N Phard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or mean-shift clustering. 1

6 0.55040085 107 nips-2003-Learning Spectral Clustering

7 0.50801933 169 nips-2003-Sample Propagation

8 0.47952941 73 nips-2003-Feature Selection in Clustering Problems

9 0.44167483 111 nips-2003-Learning the k in k-means

10 0.42233798 32 nips-2003-Approximate Expectation Maximization

11 0.4113991 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

12 0.39323315 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

13 0.39274549 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method

14 0.38701427 113 nips-2003-Learning with Local and Global Consistency

15 0.38242155 87 nips-2003-Identifying Structure across Pre-partitioned Data

16 0.36980787 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

17 0.36406139 126 nips-2003-Measure Based Regularization

18 0.36263192 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

19 0.35303649 30 nips-2003-Approximability of Probability Distributions

20 0.34830037 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.039), (11, 0.039), (29, 0.012), (30, 0.023), (35, 0.051), (53, 0.113), (59, 0.015), (68, 0.28), (69, 0.012), (71, 0.082), (76, 0.072), (85, 0.048), (91, 0.12), (99, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85136771 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science

Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung

Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be diﬃcult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns eﬃciently. We demonstrate the approach using two localist network models of speech perception. 1

same-paper 2 0.84853232 152 nips-2003-Pairwise Clustering and Graphical Models

Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss

3 0.80587101 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman

Abstract: Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-speciﬁc image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a signiﬁcant improvement over other common multiple-image super-resolution techniques. 1

4 0.63304281 107 nips-2003-Learning Spectral Clustering

Author: Francis R. Bach, Michael I. Jordan

5 0.58540738 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difﬁcult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1

6 0.58523363 30 nips-2003-Approximability of Probability Distributions

7 0.58417094 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

8 0.58412904 81 nips-2003-Geometric Analysis of Constrained Curves

9 0.5827527 126 nips-2003-Measure Based Regularization

10 0.58182883 113 nips-2003-Learning with Local and Global Consistency

11 0.58163118 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

12 0.58058959 78 nips-2003-Gaussian Processes in Reinforcement Learning

13 0.5792861 73 nips-2003-Feature Selection in Clustering Problems

14 0.57884806 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

15 0.57875568 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

16 0.57856387 189 nips-2003-Tree-structured Approximations by Expectation Propagation

17 0.57781851 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

18 0.57628345 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

19 0.5760017 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering

20 0.57566142 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications