nips nips2003 nips2003-152 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss
Abstract: Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 il Abstract Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. [sent-17, score-0.373]
2 In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. [sent-18, score-0.306]
3 In this paper we use the previously proposed typical cut framework for pairwise clustering. [sent-20, score-0.714]
4 We show an equivalence between calculating the typical cut and inference in an undirected graphical model. [sent-21, score-0.833]
5 We show that for clustering problems with hundreds of datapoints exact inference may still be possible. [sent-22, score-0.511]
6 For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. [sent-23, score-0.5]
7 We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data. [sent-24, score-0.159]
8 Datasets of this type, where the two clusters are not easily described by a parametric model can be successfully clustered using pairwise clustering algorithms [4, 6, 3]. [sent-26, score-0.468]
9 These algorithms start by building a graph whose vertices correspond to datapoints and edges exist between nearby points with a weight that decreases with distance. [sent-27, score-0.251]
10 a Figure 1: Clustering as graph partitioning (following [8]). [sent-29, score-0.192]
11 Vertices correspond to datapoints and edges between adjacent pixels are weighted by the distance. [sent-30, score-0.17]
12 The minimal cut criterion finds clusterings that minimize cut(A, B). [sent-34, score-0.521]
13 The advantage of using the minimal cut criterion is that the optimal segmentation can be computed in polynomial time. [sent-35, score-0.599]
14 Since the cut value grows linearly with the number of edges cut, a single datapoint cut from its neighbors will often have a lower cut value than the desired clustering (e. [sent-37, score-1.501]
15 g, the minimal cut solution separates the full dot in fig1, instead of the desired ‘N’ and ‘I’ clusters). [sent-38, score-0.475]
16 In order to avoid these trivial clusterings, several graph partitioning criteria have been proposed. [sent-39, score-0.223]
17 Shi and Malik suggested the normalized cut criterion which directly penalizes partitions where one of the groups is small, hence a separation of a single isolated datapoint is not favored. [sent-40, score-0.634]
18 Minimization of the normalized cut criterion is NP-Complete but it can be approximated using spectral methods. [sent-41, score-0.484]
19 Despite the success of spectral methods in a wide range of clustering problems, several problems remain. [sent-42, score-0.211]
20 The typical cut criterion, suggested by Blatt et al [1] and later by Gdalyahu et al [2], is based on a simple probabilistic model. [sent-45, score-0.701]
21 Thus performing MAP inference under this probability distribution will still lead to trivial segmentations. [sent-48, score-0.165]
22 For example, consider the pairwise correlation p(i, j) defined for any two neighboring nodes in the graph as the probability that they belong to the same segment: p(i, j) = Pr(A, B)SAM E(i, j; A, B) (3) A,B with SAM E(i, j; A, B) defined as 1 iff i ∈ A and j ∈ A or i ∈ B and j ∈ B. [sent-50, score-0.405]
23 Referring again, to the single isolated datapoint in figure 1, then while that datapoint and its neighbors do not appear in the same cluster in the most probable partition, they do appear in the same cluster for the vast majority of partitions. [sent-51, score-0.286]
24 Hence the typical cut algorithm of Blatt et al consists of three stages: • Preprocessing: Construct the affinity matrix W so that each node will be connected to at most K neighbors. [sent-53, score-0.641]
25 • Estimating pairwise correlations: Use a Markov chain Monte-Carlo (MCMC) sampling method to estimate p(i, j) at each temperature T . [sent-55, score-0.318]
26 • Postprocessing: Define the typical cut partition as the connected components of the graph after removing any links for which p(i, j) < 1/2. [sent-56, score-0.704]
27 At zero temperature all the datapoints reside in one cluster (this trivially minimizes the cut value), and at high temperatures every datapoint forms a separate cluster. [sent-60, score-0.785]
28 In this paper we show that calculating the typical cut is equivalent to performing inference in an undirected graphical model. [sent-61, score-0.797]
29 We use this equivalence to show that in problems with hundreds of datapoints, the typical cut may be calculated exactly. [sent-62, score-0.613]
30 We show that when exact inference is impossible, loopy belief propagation (BP) and generalized belief propagation (GBP) may give an excellent approximation with very little computational cost. [sent-63, score-0.537]
31 Finally, we use the standard algorithm for ML estimation in graphical models to derive a learning algorithm for affinity matrices based on labeled data 1 . [sent-64, score-0.159]
32 To connect this to typical cuts we first define for every partition (A, B) a binary vector x such that x(i) = 0 if i ∈ A and x(i) = 1 if i ∈ B. [sent-66, score-0.273]
33 So far we have focused on partitioning the graph into two segments, but the equivalence holds for any number of segments q. [sent-68, score-0.293]
34 Let (A1 , A2 , · · · , Aq ) be a partitioning of the graph into q segments (note that these segments need not be connected in G). [sent-69, score-0.356]
35 Define cut(A1 , A2 , · · · Aq ) in direct analogy to equation 1, and: 1 1 (6) Pr((A1 , A2 , · · · , Aq ) = e− T cut(A1 ,A2 ,···,Aq ) Z The implication of observation 1 is that we can use the powerful tools of graphical models in the context of pairwise clustering. [sent-70, score-0.373]
36 In subsequent sections we provide examples of the benefits of using graphical models to compute typical cuts. [sent-71, score-0.222]
37 3 Computing typical cuts using inference in a graphical model Typical cuts has been successfully used for clustering of datapoints in R n [1] using an expensive MCMC to calculate pairwise correlations, p(i, j). [sent-73, score-1.093]
38 Using inference algorithms we provide a deterministic and more efficient estimate of p(i, j). [sent-74, score-0.134]
39 More specifically, we use inference algorithms to compute the pairwise beliefs over neighboring nodes b ij (xi , xj ), q and calculate the pairwise correlation as p(i, j) = t=1 bij (t, t). [sent-75, score-0.898]
40 In all other cases we must resort to approximate inference using the BP and the GBP algorithms. [sent-77, score-0.171]
41 The following subsections discuss exact and approximate inference for computing typical cuts. [sent-78, score-0.36]
42 1 Exact inference for typical cut clustering The nature of real life clustering problems seems to suggest that exact inference would be intractable due to the clique size of the junction tree. [sent-80, score-1.3]
43 As shown previously by Gdalyahu et al the typical cut criterion does sensible things: it does not favor segmentation of individual datapoints (as in minimal cut), nor is it fooled by narrow bridges between clusters (as in simple connected components). [sent-84, score-1.039]
44 However, while previous typical cut algorithms approximate p(i, j) using MCMC, in some cases using the framework of graphical model we can calculate p(i, j) exactly and efficiently. [sent-85, score-0.687]
45 In example (a) the pairwise correlations were calculated exactly, while in example (b) we used BP. [sent-89, score-0.396]
46 Approximate inference for typical cut clustering Although exact inference is shown to be possible, in the more common case it is infeasible, and p(i, j) can only be estimated using approximate inference algorithms. [sent-103, score-1.199]
47 In this section we discuss approximate inference using the BP and the GBP algorithms. [sent-104, score-0.171]
48 Approximate inference using Belief Propagation In BP the pairwise beliefs over neighboring nodes, bij , are defined using the messages as: bij (xi , xj ) = αΨij (xi , xj ) mki (xi ) xk ∈N (xi )\xj mkj (xj ) xk ∈N (xj )\xi Can this be used as an approximation for pairwise clustering? [sent-105, score-1.033]
49 (7) Observation 2: In case where the messages are initialized uniformly the pairwise beliefs calculated by BP are only a function of the local potentials, i. [sent-106, score-0.36]
50 Proof: Due to the symmetry of the potentials and since the messages are initialized uniformly, all the messages in BP remain uniform. [sent-108, score-0.264]
51 Due to the symmetry of the potentials, if exact inference is used then conditioning on a single node xc = 1 and calculating conditional correlations P (xi = xj |xc = 1) should give exactly the same answer as the unconditional correlations p(i, j) = P (xi = xj ). [sent-112, score-0.832]
52 However, when BP inference is used, clamping the value of xc causes its outgoing messages to be nonuniform, and as these messages propagate through the graph they break the symmetry used in the proof of observation 2. [sent-113, score-0.546]
53 when the graph is disconnected) conditioning on a single point does not break the symmetry throughout the graph and additional points need to be clamped. [sent-117, score-0.346]
54 In order to evaluate the quality of the approximation provided by BP, we compared BP using conditioning and exact inference over the dataset shown in fig 2a. [sent-118, score-0.243]
55 Each row presents the clustering solution of exact inference and BP, and a scatter plot of the correlations over all of the edges using the two methods. [sent-120, score-0.658]
56 At the “low” temperature the approximation almost coincides with the exact values, but at the “high” temperature BP over estimates the correlation values. [sent-121, score-0.309]
57 2 Loopy correlations Figure 3: Clustering results at a “low” temperature (upper row) and a “high” temperature (lower row). [sent-206, score-0.392]
58 The left and middle columns present clustering results of exact inference and of BP, respectively. [sent-207, score-0.381]
59 The right column compares the values of the correlations provided by the two methods. [sent-208, score-0.158]
60 At “low” temperature most of the correlations are close to 1, hence many edges appear as a single dot. [sent-210, score-0.336]
61 Approximate inference using Generalized Belief Propagation Generalized Belief Propagation algorithms (GBP) [10] extend the BP algorithm by sending messages that are functions of clusters of variables, and has been shown to provide a better approximation than BP in many applications. [sent-211, score-0.314]
62 Can GBP improve the approximation of pairwise correlations in typical cuts? [sent-212, score-0.473]
63 Our empirical studies show that the performance and convergence of GBP over a general graph obtained from arbitrary points in Rn , strongly depends on the initial choice of clusters (regions). [sent-213, score-0.209]
64 As also observed by Minka et al [5] a specific choice of clusters may yield worse results than BP, or may even cause GBP not to converge. [sent-214, score-0.189]
65 The clique size in a junction tree is of order 230 hence exact inference is infeasible. [sent-221, score-0.336]
66 We compare the correlations p(i, j) calculated using an extensive MCMC sampling procedure [9] to those calculated using GBP with the clusters being four neighboring pixels in the graph. [sent-222, score-0.415]
67 Figure 4c presents a comparison of the MCMC correlations with those calculated by GBP on a real 120x80 image shown in fig 4b with affinity based on color similarity. [sent-224, score-0.382]
68 Figure 4d presents the clustering results, which provides a segmentation of the image. [sent-225, score-0.327]
69 8 1 SW−MC (b) (c) (d) Figure 4: (a) Scatter plot of pairwise correlations in a 30x30 grid, using MCMC [9] and GBP. [sent-252, score-0.359]
70 Each dot corresponds to the pairwise correlation of one edge at a specific temperature. [sent-253, score-0.272]
71 Notice the excellent correspondence between GBP and MCMC (c) The same comparison performed over the image in (b). [sent-254, score-0.134]
72 4 Learning Affinity Matrices from Labeled Datasets As noted in the introduction using graphical models to compute typical cuts, can also be advantageous for other aspects of the clustering problem, apart from computing p(i, j). [sent-256, score-0.394]
73 For example, in image segmentation where the nodes are pixels, one can define affinity based on color similarity, texture similarity or some combination of the two. [sent-259, score-0.289]
74 Hence the affinity between neighboring points i and j, is defined by: W (i, j) = K k=1 αk fk (i, j). [sent-262, score-0.159]
75 In addition assume we are given a labeled training sample, which consists of the following: (i) A graph in which neighboring nodes are connected by edges. [sent-263, score-0.289]
76 This problem can be solved using the graphical model defined by the typical cut probability distribution (Equation 6). [sent-267, score-0.621]
77 Recall that the probability of a partition x is defined as P (x) = 1 1 −cut(x) e = e− Z Z (1−δ(xi −xj ))W (i,j) = 1 − e Z(α) K k=1 αk fcutk (x) (8) Where we have defined: fcutk (x) = (1 − δ(xi − xj ))fk (i, j). [sent-268, score-0.487]
78 fcutk (x) is the cut value defined by x when only taking into account the affinity function fk , hence it can be computed using the training sample. [sent-269, score-0.707]
79 Differentiating the log likelihood with respect to α k gives the exponential family equation: ∂ ln P (x) = −fcutk (x)+ < fcutk >α ∂αk (9) Equation 9 gives an intuitive definition for the optimal α: the optimal α is the one for which < fcutk >α = fcutk (x), i. [sent-270, score-0.531]
80 e, for optimal α the expected values of the cuts for each feature separately, match exactly the values of these cuts in the training set. [sent-271, score-0.232]
81 To calculate the gradient explicitly, we use the linearity of expectation: < fcutk >α = < (1 − δ(yi − yj ) >α fk (i, j) = (1 − p(i, j)α )fk (i, j) Where p(i, j)α are the pairwise correlations for given values of α. [sent-273, score-0.669]
82 1 Combining learning and GBP approximate inference We experimented with the learning algorithm on images, with the pixels grid as the graph and using GBP for approximating p(i, j)α . [sent-276, score-0.347]
83 There is one training image (fig 5a) but two different manual segmentations (fig 5b,c). [sent-280, score-0.138]
84 Figure 5d shows a novel image and figures 5e,f show two different pairwise correlations of this image using the learned α. [sent-283, score-0.519]
85 Nevertheless, the segmentation based on the learned affinity (figure 5e) ignores the shadows and segments the facial features from each other. [sent-290, score-0.243]
86 In contrast, a typical cut segmentation which uses a naive affinity function (combining the three color channels with uniform weights) segments mostly based on shadows (figure 5f). [sent-291, score-0.819]
87 5 Discussion Pairwise clustering algorithms have a wide range of applicability due to their ability to find clusters with arbitrary shapes. [sent-292, score-0.267]
88 In this paper we have shown how pairwise clustering can be (a) (b) (c) (a) (b) (c) (d) (e) (f) (d) (e) (f) Figure 5: Left pane: A synthetic example for learning the affinity function. [sent-293, score-0.373]
89 The top row presents the training set: The input image (a), the clusters of the first (b) and second (c) experiments. [sent-294, score-0.26]
90 The bottom row presents the result of the learning algorithm: The input image (d), the marginal probabilities p(i, j) (Eqn. [sent-295, score-0.165]
91 The top row shows the learning data set: The input image(a), the pre-processed image (b) and the manual segmentation (invariant to shadows) (c). [sent-298, score-0.232]
92 The bottom row presents, from left to right, the pre-processed test image (d), an edge map produced by learning the shadow-invariant affinity (e) and an edge map produced by a naive affinity function, combining the 3 color channels with uniform weights (f). [sent-299, score-0.26]
93 The edge maps were computed by thresholding the pairwise correlations p(i,j) (Eqn. [sent-300, score-0.397]
94 mapped to an inference problem in a graphical model. [sent-304, score-0.242]
95 This equivalence allowed us to use the standard tools of graphical models: exact and approximate inference and ML learning. [sent-305, score-0.422]
96 We showed how to combine approximate inference and ML learning in the challenging problem of learning affinities for images from labeled data. [sent-306, score-0.251]
97 We have only begun to use the many tools of graphical models. [sent-307, score-0.14]
98 We are currently working on learning from unlabeled sets and on other approximate inference algorithms. [sent-308, score-0.171]
99 Self organization in vision: Stochastic clustering for image segmentation, perceptual grouping, and image database organization. [sent-319, score-0.332]
100 Learning and inferring image segmentations using the gbp typical cut. [sent-349, score-0.595]
wordName wordTfidf (topN-words)
[('cut', 0.399), ('gbp', 0.343), ('af', 0.299), ('nity', 0.233), ('bp', 0.205), ('pairwise', 0.201), ('fcutk', 0.177), ('clustering', 0.172), ('correlations', 0.158), ('inference', 0.134), ('temperature', 0.117), ('cuts', 0.116), ('typical', 0.114), ('graph', 0.114), ('segmentation', 0.111), ('graphical', 0.108), ('fk', 0.104), ('datapoints', 0.103), ('datapoint', 0.098), ('jerusalem', 0.097), ('nities', 0.096), ('blatt', 0.096), ('clusters', 0.095), ('mcmc', 0.091), ('xj', 0.09), ('pane', 0.089), ('messages', 0.085), ('image', 0.08), ('partitioning', 0.078), ('exact', 0.075), ('bij', 0.07), ('al', 0.067), ('aq', 0.067), ('gdalyahu', 0.067), ('shadows', 0.067), ('segments', 0.065), ('color', 0.063), ('shi', 0.062), ('propagation', 0.06), ('zomet', 0.058), ('meila', 0.058), ('segmentations', 0.058), ('neighboring', 0.055), ('belief', 0.054), ('excellent', 0.054), ('clique', 0.052), ('labeled', 0.051), ('symmetry', 0.049), ('junction', 0.048), ('xi', 0.048), ('hebrew', 0.047), ('criterion', 0.046), ('ij', 0.046), ('loopy', 0.046), ('potentials', 0.045), ('israel', 0.044), ('presents', 0.044), ('xc', 0.044), ('partition', 0.043), ('pr', 0.043), ('ml', 0.043), ('minimal', 0.043), ('undirected', 0.042), ('row', 0.041), ('spectral', 0.039), ('shental', 0.039), ('temperatures', 0.039), ('edge', 0.038), ('beliefs', 0.037), ('approximate', 0.037), ('calculated', 0.037), ('equivalence', 0.036), ('nodes', 0.035), ('break', 0.035), ('hertz', 0.035), ('connected', 0.034), ('conditioning', 0.034), ('edges', 0.034), ('pixels', 0.033), ('dot', 0.033), ('clusterings', 0.033), ('partitions', 0.032), ('equation', 0.032), ('tools', 0.032), ('isolated', 0.032), ('sw', 0.031), ('trivial', 0.031), ('gure', 0.03), ('datasets', 0.029), ('cluster', 0.029), ('experimented', 0.029), ('mc', 0.029), ('malik', 0.029), ('calculate', 0.029), ('images', 0.029), ('et', 0.027), ('hundreds', 0.027), ('minka', 0.027), ('sam', 0.027), ('hence', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 152 nips-2003-Pairwise Clustering and Graphical Models
Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss
Abstract: Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data. 1
2 0.26664162 189 nips-2003-Tree-structured Approximations by Expectation Propagation
Author: Yuan Qi, Tom Minka
Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1
3 0.21521248 107 nips-2003-Learning Spectral Clustering
Author: Francis R. Bach, Michael I. Jordan
Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors. 1
4 0.20360959 117 nips-2003-Linear Response for Approximate Inference
Author: Max Welling, Yee W. Teh
Abstract: Belief propagation on cyclic graphs is an efficient algorithm for computing approximate marginal probability distributions over single nodes and neighboring nodes in the graph. In this paper we propose two new algorithms for approximating joint probabilities of arbitrary pairs of nodes and prove a number of desirable properties that these estimates fulfill. The first algorithm is a propagation algorithm which is shown to converge if belief propagation converges to a stable fixed point. The second algorithm is based on matrix inversion. Experiments compare a number of competing methods.
5 0.15221249 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation
Author: Chen Yanover, Yair Weiss
Abstract: Loopy belief propagation (BP) has been successfully used in a number of difficult graphical models to find the most probable configuration of the hidden variables. In applications ranging from protein folding to image analysis one would like to find not just the best configuration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of finding the M best configurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best configurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best configurations in graphs with hundreds of variables. 1
6 0.15104213 46 nips-2003-Clustering with the Connectivity Kernel
7 0.13962758 113 nips-2003-Learning with Local and Global Consistency
8 0.13063437 111 nips-2003-Learning the k in k-means
9 0.12804486 73 nips-2003-Feature Selection in Clustering Problems
10 0.11653785 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data
11 0.11542852 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method
12 0.11499294 87 nips-2003-Identifying Structure across Pre-partitioned Data
13 0.11392259 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
14 0.10141076 169 nips-2003-Sample Propagation
15 0.093512021 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
16 0.092909522 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation
17 0.090009116 32 nips-2003-Approximate Expectation Maximization
18 0.08677955 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
19 0.086690322 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
20 0.086270064 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
topicId topicWeight
[(0, -0.255), (1, -0.164), (2, -0.118), (3, 0.284), (4, -0.07), (5, -0.107), (6, -0.086), (7, 0.169), (8, -0.026), (9, 0.021), (10, 0.049), (11, 0.123), (12, -0.044), (13, -0.121), (14, -0.113), (15, -0.019), (16, -0.012), (17, 0.008), (18, -0.212), (19, -0.003), (20, 0.095), (21, 0.022), (22, 0.072), (23, -0.068), (24, -0.083), (25, -0.01), (26, 0.022), (27, -0.067), (28, 0.046), (29, -0.036), (30, 0.064), (31, -0.095), (32, 0.056), (33, -0.025), (34, 0.005), (35, 0.066), (36, -0.073), (37, 0.106), (38, 0.016), (39, -0.01), (40, -0.043), (41, 0.119), (42, 0.043), (43, -0.013), (44, -0.03), (45, 0.028), (46, -0.095), (47, 0.042), (48, 0.035), (49, -0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.96791965 152 nips-2003-Pairwise Clustering and Graphical Models
Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss
Abstract: Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data. 1
2 0.7184127 189 nips-2003-Tree-structured Approximations by Expectation Propagation
Author: Yuan Qi, Tom Minka
Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1
3 0.68799275 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation
Author: Chen Yanover, Yair Weiss
Abstract: Loopy belief propagation (BP) has been successfully used in a number of difficult graphical models to find the most probable configuration of the hidden variables. In applications ranging from protein folding to image analysis one would like to find not just the best configuration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of finding the M best configurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best configurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best configurations in graphs with hundreds of variables. 1
4 0.58181268 117 nips-2003-Linear Response for Approximate Inference
Author: Max Welling, Yee W. Teh
Abstract: Belief propagation on cyclic graphs is an efficient algorithm for computing approximate marginal probability distributions over single nodes and neighboring nodes in the graph. In this paper we propose two new algorithms for approximating joint probabilities of arbitrary pairs of nodes and prove a number of desirable properties that these estimates fulfill. The first algorithm is a propagation algorithm which is shown to converge if belief propagation converges to a stable fixed point. The second algorithm is based on matrix inversion. Experiments compare a number of competing methods.
5 0.56972277 46 nips-2003-Clustering with the Connectivity Kernel
Author: Bernd Fischer, Volker Roth, Joachim M. Buhmann
Abstract: Clustering aims at extracting hidden structure in dataset. While the problem of finding compact clusters has been widely studied in the literature, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algorithm which tackles the problem by a two step procedure: first the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactness-based criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactness-based criteria naturally transfer to the problem of extracting elongated structures, leading to a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally N Phard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or mean-shift clustering. 1
6 0.55040085 107 nips-2003-Learning Spectral Clustering
7 0.50801933 169 nips-2003-Sample Propagation
8 0.47952941 73 nips-2003-Feature Selection in Clustering Problems
9 0.44167483 111 nips-2003-Learning the k in k-means
10 0.42233798 32 nips-2003-Approximate Expectation Maximization
11 0.4113991 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
12 0.39323315 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering
13 0.39274549 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method
14 0.38701427 113 nips-2003-Learning with Local and Global Consistency
15 0.38242155 87 nips-2003-Identifying Structure across Pre-partitioned Data
16 0.36980787 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds
17 0.36406139 126 nips-2003-Measure Based Regularization
18 0.36263192 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data
19 0.35303649 30 nips-2003-Approximability of Probability Distributions
20 0.34830037 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
topicId topicWeight
[(0, 0.039), (11, 0.039), (29, 0.012), (30, 0.023), (35, 0.051), (53, 0.113), (59, 0.015), (68, 0.28), (69, 0.012), (71, 0.082), (76, 0.072), (85, 0.048), (91, 0.12), (99, 0.023)]
simIndex simValue paperId paperTitle
1 0.85136771 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung
Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. We demonstrate the approach using two localist network models of speech perception. 1
same-paper 2 0.84853232 152 nips-2003-Pairwise Clustering and Graphical Models
Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss
Abstract: Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data. 1
3 0.80587101 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution
Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman
Abstract: Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-specific image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a significant improvement over other common multiple-image super-resolution techniques. 1
4 0.63304281 107 nips-2003-Learning Spectral Clustering
Author: Francis R. Bach, Michael I. Jordan
Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors. 1
5 0.58540738 112 nips-2003-Learning to Find Pre-Images
Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir
Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1
6 0.58523363 30 nips-2003-Approximability of Probability Distributions
7 0.58417094 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
8 0.58412904 81 nips-2003-Geometric Analysis of Constrained Curves
9 0.5827527 126 nips-2003-Measure Based Regularization
10 0.58182883 113 nips-2003-Learning with Local and Global Consistency
11 0.58163118 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
12 0.58058959 78 nips-2003-Gaussian Processes in Reinforcement Learning
13 0.5792861 73 nips-2003-Feature Selection in Clustering Problems
14 0.57884806 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
15 0.57875568 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
16 0.57856387 189 nips-2003-Tree-structured Approximations by Expectation Propagation
17 0.57781851 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
18 0.57628345 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data
19 0.5760017 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering
20 0.57566142 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications