nips nips2000 nips2000-79 knowledge-graph by maker-knowledge-mining

79 nips-2000-Learning Segmentation by Random Walks

Source: pdf

Author: Marina Meila, Jianbo Shi

Abstract: We present a new view of image segmentation by pairwise similarities. We interpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. This interpretation shows that spectral methods for clustering and segmentation have a probabilistic foundation. In particular, we prove that the Normalized Cut method arises naturally from our framework. Finally, the framework provides a principled method for learning the similarity function as a combination of features. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a new view of image segmentation by pairwise similarities. [sent-5, score-0.391]

2 We interpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. [sent-6, score-0.68]

3 This interpretation shows that spectral methods for clustering and segmentation have a probabilistic foundation. [sent-7, score-0.608]

4 Finally, the framework provides a principled method for learning the similarity function as a combination of features. [sent-9, score-0.16]

5 1 Introduction This paper focuses on pairwise (or similarity-based) clustering and image segmentation. [sent-10, score-0.288]

6 In contrast to statistical clustering methods, that assume a probabilistic model that generates the observed data points (or pixels), pairwise clustering defines a similarity function between pairs of points and then formulates a criterion (e. [sent-11, score-0.544]

7 maximum total intracluster similarity) that the clustering must optimize. [sent-13, score-0.119]

8 The optimality criteria quantify the intuitive notion that points in a cluster (or pixels in a segment) are similar, whereas points in different clusters are dissimilar. [sent-14, score-0.227]

9 An increasingly popular approach to similarity based clustering and segmentation is by spectral methods. [sent-15, score-0.628]

10 These methods use eigenvalues and eigenvectors of a matrix constructed from the pairwise similarity function. [sent-16, score-0.433]

11 Spectral methods are sometimes regarded as continuous approximations of previously formulated discrete graph theoretical criteria as in image segmentation method of [9], or as in the web clustering method of [4, 2]. [sent-17, score-0.576]

12 In spite of their practical successes, spectral methods are still incompletely understood. [sent-19, score-0.204]

13 The main achievement of this work is to show that there is a simple probabilistic interpretation that can offer insights and serve as an analysis tool for all the spectral methods cited above. [sent-20, score-0.29]

14 We view the pairwise similarities as edge flows in a Markov random walk and study the properties of the eigenvectors and values of the resulting transition matrix. [sent-21, score-0.667]

15 Using this view, we were able to show that several of the above methods are subsumed by the Normalized Cut (NCut) segmentation algorithm of [9] in a sense that will be described. [sent-22, score-0.289]

16 Therefore, in the following, we will focus on the NCut algorithm and will adopt the terminology of image segmentation (i. [sent-23, score-0.343]

17 the data points are pixels and the set of all pixels is the image), keeping in mind that all the results are also valid for similarity based clustering. [sent-25, score-0.314]

18 A probabilistic interpretation of NCut as a Markov random walk not only sheds new lights on why and how spectral methods work in segmentation, but also offers a principled way of learning the similarity function. [sent-26, score-0.593]

19 A segmented image can provide a "target" transition matrix to which a learning algorithm matches in KL divergence the "learned" transition probabilities. [sent-27, score-0.394]

20 The latter are output by a model as a function of a set of features measured from the training image. [sent-28, score-0.039]

21 Experimental results on learning segmenting objects with smooth and rounded shape is described in section 6. [sent-30, score-0.146]

22 2 The Normalized Cut criterion and algorithm Here and in the following, an image will be represented by a set of pixels I. [sent-31, score-0.288]

23 A segmentation is a partioning of I into mutually disjoint subsets. [sent-32, score-0.273]

24 For each pair of pixels i, j E I a similarity Sij = Sji ~ 0 is given. [sent-33, score-0.198]

25 In the NCut framework the similarities Sij are viewed as weights on the edges ij of a graph G over I. [sent-34, score-0.163]

26 The matrix S = [Sij] plays the role of a "real-valued" adjacency matrix for G. [sent-35, score-0.14]

27 Let d i = LjEf Sij, called the degree of node i, and the volume of a set A c I be vol A = LiEA d i · The set of edges between A and its complement A is an edge cut or shortly a cut. [sent-36, score-0.335]

28 The normalized cut (NCut) criterion of [9] is a graph theoretical criterion for segmenting an image into two by minimizing (1) over all cuts A, A. [sent-37, score-0.489]

29 Minimizing NCut means finding a cut ofrelatively small weight between two subsets with strong internal connections. [sent-38, score-0.119]

30 The NCut algorithm was introduced in [9] as a continuous approximation for solving the discrete minimum NCut problem by way of eigenvalues and eigenvectors. [sent-40, score-0.109]

31 It uses the Laplacian matrix L = D - S where D is a diagonal matrix formed with the degrees of the nodes. [sent-41, score-0.14]

32 The algorithm consists of solving the generalized eigenvalues/vectors problem Lx = )'Dx (2) The NCut algorithm focuses on the second smallest eigenvalue of (2) and its corresponding eigenvector, call them ),L and xL respectively. [sent-42, score-0.164]

33 In [9] it is shown that when there is a partitioning of A, A of I such that L Xi _ - {a, (3, i E A i EA (3) then A, A is the optimal NCut and the value of the cut itself is NCut(A, A) This result represents the basis of spectral segmentation by normalized cuts. [sent-43, score-0.604]

34 One solves the generalized spectral problem (2), then finds a partitioning of the elements of xL into two sets containing roughly equal values. [sent-44, score-0.276]

35 The partitioning can be done by thresholding the elements. [sent-45, score-0.066]

36 The partitioning of the eigenvector induces a partition on I which is the desired segmentation. [sent-46, score-0.203]

37 As presented above, the NCut algorithm lacks a satisfactory intuitive explanation. [sent-47, score-0.061]

38 In particular, the NCut algorithm and criterion offer little intuition about (1) what causes xL to be piecewise constant? [sent-48, score-0.249]

39 (2) what happens when there are more than two segments and (3) how does the algorithm degrade its performance when xL is not piecewise constant? [sent-49, score-0.144]

40 The random walk interpretation that we describe now will answer the first two questions as well as give a better understanding of what spectral clustering is achieving. [sent-50, score-0.537]

41 We shall not approach the third issue here: instead, we point to the results of [2] that apply to the NCut algorithm as well. [sent-51, score-0.037]

42 3 Markov walks and normalized cuts By "normalizing" the similarity matrix S one obtains the stochastic matrix P = D- 1 S (4) whose row sums are all 1. [sent-52, score-0.456]

43 As it is known from the theory of Markov random walks, Pij represents the probability of moving from node i to j in one step, given that we are in i. [sent-53, score-0.077]

44 The first eigenvector of P is Xl =1, the vector whose elements are all Is. [sent-60, score-0.094]

45 Let us now examine the spectral problem for the matrix P, namely the solutions of the equation (5) Px AX = Proposition 1 If A, solutions of (2). [sent-65, score-0.293]

46 X are solutions of (5) and P = D- 1 S, then (1 - A), x are In other words, the NCut algorithm and the matrix P have the same eigenvectors; the eigenvalues of P are identical to 1 minues the generalized eigenvalues in (2). [sent-66, score-0.304]

47 Proposition 1 shows the equivalence between the spectral problem formulated by the NCut algorithm and the eigenvalues/vectors of the stochastic matrix P. [sent-67, score-0.347]

48 This also helps explaining why the NCut algorithm uses the second smallest generalized eigenvector: the smallest eigenvector of (2) corresponds to the largest eigenvector of P, which in most cases of interest is equal to 1 thus containing no information. [sent-68, score-0.338]

49 The NCut criterion can also be understood in this framework. [sent-69, score-0.077]

50 If the chain is ergodic, which happens under mild conditions [1], then ? [sent-77, score-0.035]

51 roo is the only distribution over I with this property. [sent-78, score-0.092]

52 Note also that the Markov chain is reversible because ? [sent-79, score-0.035]

53 Define PAB = Pr[A -+ BIA] as the probability of the random walk transitioning from set A c I to set B C I in one step if the current state is in A and the random walk is started in its stationary distribution. [sent-82, score-0.406]

54 EiEA,jEB Sij vol(A) (6) From this it follows that NCut(A, A) = P AA + P AA (7) If the NCut is small for a certain partition A, A then it means that the probabilities of evading set A, once the walk is in it and of evading its complement A are both small. [sent-85, score-0.355]

55 Intuitively, we have partioned the set I into two parts such that the random walk, once in one of the parts, tends to remain in it. [sent-86, score-0.049]

56 The NCut is strongly related to a the concept of low conductivity sets in a Markov random walk. [sent-87, score-0.118]

57 A low conductivity set A is a subset of I such that h(A) = max( P AA , P AA ) is small. [sent-88, score-0.069]

58 They have been studied in spectral graph theory in connection with the mixing time of Markov random walks [1]. [sent-89, score-0.381]

59 More recently, [2] uses them to define a new criterion for clustering. [sent-90, score-0.077]

60 4 Stochastic matrices with piecewise constant eigenvectors In the following we will use the transition matrix P to achieve a better understanding of the NCut algorithm. [sent-92, score-0.377]

61 Recall that the NCut algorithm looks at the second "largest" eigenvector of P, denoted by X2 and equal to X L, in order to obtain a partioning of I. [sent-93, score-0.177]

62 We define a vector x to be piecewise constant relative to a partition ~ = (Al' A 2 , ••. [sent-94, score-0.176]

63 A k ) of I iff Xi = Xj for i,j pixels in the same set As, s = 1, . [sent-95, score-0.095]

64 Since having piecewise constant eigenvectors is ideal case for spectral segmentation, it is important to understand when the matrix P has this desired property. [sent-99, score-0.525]

65 We study when the first k out of n eigenvectors are piecewise constant. [sent-100, score-0.208]

66 Proposition 2 Let P be a matrix with rows and columns indexed by I that has independent eigenvectors. [sent-101, score-0.07]

67 Then, P has k eigenvectors that are piecewise constant w. [sent-104, score-0.234]

68 ~ and correspond to non-zero eigenvalues if and only if the sums Pis = l:jEA. [sent-107, score-0.094]

69 Lemma 3 If the matrix P of dimension n is of the form P = D- l S with S symmetric and D non-singular then P has n independent eigenvectors. [sent-113, score-0.07]

70 We call a stochastic matrix P satisfying the conditions of Proposition 2 a blockstochastic matrix. [sent-114, score-0.108]

71 Intuitively, Proposition 2 says that a stochastic matrix has piecewise constant eigenvectors if the underlying Markov chain can be aggregated into a Markov chain with state space ~ = {A l , . [sent-115, score-0.432]

72 This opens interesting connections between the field of spectral segmentation and the body of work on aggregability or (lumpability) [3] of Markov chains. [sent-119, score-0.406]

73 Proposition 2 shows that a much broader condition exists for Ncut algorithm to produce an exact segmentation/clustering solution. [sent-121, score-0.037]

74 Such condition shows that in fact spectral clustering is able to group pixels by the similarity of their transition probabilities to subsets of I. [sent-122, score-0.623]

75 Experiments [9] show that NCut works well on many graphs that have a sparse complex connection structure supporting this result with practical evidence. [sent-123, score-0.036]

76 The NCut algorithm and criterion is one of the recently proposed spectral segmentation methods. [sent-125, score-0.52]

77 In image segmentation, there are algorithms of Perona and Freeman (PF) [7] and Scott and Longuet-Higgins (SLH) [8]. [sent-126, score-0.079]

78 In web clustering, there are algorithms of Kleinberg[4] (K), the long known latent semantic analysis (LSA), and in the variant proposed by Kannan, Vempala and Yetta (KVV) [2]. [sent-127, score-0.03]

79 It is easy to show that each of the above ideal situations imply that the resulting stochastic matrix P satisfies the conditions of Proposition 2 and thus the NCut algorithm will also work exactly in these situations. [sent-128, score-0.187]

80 Another important aspect of a spectral clustering algorithm is robustness. [sent-131, score-0.335]

81 5 The framework for learning image segmentation The previous section stressed the connection between NCut as a criterion for image segmentation and searching for low conductivity sets in a random walk. [sent-133, score-0.875]

82 Here we will exploit this connection to develop a framework for supervised learning of image segmentation. [sent-134, score-0.147]

83 Our goal is to obtain an algorithm that starts with a training set of segmented images and with a set of features and learns a function of the features that produces correct segmentations, as shown in figure 1. [sent-135, score-0.146]

84 ,g 'J~ '1j P,j ~ == ~~ Human labeled Segmentation Figure 1: The general framework for learning image segmentation. [sent-142, score-0.111]

85 For simplicity, assume the training set consists of one image only and its correct segmentation. [sent-143, score-0.079]

86 From the latter it is easy to obtain "ideal" or target transition probabilities p . [sent-144, score-0.126]

87 for i in segment A with IAI elements (8) r, We also have a predefined set of features q = 1, . [sent-148, score-0.061]

88 Q which measure similarity between two pixels according to different criteria and their values for 1. [sent-151, score-0.237]

89 The model is the part of the framework that is subject to learning. [sent-152, score-0.032]

90 It takes the features fi~ as inputs and outputs the global similarity measure Sij. [sent-153, score-0.142]

91 For the present experiments we use the simple model Sij = eL: q Aql;; Intuitively, it represents a set of independent "experts", the factors eAql" voting on the probability of a transition i -+ j. [sent-154, score-0.073]

92 In our framework, based on the fact that a segmentation is equivalent to a random walk, optimality is defined as the minimization of the conditional K ullback-Leibler (KL) divergence between the target probabilities Ptj and the transition probabilities Pij obtained by normalizing Sij. [sent-155, score-0.517]

93 max J, where I~I LPtj logPij J = L iEI (9) jEI If we interpret the factor 1/lll as a uniform distribution over states 71"0 then the criterion in (9) is equivalent to the KL divergence between two distributions over transitions KL(Pi~jllPi-+j) where j = 7I"? [sent-158, score-0.155]

94 We obtain oj -_ TIf 1 OA q q ('"' P ij f ij ~ * 1J - ~ P ij f ij ) '"' q (10) 1J One can further note that the optimum of J corresponds to the solution of the following maximum entropy problem: maxH(jli) S. [sent-160, score-0.152]

95 (11) 6 Segmentation with shape and region information In this section, we exemplify our approach on a set of synthetic and real images and we use features carrying contour and shape information. [sent-172, score-0.113]

96 First we use a set of local filer banks as edge detectors. [sent-173, score-0.116]

97 • lC (c) (b) DeL 00,' ;A \u J Figure 2: Features for segmenting objects with smooth rounded shape. [sent-178, score-0.121]

98 (a) The edge strength provides a cue of region boundary. [sent-179, score-0.146]

99 It biases against random walks in a direction orthogonal to an edge. [sent-180, score-0.132]

100 The induced edge flow is used to bias the random walk along the edge, and transitions between co-circular edge flows are encouraged. [sent-182, score-0.533]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ncut', 0.759), ('segmentation', 0.227), ('spectral', 0.179), ('walk', 0.154), ('sij', 0.139), ('proposition', 0.133), ('clustering', 0.119), ('edge', 0.116), ('piecewise', 0.107), ('similarity', 0.103), ('eigenvectors', 0.101), ('cut', 0.098), ('pixels', 0.095), ('eigenvector', 0.094), ('roo', 0.092), ('walks', 0.083), ('image', 0.079), ('pij', 0.078), ('criterion', 0.077), ('transition', 0.073), ('eigenvalues', 0.072), ('matrix', 0.07), ('conductivity', 0.069), ('markov', 0.067), ('partitioning', 0.066), ('pairwise', 0.062), ('aa', 0.056), ('segmenting', 0.054), ('flows', 0.05), ('random', 0.049), ('evading', 0.046), ('partioning', 0.046), ('pss', 0.046), ('rounded', 0.046), ('slh', 0.046), ('kl', 0.045), ('partition', 0.043), ('ideal', 0.042), ('vol', 0.04), ('features', 0.039), ('similarities', 0.039), ('pf', 0.039), ('criteria', 0.039), ('stochastic', 0.038), ('ij', 0.038), ('algorithm', 0.037), ('interpretation', 0.036), ('connection', 0.036), ('cuts', 0.036), ('iei', 0.036), ('chain', 0.035), ('graph', 0.034), ('normalized', 0.034), ('complement', 0.033), ('probabilities', 0.033), ('framework', 0.032), ('op', 0.031), ('segmented', 0.031), ('smallest', 0.031), ('divergence', 0.031), ('ri', 0.031), ('generalized', 0.031), ('cue', 0.03), ('web', 0.03), ('intuitively', 0.029), ('node', 0.028), ('offer', 0.028), ('focuses', 0.028), ('optimality', 0.027), ('flow', 0.027), ('constant', 0.026), ('interpret', 0.026), ('methods', 0.025), ('xl', 0.025), ('principled', 0.025), ('shape', 0.025), ('intuitive', 0.024), ('contour', 0.024), ('normalizing', 0.024), ('pt', 0.024), ('formulated', 0.023), ('fi', 0.023), ('view', 0.023), ('segment', 0.022), ('solutions', 0.022), ('sums', 0.022), ('probabilistic', 0.022), ('smooth', 0.021), ('subsets', 0.021), ('transitions', 0.021), ('points', 0.021), ('largest', 0.02), ('edges', 0.02), ('target', 0.02), ('kannan', 0.02), ('ptj', 0.02), ('pare', 0.02), ('aggregated', 0.02), ('del', 0.02), ('pji', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 79 nips-2000-Learning Segmentation by Random Walks

Author: Marina Meila, Jianbo Shi

2 0.13016097 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

Author: Naftali Tishby, Noam Slonim

Abstract: We introduce a new, non-parametric and principled, distance based clustering method. This method combines a pairwise based approach with a vector-quantization method which provide a meaningful interpretation to the resulting clusters. The idea is based on turning the distance matrix into a Markov process and then examine the decay of mutual-information during the relaxation of this process. The clusters emerge as quasi-stable structures during this relaxation, and then are extracted using the information bottleneck method. These clusters capture the information about the initial point of the relaxation in the most effective way. The method can cluster data with no geometric or other bias and makes no assumption about the underlying distribution. 1

3 0.065547608 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

4 0.063450933 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

Author: James M. Coughlan, Alan L. Yuille

Abstract: Preliminary work by the authors made use of the so-called

5 0.06071604 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

6 0.056942705 134 nips-2000-The Kernel Trick for Distances

7 0.054930497 12 nips-2000-A Support Vector Method for Clustering

8 0.053333063 48 nips-2000-Exact Solutions to Time-Dependent MDPs

9 0.052539244 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

10 0.051577508 117 nips-2000-Shape Context: A New Descriptor for Shape Matching and Object Recognition

11 0.049875438 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

12 0.049179122 103 nips-2000-Probabilistic Semantic Video Indexing

13 0.048373863 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

14 0.048277996 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

15 0.047871314 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

16 0.047018398 30 nips-2000-Bayesian Video Shot Segmentation

17 0.046197969 121 nips-2000-Sparse Kernel Principal Component Analysis

18 0.044668134 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

19 0.044435844 136 nips-2000-The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

20 0.043342348 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.149), (1, -0.022), (2, 0.073), (3, 0.042), (4, -0.015), (5, 0.066), (6, 0.051), (7, -0.016), (8, -0.083), (9, 0.137), (10, -0.025), (11, 0.07), (12, 0.054), (13, 0.007), (14, -0.052), (15, 0.05), (16, -0.038), (17, 0.047), (18, 0.037), (19, 0.045), (20, -0.049), (21, 0.084), (22, -0.119), (23, 0.104), (24, 0.06), (25, 0.187), (26, -0.066), (27, -0.104), (28, 0.003), (29, -0.007), (30, 0.221), (31, 0.063), (32, 0.188), (33, 0.052), (34, 0.123), (35, 0.187), (36, 0.04), (37, -0.021), (38, -0.097), (39, 0.129), (40, -0.016), (41, 0.052), (42, 0.065), (43, 0.138), (44, -0.122), (45, 0.064), (46, 0.024), (47, 0.021), (48, 0.004), (49, 0.214)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95564705 79 nips-2000-Learning Segmentation by Random Walks

Author: Marina Meila, Jianbo Shi

2 0.62639213 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

Author: Naftali Tishby, Noam Slonim

3 0.46296695 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

Author: James M. Coughlan, Alan L. Yuille

Abstract: Preliminary work by the authors made use of the so-called

4 0.37603161 30 nips-2000-Bayesian Video Shot Segmentation

Author: Nuno Vasconcelos, Andrew Lippman

Abstract: Prior knowledge about video structure can be used both as a means to improve the peiformance of content analysis and to extract features that allow semantic classification. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models by introducing a Bayesian formulation for the shot segmentation problem. The new formulations is shown to extend standard thresholding methods in an adaptive and intuitive way, leading to improved segmentation accuracy.

5 0.35209101 12 nips-2000-A Support Vector Method for Clustering

Author: Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladimir Vapnik

Abstract: We present a novel method for clustering using the support vector machine approach. Data points are mapped to a high dimensional feature space, where support vectors are used to define a sphere enclosing them. The boundary of the sphere forms in data space a set of closed contours containing the data. Data points enclosed by each contour are defined as a cluster. As the width parameter of the Gaussian kernel is decreased, these contours fit the data more tightly and splitting of contours occurs. The algorithm works by separating clusters according to valleys in the underlying probability distribution, and thus clusters can take on arbitrary geometrical shapes. As in other SV algorithms, outliers can be dealt with by introducing a soft margin constant leading to smoother cluster boundaries. The structure of the data is explored by varying the two parameters. We investigate the dependence of our method on these parameters and apply it to several data sets.

6 0.29472408 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

7 0.26612073 103 nips-2000-Probabilistic Semantic Video Indexing

8 0.26229215 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

9 0.26226106 116 nips-2000-Sex with Support Vector Machines

10 0.25508407 136 nips-2000-The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

11 0.24468754 32 nips-2000-Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes

12 0.23968297 48 nips-2000-Exact Solutions to Time-Dependent MDPs

13 0.2358411 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

14 0.23427364 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

15 0.22658472 117 nips-2000-Shape Context: A New Descriptor for Shape Matching and Object Recognition

16 0.22598876 44 nips-2000-Efficient Learning of Linear Perceptrons

17 0.22077447 112 nips-2000-Reinforcement Learning with Function Approximation Converges to a Region

18 0.21706238 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

19 0.21127477 22 nips-2000-Algorithms for Non-negative Matrix Factorization

20 0.20974414 18 nips-2000-Active Support Vector Machine Classification

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.027), (17, 0.149), (32, 0.034), (33, 0.038), (36, 0.011), (55, 0.024), (62, 0.047), (65, 0.02), (67, 0.094), (75, 0.091), (76, 0.045), (79, 0.019), (81, 0.021), (90, 0.037), (97, 0.019), (98, 0.226)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84561056 79 nips-2000-Learning Segmentation by Random Walks

Author: Marina Meila, Jianbo Shi

2 0.78742814 133 nips-2000-The Kernel Gibbs Sampler

Author: Thore Graepel, Ralf Herbrich

Abstract: We present an algorithm that samples the hypothesis space of kernel classifiers. Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constant posterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Markov Chain Monte Carlo method that chooses a random direction in parameter space and samples from the resulting piecewise constant density along the line chosen. The KGS can be used as an analytical tool for the exploration of Bayesian transduction, Bayes point machines, active learning, and evidence-based model selection on small data sets that are contaminated with label noise. For a simple toy example we demonstrate experimentally how a Bayes point machine based on the KGS outperforms an SVM that is incapable of taking into account label noise. 1

3 0.67304945 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

Author: Naftali Tishby, Noam Slonim

4 0.64456779 30 nips-2000-Bayesian Video Shot Segmentation

Author: Nuno Vasconcelos, Andrew Lippman

5 0.619196 74 nips-2000-Kernel Expansions with Unlabeled Examples

Author: Martin Szummer, Tommi Jaakkola

Abstract: Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trained via the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulation of the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requires very few labeled examples for high classification accuracy.

6 0.61358052 70 nips-2000-Incremental and Decremental Support Vector Machine Learning

7 0.605919 122 nips-2000-Sparse Representation for Gaussian Process Models

8 0.60490882 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

9 0.60224742 134 nips-2000-The Kernel Trick for Distances

10 0.59837204 146 nips-2000-What Can a Single Neuron Compute?

11 0.59768832 4 nips-2000-A Linear Programming Approach to Novelty Detection

12 0.59295541 130 nips-2000-Text Classification using String Kernels

13 0.59293056 37 nips-2000-Convergence of Large Margin Separable Linear Classification

14 0.59209776 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

15 0.59114456 60 nips-2000-Gaussianization

16 0.59006691 22 nips-2000-Algorithms for Non-negative Matrix Factorization

17 0.58781123 111 nips-2000-Regularized Winnow Methods

18 0.58611256 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

19 0.58590961 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

20 0.58577168 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning