nips nips2004 nips2004-167 knowledge-graph by maker-knowledge-mining

167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Source: pdf

Author: Zhengdong Lu, Todd K. Leen

Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to inﬂuence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We ﬁt the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. [sent-4, score-0.358]

2 We would like such pairwise relations to inﬂuence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. [sent-5, score-0.899]

3 Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. [sent-6, score-0.278]

4 We express clustering preferences in the prior distribution over assignments of data points to clusters. [sent-7, score-0.649]

5 This prior penalizes cluster assignments according to the degree with which they violate the preferences. [sent-8, score-0.291]

6 Experiments on a variety of data sets show that PPC can consistently improve clustering results. [sent-10, score-0.307]

7 1 Introduction While clustering is usually executed completely unsupervised, there are circumstances in which we have prior belief that pairs of samples should (or should not) be assigned to the same cluster. [sent-11, score-0.476]

8 Such pairwise relations may arise from a perceived similarity (or dissimilarity) between samples, or from a desire that the algorithmically generated clusters match the geometric cluster structure perceived by the experimenter in the original data. [sent-12, score-0.743]

9 Continuity, which suggests that neighboring pairs of samples in a time series or in an image are likely to belong to the same class of object, is also a source of clustering preferences. [sent-13, score-0.388]

10 We would like these preferences to be incorporated into the cluster structure so that the assignment of out-of-sample data to clusters captures the concept(s) that give rise to the preferences expressed in the training data. [sent-14, score-0.823]

11 Some work [1, 2, 3] has been done on adopting traditional clustering methods, such as Kmeans, to incorporate pairwise relations. [sent-15, score-0.48]

12 These models are based on hard clustering and the clustering preferences are expressed as hard pairwise constraints that must be satisﬁed. [sent-16, score-1.111]

13 [4] who propose a Gaussian mixture model (GMM) for clustering that incorporates hard pairwise constraints. [sent-18, score-0.589]

14 In this paper, we propose a soft clustering algorithm based on GMM that expresses cluster- ing preferences (in the form of pairwise relations) in the prior probability on assignments of data points to clusters. [sent-19, score-0.967]

15 This framework naturally accommodates both hard constraints and soft preferences in a framework in which the preferences are expressed as a Bayesian probability that pairs of points should (or should not) be assigned to the same cluster. [sent-20, score-0.781]

16 Experiments on several datasets demonstrate that PPC can consistently improve the clustering result by incorporating reliable prior knowledge. [sent-22, score-0.353]

17 N with (latent) cluster assignments Z = z(xi ), i = 1, . [sent-33, score-0.22]

18 The complete data likelihood is P (X, Z|Θ) = P (X|Z, Θ)P (Z|Θ). [sent-37, score-0.099]

19 1 Prior distribution in latent space We incorporate our clustering preferences by manipulating the prior probability P (Z|Θ). [sent-39, score-0.526]

20 In the standard Gaussian mixture model, the prior distribution is trivial: P (Z|Θ) = i πzi . [sent-40, score-0.109]

21 We incorporate prior knowledge (our clustering preferences) through a weighting function g(Z) that has large values when the assignment of data points to clusters Z conforms to our preferences, and low values when Z conﬂicts with our preferences. [sent-41, score-0.582]

22 Hence we write i P (Z|Θ, G) = Z πzi g(Z) 1 ≡ K j πzj g(Z) πzi g(Z) (2) i where the sum is over all possible assignments of the data to clusters. [sent-42, score-0.126]

23 The likelihood of the data, given a speciﬁc cluster assignment, is independent of the cluster assignment preferences, and so the complete data likelihood is P (X, Z|Θ, G) = P (X|Z, Θ) 1 K πzi g(Z) = i 1 P (X, Z|Θ)g(Z), K (3) where P (X, Z|Θ) is the complete data likelihood for a standard GMM. [sent-43, score-0.592]

24 The data likelihood is the sum of complete data likelihood over all possible Z, that is, L(X|Θ) = P (X|Θ, G) = Z P (X, Z|Θ, G), which can be maximized with the EM algorithm. [sent-44, score-0.161]

25 Once the model parameters are ﬁt, we do soft clustering according to the posterior probabilities for new data p(α|x, Θ). [sent-45, score-0.423]

26 (Note that cluster assignment preferences are not expressed for the new data, only for the training data. [sent-46, score-0.515]

27 2 Pairwise relations Pairwise relations provide a special case of the framework discussed above. [sent-48, score-0.508]

28 We specify two types of pairwise relations: • link: two sample should be assigned into one cluster • do-not-link: two samples should be assigned into different clusters. [sent-49, score-0.585]

29 The weighting factor given to the cluster assignment conﬁguration Z is simple: p exp(Wij δ(zi , zj )), g(Z) = i,j p where δ is the Kronecker δ-function and Wij is the weight associated with sample pair (xi , xj ). [sent-50, score-0.541]

30 p The weight Wij reﬂects our preference and conﬁdence in assigning xi and xj into one p cluster. [sent-52, score-0.147]

31 We use a positive Wij when we prefer to assign xi and xj into one cluster (link), p and a negative Wij when we prefer to assign them into different clusters (do-not-link). [sent-53, score-0.363]

32 If Wij = 0, we have no prior p knowledge on the assignment relevancy of xi and xj . [sent-55, score-0.415]

33 In the extreme cases where |Wij | → ∞, the Z violating the pairwise relations about xi and xj have zero prior probability, since for those assignments n P (Z|Θ, G) = Z πz n n i,j πzn p exp(Wij δ(zi , zj )) i,j p exp(Wij δ(zi , zj )) → 0. [sent-56, score-1.124]

34 p Then the relations become hard constraints, while the relations with |Wij | < ∞ are called soft preferences. [sent-57, score-0.662]

35 In the remainder of this paper, we will use W p to denote the prior knowledge on pairwise relations, that is P (X, Z|Θ, W p ) = 1 P (X, Z|Θ) K p exp(Wij δ(zi , zj )) (4) i,j 2. [sent-58, score-0.518]

36 However, the update of prior probability of each component is more difﬁcult than for the standard GMM, we need to ﬁnd M N log πl P (l|xi , Θ(t−1) , G) − log K(π). [sent-61, score-0.071]

37 4 Posterior Inference and Gibbs sampling The M-step requires the cluster membership posterior. [sent-67, score-0.153]

38 Computing this posterior is simple for the standard GMM since each data point xi can be assigned to a cluster independent of the other data points and we have the familiar cluster origin posterior p(zi = k|xi , Θ). [sent-68, score-0.65]

39 If two sample points, xi and xj participate in a pairwise relations, equation (4) tells us P (zi , zj |X, Θ, W p ) = P (zi |X, Θ, W p )P (zj |X, Θ, W p ) . [sent-70, score-0.613]

40 and the posterior probability of xi and xj cannot be computed separately. [sent-71, score-0.249]

41 For pairwise relations, the joint posterior distribution must be calculated over the entire transitive closure of the “link” or “do-not-link” relations. [sent-72, score-0.361]

43 In some circumstances it is natural to limit ourselves to the special case of pairwise relation with |T | ≤ 2, called non-overlapping relations. [sent-80, score-0.281]

44 More generally, we can avoid the expensive computation in posterior inference by breaking large clique into many small ones. [sent-83, score-0.221]

45 For some choices of g(Z), the posterior probability can be given in a simple form even when the clique is big. [sent-87, score-0.221]

46 This case is useful when we are sure that a group of samples are from one source. [sent-89, score-0.099]

47 For more general cases, where exact inference is computationally prohibitive, we propose to use Gibbs sampling [6] to estimate the posterior probability. [sent-90, score-0.136]

48 (b) (a) Figure 2: (a) Overlapping pairwise relations; (b) Non-overlapping pairwise relations. [sent-91, score-0.474]

49 In Gibbs sampling, we estimate P (zi |X, Θ, G) as a sample mean P (zi = k|X, Θ, G) = E(δ(zi , k)|X, Θ, G) ≈ 1 S S (t) δ(zi , k) t=1 where the sum is over a sequence of S samples from P (Z|X, Θ, G) generated by the Gibbs MCMC. [sent-92, score-0.135]

50 , zN −1 , X, G, Θ) For pairwise relations it is helpful to introduce some notation. [sent-102, score-0.491]

51 Let Z−i denote an assignment of data points to clusters that leaves out the assignment of xi . [sent-103, score-0.397]

52 Let U (i) be the indices of the set of samples that participate in a pairwise relation with sample xi , p U (i) = {j : Wij = 0}. [sent-104, score-0.468]

53 P (zi |Z−i , X, Θ, W p ) ∝ P (xi , zi |Θ) (5) j∈U (i) When W p is sparse, the size of U (i) is small, thus calculating P (zi |Z−i , X, Θ, W p ) is very cheap and Gibbs sampling can effectively estimate the posterior probability. [sent-106, score-0.416]

54 1 Clustering with different number of hard pairwise constraints In this experiment, we demonstrate how the number of pairwise relations affects the performance of clustering. [sent-108, score-0.852]

55 Iris data set has 150 samples and three classes, 50 samples in each class; Waveform data set has 5000 samples and three classes, 33% samples in each class; Pendigits data set includes four classes (digits 0,6,8,9), each with 750 samples. [sent-110, score-0.511]

56 All data sets have labels for all samples, which are used to generate the relations and to evaluate performance. [sent-111, score-0.279]

57 We try PPC (with component number same as the number of classes) with various number of pairwise relations. [sent-112, score-0.237]

58 For each relations number, we conduct 100 runs and calculate the averaged classiﬁcation accuracy. [sent-113, score-0.28]

59 The pairwise relations are generated as follows: we randomly pick two samples from the training set without replacement and check their labels. [sent-115, score-0.691]

60 Note the generated pairwise relations are non-overlapping, as described in section 2. [sent-117, score-0.491]

61 3 indicates, PPC can consistently improve its clustering accuracy on the training set when more pairwise constraints are added; also, the effect brought by constraints generalizes to the test set. [sent-123, score-0.684]

62 7 0 200 400 600 800 1000 Number of relations 1200 (c) on Pendigits data Figure 3: The performance of PPC with various number of relations 3. [sent-141, score-0.533]

63 2 Hard pairwise constraints for encoding partial label The experiment in this subsection shows the application of pairwise constraints on partially labeled data. [sent-142, score-0.638]

64 The samples are partially labeled in the sense that we are told which class-set a sample is from, but not which speciﬁc class it is from. [sent-148, score-0.181]

65 We can logically derive a do-not-link constraint between any pair of samples known to belong to different class-sets, while no link constraint can be derived if each class-set has more than one class in it. [sent-149, score-0.162]

66 4 (a) is a 120x400 region from Greenland ice sheet from NASA Langley DAAC. [sent-151, score-0.073]

67 This region is partially labeled into snow area and non-snow area, as indicated in Fig. [sent-152, score-0.273]

68 The snow area can be ice, melting snow or dry snow, while the non-snow area can be bare land, water or cloud. [sent-154, score-0.498]

69 To segment the image, we ﬁrst divide the image into 5x5x7 blocks (175 dim vectors). [sent-156, score-0.139]

70 For PPC, we use half of data samples for training set and the rest for test. [sent-158, score-0.154]

71 Hard do-not-link constraints (only on training set) are generated as follows: for each block in the non-snow area, we randomly choose (without replacement) six blocks from the snow area to build do-not-link constraints. [sent-159, score-0.467]

72 By doing this, we achieve cliques with size seven (1 non-snow block + 6 snow blocks). [sent-160, score-0.325]

73 1, we apply the model ﬁtted with PPC to test set and combine the clustering results on both data sets into a complete picture. [sent-162, score-0.277]

74 A typical clustering result of 3-component standard GMM and 3-component PPC are shown as Fig. [sent-163, score-0.215]

75 4, standard GMM gives a clustering that is clearly in disagreement with the human labeling in Fig. [sent-166, score-0.215]

76 The PPC segmentation makes far fewer mis-assignments of snow areas (tagged white and gray) to non-snow (black) than does the GMM. [sent-168, score-0.312]

77 The PPC segmentation properly labels almost all of the non-snow regions as non-snow. [sent-169, score-0.081]

78 Furthermore, the segmentation of the snow areas into the two classes (not labeled) tagged white and gray in Fig. [sent-170, score-0.39]

79 4 (d) reﬂects subtle differences in the snow regions captured by the gray-scale image from spectral channel 2, as shown in Fig. [sent-171, score-0.304]

80 Figure 4: (a) Gray-scale image from the ﬁrst spectral channel 2. [sent-173, score-0.1]

81 (b) Partial label given by expert, black pixels denote non-snow area and white pixels denote snow area. [sent-174, score-0.276]

82 (c) and (d) are colored according to image blocks’ assignment. [sent-176, score-0.074]

83 3 Soft pairwise preferences for texture image segmentation In this subsection, we propose an unsupervised texture image segmentation algorithm as an application of PPC model. [sent-178, score-0.893]

84 2, the image is divided into blocks and rearranged into feature vectors. [sent-180, score-0.19]

85 However, standard GMM often fails to give a good segmentation because it cannot make use of the spatial continuity of image, which is essential in many image segmentation models, such as random ﬁeld [7]. [sent-182, score-0.281]

86 In our algorithm, the spatial continuity is incorporated as the soft link preferences with uniform weight between each block and its neighbors. [sent-183, score-0.466]

87 The complete data likelihood is P (X, Z|Θ, W p ) = 1 P (X, Z|Θ) K exp(w δ(zi , zj )), (6) i j∈U (i) where U (i) means the neighbors of the ith block. [sent-184, score-0.256]

88 The EM algorithm can be roughly interpreted as iterating on two steps: 1) estimating the texture description (parameters of mixture model) based on segmentation, and 2) segmenting the image based on the texture description given by step 1. [sent-185, score-0.246]

89 Gibbs sampling is used to estimate the posterior probability in each EM iteration. [sent-186, score-0.136]

90 Equation (5) is reduced to P (zi |Z−i , X, Θ, W p ) ∝ P (xi , zi |Θ) exp(2w δ(zi , zj )). [sent-187, score-0.411]

91 This image is divided into 7x7 blocks and then rearranged to 49-dim vectors. [sent-190, score-0.19]

92 For PPC model, the soft links 1 Downloaded from http://sipi. [sent-192, score-0.129]

93 A typical clustering result of 4-component standard GMM and 4-component PPC with w = 2 are shown in Fig. [sent-197, score-0.215]

94 Obviously, PPC achieves a better segmentation after incorporating spatial continuity. [sent-200, score-0.081]

95 (c) and (d) are shaded according to the blocks assignments to clusters. [sent-204, score-0.166]

96 4 Conclusion and Discussion We have proposed a probabilistic clustering model that incorporates prior knowledge in the form of pairwise relations between samples. [sent-205, score-0.83]

97 Unlike previous work in semi-supervised clustering, this work formulates clustering preferences as a Bayesian prior over the assignment of data points to clusters, and so naturally accommodates both hard constraints and soft preferences. [sent-206, score-0.939]

98 Experiments on different data sets show that pairwise relations can consistently improve the performance of the clustering process. [sent-208, score-0.798]

99 From instance Level to space-level constraints: making the most of prior knowledge in data clustering. [sent-228, score-0.123]

100 A multiscale random ﬁeld model for Bayesian image segmentation. [sent-250, score-0.074]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ppc', 0.513), ('gmm', 0.265), ('relations', 0.254), ('zi', 0.254), ('pairwise', 0.237), ('clustering', 0.215), ('preferences', 0.212), ('snow', 0.204), ('wij', 0.185), ('zj', 0.157), ('clique', 0.119), ('cluster', 0.119), ('assignment', 0.119), ('zn', 0.102), ('posterior', 0.102), ('assignments', 0.101), ('samples', 0.099), ('xj', 0.087), ('gibbs', 0.084), ('soft', 0.081), ('segmentation', 0.081), ('zt', 0.08), ('image', 0.074), ('hard', 0.073), ('prior', 0.071), ('texture', 0.067), ('blocks', 0.065), ('link', 0.063), ('xi', 0.06), ('ice', 0.051), ('pendigits', 0.051), ('rearranged', 0.051), ('relevancy', 0.051), ('xt', 0.051), ('cliques', 0.051), ('constraints', 0.051), ('clusters', 0.049), ('classification', 0.049), ('links', 0.048), ('assigned', 0.047), ('em', 0.046), ('continuity', 0.045), ('area', 0.045), ('nasa', 0.045), ('accommodates', 0.045), ('circumstances', 0.044), ('block', 0.043), ('nineteenth', 0.041), ('consistently', 0.041), ('classes', 0.04), ('pick', 0.04), ('mixture', 0.038), ('tagged', 0.038), ('subsection', 0.038), ('shental', 0.038), ('likelihood', 0.037), ('complete', 0.037), ('sample', 0.036), ('waveform', 0.036), ('participate', 0.036), ('iris', 0.036), ('expressed', 0.035), ('sampling', 0.034), ('brought', 0.033), ('exp', 0.032), ('perceived', 0.031), ('replacement', 0.031), ('training', 0.03), ('six', 0.029), ('incorporate', 0.028), ('seven', 0.027), ('knowledge', 0.027), ('white', 0.027), ('ects', 0.026), ('channel', 0.026), ('incorporates', 0.026), ('calculating', 0.026), ('familiar', 0.026), ('items', 0.026), ('remainder', 0.026), ('improve', 0.026), ('averaged', 0.026), ('points', 0.025), ('data', 0.025), ('tted', 0.025), ('penalized', 0.024), ('prefer', 0.024), ('labeled', 0.024), ('arg', 0.023), ('weighting', 0.023), ('kmeans', 0.022), ('algorithmically', 0.022), ('sheet', 0.022), ('brodatz', 0.022), ('rogers', 0.022), ('closure', 0.022), ('hoping', 0.022), ('formulates', 0.022), ('told', 0.022), ('incorporated', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

2 0.18514213 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Author: Le Lu, Gregory D. Hager, Laurent Younes

Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we ﬁrst extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then reﬁned using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.

3 0.14699851 115 nips-2004-Maximum Margin Clustering

Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans

Abstract: We propose a new method for clustering based on ﬁnding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difﬁcult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real beneﬁt of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1

4 0.12300111 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky

Abstract: We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to reﬁne inaccurate model initializations, as well as track hand motion through extended image sequences. 1

5 0.12078442 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

Author: Scott J. Gaffney, Padhraic Smyth

Abstract: Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a ﬁxed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.

6 0.12001093 77 nips-2004-Hierarchical Clustering of a Mixture Model

7 0.1093149 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

8 0.10830852 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

9 0.10203286 161 nips-2004-Self-Tuning Spectral Clustering

10 0.084933594 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

11 0.084255628 103 nips-2004-Limits of Spectral Clustering

12 0.075550213 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

13 0.075304829 145 nips-2004-Parametric Embedding for Class Visualization

14 0.071271442 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

15 0.069504127 164 nips-2004-Semi-supervised Learning by Entropy Minimization

16 0.068907 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

17 0.067378655 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

18 0.063302457 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

19 0.062713325 87 nips-2004-Integrating Topics and Syntax

20 0.062462859 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.216), (1, 0.043), (2, -0.055), (3, -0.11), (4, 0.054), (5, -0.032), (6, -0.147), (7, 0.129), (8, -0.151), (9, 0.002), (10, -0.05), (11, 0.209), (12, -0.082), (13, -0.073), (14, 0.097), (15, 0.004), (16, -0.071), (17, 0.047), (18, -0.045), (19, -0.043), (20, -0.033), (21, -0.01), (22, 0.074), (23, -0.005), (24, 0.058), (25, -0.123), (26, 0.04), (27, 0.062), (28, 0.033), (29, -0.036), (30, 0.084), (31, 0.134), (32, -0.012), (33, -0.092), (34, 0.059), (35, 0.016), (36, -0.111), (37, -0.078), (38, 0.083), (39, 0.021), (40, -0.026), (41, 0.07), (42, 0.117), (43, -0.078), (44, 0.038), (45, 0.168), (46, 0.018), (47, 0.063), (48, -0.087), (49, -0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96185517 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

2 0.66508609 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Author: Le Lu, Gregory D. Hager, Laurent Younes

3 0.57328969 115 nips-2004-Maximum Margin Clustering

Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans

4 0.57269764 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

Author: Massimiliano Pavan, Marcello Pelillo

Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efﬁciently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efﬁcient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1

5 0.55712473 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

Author: Scott J. Gaffney, Padhraic Smyth

6 0.52943444 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

7 0.52941573 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

8 0.50122541 158 nips-2004-Sampling Methods for Unsupervised Learning

9 0.48393387 77 nips-2004-Hierarchical Clustering of a Mixture Model

10 0.44869533 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

11 0.44435772 161 nips-2004-Self-Tuning Spectral Clustering

12 0.39432153 103 nips-2004-Limits of Spectral Clustering

13 0.38055557 43 nips-2004-Conditional Models of Identity Uncertainty with Application to Noun Coreference

14 0.37777901 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

15 0.37270841 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

16 0.36827323 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

17 0.36638466 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection

18 0.35783339 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)

19 0.34206274 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification

20 0.32991791 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.083), (15, 0.2), (26, 0.053), (27, 0.108), (31, 0.037), (32, 0.033), (33, 0.214), (35, 0.042), (39, 0.014), (50, 0.033), (68, 0.06), (93, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9470908 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

2 0.93707353 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

Author: Chiranjib Bhattacharyya, Pannagadatta K. Shivaswamy, Alex J. Smola

Abstract: We propose a convex optimization based strategy to deal with uncertainty in the observations of a classiﬁcation problem. We assume that instead of a sample (xi , yi ) a distribution over (xi , yi ) is speciﬁed. In particular, we derive a robust formulation when the distribution is given by a normal distribution. It leads to Second Order Cone Programming formulation. Our method is applied to the problem of missing data, where it outperforms direct imputation. 1

3 0.92618191 44 nips-2004-Conditional Random Fields for Object Recognition

Author: Ariadna Quattoni, Michael Collins, Trevor Darrell

Abstract: We present a discriminative part-based approach for the recognition of object classes from unsegmented cluttered scenes. Objects are modeled as ﬂexible constellations of parts conditioned on local observations found by an interest operator. For each object class the probability of a given assignment of parts to local features is modeled by a Conditional Random Field (CRF). We propose an extension of the CRF framework that incorporates hidden variables and combines class conditional CRFs into a uniﬁed framework for part-based object recognition. The parameters of the CRF are estimated in a maximum likelihood framework and recognition proceeds by ﬁnding the most likely class under our model. The main advantage of the proposed CRF framework is that it allows us to relax the assumption of conditional independence of the observed data (i.e. local features) often used in generative approaches, an assumption that might be too restrictive for a considerable number of object classes.

4 0.91258764 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth

Abstract: We address the problem of learning a symmetric positive deﬁnite matrix. The central issue is to design parameter updates that preserve positive deﬁniteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and ﬁnding a symmetric positive deﬁnite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive deﬁnite matrix of trace one instead of a probability vector (which in this context is a diagonal positive deﬁnite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive deﬁniteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DeﬁniteBoost, to learn a kernel matrix from distance measurements.

5 0.91162282 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classiﬁcation, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efﬁciently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

6 0.91078365 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

7 0.90851295 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

8 0.90844208 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

9 0.90536141 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

10 0.90437955 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

11 0.9042933 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

12 0.90333074 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

13 0.90117061 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

14 0.90009385 131 nips-2004-Non-Local Manifold Tangent Learning

15 0.89967096 168 nips-2004-Semigroup Kernels on Finite Sets

16 0.89882308 127 nips-2004-Neighbourhood Components Analysis

17 0.89879251 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

18 0.89738643 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

19 0.89615119 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

20 0.89613771 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images