nips nips2004 nips2004-167 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. [sent-4, score-0.358]
2 We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. [sent-5, score-0.899]
3 Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. [sent-6, score-0.278]
4 We express clustering preferences in the prior distribution over assignments of data points to clusters. [sent-7, score-0.649]
5 This prior penalizes cluster assignments according to the degree with which they violate the preferences. [sent-8, score-0.291]
6 Experiments on a variety of data sets show that PPC can consistently improve clustering results. [sent-10, score-0.307]
7 1 Introduction While clustering is usually executed completely unsupervised, there are circumstances in which we have prior belief that pairs of samples should (or should not) be assigned to the same cluster. [sent-11, score-0.476]
8 Such pairwise relations may arise from a perceived similarity (or dissimilarity) between samples, or from a desire that the algorithmically generated clusters match the geometric cluster structure perceived by the experimenter in the original data. [sent-12, score-0.743]
9 Continuity, which suggests that neighboring pairs of samples in a time series or in an image are likely to belong to the same class of object, is also a source of clustering preferences. [sent-13, score-0.388]
10 We would like these preferences to be incorporated into the cluster structure so that the assignment of out-of-sample data to clusters captures the concept(s) that give rise to the preferences expressed in the training data. [sent-14, score-0.823]
11 Some work [1, 2, 3] has been done on adopting traditional clustering methods, such as Kmeans, to incorporate pairwise relations. [sent-15, score-0.48]
12 These models are based on hard clustering and the clustering preferences are expressed as hard pairwise constraints that must be satisfied. [sent-16, score-1.111]
13 [4] who propose a Gaussian mixture model (GMM) for clustering that incorporates hard pairwise constraints. [sent-18, score-0.589]
14 In this paper, we propose a soft clustering algorithm based on GMM that expresses cluster- ing preferences (in the form of pairwise relations) in the prior probability on assignments of data points to clusters. [sent-19, score-0.967]
15 This framework naturally accommodates both hard constraints and soft preferences in a framework in which the preferences are expressed as a Bayesian probability that pairs of points should (or should not) be assigned to the same cluster. [sent-20, score-0.781]
16 Experiments on several datasets demonstrate that PPC can consistently improve the clustering result by incorporating reliable prior knowledge. [sent-22, score-0.353]
17 N with (latent) cluster assignments Z = z(xi ), i = 1, . [sent-33, score-0.22]
18 The complete data likelihood is P (X, Z|Θ) = P (X|Z, Θ)P (Z|Θ). [sent-37, score-0.099]
19 1 Prior distribution in latent space We incorporate our clustering preferences by manipulating the prior probability P (Z|Θ). [sent-39, score-0.526]
20 In the standard Gaussian mixture model, the prior distribution is trivial: P (Z|Θ) = i πzi . [sent-40, score-0.109]
21 We incorporate prior knowledge (our clustering preferences) through a weighting function g(Z) that has large values when the assignment of data points to clusters Z conforms to our preferences, and low values when Z conflicts with our preferences. [sent-41, score-0.582]
22 Hence we write i P (Z|Θ, G) = Z πzi g(Z) 1 ≡ K j πzj g(Z) πzi g(Z) (2) i where the sum is over all possible assignments of the data to clusters. [sent-42, score-0.126]
23 The likelihood of the data, given a specific cluster assignment, is independent of the cluster assignment preferences, and so the complete data likelihood is P (X, Z|Θ, G) = P (X|Z, Θ) 1 K πzi g(Z) = i 1 P (X, Z|Θ)g(Z), K (3) where P (X, Z|Θ) is the complete data likelihood for a standard GMM. [sent-43, score-0.592]
24 The data likelihood is the sum of complete data likelihood over all possible Z, that is, L(X|Θ) = P (X|Θ, G) = Z P (X, Z|Θ, G), which can be maximized with the EM algorithm. [sent-44, score-0.161]
25 Once the model parameters are fit, we do soft clustering according to the posterior probabilities for new data p(α|x, Θ). [sent-45, score-0.423]
26 (Note that cluster assignment preferences are not expressed for the new data, only for the training data. [sent-46, score-0.515]
27 2 Pairwise relations Pairwise relations provide a special case of the framework discussed above. [sent-48, score-0.508]
28 We specify two types of pairwise relations: • link: two sample should be assigned into one cluster • do-not-link: two samples should be assigned into different clusters. [sent-49, score-0.585]
29 The weighting factor given to the cluster assignment configuration Z is simple: p exp(Wij δ(zi , zj )), g(Z) = i,j p where δ is the Kronecker δ-function and Wij is the weight associated with sample pair (xi , xj ). [sent-50, score-0.541]
30 p The weight Wij reflects our preference and confidence in assigning xi and xj into one p cluster. [sent-52, score-0.147]
31 We use a positive Wij when we prefer to assign xi and xj into one cluster (link), p and a negative Wij when we prefer to assign them into different clusters (do-not-link). [sent-53, score-0.363]
32 If Wij = 0, we have no prior p knowledge on the assignment relevancy of xi and xj . [sent-55, score-0.415]
33 In the extreme cases where |Wij | → ∞, the Z violating the pairwise relations about xi and xj have zero prior probability, since for those assignments n P (Z|Θ, G) = Z πz n n i,j πzn p exp(Wij δ(zi , zj )) i,j p exp(Wij δ(zi , zj )) → 0. [sent-56, score-1.124]
34 p Then the relations become hard constraints, while the relations with |Wij | < ∞ are called soft preferences. [sent-57, score-0.662]
35 In the remainder of this paper, we will use W p to denote the prior knowledge on pairwise relations, that is P (X, Z|Θ, W p ) = 1 P (X, Z|Θ) K p exp(Wij δ(zi , zj )) (4) i,j 2. [sent-58, score-0.518]
36 However, the update of prior probability of each component is more difficult than for the standard GMM, we need to find M N log πl P (l|xi , Θ(t−1) , G) − log K(π). [sent-61, score-0.071]
37 4 Posterior Inference and Gibbs sampling The M-step requires the cluster membership posterior. [sent-67, score-0.153]
38 Computing this posterior is simple for the standard GMM since each data point xi can be assigned to a cluster independent of the other data points and we have the familiar cluster origin posterior p(zi = k|xi , Θ). [sent-68, score-0.65]
39 If two sample points, xi and xj participate in a pairwise relations, equation (4) tells us P (zi , zj |X, Θ, W p ) = P (zi |X, Θ, W p )P (zj |X, Θ, W p ) . [sent-70, score-0.613]
40 and the posterior probability of xi and xj cannot be computed separately. [sent-71, score-0.249]
41 For pairwise relations, the joint posterior distribution must be calculated over the entire transitive closure of the “link” or “do-not-link” relations. [sent-72, score-0.361]
42 p Z P (ZT , XT |Θ, W ) T Computing the posterior probability of a sample in clique T requires time complexity O(M |T | ), where |T | is the size of clique T and M is the number of components in the mixture model. [sent-77, score-0.414]
43 In some circumstances it is natural to limit ourselves to the special case of pairwise relation with |T | ≤ 2, called non-overlapping relations. [sent-80, score-0.281]
44 More generally, we can avoid the expensive computation in posterior inference by breaking large clique into many small ones. [sent-83, score-0.221]
45 For some choices of g(Z), the posterior probability can be given in a simple form even when the clique is big. [sent-87, score-0.221]
46 This case is useful when we are sure that a group of samples are from one source. [sent-89, score-0.099]
47 For more general cases, where exact inference is computationally prohibitive, we propose to use Gibbs sampling [6] to estimate the posterior probability. [sent-90, score-0.136]
48 (b) (a) Figure 2: (a) Overlapping pairwise relations; (b) Non-overlapping pairwise relations. [sent-91, score-0.474]
49 In Gibbs sampling, we estimate P (zi |X, Θ, G) as a sample mean P (zi = k|X, Θ, G) = E(δ(zi , k)|X, Θ, G) ≈ 1 S S (t) δ(zi , k) t=1 where the sum is over a sequence of S samples from P (Z|X, Θ, G) generated by the Gibbs MCMC. [sent-92, score-0.135]
50 , zN −1 , X, G, Θ) For pairwise relations it is helpful to introduce some notation. [sent-102, score-0.491]
51 Let Z−i denote an assignment of data points to clusters that leaves out the assignment of xi . [sent-103, score-0.397]
52 Let U (i) be the indices of the set of samples that participate in a pairwise relation with sample xi , p U (i) = {j : Wij = 0}. [sent-104, score-0.468]
53 P (zi |Z−i , X, Θ, W p ) ∝ P (xi , zi |Θ) (5) j∈U (i) When W p is sparse, the size of U (i) is small, thus calculating P (zi |Z−i , X, Θ, W p ) is very cheap and Gibbs sampling can effectively estimate the posterior probability. [sent-106, score-0.416]
54 1 Clustering with different number of hard pairwise constraints In this experiment, we demonstrate how the number of pairwise relations affects the performance of clustering. [sent-108, score-0.852]
55 Iris data set has 150 samples and three classes, 50 samples in each class; Waveform data set has 5000 samples and three classes, 33% samples in each class; Pendigits data set includes four classes (digits 0,6,8,9), each with 750 samples. [sent-110, score-0.511]
56 All data sets have labels for all samples, which are used to generate the relations and to evaluate performance. [sent-111, score-0.279]
57 We try PPC (with component number same as the number of classes) with various number of pairwise relations. [sent-112, score-0.237]
58 For each relations number, we conduct 100 runs and calculate the averaged classification accuracy. [sent-113, score-0.28]
59 The pairwise relations are generated as follows: we randomly pick two samples from the training set without replacement and check their labels. [sent-115, score-0.691]
60 Note the generated pairwise relations are non-overlapping, as described in section 2. [sent-117, score-0.491]
61 3 indicates, PPC can consistently improve its clustering accuracy on the training set when more pairwise constraints are added; also, the effect brought by constraints generalizes to the test set. [sent-123, score-0.684]
62 7 0 200 400 600 800 1000 Number of relations 1200 (c) on Pendigits data Figure 3: The performance of PPC with various number of relations 3. [sent-141, score-0.533]
63 2 Hard pairwise constraints for encoding partial label The experiment in this subsection shows the application of pairwise constraints on partially labeled data. [sent-142, score-0.638]
64 The samples are partially labeled in the sense that we are told which class-set a sample is from, but not which specific class it is from. [sent-148, score-0.181]
65 We can logically derive a do-not-link constraint between any pair of samples known to belong to different class-sets, while no link constraint can be derived if each class-set has more than one class in it. [sent-149, score-0.162]
66 4 (a) is a 120x400 region from Greenland ice sheet from NASA Langley DAAC. [sent-151, score-0.073]
67 This region is partially labeled into snow area and non-snow area, as indicated in Fig. [sent-152, score-0.273]
68 The snow area can be ice, melting snow or dry snow, while the non-snow area can be bare land, water or cloud. [sent-154, score-0.498]
69 To segment the image, we first divide the image into 5x5x7 blocks (175 dim vectors). [sent-156, score-0.139]
70 For PPC, we use half of data samples for training set and the rest for test. [sent-158, score-0.154]
71 Hard do-not-link constraints (only on training set) are generated as follows: for each block in the non-snow area, we randomly choose (without replacement) six blocks from the snow area to build do-not-link constraints. [sent-159, score-0.467]
72 By doing this, we achieve cliques with size seven (1 non-snow block + 6 snow blocks). [sent-160, score-0.325]
73 1, we apply the model fitted with PPC to test set and combine the clustering results on both data sets into a complete picture. [sent-162, score-0.277]
74 A typical clustering result of 3-component standard GMM and 3-component PPC are shown as Fig. [sent-163, score-0.215]
75 4, standard GMM gives a clustering that is clearly in disagreement with the human labeling in Fig. [sent-166, score-0.215]
76 The PPC segmentation makes far fewer mis-assignments of snow areas (tagged white and gray) to non-snow (black) than does the GMM. [sent-168, score-0.312]
77 The PPC segmentation properly labels almost all of the non-snow regions as non-snow. [sent-169, score-0.081]
78 Furthermore, the segmentation of the snow areas into the two classes (not labeled) tagged white and gray in Fig. [sent-170, score-0.39]
79 4 (d) reflects subtle differences in the snow regions captured by the gray-scale image from spectral channel 2, as shown in Fig. [sent-171, score-0.304]
80 Figure 4: (a) Gray-scale image from the first spectral channel 2. [sent-173, score-0.1]
81 (b) Partial label given by expert, black pixels denote non-snow area and white pixels denote snow area. [sent-174, score-0.276]
82 (c) and (d) are colored according to image blocks’ assignment. [sent-176, score-0.074]
83 3 Soft pairwise preferences for texture image segmentation In this subsection, we propose an unsupervised texture image segmentation algorithm as an application of PPC model. [sent-178, score-0.893]
84 2, the image is divided into blocks and rearranged into feature vectors. [sent-180, score-0.19]
85 However, standard GMM often fails to give a good segmentation because it cannot make use of the spatial continuity of image, which is essential in many image segmentation models, such as random field [7]. [sent-182, score-0.281]
86 In our algorithm, the spatial continuity is incorporated as the soft link preferences with uniform weight between each block and its neighbors. [sent-183, score-0.466]
87 The complete data likelihood is P (X, Z|Θ, W p ) = 1 P (X, Z|Θ) K exp(w δ(zi , zj )), (6) i j∈U (i) where U (i) means the neighbors of the ith block. [sent-184, score-0.256]
88 The EM algorithm can be roughly interpreted as iterating on two steps: 1) estimating the texture description (parameters of mixture model) based on segmentation, and 2) segmenting the image based on the texture description given by step 1. [sent-185, score-0.246]
89 Gibbs sampling is used to estimate the posterior probability in each EM iteration. [sent-186, score-0.136]
90 Equation (5) is reduced to P (zi |Z−i , X, Θ, W p ) ∝ P (xi , zi |Θ) exp(2w δ(zi , zj )). [sent-187, score-0.411]
91 This image is divided into 7x7 blocks and then rearranged to 49-dim vectors. [sent-190, score-0.19]
92 For PPC model, the soft links 1 Downloaded from http://sipi. [sent-192, score-0.129]
93 A typical clustering result of 4-component standard GMM and 4-component PPC with w = 2 are shown in Fig. [sent-197, score-0.215]
94 Obviously, PPC achieves a better segmentation after incorporating spatial continuity. [sent-200, score-0.081]
95 (c) and (d) are shaded according to the blocks assignments to clusters. [sent-204, score-0.166]
96 4 Conclusion and Discussion We have proposed a probabilistic clustering model that incorporates prior knowledge in the form of pairwise relations between samples. [sent-205, score-0.83]
97 Unlike previous work in semi-supervised clustering, this work formulates clustering preferences as a Bayesian prior over the assignment of data points to clusters, and so naturally accommodates both hard constraints and soft preferences. [sent-206, score-0.939]
98 Experiments on different data sets show that pairwise relations can consistently improve the performance of the clustering process. [sent-208, score-0.798]
99 From instance Level to space-level constraints: making the most of prior knowledge in data clustering. [sent-228, score-0.123]
100 A multiscale random field model for Bayesian image segmentation. [sent-250, score-0.074]
wordName wordTfidf (topN-words)
[('ppc', 0.513), ('gmm', 0.265), ('relations', 0.254), ('zi', 0.254), ('pairwise', 0.237), ('clustering', 0.215), ('preferences', 0.212), ('snow', 0.204), ('wij', 0.185), ('zj', 0.157), ('clique', 0.119), ('cluster', 0.119), ('assignment', 0.119), ('zn', 0.102), ('posterior', 0.102), ('assignments', 0.101), ('samples', 0.099), ('xj', 0.087), ('gibbs', 0.084), ('soft', 0.081), ('segmentation', 0.081), ('zt', 0.08), ('image', 0.074), ('hard', 0.073), ('prior', 0.071), ('texture', 0.067), ('blocks', 0.065), ('link', 0.063), ('xi', 0.06), ('ice', 0.051), ('pendigits', 0.051), ('rearranged', 0.051), ('relevancy', 0.051), ('xt', 0.051), ('cliques', 0.051), ('constraints', 0.051), ('clusters', 0.049), ('classification', 0.049), ('links', 0.048), ('assigned', 0.047), ('em', 0.046), ('continuity', 0.045), ('area', 0.045), ('nasa', 0.045), ('accommodates', 0.045), ('circumstances', 0.044), ('block', 0.043), ('nineteenth', 0.041), ('consistently', 0.041), ('classes', 0.04), ('pick', 0.04), ('mixture', 0.038), ('tagged', 0.038), ('subsection', 0.038), ('shental', 0.038), ('likelihood', 0.037), ('complete', 0.037), ('sample', 0.036), ('waveform', 0.036), ('participate', 0.036), ('iris', 0.036), ('expressed', 0.035), ('sampling', 0.034), ('brought', 0.033), ('exp', 0.032), ('perceived', 0.031), ('replacement', 0.031), ('training', 0.03), ('six', 0.029), ('incorporate', 0.028), ('seven', 0.027), ('knowledge', 0.027), ('white', 0.027), ('ects', 0.026), ('channel', 0.026), ('incorporates', 0.026), ('calculating', 0.026), ('familiar', 0.026), ('items', 0.026), ('remainder', 0.026), ('improve', 0.026), ('averaged', 0.026), ('points', 0.025), ('data', 0.025), ('tted', 0.025), ('penalized', 0.024), ('prefer', 0.024), ('labeled', 0.024), ('arg', 0.023), ('weighting', 0.023), ('kmeans', 0.022), ('algorithmically', 0.022), ('sheet', 0.022), ('brodatz', 0.022), ('rogers', 0.022), ('closure', 0.022), ('hoping', 0.022), ('formulates', 0.022), ('told', 0.022), ('incorporated', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
2 0.18514213 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
3 0.14699851 115 nips-2004-Maximum Margin Clustering
Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans
Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
4 0.12300111 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky
Abstract: We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. 1
5 0.12078442 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
Author: Scott J. Gaffney, Padhraic Smyth
Abstract: Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.
6 0.12001093 77 nips-2004-Hierarchical Clustering of a Mixture Model
7 0.1093149 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
8 0.10830852 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
9 0.10203286 161 nips-2004-Self-Tuning Spectral Clustering
10 0.084933594 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification
11 0.084255628 103 nips-2004-Limits of Spectral Clustering
12 0.075550213 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
13 0.075304829 145 nips-2004-Parametric Embedding for Class Visualization
14 0.071271442 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
15 0.069504127 164 nips-2004-Semi-supervised Learning by Entropy Minimization
16 0.068907 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data
17 0.067378655 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
18 0.063302457 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes
19 0.062713325 87 nips-2004-Integrating Topics and Syntax
20 0.062462859 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
topicId topicWeight
[(0, -0.216), (1, 0.043), (2, -0.055), (3, -0.11), (4, 0.054), (5, -0.032), (6, -0.147), (7, 0.129), (8, -0.151), (9, 0.002), (10, -0.05), (11, 0.209), (12, -0.082), (13, -0.073), (14, 0.097), (15, 0.004), (16, -0.071), (17, 0.047), (18, -0.045), (19, -0.043), (20, -0.033), (21, -0.01), (22, 0.074), (23, -0.005), (24, 0.058), (25, -0.123), (26, 0.04), (27, 0.062), (28, 0.033), (29, -0.036), (30, 0.084), (31, 0.134), (32, -0.012), (33, -0.092), (34, 0.059), (35, 0.016), (36, -0.111), (37, -0.078), (38, 0.083), (39, 0.021), (40, -0.026), (41, 0.07), (42, 0.117), (43, -0.078), (44, 0.038), (45, 0.168), (46, 0.018), (47, 0.063), (48, -0.087), (49, -0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.96185517 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
2 0.66508609 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
3 0.57328969 115 nips-2004-Maximum Margin Clustering
Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans
Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
4 0.57269764 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
Author: Massimiliano Pavan, Marcello Pelillo
Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1
5 0.55712473 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
Author: Scott J. Gaffney, Padhraic Smyth
Abstract: Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.
6 0.52943444 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
7 0.52941573 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
8 0.50122541 158 nips-2004-Sampling Methods for Unsupervised Learning
9 0.48393387 77 nips-2004-Hierarchical Clustering of a Mixture Model
10 0.44869533 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
11 0.44435772 161 nips-2004-Self-Tuning Spectral Clustering
12 0.39432153 103 nips-2004-Limits of Spectral Clustering
13 0.38055557 43 nips-2004-Conditional Models of Identity Uncertainty with Application to Noun Coreference
14 0.37777901 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
15 0.37270841 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
16 0.36827323 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
17 0.36638466 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection
18 0.35783339 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
19 0.34206274 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
20 0.32991791 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization
topicId topicWeight
[(13, 0.083), (15, 0.2), (26, 0.053), (27, 0.108), (31, 0.037), (32, 0.033), (33, 0.214), (35, 0.042), (39, 0.014), (50, 0.033), (68, 0.06), (93, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.9470908 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
2 0.93707353 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data
Author: Chiranjib Bhattacharyya, Pannagadatta K. Shivaswamy, Alex J. Smola
Abstract: We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi , yi ) a distribution over (xi , yi ) is specified. In particular, we derive a robust formulation when the distribution is given by a normal distribution. It leads to Second Order Cone Programming formulation. Our method is applied to the problem of missing data, where it outperforms direct imputation. 1
3 0.92618191 44 nips-2004-Conditional Random Fields for Object Recognition
Author: Ariadna Quattoni, Michael Collins, Trevor Darrell
Abstract: We present a discriminative part-based approach for the recognition of object classes from unsegmented cluttered scenes. Objects are modeled as flexible constellations of parts conditioned on local observations found by an interest operator. For each object class the probability of a given assignment of parts to local features is modeled by a Conditional Random Field (CRF). We propose an extension of the CRF framework that incorporates hidden variables and combines class conditional CRFs into a unified framework for part-based object recognition. The parameters of the CRF are estimated in a maximum likelihood framework and recognition proceeds by finding the most likely class under our model. The main advantage of the proposed CRF framework is that it allows us to relax the assumption of conditional independence of the observed data (i.e. local features) often used in generative approaches, an assumption that might be too restrictive for a considerable number of object classes.
4 0.91258764 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth
Abstract: We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements.
5 0.91162282 178 nips-2004-Support Vector Classification with Input Data Uncertainty
Author: Jinbo Bi, Tong Zhang
Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1
6 0.91078365 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
7 0.90851295 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
8 0.90844208 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
9 0.90536141 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
10 0.90437955 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
11 0.9042933 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification
12 0.90333074 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
13 0.90117061 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
14 0.90009385 131 nips-2004-Non-Local Manifold Tangent Learning
15 0.89967096 168 nips-2004-Semigroup Kernels on Finite Sets
16 0.89882308 127 nips-2004-Neighbourhood Components Analysis
17 0.89879251 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill
18 0.89738643 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
19 0.89615119 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
20 0.89613771 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images