nips nips2004 nips2004-115 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans
Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Maximum Margin Clustering Linli Xu∗ † James Neufeld† Bryce Larson† ∗ University of Waterloo † University of Alberta Dale Schuurmans† Abstract We propose a new method for clustering based on finding maximum margin hyperplanes through data. [sent-1, score-0.887]
2 By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. [sent-2, score-0.387]
3 Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. [sent-3, score-0.113]
4 Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. [sent-4, score-0.739]
5 Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. [sent-5, score-1.35]
6 The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. [sent-6, score-0.079]
7 By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. [sent-7, score-0.659]
8 Nevertheless, it has received a significant amount of renewed attention with the advent of nonlinear clustering methods based on kernels. [sent-9, score-0.381]
9 Kernel based clustering methods continue to have a significant impact on recent work in machine learning [14, 13], computer vision [16], and bioinformatics [9]. [sent-10, score-0.381]
10 In this paper, our primary focus will be on the final partitioning step where the actual clustering occurs. [sent-12, score-0.409]
11 Once the data has been preprocessed and a kernel matrix has been constructed (and its rank possibly reduced), many variants have been suggested in the literature for determining the final partitioning of the data. [sent-13, score-0.207]
12 The predominant strategies include using k-means clustering [14], minimizing various forms of graph cut cost [13] (relaxations of which amount to clustering based on eigenvectors [17]), and finding strongly connected components in a Markov chain defined by the normalized kernel [4]. [sent-14, score-0.953]
13 Some other recent alternatives are correlation clustering [12] and support vector clustering [1]. [sent-15, score-0.789]
14 What we believe is missing from this previous work however, is a simple connection to other types of machine learning, such as semisupervised and supervised learning. [sent-16, score-0.376]
15 For example, a useful goal for any clustering technique would be to find a way to integrate it seamlessly with a supervised learning technique, to obtain a principled form of semisupervised learning. [sent-18, score-0.829]
16 A good example of this is [18], which proposes a general random field model based on a given kernel matrix. [sent-19, score-0.085]
17 They then find a soft cluster assignment on unlabeled data that minimizes a joint loss with observed labels on supervised training data. [sent-20, score-0.577]
18 Unfortunately, this technique actually requires labeled data to cluster the unlabeled data. [sent-21, score-0.405]
19 Our goal in this paper is to investigate another standard machine learning principle— maximum margin classification—and modify it for clustering, with the goal of achieving a simple, unified way of solving a variety of problems, including clustering and semisupervised learning. [sent-23, score-1.184]
20 Although one might be skeptical that clustering based on large margin discriminants can perform well, we will see below that, combined with kernels, this strategy can often be more effective than conventional spectral clustering. [sent-24, score-0.88]
21 Perhaps more significantly, it also immediately suggests a simple semisupervised training technique for support vector machines (SVMs) that appears to improve the state of the art. [sent-25, score-0.485]
22 After establishing the preliminary ideas and notation in Section 2, we tackle the problem of computing a maximum margin clustering for a given kernel matrix in Section 3. [sent-27, score-1.043]
23 Although it is not obvious that this problem can be solved efficiently, we show that the optimal clustering problem can in fact be formulated as a convex integer program. [sent-28, score-0.597]
24 We then propose a relaxation of this problem which yields a semidefinite program that can be used to efficiently compute a soft clustering. [sent-29, score-0.254]
25 Then, in Section 5 we extend our approach to semisupervised learning by incorporating additional labeled training data in a seamless way. [sent-31, score-0.493]
26 We then present experimental results for semisupervised learning in Section 6 and conclude. [sent-32, score-0.334]
27 2 Preliminaries Since our main clustering idea is based on finding maximum margin separating hyperplanes, we first need to establish the background ideas from SVMs as well as establish the notation we will use. [sent-33, score-0.994]
28 For SVM training, we assume we are given labeled training examples (x1 , y 1 ), . [sent-34, score-0.131]
29 It is easy to show that this same w ∗ , b∗ is a solution to the quadratic program γ ∗ −2 = min w,b w 2 subject to y i (w φ(xi ) + b) ≥ 1, ∀N i=1 (2) Importantly, the minimum value of this quadratic program, γ ∗ −2 , is just the inverse square of the optimal solution value γ ∗ to (1) [10]. [sent-39, score-0.313]
30 Note that (3) is derived from the standard dual SVM by using the fact that λ (K ◦ yy )λ = K ◦ yy , λλ = K ◦ λλ , yy . [sent-48, score-1.158]
31 To summarize: for supervised maximum margin training, one takes a given set of labeled training data (x1 , y 1 ), . [sent-49, score-0.67]
32 , (xN , y N ), forms the kernel matrix K on data inputs, forms the kernel matrix yy on target outputs, sets the slack parameter C, and solves the quadratic program (3) to obtain the dual solution λ∗ and the inverse square maximum margin value γ ∗ −2 . [sent-52, score-1.497]
33 Of course, our main interest initially is not to find a large margin classifier given labels on the data, but instead to find a labeling that results in a large margin classifier. [sent-54, score-0.856]
34 3 Maximum margin clustering The clustering principle we investigate is to find a labeling so that if one were to subsequently run an SVM, the margin obtained would be maximal over all possible labellings. [sent-55, score-1.567]
35 , xN , we wish to assign the data points to two classes y i ∈ {−1, +1} so that the separation between the two classes is as wide as possible. [sent-58, score-0.116]
36 However, with some reformulation we can express it as a convex integer program, which suggests that there might be some hope of obtaining practical solutions. [sent-60, score-0.216]
37 However, more usefully, we can relax the integer constraint to obtain a semidefinite program that yields soft cluster assignments which approximately maximize the margin. [sent-61, score-0.51]
38 Therefore, one can obtain soft clusterings efficiently using widely available software. [sent-62, score-0.176]
39 However, before proceeding with the main development, there are some preliminary issues we need to address. [sent-63, score-0.07]
40 First, we clearly need to impose some sort of constraint on the class balance, since otherwise one could simply assign all the data points to the same class and obtain an unbounded margin. [sent-64, score-0.251]
41 Thus, to mitigate these effects we will impose a constraint that the difference in class sizes be bounded. [sent-66, score-0.178]
42 This will turn out to be a natural constraint for semisupervised learning and is very easy to enforce. [sent-67, score-0.429]
43 Second, we would like the clustering to behave gracefully on noisy data where the classes may in fact overlap, so we adopt the soft margin formulation of the maximum margin criterion. [sent-68, score-1.471]
44 Third, although it is indeed possible to extend our approach to the multiclass case [5], the extension is not simple and for ease of presentation we focus on simple two class clustering in this paper. [sent-69, score-0.485]
45 Finally, there is a small technical complication that arises with one of the SVM parameters: It turns out that an unfortunate nonconvexity problem arises when we include the use of the offset b in the underlying large margin classifier. [sent-70, score-0.45]
46 The consequence of this restriction is that the constraint λ y = 0 is removed from the dual SVM quadratic program (3). [sent-72, score-0.3]
47 We wish to solve for a labeling y ∈ {−1, +1}N that leads to a maximum (soft) margin. [sent-76, score-0.164]
48 In fact, to obtain an efficient technique for solving this problem we need two key insights. [sent-78, score-0.072]
49 The first key step is to re-express this optimization, not directly in terms of the cluster labels y, but instead in terms of the label kernel matrix M = yy . [sent-79, score-0.604]
50 Unfortunately, even though we can pose a convex objective, it does not allow us to immediately solve our problem because we still have to relate M to y, and M = yy is not a convex constraint. [sent-82, score-0.623]
51 Thus, the main challenge is to find a way to constrain M to ensure M = yy while respecting the class balance constraints − ≤ e y ≤ . [sent-83, score-0.555]
52 One obvious way to enforce M = yy would be to impose the constraint that rank(M ) = 1, since combined with M ∈ {−1, +1}N ×N this forces M to have a decomposition yy for some y ∈ {−1, +1}N . [sent-84, score-0.971]
53 Unfortunately, rank(M ) = 1 is not a convex constraint on M [7]. [sent-85, score-0.181]
54 Our second key idea is to realize that one can indirectly enforce the desired relationship M = yy by imposing a different set of linear constraints on M . [sent-86, score-0.611]
55 To do so, notice that any such M must encode an equivalence relation over the training points. [sent-87, score-0.223]
56 1 Finally, we can enforce the class balance constraint − ≤ e y ≤ by imposing the additional set of linear constraints: 1 Interestingly, for M ∈ {−1, +1}N ×N the first two sets of linear constraints can be replaced by the compact set of convex constraints diag(M ) = e, M 0 [7, 11]. [sent-90, score-0.567]
57 However, when we relax the integer constraint below, this equivalence is no longer true and we realize some benefit in keeping the linear equivalence relation constraints. [sent-91, score-0.553]
58 The combination of these two steps leads to our first main result: One can solve for a hard clustering y that maximizes the soft margin by solving a convex integer program. [sent-93, score-1.226]
59 To accomplish this, one first solves for the equivalence relation matrix M in min M ∈{−1,+1}N ×N max 2λ e − K ◦ λλ , M subject to 0 ≤ λ ≤ C, L1 , L2 , L4 λ (5) Then, from the solution M ∗ recover the optimal cluster assignment y ∗ simply by setting y∗ to any column vector in M ∗ . [sent-94, score-0.519]
60 Unfortunately, the formulation (5) is still not practical because convex integer programming is still a hard computational problem. [sent-95, score-0.339]
61 2 4 Experimental results We implemented the maximum margin clustering algorithm based on the semidefinite programming formulation (7), using the SeDuMi library, and tested it on various data sets. [sent-97, score-1.005]
62 In these experiments we compared the performance of our maximum margin clustering technique to the spectral clustering method of [14] as well as straightforward k-means clustering. [sent-98, score-1.434]
63 Both maximum margin clustering and spectral clustering were run with the same radial basis function kernel and matching width parameters. [sent-99, score-1.447]
64 In fact, in each case, we chose the best width parameter for spectral clustering by searching over a small set of five widths related to the scale of the problem. [sent-100, score-0.512]
65 In addition, the slack parameter for maximum margin clustering was simply set to an arbitrary value. [sent-101, score-0.906]
66 Table 1 shows that for the first three sets of data (Gaussians, Circles, AI) maximum margin and spectral clustering obtained identical small error rates, which were in turn significantly 2 One could also employ randomized rounding to choose a hard class assignment y. [sent-104, score-1.161]
67 It turns out that the slack parameter C did not have a significant effect on any of our preliminary investigations, so we just set it to C = 100 for all of the experiments reported here. [sent-105, score-0.088]
68 However, maximum margin clustering demonstrates a substantial advantage on the fourth data set (Joined Circles) over both spectral and k-means clustering. [sent-107, score-1.009]
69 We also conducted clustering experiments on the real data sets, two of which are depicted in Figures 2 and 3: a database of images of handwritten digits of twos and threes (Figure 2), and a database of face images of two people (Figure 3). [sent-108, score-0.606]
70 The last two columns of Table 1 show that maximum margin clustering obtains a slight advantage on the handwritten digits data, and a significant advantage on the faces data. [sent-109, score-0.996]
71 5 Semi-supervised learning Although the clustering results are reasonable, we have an additional goal of adapting the maximum margin approach to semisupervised learning. [sent-110, score-1.184]
72 In this case, we assume we are given a labeled training set (x1 , y 1 ), . [sent-111, score-0.131]
73 , (xn , y n ) as well as an unlabeled training set xn+1 , . [sent-114, score-0.212]
74 In the context of large margin classifiers, many techniques have been proposed for incorporating unlabeled data in an SVM, most of which are intuitively based on ensuring that large margins are also preserved on the unlabeled training data [8, 2], just as in our case. [sent-118, score-0.825]
75 However, none of these previous proposals have formulated a convex optimization procedure that was guaranteed to directly maximize the margin, as we propose in Section 3. [sent-119, score-0.113]
76 For our procedure, extending the maximum margin clustering approach of Section 3 to semisupervised training is easy: We simply add constraints on the matrix M to force it to respect the observed equivalence relations among the labeled training data. [sent-120, score-1.595]
77 In addition, we impose the constraint that each unlabeled example belongs to the same class as at least one labeled training example. [sent-121, score-0.469]
78 These conditions can be enforced with the simple set of additional linear constraints S1 : mij = yi yj for labeled examples i, j ∈ {1, . [sent-122, score-0.338]
79 , n} S2 : n i=1 mij ≥ 2 − n for unlabeled examples j ∈ {n + 1, . [sent-125, score-0.318]
80 , N } Note that the observed training labels yi for i ∈ {1, . [sent-128, score-0.1]
81 The resulting training procedure is similar to that of [6], with the addition of the constraints L1 –L4 , S2 which enforce two classes and facilitate the ability to perform clustering on the unlabeled examples. [sent-132, score-0.805]
82 6 Experimental results We tested our approach to semisupervised learning on various two class data sets from the UCI repository. [sent-133, score-0.471]
83 We compared the performance of our technique to the semisupervised SVM technique of [8]. [sent-134, score-0.478]
84 That is, we split the data into a labeled and unlabeled part, held out the labels of the unlabeled portion, trained the semisupervised techniques, reclassified the unlabeled examples using the learned results, and measured the misclassification error on the held out labels. [sent-136, score-1.069]
85 Here we see that the maximum margin approach based on semidefinite programming can often outperform the approach of [8]. [sent-137, score-0.513]
86 Table 2 shows that our maximum margin method is effective at exploiting unlabeled data to improve the prediction of held out labels. [sent-138, score-0.707]
87 In every case, it significantly reduces the error of plain SVM, and obtains the best overall performance of the semisupervised learning techniques we have investigated. [sent-139, score-0.41]
88 8 Figure 1: Four artificial data sets used in the clustering experiments. [sent-185, score-0.437]
89 The points and stars show the two classes discovered by maximum margin clustering. [sent-187, score-0.552]
90 Figure 2: A sampling of the handwritten digits (twos and threes). [sent-188, score-0.099]
91 Each row shows a random sampling of images from a cluster discovered by maximum margin clustering. [sent-189, score-0.574]
92 Maximum margin made very few misclassifications on this data set, as shown in Table 1. [sent-190, score-0.396]
93 Each row shows a random sampling of images from a cluster discovered by maximum margin clustering. [sent-192, score-0.574]
94 Maximum margin made no misclassifications on this data set, as shown in Table 1. [sent-193, score-0.396]
95 4 Table 1: Percentage misclassification errors of the various clustering algorithms on the various data sets. [sent-199, score-0.481]
96 44 Table 2: Percentage misclassification errors of the various semisupervised learning algorithms on the various data sets. [sent-222, score-0.434]
97 7 Conclusion We have proposed a simple, unified principle for clustering and semisupervised learning based on the maximum margin principle popularized by supervised SVMs. [sent-225, score-1.296]
98 The results on both clustering and semisupervised learning are competitive with, and sometimes exceed the state of the art. [sent-227, score-0.715]
99 Overall, margin maximization appears to be an effective way to achieve a unified approach to these different learning problems. [sent-228, score-0.368]
100 For future work we plan to address the restrictions of the current method, including the ommission of an offset b and the restriction to two class problems. [sent-229, score-0.127]
wordName wordTfidf (topN-words)
[('clustering', 0.381), ('margin', 0.368), ('yy', 0.368), ('semisupervised', 0.334), ('unlabeled', 0.16), ('mij', 0.158), ('semide', 0.146), ('soft', 0.134), ('spectral', 0.131), ('equivalence', 0.125), ('convex', 0.113), ('integer', 0.103), ('enforce', 0.102), ('maximum', 0.101), ('svm', 0.101), ('misclassi', 0.1), ('program', 0.087), ('kernel', 0.085), ('labeled', 0.079), ('technique', 0.072), ('subject', 0.069), ('constraint', 0.068), ('constraints', 0.066), ('cluster', 0.066), ('uci', 0.065), ('impose', 0.065), ('xn', 0.06), ('multiclass', 0.059), ('hwd', 0.056), ('nonconvexity', 0.056), ('slack', 0.056), ('restriction', 0.056), ('digits', 0.056), ('dual', 0.054), ('training', 0.052), ('circles', 0.052), ('relax', 0.052), ('held', 0.05), ('mik', 0.049), ('threes', 0.049), ('twos', 0.049), ('labels', 0.048), ('assignment', 0.047), ('formulation', 0.047), ('obtains', 0.047), ('relation', 0.046), ('uni', 0.045), ('tsvm', 0.045), ('joined', 0.045), ('ijk', 0.045), ('mjk', 0.045), ('class', 0.045), ('programming', 0.044), ('unfortunately', 0.044), ('nite', 0.044), ('classes', 0.044), ('handwritten', 0.043), ('supervised', 0.042), ('alberta', 0.042), ('clusterings', 0.042), ('imposing', 0.041), ('cations', 0.04), ('tackle', 0.039), ('discovered', 0.039), ('balance', 0.038), ('separating', 0.038), ('main', 0.038), ('hyperplanes', 0.037), ('forms', 0.037), ('matrix', 0.037), ('various', 0.036), ('quadratic', 0.035), ('yj', 0.035), ('principle', 0.035), ('realize', 0.034), ('establish', 0.034), ('min', 0.034), ('recover', 0.034), ('labeling', 0.034), ('nevertheless', 0.034), ('table', 0.033), ('relaxation', 0.033), ('cut', 0.033), ('max', 0.033), ('preliminary', 0.032), ('hard', 0.032), ('classi', 0.03), ('cristianini', 0.029), ('solve', 0.029), ('techniques', 0.029), ('rank', 0.029), ('maximizes', 0.028), ('data', 0.028), ('solves', 0.028), ('partitioning', 0.028), ('sets', 0.028), ('support', 0.027), ('easy', 0.027), ('offset', 0.026), ('inverse', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 115 nips-2004-Maximum Margin Clustering
Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans
Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
2 0.20933507 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
Author: Saharon Rosset, Robert Tibshirani, Ji Zhu, Trevor J. Hastie
Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. 1
3 0.20421039 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
4 0.17047203 161 nips-2004-Self-Tuning Spectral Clustering
Author: Lihi Zelnik-manor, Pietro Perona
Abstract: We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local’ scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated. 1
5 0.16345897 103 nips-2004-Limits of Spectral Clustering
Author: Ulrike V. Luxburg, Olivier Bousquet, Mikhail Belkin
Abstract: An important aspect of clustering algorithms is whether the partitions constructed on finite samples converge to a useful clustering of the whole data space as the sample size increases. This paper investigates this question for normalized and unnormalized versions of the popular spectral clustering algorithm. Surprisingly, the convergence of unnormalized spectral clustering is more difficult to handle than the normalized case. Even though recently some first results on the convergence of normalized spectral clustering have been obtained, for the unnormalized case we have to develop a completely new approach combining tools from numerical integration, spectral and perturbation theory, and probability. It turns out that while in the normalized case, spectral clustering usually converges to a nice partition of the data space, in the unnormalized case the same only holds under strong additional assumptions which are not always satisfied. We conclude that our analysis gives strong evidence for the superiority of normalized spectral clustering. It also provides a basis for future exploration of other Laplacian-based methods. 1
6 0.15740164 164 nips-2004-Semi-supervised Learning by Entropy Minimization
7 0.14699851 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
8 0.13367914 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
9 0.13224824 77 nips-2004-Hierarchical Clustering of a Mixture Model
10 0.11374725 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
11 0.11349239 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data
12 0.11229253 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
13 0.11046686 34 nips-2004-Breaking SVM Complexity with Cross-Training
14 0.10965475 92 nips-2004-Kernel Methods for Implicit Surface Modeling
15 0.10326746 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
16 0.10198853 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
17 0.1005161 178 nips-2004-Support Vector Classification with Input Data Uncertainty
18 0.099710651 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
19 0.09826912 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
20 0.096394725 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization
topicId topicWeight
[(0, -0.293), (1, 0.168), (2, -0.099), (3, 0.085), (4, -0.075), (5, -0.05), (6, -0.142), (7, 0.043), (8, -0.166), (9, 0.019), (10, -0.105), (11, 0.182), (12, -0.151), (13, -0.03), (14, 0.036), (15, -0.043), (16, -0.06), (17, -0.048), (18, -0.139), (19, -0.007), (20, 0.063), (21, 0.026), (22, 0.048), (23, 0.033), (24, 0.003), (25, -0.048), (26, -0.013), (27, 0.11), (28, 0.062), (29, -0.085), (30, 0.164), (31, 0.066), (32, 0.067), (33, -0.072), (34, -0.015), (35, -0.037), (36, 0.032), (37, 0.026), (38, 0.098), (39, 0.034), (40, -0.003), (41, -0.03), (42, -0.007), (43, 0.028), (44, 0.018), (45, -0.068), (46, -0.014), (47, -0.044), (48, -0.058), (49, -0.096)]
simIndex simValue paperId paperTitle
same-paper 1 0.97713548 115 nips-2004-Maximum Margin Clustering
Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans
Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
2 0.63896942 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
3 0.6195274 161 nips-2004-Self-Tuning Spectral Clustering
Author: Lihi Zelnik-manor, Pietro Perona
Abstract: We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local’ scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated. 1
4 0.61738336 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
Author: Massimiliano Pavan, Marcello Pelillo
Abstract: Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image segmentation. They generalize the notion of a maximal clique to edgeweighted graphs and have intriguing, non-trivial connections to continuous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the computational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set offers a simple and efficient way of doing this. Numerical experiments on various grouping problems show the effectiveness of the approach. 1
5 0.61148512 103 nips-2004-Limits of Spectral Clustering
Author: Ulrike V. Luxburg, Olivier Bousquet, Mikhail Belkin
Abstract: An important aspect of clustering algorithms is whether the partitions constructed on finite samples converge to a useful clustering of the whole data space as the sample size increases. This paper investigates this question for normalized and unnormalized versions of the popular spectral clustering algorithm. Surprisingly, the convergence of unnormalized spectral clustering is more difficult to handle than the normalized case. Even though recently some first results on the convergence of normalized spectral clustering have been obtained, for the unnormalized case we have to develop a completely new approach combining tools from numerical integration, spectral and perturbation theory, and probability. It turns out that while in the normalized case, spectral clustering usually converges to a nice partition of the data space, in the unnormalized case the same only holds under strong additional assumptions which are not always satisfied. We conclude that our analysis gives strong evidence for the superiority of normalized spectral clustering. It also provides a basis for future exploration of other Laplacian-based methods. 1
6 0.54470676 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
7 0.54439384 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
8 0.53133214 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
9 0.52049971 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization
10 0.49309772 34 nips-2004-Breaking SVM Complexity with Cross-Training
11 0.49172843 164 nips-2004-Semi-supervised Learning by Entropy Minimization
12 0.48434117 77 nips-2004-Hierarchical Clustering of a Mixture Model
13 0.4815543 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
14 0.48054817 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations
15 0.48010996 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
16 0.47473085 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
17 0.47229162 144 nips-2004-Parallel Support Vector Machines: The Cascade SVM
18 0.47226596 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
19 0.44522235 166 nips-2004-Semi-supervised Learning via Gaussian Processes
20 0.43782938 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
topicId topicWeight
[(13, 0.058), (15, 0.114), (26, 0.042), (31, 0.012), (33, 0.649), (35, 0.019), (39, 0.014), (50, 0.025)]
simIndex simValue paperId paperTitle
1 0.99835014 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Author: Tanzeem Choudhury, Sumit Basu
Abstract: In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. This result suggests that our estimate of conversational influence is predictive of social influence. 1
2 0.99831867 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse
Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1
3 0.99823582 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
4 0.99802732 63 nips-2004-Expectation Consistent Free Energies for Approximate Inference
Author: Manfred Opper, Ole Winther
Abstract: We propose a novel a framework for deriving approximations for intractable probabilistic models. This framework is based on a free energy (negative log marginal likelihood) and can be seen as a generalization of adaptive TAP [1, 2, 3] and expectation propagation (EP) [4, 5]. The free energy is constructed from two approximating distributions which encode different aspects of the intractable model such a single node constraints and couplings and are by construction consistent on a chosen set of moments. We test the framework on a difficult benchmark problem with binary variables on fully connected graphs and 2D grid graphs. We find good performance using sets of moments which either specify factorized nodes or a spanning tree on the nodes (structured approximation). Surprisingly, the Bethe approximation gives very inferior results even on grids. 1
same-paper 5 0.99682373 115 nips-2004-Maximum Margin Clustering
Author: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans
Abstract: We propose a new method for clustering based on finding maximum margin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult computational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidefinite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods. The real benefit of our approach, however, is that it leads naturally to a semi-supervised training method for support vector machines. By maximizing the margin simultaneously on labeled and unlabeled training data, we achieve state of the art performance by using a single, integrated learning principle. 1
6 0.9965173 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces
7 0.99651545 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
8 0.99552095 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
9 0.92391104 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces
10 0.92300731 143 nips-2004-PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data
11 0.91989458 44 nips-2004-Conditional Random Fields for Object Recognition
12 0.91911334 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
13 0.91764605 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers
14 0.91695911 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
15 0.91450167 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
16 0.91161323 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
17 0.90705949 166 nips-2004-Semi-supervised Learning via Gaussian Processes
18 0.90570635 179 nips-2004-Surface Reconstruction using Learned Shape Models
19 0.90423489 38 nips-2004-Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms
20 0.90246403 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data