jmlr jmlr2007 jmlr2007-5 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jia Li, Surajit Ray, Bruce G. Lindsay
Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density
Reference: text
sentIndex sentText sentNum sentScore
1 Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. [sent-15, score-0.862]
2 Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density 1. [sent-25, score-0.8]
3 The merit function reflects the general belief about good clustering, that is, objects in the same cluster should be similar to each other while those in different clusters be as distinct as possible. [sent-45, score-0.657]
4 The clustering procedure involves first fitting a mixture model, usually by the EM algorithm, and then computing the posterior probability of each mixture component given a data point. [sent-51, score-0.579]
5 Furthermore, data are allowed to reveal a nonparametric distribution for each cluster as part of the clustering procedure. [sent-78, score-0.606]
6 Our new clustering algorithm groups data points into one cluster if they are associated with the same hilltop. [sent-84, score-0.571]
7 Specifically, every cluster is ensured to be associated with a hill, and every sample point in the cluster can be moved to the corresponding hilltop along an ascending path without crossing any “valley” that separates two hills. [sent-89, score-0.685]
8 Measures for the pairwise separability between clusters are proposed using ridgelines. [sent-102, score-0.586]
9 A cluster merging algorithm to enhance modal clustering is developed. [sent-103, score-0.927]
10 Comparisons are made with several other clustering algorithms such as linkage clustering, k-means, and Gaussian mixture modeling. [sent-105, score-0.587]
11 The MEM and REM algorithms, the cluster diagnosis tool, and cluster merging procedure built upon REM are unique to our work. [sent-113, score-0.782]
12 In addition, several measures of pairwise separability between clusters are defined, which lead to the derivation of a new cluster merging algorithm. [sent-148, score-1.077]
13 These density functions facilitate soft clustering as well as cluster assignment of samples outside the data set. [sent-234, score-0.658]
14 It is known in the literature of mixture modeling that if the density of a cluster is estimated using only points assigned to this cluster, the variance tends to be under estimated, although the effect on clustering may be small (Celeux and Govaert, 1993). [sent-238, score-0.821]
15 With g k (x) in (2) as the initial cluster density, compute the posterior of cluster k given each x i by pi,k ∝ |Ck | gk (x), n |G| k = 1, . [sent-244, score-0.692]
16 On the other hand, if the maximum a posteriori clustering based on the final gk (x) differs significantly from the result of ˜ modal clustering, this procedure may have defeated the very purpose of modal clustering and turned it into merely an initialization scheme. [sent-252, score-0.957]
17 At any bandwidth σl , the cluster representatives in Gl−1 obtained from the preceding bandwidth are input into MAC using the density f (x|S, σ2 ). [sent-265, score-0.571]
18 That is, the cluster of xi at level l is determined by its cluster representative in Gl−1 . [sent-294, score-0.68]
19 In linkage clustering, at every level, only the two clusters with the minimum pairwise distance are merged. [sent-296, score-0.584]
20 In HMAC, however, at any level, the merging of clusters is conducted globally and the effect of every original data point on clustering is retained through the density f (x|S, σ2 ). [sent-299, score-0.933]
21 Among the 20 different σ l ’s, only 6 of them result in clustering different from σl−1 , reflecting the fact that the bandwidth needs to increase by a sufficient amount to drive the merging of some existing cluster representatives. [sent-311, score-0.835]
22 Analysis of Cluster Separability via Ridgelines A measure for the separability between clusters is useful for gaining deeper understanding of clustering structures in data. [sent-345, score-0.818]
23 Although the separability measure is a diagnostic tool and the cluster merging method can adjust the number of clusters, in this paper, we do not pursue the problem of choosing the number of clusters fully automatically. [sent-352, score-1.049]
24 The separability measure we define here exploits the geometry of the density functions of two clusters in a comprehensive manner. [sent-354, score-0.645]
25 (b) The MEM ascending paths from the modes at level 2 (crosses) to the modes at level 3 (squares), and the contours of the density estimate at level 3. [sent-373, score-0.591]
26 1 Ridgeline The ridgeline between two clusters with density g1 (x) and g2 (x) is L = {x(α) : (1 − α)∇ log g1 (x) + α∇ log g2 (x) = 0, 0 ≤ α ≤ 1} . [sent-381, score-0.69]
27 The task of identifying insignificant clusters in Figure 1(c) is not particularly challenging because the two smallest clusters are “singletons” (containing only one sample). [sent-446, score-0.732]
28 The low separability of cluster 9 is caused by its proximity to cluster 1, the largest cluster which accounts for 60% of the data. [sent-455, score-1.065]
29 However, an enlarged bandwidth may cause prominent clusters to be clumped while leaving undesirable small “noisy” clusters unchanged. [sent-473, score-0.872]
30 Merging clusters according to the separability measure is one possible approach to eliminate “noisy” clusters without losing important clustering structures found at a small bandwidth. [sent-474, score-1.184]
31 We denote the pairwise separability between cluster zi and z j in short by Si, j , where Si, j = S(zi , z j ). [sent-480, score-0.581]
32 The main idea of the merging algorithm is to have clusters with a higher significance index absorb other clusters that are not well separated from them and are less dominant (lower significance index). [sent-487, score-0.983]
33 First, a cluster z i may be weakly separated from several other clusters with higher significance indices. [sent-489, score-0.682]
34 Second, two weakly separated clusters zi and z j may have the same significance indices, that is, δ(zi ) = δ(z j ); and hence it is ambiguous which cluster should be treated as the dominant one. [sent-491, score-0.752]
35 Clique: Cluster zi and z j are in the same clique if (a) zi and z j are tied, or (b) there is a cluster zk such that zi and zk are in the same clique, and z j and zk are in the same clique. [sent-501, score-0.585]
36 Since all the clusters in ci have equal significance indices, we let δ(ci ) = δ(zi ), where zi is any cluster included in ci . [sent-516, score-0.841]
37 We call the merging procedure conducted based on the separability measure stage I merging and that based on coverage rate stage II merging. [sent-555, score-0.711]
38 (a) The clustering result after merging clusters in the same cliques. [sent-634, score-0.826]
39 In a nutshell, linkage clustering forms clusters by progressively merging a pair of current clusters. [sent-640, score-1.016]
40 Our merging algorithm is a particular kind of linkage clustering where the elements to be clustered are cliques and the distance between the cliques is the separability. [sent-647, score-0.832]
41 We may call this linkage clustering 1702 N ONPARAMETRIC M ODAL C LUSTERING algorithm directional single linkage for reasons that will be self-evident shortly. [sent-650, score-0.667]
42 An alternative is to apply the directional single linkage clustering and stop merging when a desired number of clusters is achieved. [sent-656, score-1.043]
43 Figure 2(a) shows the clustering after merging clusters in the same clique. [sent-666, score-0.826]
44 To highlight the relationship between the merged clusters and the original clusters, the list of updated symbols for each original cluster is given in every scatter plot. [sent-668, score-0.717]
45 At ρ = 95%, only the cluster of size 1 is marked as an outlier cluster, and is merged with the cluster of size 6. [sent-674, score-0.674]
46 Because clusters with low separability are apt to be grouped together when the kernel bandwidth increases, it is not surprising for the clustering result obtained by the merging algorithm to agree with clustering at a higher level of HMAC. [sent-675, score-1.497]
47 On the other hand, examining separability and identifying outlier clusters enhance the robustness of clustering results, a valuable trait especially in high dimensions. [sent-676, score-0.85]
48 Instead, we can apply the merging algorithm to a relatively large number of clusters obtained at an early level and reduce the number of clusters to the desired value. [sent-679, score-1.005]
49 We start by comparing linkage clustering with mixture modeling using two data sets. [sent-797, score-0.653]
50 In average linkage, if cluster z 2 and z3 are merged into z4 , the distance between z1 and z4 is calculated as d(z1 , z4 ) = n2n2 3 d(z1 , z2 ) + +n n3 n2 +n3 d(z1 , z3 ), where n2 and n3 are the cluster sizes of z2 and z3 respectively. [sent-803, score-0.642]
51 In this example, the mixture model is initialized by the clustering obtained from k-means; and the covariance matrices of the two clusters are not restricted. [sent-818, score-0.763]
52 We apply HMAC to both data sets and show the clustering results obtained at the level where two clusters are formed. [sent-830, score-0.719]
53 The two clusters obtained by average linkage clustering and HMAC are identical to the original ones. [sent-834, score-0.816]
54 The above two data sets exemplify situations in which either the average linkage clustering or mixture modeling (or k-means) performs well but not both. [sent-840, score-0.653]
55 Mixture modeling favors elliptical clusters because of the Gaussian assumption, and k-means favors spherical clusters due to extra restrictions on Gaussian components. [sent-844, score-0.778]
56 The greedy pairwise merging in average linkage becomes rather arbitrary when clusters are not well separated. [sent-847, score-0.784]
57 Chipman and Tibshirani (2006) noted that bottom-up agglomerative clustering methods, such as average linkage, tend to generate good small clusters but suffer at extracting a few large clusters. [sent-866, score-0.626]
58 A hybrid approach is proposed in that paper to combine the advantages of bottom-up and topdown clustering, which first seeks mutual clusters by bottom-up linkage clustering and then applies top-down clustering to the mutual clusters. [sent-868, score-1.076]
59 For HMAC, the level of the dendrogram yielding two clusters is chosen to create the partition of the data. [sent-877, score-0.621]
60 In Mclust, the mixture model is initialized using the suggested default option, that is, to initialize the partition of data by an agglomerative hierarchical clustering approach, an extension of linkage clustering based on Gaussian mixtures (Fraley and Raftery, 2006). [sent-881, score-0.896]
61 This initialization may be of advantage especially to data in this study because as shown previously, linkage clustering generates perfect clustering for the noisy curve data set in the first example. [sent-882, score-0.771]
62 The dendrogram and the table show that 3 clusters containing more than 5 points are formed at the first level. [sent-958, score-0.575]
63 At level 4, two of the 3 prominent clusters are merged, leaving 2 prominent clusters which are further merged at level 7. [sent-960, score-1.05]
64 One remedy is to apply the cluster merging algorithm to a larger number of clusters formed at a lower level. [sent-969, score-0.884]
65 In Figure 6(d), three clusters are formed by applying stage II merging based on coverage rate. [sent-977, score-0.679]
66 Merging based on separability is not conducted because we attempt to get 3 clusters and the 3 largest clusters at this level already account for close to 95% of the data. [sent-979, score-0.997]
67 This clustering result is much more preferable than the 3 clusters directly generated by HMAC at level 1710 N ONPARAMETRIC M ODAL C LUSTERING Level 10 Level 7 Level 3 Level 1 (a) 0. [sent-980, score-0.699]
68 (d), (e) Three (two) clusters obtained by applying the merging algorithm to clusters generated by HMAC at level 3. [sent-1034, score-1.005]
69 4 (stage I) and merging based on coverage rate with ρ = 95% (stage II), we obtain two clusters shown in Figure 6(e). [sent-1038, score-0.619]
70 The ridgeline between the two major clusters at level 2 is computed, and the density function along this ridgeline is plotted in Figure 8(c). [sent-1059, score-0.935]
71 (c) Density function along the ridgeline between the two major clusters at level 2. [sent-1076, score-0.659]
72 (d) Two clusters obtained by applying the merging algorithm to clusters generated by HMAC. [sent-1077, score-0.932]
73 (e) The modal curves of the two main clusters obtained by HMAC at level 2. [sent-1078, score-0.615]
74 Just as in the clustering by HMAC, one cluster had longer attention time to the initial stimulus in both occasions. [sent-1101, score-0.578]
75 Instead, we provide a summarized version of the dendrogram emphasizing prominent clusters in Figure 9(a). [sent-1122, score-0.604]
76 (b) The density function along the ridgeline between the two major clusters at level 12. [sent-1151, score-0.746]
77 This dendrogram strongly suggests there are two major clusters in this data set because before level 8, the clusters formed are too small and the percentage of data not covered by the clusters is too high. [sent-1162, score-1.451]
78 To examine the separation of the two clusters at this level, we calculate the ridgeline between them and plot the density function along the ridgeline in Figure 9(b). [sent-1165, score-0.831]
79 The mode heights of the two clusters are close, and the cluster separation is strong. [sent-1166, score-0.744]
80 If we specify the range for the number of clusters as 2 to 10 and let Mclust choose the best number and the best covariance structure using BIC, the number of clusters chosen is 3. [sent-1207, score-0.732]
81 On the other hand, if we fix the number of clusters at 2 and run Mclust, the clustering accuracy becomes 94%. [sent-1213, score-0.626]
82 Under this principle, we declare a level of the dendrogram too low (small bandwidth) if all the clusters are small and a level too high (large bandwidth) if a very large portion of the data (e. [sent-1220, score-0.714]
83 A more profound effect on the clustering result comes from the merging procedure based on separability which involves choosing a separability threshold. [sent-1226, score-0.844]
84 2, the merging process based on separability is essentially the directional single linkage clustering which stops when all the between-cluster separability measures are above the threshold. [sent-1228, score-1.061]
85 Hence in situations where we have a targeted number of clusters to create, we can avoid choosing the threshold and simply stop the directional single linkage when the given number is reached. [sent-1230, score-0.603]
86 Of course, if at a certain level of the dendrogram, there exist precisely the targeted number of valid clusters, we can use that level directly rather than applying merging on clusters at a lower level. [sent-1231, score-0.732]
87 Whether the HMAC dendrogram clearly suggests the right number of clusters can be highly data dependent. [sent-1232, score-0.568]
88 For instance, in the document clustering example, the dendrogram in Figure 9(a) strongly suggests two clusters because at all the acceptable levels there are two reasonably large clusters. [sent-1233, score-0.808]
89 If precisely two clusters merge at every increased level of a dendrogram, that is, the number of clusters decreases exactly by one, the dendrogram will have n levels, where n is the data size. [sent-1239, score-1.041]
90 In practice, since the targeted number of clusters is normally much smaller than the data size, it is easy to find a relatively low level at which the number of clusters exceeds the target. [sent-1247, score-0.845]
91 We can then apply the separability based directional single linkage clustering at that level, and achieve any number of clusters smaller than the starting value. [sent-1248, score-1.035]
92 A basic approach to image segmentation is to cluster the pixel color components and label pixels in the same cluster as one region (Li and Gray, 2000). [sent-1251, score-0.665]
93 (d) Starting from the first level of the i=1 dendrogram formed by HMAC, apply the cluster merging algorithm described in Section 4. [sent-1267, score-0.773]
94 If the number of clusters after merging is smaller than or equal to the given targeted number of segments, stop and output the clustering results at this level. [sent-1269, score-0.846]
95 K-means clusters the pixels and computes the centroid vector for each cluster according to the criterion of minimizing the mean squared distance between the original vectors and the centroid vectors. [sent-1292, score-0.657]
96 For HMAC, the mode color vector of each cluster is shown as a color bar; and for K-means, the mean vector of each cluster is shown. [sent-1301, score-0.669]
97 A hierarchical clustering algorithm, HMAC, is developed by gradually increasing the bandwidth of the kernel functions and by recursively treating modes acquired at a smaller bandwidth as points to be clustered when a larger bandwidth is used. [sent-1310, score-0.718]
98 A separability measure between two clusters is defined based on the ridgeline, which takes comprehensive consideration of the exact densities of the clusters. [sent-1312, score-0.587]
99 A cluster merging method based on pairwise separability is developed, which addresses the competing factors of using a small bandwidth to retain major clustering structures and using a large one to achieve a low number of clusters. [sent-1313, score-1.086]
100 The HMAC clustering algorithm and its combination with the cluster merging algorithm are tested using both simulated and real data sets. [sent-1314, score-0.793]
wordName wordTfidf (topN-words)
[('hmac', 0.43), ('clusters', 0.366), ('cluster', 0.291), ('clustering', 0.26), ('merging', 0.2), ('separability', 0.192), ('linkage', 0.19), ('ridgeline', 0.189), ('dendrogram', 0.182), ('modal', 0.176), ('mem', 0.138), ('mixture', 0.137), ('mclust', 0.132), ('indsay', 0.117), ('odal', 0.117), ('modes', 0.104), ('onparametric', 0.099), ('mode', 0.087), ('density', 0.087), ('gk', 0.085), ('clique', 0.084), ('bandwidth', 0.084), ('ay', 0.081), ('ascending', 0.077), ('lustering', 0.075), ('level', 0.073), ('rem', 0.072), ('infants', 0.072), ('zi', 0.07), ('cliques', 0.068), ('head', 0.067), ('segmentation', 0.06), ('merged', 0.06), ('infant', 0.059), ('ci', 0.057), ('prominent', 0.056), ('sc', 0.055), ('coverage', 0.053), ('em', 0.047), ('modeling', 0.046), ('pk', 0.046), ('clustered', 0.046), ('mac', 0.044), ('visualization', 0.043), ('glass', 0.042), ('ridgelines', 0.039), ('discriminant', 0.038), ('pl', 0.038), ('li', 0.037), ('grouped', 0.035), ('parametric', 0.035), ('nonparametric', 0.035), ('fk', 0.035), ('hi', 0.035), ('bandwidths', 0.035), ('ck', 0.034), ('merge', 0.034), ('gl', 0.034), ('leung', 0.033), ('occasion', 0.033), ('stage', 0.033), ('lindsay', 0.033), ('outlier', 0.032), ('circle', 0.031), ('major', 0.031), ('gaussian', 0.03), ('colors', 0.03), ('xii', 0.03), ('directed', 0.029), ('densities', 0.029), ('hierarchical', 0.029), ('pairwise', 0.028), ('fraley', 0.028), ('directional', 0.027), ('stimulus', 0.027), ('kernel', 0.027), ('formed', 0.027), ('absorb', 0.026), ('hilltop', 0.026), ('mink', 0.026), ('minnotte', 0.026), ('posterior', 0.025), ('xi', 0.025), ('separated', 0.025), ('representatives', 0.025), ('loop', 0.024), ('log', 0.024), ('ray', 0.024), ('image', 0.023), ('cance', 0.023), ('projection', 0.022), ('simulated', 0.022), ('directions', 0.022), ('noisy', 0.021), ('jain', 0.02), ('tied', 0.02), ('zm', 0.02), ('data', 0.02), ('raftery', 0.02), ('targeted', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
Author: Jia Li, Surajit Ray, Bruce G. Lindsay
Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density
2 0.20398107 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
Author: Wei Pan, Xiaotong Shen
Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage
3 0.13822326 6 jmlr-2007-A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians
Author: Sanjoy Dasgupta, Leonard Schulman
Abstract: We show that, given data from a mixture of k well-separated spherical Gaussians in R d , a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to nearoptimal precision, if the dimension is high (d ln k). We relate this to previous theoretical and empirical work on the EM algorithm. Keywords: expectation maximization, mixtures of Gaussians, clustering, unsupervised learning, probabilistic analysis
4 0.12118074 52 jmlr-2007-Margin Trees for High-dimensional Classification
Author: Robert Tibshirani, Trevor Hastie
Abstract: We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its performance to less greedy procedures based on clustering of the matrix of pairwise margins. We compare the performance of the “margin tree” to the closely related “all-pairs” (one versus one) support vector machine, and nearest centroids on a number of cancer microarray data sets. We also develop a simple method for feature selection. We find that the margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes. Keywords: maximum margin classifier, support vector machine, decision tree, CART
5 0.11185756 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption
Author: Philippe Rigollet
Abstract: We consider semi-supervised classification when part of the available data is unlabeled. These unlabeled data can be useful for the classification problem when we make an assumption relating the behavior of the regression function to that of the marginal distribution. Seeger (2000) proposed the well-known cluster assumption as a reasonable one. We propose a mathematical formulation of this assumption and a method based on density level sets estimation that takes advantage of it to achieve fast rates of convergence both in the number of unlabeled examples and the number of labeled examples. Keywords: semi-supervised learning, statistical learning theory, classification, cluster assumption, generalization bounds
6 0.10828663 8 jmlr-2007-A Unified Continuous Optimization Framework for Center-Based Clustering Methods
7 0.080860965 3 jmlr-2007-A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation
8 0.048297901 56 jmlr-2007-Multi-Task Learning for Classification with Dirichlet Process Priors
9 0.044045281 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time
10 0.043863602 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)
11 0.0421785 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
12 0.038062461 30 jmlr-2007-Dynamics and Generalization Ability of LVQ Algorithms
13 0.037145335 42 jmlr-2007-Infinitely Imbalanced Logistic Regression
14 0.036925156 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
15 0.033442993 50 jmlr-2007-Local Discriminant Wavelet Packet Coordinates for Face Recognition
16 0.032696996 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
17 0.031844076 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features
18 0.031142378 1 jmlr-2007-"Ideal Parent" Structure Learning for Continuous Variable Bayesian Networks
19 0.030868549 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
20 0.029377092 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs
topicId topicWeight
[(0, 0.237), (1, 0.136), (2, 0.014), (3, 0.132), (4, 0.16), (5, -0.476), (6, 0.141), (7, 0.143), (8, 0.047), (9, -0.002), (10, -0.025), (11, -0.054), (12, 0.052), (13, 0.112), (14, -0.064), (15, -0.078), (16, 0.003), (17, -0.071), (18, -0.008), (19, 0.021), (20, -0.078), (21, 0.11), (22, 0.061), (23, -0.037), (24, 0.051), (25, -0.015), (26, 0.001), (27, 0.045), (28, 0.238), (29, 0.048), (30, -0.161), (31, -0.02), (32, -0.021), (33, 0.048), (34, 0.06), (35, 0.033), (36, -0.083), (37, 0.001), (38, 0.028), (39, 0.011), (40, 0.023), (41, 0.018), (42, 0.033), (43, -0.038), (44, -0.068), (45, -0.032), (46, 0.004), (47, -0.037), (48, 0.084), (49, -0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.97965866 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
Author: Jia Li, Surajit Ray, Bruce G. Lindsay
Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density
2 0.85120165 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
Author: Wei Pan, Xiaotong Shen
Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage
3 0.50220454 8 jmlr-2007-A Unified Continuous Optimization Framework for Center-Based Clustering Methods
Author: Marc Teboulle
Abstract: Center-based partitioning clustering algorithms rely on minimizing an appropriately formulated objective function, and different formulations suggest different possible algorithms. In this paper, we start with the standard nonconvex and nonsmooth formulation of the partitioning clustering problem. We demonstrate that within this elementary formulation, convex analysis tools and optimization theory provide a unifying language and framework to design, analyze and extend hard and soft center-based clustering algorithms, through a generic algorithm which retains the computational simplicity of the popular k-means scheme. We show that several well known and more recent center-based clustering algorithms, which have been derived either heuristically, or/and have emerged from intuitive analogies in physics, statistical techniques and information theoretic perspectives can be recovered as special cases of the proposed analysis and we streamline their relationships. Keywords: clustering, k-means algorithm, convex analysis, support and asymptotic functions, distance-like functions, Bregman and Csiszar divergences, nonlinear means, nonsmooth optimization, smoothing algorithms, fixed point methods, deterministic annealing, expectation maximization, information theory and entropy methods
4 0.47346708 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption
Author: Philippe Rigollet
Abstract: We consider semi-supervised classification when part of the available data is unlabeled. These unlabeled data can be useful for the classification problem when we make an assumption relating the behavior of the regression function to that of the marginal distribution. Seeger (2000) proposed the well-known cluster assumption as a reasonable one. We propose a mathematical formulation of this assumption and a method based on density level sets estimation that takes advantage of it to achieve fast rates of convergence both in the number of unlabeled examples and the number of labeled examples. Keywords: semi-supervised learning, statistical learning theory, classification, cluster assumption, generalization bounds
5 0.46886599 6 jmlr-2007-A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians
Author: Sanjoy Dasgupta, Leonard Schulman
Abstract: We show that, given data from a mixture of k well-separated spherical Gaussians in R d , a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to nearoptimal precision, if the dimension is high (d ln k). We relate this to previous theoretical and empirical work on the EM algorithm. Keywords: expectation maximization, mixtures of Gaussians, clustering, unsupervised learning, probabilistic analysis
6 0.37363437 52 jmlr-2007-Margin Trees for High-dimensional Classification
7 0.33861825 3 jmlr-2007-A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation
8 0.21310411 56 jmlr-2007-Multi-Task Learning for Classification with Dirichlet Process Priors
9 0.20878568 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features
10 0.20768578 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
11 0.19502525 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
12 0.18536125 30 jmlr-2007-Dynamics and Generalization Ability of LVQ Algorithms
13 0.18300515 50 jmlr-2007-Local Discriminant Wavelet Packet Coordinates for Face Recognition
14 0.17358135 26 jmlr-2007-Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis
15 0.15441437 15 jmlr-2007-Bilinear Discriminant Component Analysis
16 0.15381956 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)
17 0.1530063 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time
18 0.14596853 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
19 0.13873351 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs
20 0.13187692 47 jmlr-2007-Learning Horn Expressions with LOGAN-H
topicId topicWeight
[(4, 0.017), (7, 0.011), (8, 0.092), (10, 0.047), (12, 0.03), (15, 0.042), (22, 0.044), (28, 0.056), (39, 0.271), (40, 0.031), (45, 0.015), (48, 0.052), (60, 0.034), (80, 0.021), (85, 0.053), (98, 0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.73809195 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
Author: Jia Li, Surajit Ray, Bruce G. Lindsay
Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density
2 0.48367858 18 jmlr-2007-Characterizing the Function Space for Bayesian Kernel Models
Author: Natesh S. Pillai, Qiang Wu, Feng Liang, Sayan Mukherjee, Robert L. Wolpert
Abstract: Kernel methods have been very popular in the machine learning literature in the last ten years, mainly in the context of Tikhonov regularization algorithms. In this paper we study a coherent Bayesian kernel model based on an integral operator defined as the convolution of a kernel with a signed measure. Priors on the random signed measures correspond to prior distributions on the functions mapped by the integral operator. We study several classes of signed measures and their image mapped by the integral operator. In particular, we identify a general class of measures whose image is dense in the reproducing kernel Hilbert space (RKHS) induced by the kernel. A consequence of this result is a function theoretic foundation for using non-parametric prior specifications in Bayesian modeling, such as Gaussian process and Dirichlet process prior distributions. We discuss the construction of priors on spaces of signed measures using Gaussian and L´ vy processes, e with the Dirichlet processes being a special case the latter. Computational issues involved with sampling from the posterior distribution are outlined for a univariate regression and a high dimensional classification problem. Keywords: reproducing kernel Hilbert space, non-parametric Bayesian methods, L´ vy processes, e Dirichlet processes, integral operator, Gaussian processes c 2007 Natesh S. Pillai, Qiang Wu, Feng Liang, Sayan Mukherjee and Robert L. Wolpert. P ILLAI , W U , L IANG , M UKHERJEE AND W OLPERT
3 0.4750458 6 jmlr-2007-A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians
Author: Sanjoy Dasgupta, Leonard Schulman
Abstract: We show that, given data from a mixture of k well-separated spherical Gaussians in R d , a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to nearoptimal precision, if the dimension is high (d ln k). We relate this to previous theoretical and empirical work on the EM algorithm. Keywords: expectation maximization, mixtures of Gaussians, clustering, unsupervised learning, probabilistic analysis
4 0.44996285 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
Author: Wei Pan, Xiaotong Shen
Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage
5 0.44884136 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data
Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby
Abstract: Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space, based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of the embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text data sets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling, IsoMap and correspondence analysis. Keywords: embedding algorithms, manifold learning, exponential families, multidimensional scaling, matrix factorization, semidefinite programming
6 0.4449873 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
7 0.44489977 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation
9 0.43546617 26 jmlr-2007-Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis
10 0.43502757 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
11 0.42505509 39 jmlr-2007-Handling Missing Values when Applying Classification Models
12 0.4228152 46 jmlr-2007-Learning Equivariant Functions with Matrix Valued Kernels
13 0.4218016 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition
14 0.42124364 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
15 0.41876778 13 jmlr-2007-Bayesian Quadratic Discriminant Analysis
16 0.41725791 60 jmlr-2007-Nonlinear Estimators and Tail Bounds for Dimension Reduction inl1Using Cauchy Random Projections
17 0.41692716 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
18 0.41593832 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time
19 0.41541362 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption
20 0.4147917 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR