jmlr jmlr2011 jmlr2011-15 knowledge-graph by maker-knowledge-mining

15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

Source: pdf

Author: Volodymyr Melnykov, Ranjan Maitra

Abstract: This paper presents the C LUSTERING A LGORITHMS ’ R EFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. This paper brieﬂy describes the software and its capabilities. Keywords: CARP, M IX S IM, clustering algorithm, Gaussian mixture, overlap

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Statistics and Statistical Laboratory Iowa State University Ames, IA 50011, USA Editor: Cheng Soon Ong Abstract This paper presents the C LUSTERING A LGORITHMS ’ R EFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. [sent-4, score-0.37]

2 Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. [sent-5, score-0.67]

3 This paper brieﬂy describes the software and its capabilities. [sent-6, score-0.042]

4 Keywords: CARP, M IX S IM, clustering algorithm, Gaussian mixture, overlap 1. [sent-7, score-0.298]

5 Introduction There are many clustering algorithms available, but no uniformly clear winner. [sent-8, score-0.232]

6 Thus, calibrating the performance of each algorithm in different situations is important. [sent-9, score-0.039]

7 CARP provides software to evaluate performance of any user-provided clustering technique under simulated datasets of speciﬁed clustering complexity. [sent-10, score-0.888]

8 At its heart is software that implements Maitra and Melnykov’s (2010) algorithm to simulate datasets from Gaussian mixtures of different clustering difﬁculty. [sent-11, score-0.795]

9 CARP provides an integrated software tool which generates datasets using the above, uses the clustering algorithm being evaluated on each dataset and compares the derived grouping relative to the true via indices such as the Adjusted Rand (R ) index (Hubert and Arabie, 1985). [sent-12, score-0.739]

10 Beyond the algorithmic and other limitations (Maitra and Melnykov, 2010) underlying both these packages, none of them provide an integrated tool to evaluate clustering algorithms. [sent-16, score-0.293]

11 CARP addresses this shortcoming by seamlessly integrating three stages. [sent-17, score-0.061]

12 The ﬁrst stage generates datasets given user-speciﬁed measures for desired clustering complexity c 2011 Volodymyr Melnykov and Ranjan Maitra. [sent-18, score-0.749]

13 0 M ELNYKOV AND M AITRA (a) (b) Figure 1: (a) Sample 10-component bivariate mixture distribution. [sent-25, score-0.126]

14 (b) Corresponding sample dataset generated at the ﬁrst stage of CARP. [sent-26, score-0.215]

15 The next stage clusters each dataset using the user-provided algorithm in executable form. [sent-56, score-0.347]

16 The ﬁnal phase evaluates performance—in terms of the default R or some other index supplied by the user in executable form—of this clustering algorithm by comparing its derived partitioning with the true grouping. [sent-57, score-0.466]

17 The package is speciﬁcally designed to be ﬂexible enough to allow the user to provide clustering algorithms and evaluation measures in his/her preferred programming language. [sent-58, score-0.378]

18 For any two components (say i, j) of a Gaussian mixture density g(x) = ∑K πk φ(x; µk , Σk ), the overlap is dek=1 ﬁned as ωi j = ωi| j + ω j|i , where ω j|i is the probability that an observation from the ith component is misclassiﬁed to be from the jth one, with ωi| j deﬁned similarly. [sent-61, score-0.133]

19 1 in Maitra and Melnykov (2010) to calculate them numerically and efﬁciently. [sent-63, score-0.017]

20 For a K-component mixture, there are K ωi j s, so clustering complexity 2 ¯ ˇ is characterized using the average (ω) or maximum (ω) pairwise overlap measures. [sent-64, score-0.36]

21 1 of Maitra and Melnykov (2010) to simulate Gaussian mixtures corre¯ ˇ sponding to a given ω or ω, as in the sample ten-component bivariate mixture distribution satisfying ˇ ω = 0. [sent-67, score-0.326]

22 For a more comprehensive summary of clustering complexity, CARP uses ¯ ˇ their Algorithm 2. [sent-69, score-0.232]

23 2 to simulate mixtures according to the pair (ω, ω), as illustrated in Figure 2. [sent-71, score-0.176]

24 ¯ Note that Figures 2a-b have the same ω but in the ﬁrst case, this value is driven by the overlap between a few pairs of components. [sent-72, score-0.066]

25 0027 ¯ ˇ Figure 2: Sample 6-component bivariate mixture distributions for different (ω, ω)s. [sent-78, score-0.126]

26 375 Table 1: Median time (secs) to simulate 25 K-component p-dimensional Gaussian mixtures. [sent-95, score-0.108]

27 CARP can simulate mixture models of many components (K) and dimensions (p): Table 1 summarizes the median time over 25 samples to obtain realizations of Gaussian mixtures at each setting on a desktop workstation with dual quad-core Intel R Xeon R X5482 @ 3. [sent-97, score-0.355]

28 The ﬁrst phase of CARP thus involves simulating mixture distributions corresponding to userˇ ¯ speciﬁed K, p and clustering complexity in the form of ω and/or ω. [sent-107, score-0.423]

29 Given the desired sample size (n) of the dataset, this stage then obtains n simulated observations from each realized mixture model, as in the 500-observations sample displayed in Figure 1b. [sent-108, score-0.296]

30 Beyond Gaussian mixtures, datasets from more general-shaped clusters with desired approximate clustering complexity are possible to generate using the inverse multivariate Box-Cox transform on the simulated Gaussian mixtures. [sent-109, score-0.686]

31 This stage is also designed to allow for generating datasets with noisy variables and/or containing scatter/outliers. [sent-110, score-0.491]

32 CARP can be used standalone, without calls to the next two stages, to generate only these distributions and datasets if so desired—the R package M IX S IM provides an additional interface to this stage of CARP. [sent-111, score-0.592]

33 Stage II: Clustering Simulated Datasets: The second phase of CARP uses the clustering algorithm(s) being evaluated to partition the datasets simulated in the ﬁrst stage. [sent-112, score-0.63]

34 Code for these algorithm(s) is submitted by the user in executable form, with no requirement on this code having to be in a speciﬁc programming language. [sent-113, score-0.143]

35 This stage is designed to interface easily with other clustering tools, as illustrated in the manual. [sent-114, score-0.423]

36 Stage III: Evaluating the Performance of Clustering Algorithms: The third stage compares the partitionings provided by each clustering algorithm under investigation to the true. [sent-115, score-0.413]

37 The default evaluation metric is R , but other measures, including those with user-supplied code in executable form, can be used: see the manual for examples. [sent-116, score-0.19]

38 In summary, CARP provides distributions of the desired performance measure for the clustering ˇ ¯ ¯ ˇ method(s) being evaluated for given n, K, p and the desired ω, ω or (ω, ω). [sent-117, score-0.304]

39 1 Demonstrating the Utility of CARP For illustration, we consider a simple example where the goal is to evaluate performance of different algorithms in partitioning datasets with (n, K, p) = (100, 7, 5) and varying clustering complexity ˇ characterized solely in terms of ω. [sent-119, score-0.64]

40 (The exact CARP commands used to conduct this study are in Section 5 of the CARP software manual. [sent-120, score-0.063]

41 ) Table 2 summarizes performance of each algorithm on ˇ 25 datasets, each with different ω and obtained using Stage I of CARP. [sent-121, score-0.016]

42 Clustering algorithms are used here via external calls to the corresponding software. [sent-122, score-0.023]

43 Evaluations are in terms of the default R ˇ provided with CARP. [sent-123, score-0.041]

44 Clearly, performance degrades across the board with higher ω. [sent-124, score-0.018]

45 Hierarchical clustering with Ward’s and single linkages are, respectively, the best and worst performers in many cases, with k-means being second-best. [sent-125, score-0.28]

46 Although a small-scale experiment, it highlights CARP’s utility in summarizing and evaluating performance of different clustering algorithms. [sent-126, score-0.304]

47 000 Table 2: Performance of hierarchical algorithms with Ward’s (W), complete (C), average (A), single (S) linkages, k-means (K), and partitioning around medoids (P) algorithms on clustering 5-dimensional datasets of size 100 with 7 groups. [sent-263, score-0.58]

48 5 and IR represent the median and inter-quartile range of R over 100 replications. [sent-265, score-0.034]

49 Complete details on usage, parameters and examples are provided in the package’s manual and README ﬁle. [sent-270, score-0.031]

50 CARP can be used standalone in order to only simulate Gaussian mixtures and corresponding datasets. [sent-271, score-0.247]

51 This standalone functionality is also provided by the R package M IX S IM which is publicly available from http://www. [sent-272, score-0.214]

52 Conclusions CARP is a powerful and user-friendly open source software package for evaluating clustering algorithms. [sent-276, score-0.412]

53 It realizes mixture models of pre-speciﬁed clustering complexity, and from there, datasets of desired sample sizes that are then partitioned using the clustering algorithms being appraised. [sent-277, score-0.897]

54 Performance is summarized by comparing the obtained groupings with the true. [sent-278, score-0.021]

55 CARP can also be used to evaluate semi-supervised clustering algorithms, or on simulated datasets with noisy variables or containing scatter/outliers. [sent-279, score-0.614]

56 Clusters should each ideally be Gaussian-distributed, though the package also uses transformations such as the multivariate Box-Cox to simulate grouped datasets from more general distributions. [sent-280, score-0.512]

57 CARP can also calculate the pairwise overlap between identiﬁed groups in clustered or classiﬁed datasets. [sent-281, score-0.103]

58 It is only designed for cases involving continuous variables and not, in general, able to evaluate algorithms that cluster on manifolds and the like. [sent-282, score-0.06]

59 Simulating data to study performance of ﬁnite mixture modeling and clustering algorithms. [sent-292, score-0.299]

60 Generation of random clusters with speciﬁed degree of separation. [sent-301, score-0.037]

61 OCLUS: An analytic method for generating clusters with known overlap. [sent-306, score-0.055]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('carp', 0.726), ('datasets', 0.309), ('maitra', 0.251), ('melnykov', 0.251), ('clustering', 0.232), ('stage', 0.138), ('simulate', 0.108), ('package', 0.095), ('executable', 0.095), ('ir', 0.085), ('volodymyr', 0.084), ('dataset', 0.077), ('simulating', 0.073), ('standalone', 0.071), ('mixtures', 0.068), ('mixture', 0.067), ('overlap', 0.066), ('bivariate', 0.059), ('lustering', 0.059), ('aitra', 0.056), ('elnykov', 0.056), ('hubert', 0.056), ('ishing', 0.056), ('oclus', 0.056), ('oftware', 0.056), ('ood', 0.056), ('ranjan', 0.056), ('steinley', 0.056), ('simulated', 0.055), ('linkages', 0.048), ('qiu', 0.048), ('evaluating', 0.043), ('software', 0.042), ('ix', 0.041), ('default', 0.041), ('calibrating', 0.039), ('partitioning', 0.039), ('im', 0.038), ('clusters', 0.037), ('implements', 0.036), ('desired', 0.036), ('lgorithms', 0.035), ('ward', 0.034), ('phase', 0.034), ('median', 0.034), ('stages', 0.033), ('manual', 0.031), ('utility', 0.029), ('publicly', 0.027), ('interface', 0.027), ('usage', 0.026), ('designed', 0.026), ('characterized', 0.025), ('user', 0.025), ('arabie', 0.024), ('corre', 0.024), ('formerly', 0.024), ('gcc', 0.024), ('iowa', 0.024), ('joe', 0.024), ('partitionings', 0.024), ('seamlessly', 0.024), ('sponding', 0.024), ('summaries', 0.024), ('workstation', 0.024), ('gaussian', 0.023), ('ut', 0.023), ('calls', 0.023), ('code', 0.023), ('tool', 0.022), ('integrated', 0.021), ('eneration', 0.021), ('commands', 0.021), ('functionality', 0.021), ('groupings', 0.021), ('ia', 0.021), ('realizes', 0.021), ('pairwise', 0.02), ('desktop', 0.02), ('gnu', 0.02), ('secs', 0.02), ('shortcoming', 0.02), ('compares', 0.019), ('generating', 0.018), ('board', 0.018), ('career', 0.018), ('realizations', 0.018), ('rand', 0.018), ('suite', 0.018), ('unavailable', 0.018), ('xeon', 0.018), ('figures', 0.018), ('evaluate', 0.018), ('complexity', 0.017), ('addresses', 0.017), ('generates', 0.017), ('packages', 0.017), ('calculate', 0.017), ('summarizes', 0.016), ('manifolds', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

Author: Volodymyr Melnykov, Ranjan Maitra

2 0.05454858 16 jmlr-2011-Clustering Algorithms for Chains

Author: Antti Ukkonen

Abstract: We consider the problem of clustering a set of chains to k clusters. A chain is a totally ordered subset of a ﬁnite set of items. Chains are an intuitive way to express preferences over a set of alternatives, as well as a useful representation of ratings in situations where the item-speciﬁc scores are either difﬁcult to obtain, too noisy due to measurement error, or simply not as relevant as the order that they induce over the items. First we adapt the classical k-means for chains by proposing a suitable distance function and a centroid structure. We also present two different approaches for mapping chains to a vector space. The ﬁrst one is related to the planted partition model, while the second one has an intuitive geometrical interpretation. Finally we discuss a randomization test for assessing the signiﬁcance of a clustering. To this end we present an MCMC algorithm for sampling random sets of chains that share certain properties with the original data. The methods are studied in a series of experiments using real and artiﬁcial data. Results indicate that the methods produce interesting clusterings, and for certain types of inputs improve upon previous work on clustering algorithms for orders. Keywords: Lloyd’s algorithm, orders, preference statements, planted partition model, randomization testing

3 0.03032276 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

Author: Bruno Pelletier, Pierre Pudlo

Abstract: Following Hartigan (1975), a cluster is deﬁned as a connected component of the t-level set of the underlying density, that is, the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the data set into the feature space, which establishes the strong consistency of our proposed algorithm. Keywords: spectral clustering, graph, unsupervised classiﬁcation, level sets, connected components

4 0.027053766 41 jmlr-2011-Improved Moves for Truncated Convex Models

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random ﬁeld characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Speciﬁcally, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efﬁcient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation

5 0.022262089 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Author: Vianney Perchet

Abstract: We provide consistent random algorithms for sequential decision under partial monitoring, when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other ﬁxed law. They are based on a generalization of calibration, no longer deﬁned in terms of a Vorono¨ ı diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the ﬁrst time in this general framework, the expected average internal, as well as the usual external, regret at stage n by O(n−1/3 ), which is known to be optimal. Keywords: repeated games, on-line learning, regret, partial monitoring, calibration, Vorono¨ and ı Laguerre diagrams

6 0.021557789 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

7 0.021393806 35 jmlr-2011-Forest Density Estimation

8 0.020319933 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

9 0.020149117 61 jmlr-2011-Logistic Stick-Breaking Process

10 0.019851089 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

11 0.019806305 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

12 0.019627608 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

13 0.019520093 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data

14 0.019158266 102 jmlr-2011-Waffles: A Machine Learning Toolkit

15 0.017647404 50 jmlr-2011-LPmade: Link Prediction Made Easy

16 0.017248943 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

17 0.016852183 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

18 0.016508786 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

19 0.016246015 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

20 0.015985252 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.065), (1, -0.035), (2, -0.015), (3, -0.014), (4, -0.023), (5, -0.002), (6, 0.002), (7, 0.036), (8, -0.051), (9, 0.015), (10, -0.038), (11, 0.075), (12, 0.128), (13, -0.007), (14, -0.117), (15, -0.098), (16, 0.027), (17, 0.069), (18, -0.031), (19, -0.038), (20, 0.049), (21, -0.053), (22, -0.043), (23, -0.158), (24, 0.027), (25, -0.133), (26, -0.098), (27, 0.268), (28, -0.068), (29, -0.249), (30, 0.192), (31, 0.255), (32, 0.051), (33, 0.008), (34, 0.142), (35, -0.061), (36, -0.189), (37, 0.019), (38, 0.172), (39, -0.04), (40, -0.282), (41, -0.009), (42, 0.115), (43, 0.069), (44, -0.069), (45, 0.096), (46, 0.031), (47, 0.014), (48, -0.031), (49, -0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98514676 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

Author: Volodymyr Melnykov, Ranjan Maitra

2 0.56358218 16 jmlr-2011-Clustering Algorithms for Chains

Author: Antti Ukkonen

3 0.31489608 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

Author: Bruno Pelletier, Pierre Pudlo

4 0.23862755 102 jmlr-2011-Waffles: A Machine Learning Toolkit

Author: Michael Gashler

Abstract: We present a breadth-oriented collection of cross-platform command-line tools for researchers in machine learning called Wafﬂes. The Wafﬂes tools are designed to offer a broad spectrum of functionality in a manner that is friendly for scripted automation. All functionality is also available in a C++ class library. Wafﬂes is available under the GNU Lesser General Public License. Keywords: machine learning, toolkits, data mining, C++, open source

5 0.1861922 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

Author: Fabien Lauer, Yann Guermeur

Abstract: This paper describes MSVMpack, an open source software package dedicated to our generic model of multi-class support vector machine. All four multi-class support vector machines (M-SVMs) proposed so far in the literature appear as instances of this model. MSVMpack provides for them the ﬁrst uniﬁed implementation and offers a convenient basis to develop other instances. This is also the ﬁrst parallel implementation for M-SVMs. The package consists in a set of command-line tools with a callable library. The documentation includes a tutorial, a user’s guide and a developer’s guide. Keywords: multi-class support vector machines, open source, C

6 0.17292053 41 jmlr-2011-Improved Moves for Truncated Convex Models

7 0.1615265 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

8 0.15057543 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

9 0.14875218 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

10 0.14635649 35 jmlr-2011-Forest Density Estimation

11 0.14488789 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data

12 0.14387713 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

13 0.13746971 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes

14 0.1371337 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

15 0.13299641 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

16 0.12577826 61 jmlr-2011-Logistic Stick-Breaking Process

17 0.11617911 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

18 0.11003105 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

19 0.10219155 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach

20 0.090702675 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.02), (9, 0.023), (10, 0.048), (24, 0.022), (31, 0.1), (32, 0.023), (36, 0.011), (41, 0.026), (60, 0.017), (73, 0.019), (78, 0.027), (87, 0.524), (90, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78153849 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

Author: Volodymyr Melnykov, Ranjan Maitra

2 0.70965838 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

Author: Trine Julie Abrahamsen, Lars Kai Hansen

Abstract: Small sample high-dimensional principal component analysis (PCA) suffers from variance inﬂation and lack of generalizability. It has earlier been pointed out that a simple leave-one-out variance renormalization scheme can cure the problem. In this paper we generalize the cure in two directions: First, we propose a computationally less intensive approximate leave-one-out estimator, secondly, we show that variance inﬂation is also present in kernel principal component analysis (kPCA) and we provide a non-parametric renormalization scheme which can quite efﬁciently restore generalizability in kPCA. As for PCA our analysis also suggests a simpliﬁed approximate expression. Keywords: PCA, kernel PCA, generalizability, variance renormalization

3 0.22307934 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis

Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We ﬁrst propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efﬁcient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line ﬁlling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture

4 0.2206028 95 jmlr-2011-Training SVMs Without Offset

Author: Ingo Steinwart, Don Hush, Clint Scovel

Abstract: We develop, analyze, and test a training algorithm for support vector machine classiﬁers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that signiﬁcantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs signiﬁcantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms

5 0.21613786 101 jmlr-2011-Variable Sparsity Kernel Learning

Author: Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath, Sankaran Raman

Abstract: paper1 This presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls , s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efﬁcient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O m2 ntot log nmax /ε2 where m is no. training data points, nmax , ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efﬁciency. Keywords: multiple kernel learning, mirror descent, mixed-norm, object categorization, scalability 1. All authors contributed equally. The author names appear in alphabetical order. c 2011 Jonathan Aﬂalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath and Sankaran Raman. A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN

6 0.21389201 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

7 0.20677453 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

8 0.20312527 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

9 0.20026666 12 jmlr-2011-Bayesian Co-Training

10 0.19950627 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

11 0.19913299 48 jmlr-2011-Kernel Analysis of Deep Networks

12 0.19890617 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal

13 0.19886318 16 jmlr-2011-Clustering Algorithms for Chains

14 0.19808312 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

15 0.19650073 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

16 0.19593072 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

17 0.19494677 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets

18 0.19220471 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling

19 0.19200106 66 jmlr-2011-Multiple Kernel Learning Algorithms

20 0.19192909 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets