jmlr jmlr2009 jmlr2009-43 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas Abeel, Yves Van de Peer, Yvan Saeys
Abstract: Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license. Keywords: open source, machine learning, data mining, java library, clustering, feature selection, classification
Reference: text
sentIndex sentText sentNum sentScore
1 The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. [sent-11, score-0.248]
2 Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. [sent-12, score-0.378]
3 The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. [sent-13, score-0.096]
4 The library is written in Java and is available from http://java-ml. [sent-14, score-0.254]
5 Keywords: open source, machine learning, data mining, java library, clustering, feature selection, classification 1. [sent-17, score-0.156]
6 Introduction Machine learning techniques are increasingly popular in research fields like bio- and chemoinformatics, text and web mining, as well as many other areas of research and industry. [sent-18, score-0.035]
7 In this paper we present Java-ML: a cross-platform, open source machine learning library written in Java. [sent-19, score-0.386]
8 Several well-known data mining libraries already exist, including for example, Weka (Witten and Frank, 2005) and Yale/RapidMiner (Mierswa et al. [sent-20, score-0.106]
9 These programs provide a userfriendly interface and are geared towards interactive use with the user. [sent-22, score-0.217]
10 In contrast to these programs, Java-ML is oriented towards developers that want to use machine learning in their own programs. [sent-23, score-0.145]
11 To this end, Java-ML interfaces are restricted to the essentials, and are very easy to understand. [sent-24, score-0.248]
12 As a result, Java-ML facilitates a broad exploration of different models, is straightforward to integrate into your own source code, and can be easily extended. [sent-25, score-0.311]
13 Java-ML contains an extensive set of similarity based techniques, and offers state-of-the-art feature selection techniques. [sent-27, score-0.176]
14 The large number of similarity functions allow for a broad set of clustering and instance based learning techniques, while the feature selection techniques are well suited to deal with high-dimensional domains, such as the ones often encountered in bioinformatics and biomedical applications. [sent-28, score-0.676]
15 Description of the Library In this section we first describe the software design of Java-ML, we then discuss how to integrate it in your program and finally we cover the documentation. [sent-33, score-0.121]
16 1 Structure of the Library The library is built around two core interfaces: Dataset and Instance. [sent-35, score-0.254]
17 These two interfaces have several implementations for different types of samples. [sent-36, score-0.29]
18 The machine learning algorithms implement one of the following interfaces: Clusterer, Classifier, FeatureScoring, FeatureRanking or FeatureSubsetSelection. [sent-37, score-0.049]
19 Distance, correlation and similarity measures implement the interface DistanceMeasure. [sent-38, score-0.352]
20 These distance measures can be used in many algorithms to modify their behavior. [sent-39, score-0.157]
21 Cluster evaluation measures are defined by the ClusterEvaluation interface. [sent-40, score-0.145]
22 Manipulation filters either implement InstanceFilter or DatasetFilter, depending on the level they work on. [sent-41, score-0.049]
23 All implementing classes for each of the interfaces are available from the API documentation that is available on the Java-ML website. [sent-42, score-0.343]
24 Each of these interfaces provides one or two methods that are required to execute the algorithm on a particular data set. [sent-43, score-0.287]
25 Several utility classes make it easy to load data from tab or comma separated files and from ARFF formatted files. [sent-44, score-0.175]
26 An overview of the main algorithms included in Java-ML can be found in Table 1. [sent-45, score-0.043]
27 The library provides several algorithms that have not been made available before in a bundled form. [sent-46, score-0.254]
28 In particular, clustering algorithms and the accompanying cluster evaluation measures are extensively represented. [sent-47, score-0.617]
29 This includes the adaptive quality-based clustering algorithm, density based methods, self-organizing maps (both as clustering and classification algorithm) and numerous other 932 JAVA -ML: A M ACHINE L EARNING L IBRARY well-known clustering algorithms. [sent-48, score-1.001]
30 A large number of distance, similarity and correlation measures are included. [sent-49, score-0.24]
31 Feature selection algorithms include traditional algorithms like symmetrical uncertainty, gain ratio, RELIEF, stepwise addition/removal, as well as a number of more recent methods (SVMRFE and random forest attribute evaluation). [sent-50, score-0.26]
32 Also the recently introduced concept of ensemble feature selection techniques (Saeys et al. [sent-51, score-0.162]
33 We have also implemented a fast and simple random tree algorithm to cope with high dimensional, sparse and ambiguous data. [sent-53, score-0.131]
34 Finally, we provide bridges for classification and clustering in Weka and libsvm (Fan et al. [sent-54, score-0.476]
35 2 Easy Integration in Your Own Source Code Including Java-ML algorithms in your own source code is very simple. [sent-57, score-0.222]
36 To illustrate this, we present here two short code fragments that demonstrate the ease to integrate the library. [sent-58, score-0.257]
37 The following lines of code integrate a K-Means clustering algorithm in your own program. [sent-59, score-0.533]
38 cluster(data); The first line uses the FileHandler utility to load data from the iris. [sent-63, score-0.124]
39 In this file, the class label is on the fourth position and the fields are separated by a comma. [sent-65, score-0.051]
40 The second line constructs a new instance of the KMeans clustering algorithm with default values, in this case k=4. [sent-66, score-0.365]
41 The third line uses the KMeans instance to cluster the data that we loaded in the first line. [sent-67, score-0.193]
42 The resulting clusters will be returned as an array of data sets. [sent-68, score-0.082]
43 The following example illustrates how to perform a cross-validation experiment for a specific dataset and classifier. [sent-69, score-0.163]
wordName wordTfidf (topN-words)
[('clustering', 0.322), ('library', 0.254), ('interfaces', 0.248), ('abeel', 0.213), ('psb', 0.213), ('yvan', 0.213), ('yves', 0.213), ('saeys', 0.181), ('dataset', 0.163), ('kmeans', 0.162), ('weka', 0.162), ('classifier', 0.142), ('clusterer', 0.142), ('file', 0.142), ('peer', 0.142), ('source', 0.132), ('integrate', 0.121), ('arff', 0.121), ('java', 0.111), ('bridges', 0.108), ('api', 0.108), ('forests', 0.108), ('stepwise', 0.108), ('measures', 0.105), ('developers', 0.099), ('cluster', 0.096), ('knn', 0.092), ('code', 0.09), ('manipulation', 0.087), ('lters', 0.087), ('similarity', 0.085), ('crossvalidation', 0.082), ('thomas', 0.082), ('ensemble', 0.071), ('interface', 0.063), ('utility', 0.063), ('programs', 0.061), ('load', 0.061), ('eer', 0.06), ('gent', 0.06), ('rfe', 0.06), ('chemoinformatics', 0.06), ('symmetrical', 0.06), ('mining', 0.06), ('broad', 0.058), ('implementing', 0.056), ('loaded', 0.054), ('accompanying', 0.054), ('extensible', 0.054), ('geared', 0.054), ('gnu', 0.054), ('plant', 0.054), ('ghent', 0.054), ('relief', 0.054), ('usable', 0.054), ('documented', 0.054), ('distance', 0.052), ('separated', 0.051), ('van', 0.05), ('correlation', 0.05), ('utilities', 0.05), ('gpl', 0.05), ('ambiguous', 0.05), ('achine', 0.05), ('implement', 0.049), ('clusters', 0.048), ('selection', 0.046), ('oriented', 0.046), ('forest', 0.046), ('fragments', 0.046), ('libraries', 0.046), ('organizing', 0.046), ('libsvm', 0.046), ('feature', 0.045), ('cope', 0.043), ('bagging', 0.043), ('biomedical', 0.043), ('ren', 0.043), ('self', 0.043), ('witten', 0.043), ('belgium', 0.043), ('instance', 0.043), ('overview', 0.043), ('implementations', 0.042), ('loading', 0.041), ('elds', 0.04), ('evaluation', 0.04), ('interactive', 0.039), ('documentation', 0.039), ('execute', 0.039), ('tree', 0.038), ('km', 0.037), ('sonnenburg', 0.037), ('discretization', 0.035), ('integration', 0.035), ('fan', 0.035), ('increasingly', 0.035), ('maps', 0.035), ('encountered', 0.034), ('array', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 43 jmlr-2009-Java-ML: A Machine Learning Library (Machine Learning Open Source Software Paper)
Author: Thomas Abeel, Yves Van de Peer, Yvan Saeys
Abstract: Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license. Keywords: open source, machine learning, data mining, java library, clustering, feature selection, classification
2 0.18849146 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)
Author: Davis E. King
Abstract: There are many excellent toolkits which provide support for developing machine learning software in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking. To enable easy use of these tools, the entire library has been developed with contract programming, which provides complete and precise documentation as well as powerful debugging tools. Keywords: kernel-methods, svm, rvm, kernel clustering, C++, Bayesian networks
Author: Troy Raeder, Nitesh V. Chawla
Abstract: This paper presents Model Monitor (M 2 ), a Java toolkit for robustly evaluating machine learning algorithms in the presence of changing data distributions. M 2 provides a simple and intuitive framework in which users can evaluate classifiers under hypothesized shifts in distribution and therefore determine the best model (or models) for their data under a number of potential scenarios. Additionally, M 2 is fully integrated with the WEKA machine learning environment, so that a variety of commodity classifiers can be used if desired. Keywords: machine learning, open-source software, distribution shift, scenario analysis
4 0.096136995 59 jmlr-2009-Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions
Author: Sébastien Bubeck, Ulrike von Luxburg
Abstract: Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal, and instead can lead to inconsistency. We construct examples which provably have this behavior. As in the case of supervised learning, the cure is to restrict the size of the function classes under consideration. For appropriate “small” function classes we can prove very general consistency theorems for clustering optimization schemes. As one particular algorithm for clustering with a restricted function space we introduce “nearest neighbor clustering”. Similar to the k-nearest neighbor classifier in supervised learning, this algorithm can be seen as a general baseline algorithm to minimize arbitrary clustering objective functions. We prove that it is statistically consistent for all commonly used clustering objective functions. Keywords: clustering, minimizing objective functions, consistency
5 0.086717896 34 jmlr-2009-Fast ApproximatekNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
Author: Jie Chen, Haw-ren Fang, Yousef Saad
Abstract: Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn2 ) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dnt ) time for high dimensional data (large d). The exponent t ∈ (1, 2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs. Keywords: nearest neighbors graph, high dimensional data, divide and conquer, Lanczos algorithm, spectral method
6 0.081702597 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics
8 0.059194516 24 jmlr-2009-Distance Metric Learning for Large Margin Nearest Neighbor Classification
9 0.054964595 96 jmlr-2009-Transfer Learning for Reinforcement Learning Domains: A Survey
10 0.051679641 90 jmlr-2009-Structure Spaces
12 0.047823407 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
13 0.043769479 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory
15 0.040240329 45 jmlr-2009-Learning Approximate Sequential Patterns for Classification
16 0.034795702 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms
17 0.034464356 38 jmlr-2009-Hash Kernels for Structured Data
18 0.034001671 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning
19 0.033939503 60 jmlr-2009-Nieme: Large-Scale Energy-Based Models (Machine Learning Open Source Software Paper)
20 0.033895116 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training
topicId topicWeight
[(0, 0.14), (1, -0.142), (2, 0.075), (3, -0.103), (4, 0.082), (5, -0.261), (6, 0.275), (7, 0.061), (8, -0.002), (9, 0.186), (10, -0.129), (11, 0.252), (12, -0.11), (13, 0.298), (14, 0.086), (15, -0.026), (16, 0.082), (17, -0.058), (18, -0.114), (19, 0.002), (20, 0.016), (21, 0.036), (22, -0.137), (23, 0.049), (24, 0.005), (25, 0.02), (26, -0.041), (27, 0.053), (28, -0.039), (29, -0.086), (30, 0.098), (31, -0.076), (32, 0.088), (33, -0.049), (34, 0.012), (35, -0.014), (36, -0.003), (37, -0.057), (38, 0.065), (39, 0.044), (40, -0.065), (41, 0.064), (42, 0.023), (43, -0.034), (44, -0.065), (45, 0.048), (46, -0.007), (47, -0.095), (48, 0.019), (49, 0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.98280835 43 jmlr-2009-Java-ML: A Machine Learning Library (Machine Learning Open Source Software Paper)
Author: Thomas Abeel, Yves Van de Peer, Yvan Saeys
Abstract: Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license. Keywords: open source, machine learning, data mining, java library, clustering, feature selection, classification
2 0.74851376 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)
Author: Davis E. King
Abstract: There are many excellent toolkits which provide support for developing machine learning software in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking. To enable easy use of these tools, the entire library has been developed with contract programming, which provides complete and precise documentation as well as powerful debugging tools. Keywords: kernel-methods, svm, rvm, kernel clustering, C++, Bayesian networks
Author: Troy Raeder, Nitesh V. Chawla
Abstract: This paper presents Model Monitor (M 2 ), a Java toolkit for robustly evaluating machine learning algorithms in the presence of changing data distributions. M 2 provides a simple and intuitive framework in which users can evaluate classifiers under hypothesized shifts in distribution and therefore determine the best model (or models) for their data under a number of potential scenarios. Additionally, M 2 is fully integrated with the WEKA machine learning environment, so that a variety of commodity classifiers can be used if desired. Keywords: machine learning, open-source software, distribution shift, scenario analysis
Author: Sébastien Bubeck, Ulrike von Luxburg
Abstract: Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal, and instead can lead to inconsistency. We construct examples which provably have this behavior. As in the case of supervised learning, the cure is to restrict the size of the function classes under consideration. For appropriate “small” function classes we can prove very general consistency theorems for clustering optimization schemes. As one particular algorithm for clustering with a restricted function space we introduce “nearest neighbor clustering”. Similar to the k-nearest neighbor classifier in supervised learning, this algorithm can be seen as a general baseline algorithm to minimize arbitrary clustering objective functions. We prove that it is statistically consistent for all commonly used clustering objective functions. Keywords: clustering, minimizing objective functions, consistency
5 0.37213901 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics
Author: Jens Lehmann
Abstract: In this paper, we introduce DL-Learner, a framework for learning in description logics and OWL. OWL is the ofÄ?Ĺš cial W3C standard ontology language for the Semantic Web. Concepts in this language can be learned for constructing and maintaining OWL ontologies or for solving problems similar to those in Inductive Logic Programming. DL-Learner includes several learning algorithms, support for different OWL formats, reasoner interfaces, and learning problems. It is a cross-platform framework implemented in Java. The framework allows easy programmatic access and provides a command line interface, a graphical interface as well as a WSDL-based web service. Keywords: concept learning, description logics, OWL, classiÄ?Ĺš cation, open-source
8 0.2520242 24 jmlr-2009-Distance Metric Learning for Large Margin Nearest Neighbor Classification
9 0.25126639 34 jmlr-2009-Fast ApproximatekNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
11 0.21765172 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms
12 0.20753177 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory
13 0.20672631 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
14 0.1855083 90 jmlr-2009-Structure Spaces
15 0.17557263 45 jmlr-2009-Learning Approximate Sequential Patterns for Classification
16 0.17215022 96 jmlr-2009-Transfer Learning for Reinforcement Learning Domains: A Survey
17 0.16802342 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training
18 0.15835808 50 jmlr-2009-Learning When Concepts Abound
19 0.1564313 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning
20 0.14433429 4 jmlr-2009-A Survey of Accuracy Evaluation Metrics of Recommendation Tasks
topicId topicWeight
[(8, 0.053), (26, 0.038), (38, 0.038), (52, 0.029), (55, 0.012), (58, 0.026), (66, 0.053), (90, 0.042), (91, 0.601)]
simIndex simValue paperId paperTitle
same-paper 1 0.78745914 43 jmlr-2009-Java-ML: A Machine Learning Library (Machine Learning Open Source Software Paper)
Author: Thomas Abeel, Yves Van de Peer, Yvan Saeys
Abstract: Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license. Keywords: open source, machine learning, data mining, java library, clustering, feature selection, classification
2 0.27462727 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)
Author: Davis E. King
Abstract: There are many excellent toolkits which provide support for developing machine learning software in Python, R, Matlab, and similar environments. Dlib-ml is an open source library, targeted at both engineers and research scientists, which aims to provide a similarly rich environment for developing machine learning software in the C++ language. Towards this end, dlib-ml contains an extensible linear algebra toolkit with built in BLAS support. It also houses implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking. To enable easy use of these tools, the entire library has been developed with contract programming, which provides complete and precise documentation as well as powerful debugging tools. Keywords: kernel-methods, svm, rvm, kernel clustering, C++, Bayesian networks
Author: Abhik Shah, Peter Woolf
Abstract: In this paper, we introduce PEBL, a Python library and application for learning Bayesian network structure from data and prior knowledge that provides features unmatched by alternative software packages: the ability to use interventional data, flexible specification of structural priors, modeling with hidden variables and exploitation of parallel processing. PEBL is released under the MIT open-source license, can be installed from the Python Package Index and is available at http://pebl-project.googlecode.com. Keywords: Bayesian networks, python, open source software
4 0.16865599 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics
Author: Jens Lehmann
Abstract: In this paper, we introduce DL-Learner, a framework for learning in description logics and OWL. OWL is the ofÄ?Ĺš cial W3C standard ontology language for the Semantic Web. Concepts in this language can be learned for constructing and maintaining OWL ontologies or for solving problems similar to those in Inductive Logic Programming. DL-Learner includes several learning algorithms, support for different OWL formats, reasoner interfaces, and learning problems. It is a cross-platform framework implemented in Java. The framework allows easy programmatic access and provides a command line interface, a graphical interface as well as a WSDL-based web service. Keywords: concept learning, description logics, OWL, classiÄ?Ĺš cation, open-source
5 0.15400271 21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression
Author: Sylvain Arlot, Pascal Massart
Abstract: Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely data-driven calibration algorithm for these parameters in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birg´ and Massart (2007) in the context of penalized least squares for Gaussian hoe moscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a data-driven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a datadriven penalty (the slope heuristics) and proving that it works for penalized least-squares regression with a random design, even for heteroscedastic non-Gaussian data. For technical reasons, some exact mathematical results will be proved only for regressogram bin-width selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general. Keywords: data-driven calibration, non-parametric regression, model selection by penalization, heteroscedastic data, regressogram
6 0.1448791 60 jmlr-2009-Nieme: Large-Scale Energy-Based Models (Machine Learning Open Source Software Paper)
7 0.14040893 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
8 0.13875192 29 jmlr-2009-Estimating Labels from Label Proportions
9 0.13650073 70 jmlr-2009-Particle Swarm Model Selection (Special Topic on Model Selection)
10 0.13635197 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning
11 0.13550481 48 jmlr-2009-Learning Nondeterministic Classifiers
13 0.13270676 82 jmlr-2009-Robustness and Regularization of Support Vector Machines
14 0.13147403 69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization
15 0.13143618 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
16 0.13102798 38 jmlr-2009-Hash Kernels for Structured Data
17 0.13079953 58 jmlr-2009-NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM
18 0.1305085 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures
20 0.12936778 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning