jmlr jmlr2012 jmlr2012-19 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adrian Barbu, Nathan Lay
Abstract: Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants’ budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI data sets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost’s detection rate from 79.6% to 81.2% at 3 false positives/volume. Keywords: online learning, ensemble methods, supervised learning, random forest, implicit online learning
Reference: text
sentIndex sentText sentNum sentScore
1 The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. [sent-5, score-0.975]
2 The market can be trained online by updating the participants’ budgets using training examples. [sent-6, score-0.834]
3 The obtained artificial prediction market is shown to be a maximum likelihood estimator. [sent-9, score-0.69]
4 Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. [sent-11, score-0.763]
5 Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost’s detection rate from 79. [sent-13, score-0.802]
6 prediction markets are capable of fusing the information that the market participants possess through the contract price. [sent-23, score-1.186]
7 An important part of the prediction market is the contract price, which will be shown to be an estimator of the class-conditional probability given the evidence presented through a feature vector x. [sent-29, score-0.706]
8 It turns out that to obtain linear aggregation, each market participant purchases contracts for the class it predicts, regardless of the market price for that contract. [sent-35, score-1.677]
9 3 will be presented special betting functions that make the prediction market equivalent to a logistic regression and a kernel-based classifier respectively. [sent-38, score-1.091]
10 5 (very difficult) as well as on real UCI data show that the prediction market using the specialized classifiers outperforms the random forest in prediction and in estimating the true underlying probability. [sent-45, score-0.877]
11 The Artificial Prediction Market for Classification This work simulates the Iowa electronic market (Wolfers and Zitzewitz, 2004), which is a real prediction market that can be found online at http://www. [sent-48, score-1.376]
12 1 The Iowa Electronic Market The Iowa electronic market (Wolfers and Zitzewitz, 2004) is a forum where contracts for future outcomes of interest (e. [sent-53, score-0.808]
13 Our market will simulate this behavior, with contracts for all the possible outcomes, paying 1 if that outcome is realized. [sent-60, score-0.78]
14 The market consists of a number of market participants (βm , φm (x, c)), m = 1, . [sent-70, score-1.4]
15 A market participant is a pair (β, φ(x, c)) of a budget β and a betting function φ(x, c) : Ω × ∆ → [0, 1]K , φ(x, c) = φ1 (x, c), . [sent-74, score-1.296]
16 The betting function tells what percentage of its budget this participant will allocate to purchase contracts for each class, based on the instance x ∈ Ω and the market price c. [sent-79, score-1.564]
17 As the market price c is not known in advance, the betting function describes what the participant plans to do for each possible price c. [sent-80, score-1.453]
18 We will show that logistic regression and kernel methods can also be represented using the artificial prediction market and specific types of betting functions. [sent-85, score-1.091]
19 In order to bet at most the budget β, the betting functions must satisfy ∑K φk (x, c)) ≤ 1. [sent-86, score-0.717]
20 Examples of betting functions include the following, also shown in Figure 1: • Constant betting functions φk (x, c) = φk (x) for example based on trained classifiers φk (x, c) = ηhk (x), where η ∈ (0, 1] is constant. [sent-143, score-0.828]
21 (1) • Aggressive betting functions 1 k k φ (x, c) = h (x) 0 hk (x)+ε−c k ε 2179 if ck ≤ hk (x) if ck > hk (x) + ε . [sent-145, score-0.938]
22 Given feature vector x, a set of market participants will establish the market equilibrium price c, which is an estimator of P(Y = k|x). [sent-160, score-1.578]
23 3 Training the Artificial Prediction Market Training the market involves initializing all participants with the same budget β0 and presenting to the market a set of training examples (xi , yi ), i = 1, . [sent-164, score-1.554]
24 For each example (xi , yi ) the participants purchase contracts for the different classes based on the market price c (which is not known yet) and their budgets βm are updated based on the contracts purchased and the true outcome yi . [sent-168, score-1.365]
25 m=1 This condition transforms into a set of equations that constrain the market price, which we call the price equations. [sent-187, score-0.759]
26 5 Price Uniqueness The price equations together with the equation ∑K ck = 1 are enough to uniquely determine the k=1 market price c, under mild assumptions on the betting functions φk (x, c). [sent-200, score-1.519]
27 This suggests a class of betting functions φk (x, ck ) depending only on the price ck that are continuous and monotonically non-increasing in ck . [sent-203, score-1.161]
28 , M are continuous and m monotonically non-increasing in ck with φk (x, 0) > 0 then fk (ck ) = c1k ∑M βm φk (x, ck ) is continum m m=1 ous and strictly decreasing in ck as long as fk (ck ) > 0. [sent-207, score-0.742]
29 m ck m=1 Remark 2 If all fk (ck ) are continuous and strictly decreasing in ck as long as fk (ck ) > 0, then for every n > 0, n ≥ nk = fk (1) there is a unique ck = ck (n) that satisfies fk (ck ) = n. [sent-212, score-1.087]
30 To guarantee price uniqueness, we need at least one market participant to satisfy the following Assumption 2 The total bet of participant (βm , φm (x, c)) is positive inside the simplex ∆, that is, K K j ∑ φm (x, c j ) > 0, ∀c ∈ (0, 1)K , ∑ c j = 1. [sent-214, score-1.292]
31 If the betting m m function φm (x, c) of least one participant with βm > 0 satisfies Assumption 2, then for the Budget Update(x, y, c) there is a unique price c = (c1 , . [sent-223, score-0.694]
32 6 Solving the Market Price Equations In practice, a double bisection algorithm could be used to find the equilibrium price, computing each ck (n) by the bisection method, and employing another bisection algorithm to find n such that the price condition ∑K ck (n) = 1 holds. [sent-230, score-0.697]
33 , K repeat fk = ∑m βm φk (x, c) m n = ∑k fk if n = 0 then fk ← fnk rk = fk − ck ck ← (i−1)ck + fk i end if i ← i+1 until ∑k |rk | ≤ ε or n = 0 or i > imax 2. [sent-246, score-0.7]
34 7 Two-class Formulation For the two-class problem, that is, K = 2, the budget equation can be simplified by writing c = (1 − c, c) and obtaining the two-class market price equation M (1 − c) ∑ βm φ2 (x, c) − c m m=1 M ∑ βm φ1 (x, 1 − c) = 0. [sent-247, score-0.912]
35 Different betting functions give different ways to fuse the market participants. [sent-256, score-1.018]
36 In what follows we prove that by choosing specific betting functions, the artificial prediction market behaves like a linear aggregator or logistic regressor, or that it can be used as a kernel-based classifier. [sent-257, score-1.091]
37 1 Constant Betting and Linear Aggregation For markets with constant betting functions, φk (x, c) = φk (x) the market price has a simple analytic m m formula, proved in the Appendix. [sent-259, score-1.455]
38 (7) ∑m=1 ∑K βm φk (x) m k=1 Furthermore, if the betting functions are based on classifiers φk (x, c) = ηhk (x) then the equilibrium m m price is obtained by linear aggregation c= ∑M βm hm (x) m=1 = ∑ αm hm (x). [sent-262, score-0.689]
39 ∑M βm m m=1 This way the artificial prediction market can model linear aggregation of classifiers. [sent-263, score-0.749]
40 In particular, the random forest (Breiman, 2001) can be viewed as an artificial prediction market with constant betting (linear aggregation) where all participants are random trees with the same budget βm = 1, m = 1, . [sent-267, score-1.457]
41 3 Relation to Kernel Methods Here we construct a market participant from each training example (xn , yn ), n = 1, . [sent-286, score-0.807]
42 We construct a participant from xT training example (xm , ym ) by defining the following betting functions in terms of um (x) = xmm xx : um (x) if um (x) ≥ 0 0 else φym (x) = um (x)+ = m 2−y φm m (x) , 0 if um (x) ≥ 0 −um (x) else . [sent-290, score-0.953]
43 − = −um (x) = (10) Observe that these betting functions do not depend on the contract price c, so it is a constant market but not one based on classifiers. [sent-291, score-1.189]
44 In Figure 3, left, is shown an example of the decision boundary of a market trained online with an RBF kernel with σ = 0. [sent-298, score-0.717]
45 This example shows that the artificial prediction market is an online method with enough modeling power to represent complex decision boundaries such as those given by RBF kernels through the betting functions of the participants. [sent-320, score-1.122]
46 (11) ˆ N i=1 N i=1 We will again use the total amount bet B(x, c) = ∑M ∑K βm φk (x, c) for observation x at m m=1 k=1 market price c. [sent-342, score-0.966]
47 cyi (xi ) k=1 m (12) A RTIFICIAL P REDICTION M ARKETS Equation (12) can be viewed as presenting all observations (xi , yi ) to the market simultaneously instead of sequentially. [sent-348, score-0.7]
48 The prediction market update (15) finds an approximate maximum of the likelihood (11) subject to the constraint ∑M γ2 = 1 by an approximate constrained m=1 m stochastic gradient ascent. [sent-358, score-0.731]
49 An important issue for the real prediction markets is the efficient market hypothesis, which states that the market price fuses in an optimal way the information available to the market participants (Fama, 1970; Basu, 1977; Malkiel, 2003). [sent-367, score-2.508]
50 In general, an untrained market (in which the budgets have not been updated based on training data) will not satisfy the efficient market hypothesis. [sent-369, score-1.363]
51 The market trained with a large amount of representative training data and small η satisfies the efficient market hypothesis. [sent-371, score-1.305]
52 Specialized Classifiers The prediction market is capable of fusing the information available to the market participants, which can be trained classifiers. [sent-373, score-1.357]
53 The artificial prediction market can aggregate such classifiers, transformed into participants that don’t bet anything outside of their domain of expertise Di ⊂ Ω. [sent-379, score-1.032]
54 Evaluating this market on 1000 positives and 1000 negatives showed that the market obtained a prediction accuracy of 100%. [sent-393, score-1.294]
55 It can be verified using Equation (7) that constant specialized betting is the linear aggregation of the participants that are currently betting. [sent-404, score-0.689]
56 Related Work This work borrows prediction market ideas from Economics and brings them to Machine Learning for supervised aggregation of classifiers or features in general. [sent-407, score-0.749]
57 In parimutuel betting contracts are sold for all possible outcomes (classes) and the entire budget (minus fees) is divided between the participants that purchased contracts for the winning outcome. [sent-416, score-1.024]
58 First our work uses the Iowa electronic market instead of parimutuel betting with odds-updating. [sent-423, score-1.082]
59 Using the Iowa model allowed us to obtain a closed form equation for the market price in some important cases. [sent-424, score-0.778]
60 Third, the analytical market price formulation allowed us to prove that the constant market performs maximum likelihood learning. [sent-428, score-1.401]
61 In this regard, the prediction market also solves an implicit equation at each step for finding the new model, but does not balance two criteria like the implicit online learning method. [sent-433, score-0.852]
62 In experiments, we observed that the prediction market obtains significantly smaller misclassification errors on many data sets compared to implicit online learning. [sent-435, score-0.799]
63 However, instead of having a reject rule for the aggregated classifier, each market participant has his own reject rule to decide on what observations to contribute to the aggregation. [sent-437, score-0.846]
64 ROC-based reject rules (Tortorella, 2004) could be found for each market participant and used for defining its domain of specialization. [sent-438, score-0.816]
65 If the overall reject option is not desired, one could avoid having instances for which no classifiers bet by including in the market a set of participants that are all the leaves of a number of random trees. [sent-441, score-1.032]
66 This approach can be seen as a market with two participants that are not overlapping. [sent-446, score-0.777]
67 Each participant is paid by an entity called the market maker according to a predefined scoring rule. [sent-458, score-0.786]
68 The second prediction market mechanism is the machine learning market (Storkey, 2011; Storkey et al. [sent-459, score-1.294]
69 Each market participant purchases contracts for the possible outcomes to maximize its own utility function. [sent-461, score-0.945]
70 These markets have the same classifiers, namely the leaves of the trained random trees, but differ either in the betting functions or in the way the budgets are trained as follows: 1. [sent-467, score-0.886]
71 The first market has constant betting and equal budgets for all participants. [sent-468, score-1.114]
72 The second market has constant betting based on specialized classifiers (the leaves of the random trees), with the budgets initialized with the same values like the market 1 above, but trained using the update equation (13). [sent-472, score-1.915]
73 The third market has linear betting functions (1), for which the market price can be computed analytically only for binary classification. [sent-475, score-1.777]
74 The market is initialized with equal budgets and trained using Equation (15). [sent-476, score-0.757]
75 The fourth market has aggressive betting (2) with ε = 0. [sent-478, score-1.092]
76 01 and the market price computed using the Mann iteration Algorithm 3. [sent-479, score-0.759]
77 The market is initialized with equal budgets and trained using Equation (15). [sent-480, score-0.757]
78 The markets investigated are the constant market with both incremental and batch updates, given in Equations (13) and (12) respectively, the linear and aggressive markets with incremental updates given in (15). [sent-495, score-1.395]
79 02 0 Number of Epochs 5 10 15 20 25 30 35 40 45 50 Number of Epochs Figure 5: Experiments on the satimage data set for the incremental and batch market updates. [sent-525, score-0.713]
80 The aggressive and constant markets achieve similar values of the negative log likelihood and similar training errors, but the aggressive market seems to overfit more since the test error is larger than the constant incremental (p-value< 0. [sent-537, score-1.147]
81 As one could see, the aggressive and constant betting markets obtain significantly better (p-value < 0. [sent-630, score-0.77]
82 On the other hand, the linear betting market obtains probability estimators significantly better (p-value < 0. [sent-633, score-1.018]
83 The markets evaluated are our implementation of random forest (RF), and markets with Constant (CB), Linear (LB) and respectively Aggressive (AB) Betting. [sent-805, score-0.698]
84 For our problem we use ℓ(β) = − log(cy (β)) where cy (β) is the constant market equilibrium price for ground truth label y. [sent-816, score-0.837]
85 The comparisons are done with paired t-tests and shown with ∗ and ‡ when the constant betting market is significantly (α < 0. [sent-827, score-1.018]
86 5 Comparison with Adaboost for Lymph Node Detection Finally, we compared the linear aggregation capability of the artificial prediction market with adaboost for a lymph node detection problem. [sent-977, score-0.925]
87 The constant betting market of the 2048 participants is initialized with these budgets and trained with the same training examples that were used to train the adaboost classifier. [sent-994, score-1.398]
88 Right: ROC curves for adaboost and the constant betting market with participants as the 2048 adaboost weak classifier bins. [sent-1030, score-1.314]
89 2198 A RTIFICIAL P REDICTION M ARKETS The adaboost classifier and the constant market were evaluated for a lymph node detection application on a data set containing 54 CT scans of the pelvic and abdominal region, with a total of 569 lymph nodes, with six-fold cross-validation. [sent-1032, score-0.858]
90 In Figure 8, right, are shown the training and test ROC curves of adaboost and the constant market trained with 7 epochs. [sent-1038, score-0.753]
91 The artificial prediction market is a novel online learning algorithm that can be easily implemented for two class and multi class applications. [sent-1046, score-0.727]
92 Experimental comparisons on real and synthetic data show that the prediction market usually outperforms random forest, adaboost and implicit online learning in prediction accuracy. [sent-1049, score-0.899]
93 It can obtain meaningful probability estimates when only a subset of the market participants are involved for a particular instance x ∈ X. [sent-1057, score-0.777]
94 This feature is useful for learning on manifolds (Belkin and Niyogi, 2004; Elgammal and Lee, 2004; Saul and Roweis, 2003), where the location on the manifold decides which market participants should be involved. [sent-1058, score-0.777]
95 Because of their betting functions, the specialized market participants can decide for which instances they bet and how much. [sent-1061, score-1.441]
96 These extensions involve contracts for uncountably many outcomes but the update and the market price equations extend naturally. [sent-1064, score-0.959]
97 , logistic regression, or betting for a single class) can be used as specialized market participants for that region. [sent-1069, score-1.259]
98 k=1 Either way, since ∑K ck (n) is continuous, strictly decreasing, and since ∑K ck (n∗ ) ≥ 1 and k=1 k=1 limn→∞ ∑K ck (n) = 0, there exists a unique n > 0 such that ∑K ck (n) = 1. [sent-1104, score-0.84]
99 , vation (xi , yi ), we have the market price for label yi : M M cyi (xi ) = ∑ m=1 γ2 φyi (xi )/( ∑ m m βm ) and an obser- K ∑ γ2 φk (xi )). [sent-1125, score-0.854]
100 Parimutuel betting markets as information aggregation devices: Experimental results. [sent-1329, score-0.774]
wordName wordTfidf (topN-words)
[('market', 0.623), ('betting', 0.395), ('markets', 0.301), ('ck', 0.21), ('bet', 0.207), ('participant', 0.163), ('participants', 0.154), ('price', 0.136), ('contracts', 0.132), ('barbu', 0.125), ('budget', 0.115), ('budgets', 0.096), ('forest', 0.096), ('arkets', 0.081), ('rtificial', 0.081), ('aggregation', 0.078), ('aggressive', 0.074), ('um', 0.071), ('adaboost', 0.071), ('rediction', 0.063), ('ay', 0.062), ('specialized', 0.062), ('cyi', 0.059), ('lymph', 0.059), ('fk', 0.056), ('online', 0.056), ('rf', 0.055), ('implicit', 0.053), ('epochs', 0.048), ('prediction', 0.048), ('ers', 0.047), ('cb', 0.045), ('rfb', 0.044), ('equilibrium', 0.042), ('update', 0.041), ('hk', 0.041), ('xm', 0.038), ('trained', 0.038), ('conserved', 0.038), ('parimutuel', 0.038), ('iowa', 0.037), ('mann', 0.037), ('classi', 0.036), ('cy', 0.036), ('incremental', 0.035), ('contract', 0.035), ('bisection', 0.033), ('xi', 0.033), ('perols', 0.031), ('purchased', 0.031), ('wolfers', 0.031), ('reject', 0.03), ('satimage', 0.029), ('outcomes', 0.027), ('detection', 0.026), ('electronic', 0.026), ('batch', 0.026), ('trees', 0.026), ('logistic', 0.025), ('fusing', 0.025), ('misclassification', 0.025), ('outcome', 0.025), ('bayes', 0.024), ('nk', 0.023), ('misclassi', 0.022), ('fusion', 0.022), ('specialization', 0.022), ('won', 0.022), ('breiman', 0.022), ('arti', 0.021), ('kulis', 0.021), ('training', 0.021), ('ine', 0.02), ('node', 0.02), ('cial', 0.02), ('equation', 0.019), ('likelihood', 0.019), ('ln', 0.019), ('limck', 0.019), ('plott', 0.019), ('presidential', 0.019), ('zipcode', 0.019), ('zitzewitz', 0.019), ('errors', 0.019), ('hm', 0.019), ('ym', 0.019), ('uci', 0.018), ('uniqueness', 0.018), ('leaves', 0.018), ('yi', 0.018), ('hy', 0.018), ('lb', 0.018), ('percent', 0.018), ('er', 0.017), ('ak', 0.017), ('limn', 0.017), ('ct', 0.016), ('storkey', 0.016), ('classifier', 0.016), ('commerce', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification
Author: Adrian Barbu, Nathan Lay
Abstract: Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants’ budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI data sets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost’s detection rate from 79.6% to 81.2% at 3 false positives/volume. Keywords: online learning, ensemble methods, supervised learning, random forest, implicit online learning
2 0.056152776 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
Author: Zhuang Wang, Koby Crammer, Slobodan Vucetic
Abstract: Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness. When equipped with kernel functions, similarly to other SVM learning algorithms, SGD is susceptible to the curse of kernelization that causes unbounded linear growth in model size and update time with data size. This may render SGD inapplicable to large data sets. We address this issue by presenting a class of Budgeted SGD (BSGD) algorithms for large-scale kernel SVM training which have constant space and constant time complexity per update. Specifically, BSGD keeps the number of support vectors bounded during training through several budget maintenance strategies. We treat the budget maintenance as a source of the gradient error, and show that the gap between the BSGD and the optimal SVM solutions depends on the model degradation due to budget maintenance. To minimize the gap, we study greedy budget maintenance methods based on removal, projection, and merging of support vectors. We propose budgeted versions of several popular online SVM algorithms that belong to the SGD family. We further derive BSGD algorithms for multi-class SVM training. Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction. Keywords: SVM, large-scale learning, online learning, stochastic gradient descent, kernel methods
3 0.042626947 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman
Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency. Keywords: high-dimensional undirected graph estimation, glasso, huge, semiparametric graph estimation, data-dependent model selection, lossless screening, lossy screening 1. Overview Undirected graphs is a natural approach to describe the conditional independence among many variables. Each node of the graph represents a single variable and no edge between two variables implies that they are conditional independent given all other variables. In the past decade, significant progress has been made on designing efficient algorithms to learn undirected graphs from high-dimensional observational data sets. Most of these methods are based on either the penalized maximum-likelihood estimation (Friedman et al., 2007) or penalized regression methods (Meinshausen and B¨ hlmann, 2006). Existing packages include glasso, Covpath and CLIME. In particuu ∗. Also in the Department of Biostatistics. †. Also in the Department of Machine Learning. c 2012 Zhao, Liu, Roeder, Lafferty and Wasserman. Z HAO , L IU , ROEDER , L AFFERTY AND WASSERMAN lar, the glasso package has been widely adopted by statisticians and computer scientists due to its friendly user-inference and efficiency. In this paper1 we describe a newly developed R package named huge (High-dimensional Undirected Graph Estimation) coded in C. The package includes a wide range of functional modules and addresses some drawbacks of the graphical lasso algorithm. To gain more scalability, the package supports two modes of screening, lossless (Witten et al., 2011) and lossy screening. When using lossy screening, the user can select the desired screening level to scale up for high-dimensional problems, but this introduces some estimation bias. 2. Software Design and Implementation The package huge aims to provide a general framework for high-dimensional undirected graph estimation. The package includes Six functional modules (M1-M6) facilitate a flexible pipeline for analysis (Figure 1). M1. Data Generator: The function huge.generator() can simulate multivariate Gaussian data with different undirected graphs, including hub, cluster, band, scale-free, and Erd¨ s-R´ nyi o e random graphs. The sparsity level of the obtained graph and signal-to-noise ratio can also be set up by users. M2. Semiparametric Transformation: The function huge.npn() implements the nonparanormal method (Liu et al., 2009, 2012) for estimating a semiparametric Gaussian copula model.The nonparanormal family extends the Gaussian distribution by marginally transforming the variables. Computationally, the nonparanormal transformation only requires one pass through the data matrix. M3. Graph Screening: The scr argument in the main function huge() controls the use of largescale correlation screening before graph estimation. The function supports the lossless screening (Witten et al., 2011) and the lossy screening. Such screening procedures can greatly reduce the computational cost and achieve equal or even better estimation by reducing the variance at the expense of increased bias. Figure 1: The graph estimation pipeline. M4. Graph Estimation: Similar to the glasso package, the method argument in the huge() function supports two estimation methods: (i) the neighborhood pursuit algorithm (Meinshausen and B¨ hlmann, 2006) and (ii) the graphical lasso algorithm (Friedman et al., 2007). We apply u the coordinate descent with active set and covariance update, as well as other tricks suggested in Friedman et al. (2010). We modified the warm start trick to address the potential divergence problem of the graphical lasso algorithm (Mazumder and Hastie, 2011). The code is also memory-optimized using the sparse matrix data structure when estimating and storing full regularization paths for large 1. This paper is only a summary of the package huge. For more details please refer to the online vignette. 1060 H IGH - DIMENSIONAL U NDIRECTED G RAPH E STIMATION data sets. we also provide a complementary graph estimation method based on thresholding the sample correlation matrix, which is computationally efficient and widely applied in biomedical research. M5. Model Selection: The function huge.select() provides two regularization parameter selection methods: the stability approach for regularization selection (StARS) (Liu et al., 2010); and rotation information criterion (RIC). We also provide a likelihood-based extended Bayesian information criterion. M6. Graph Visualization: The plotting functions huge.plot() and plot() provide visualizations of the simulated data sets, estimated graphs and paths. The implementation is based on the igraph package. 3. User Interface by Example We illustrate the user interface by analyzing a stock market data which we contribute to the huge package. We acquired closing prices from all stocks in the S&P; 500 for all the days that the market was open between Jan 1, 2003 and Jan 1, 2008. This gave us 1258 samples for the 452 stocks that remained in the S&P; 500 during the entire time period. > > > > > library(huge) data(stockdata) # Load the data x = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) # Preprocessing x.npn = huge.npn(x, npn.func=
4 0.036479656 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
Author: Koby Crammer, Mark Dredze, Fernando Pereira
Abstract: Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of textcategorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing. Keywords: online learning, confidence prediction, text categorization
5 0.035837658 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAM challenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010). Keywords: causality, graphical models, randomized experiments, structural equation models, latent variables, latent confounders, cycles
6 0.031626783 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
7 0.028617071 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
8 0.027988814 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels
9 0.027334169 20 jmlr-2012-Analysis of a Random Forests Model
10 0.026980197 91 jmlr-2012-Plug-in Approach to Active Learning
11 0.026880845 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity
12 0.025914581 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
13 0.025756227 97 jmlr-2012-Regularization Techniques for Learning with Matrices
14 0.02552199 82 jmlr-2012-On the Necessity of Irrelevant Variables
15 0.025462668 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
16 0.022898059 84 jmlr-2012-Online Submodular Minimization
17 0.021873949 106 jmlr-2012-Sign Language Recognition using Sub-Units
18 0.021504221 13 jmlr-2012-Active Learning via Perfect Selective Classification
19 0.021272123 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
20 0.020987343 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
topicId topicWeight
[(0, -0.106), (1, -0.004), (2, 0.018), (3, -0.005), (4, 0.016), (5, 0.019), (6, 0.055), (7, 0.04), (8, 0.031), (9, 0.007), (10, 0.023), (11, 0.032), (12, -0.011), (13, -0.072), (14, 0.005), (15, 0.043), (16, 0.049), (17, -0.041), (18, -0.1), (19, 0.1), (20, -0.011), (21, 0.103), (22, 0.014), (23, 0.1), (24, 0.105), (25, 0.051), (26, -0.078), (27, 0.128), (28, -0.001), (29, -0.114), (30, 0.01), (31, -0.229), (32, 0.053), (33, 0.013), (34, -0.02), (35, -0.123), (36, 0.058), (37, 0.131), (38, -0.357), (39, -0.286), (40, 0.343), (41, -0.116), (42, -0.063), (43, -0.04), (44, -0.017), (45, -0.009), (46, -0.049), (47, 0.043), (48, -0.025), (49, 0.271)]
simIndex simValue paperId paperTitle
same-paper 1 0.95639485 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification
Author: Adrian Barbu, Nathan Lay
Abstract: Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants’ budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI data sets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost’s detection rate from 79.6% to 81.2% at 3 false positives/volume. Keywords: online learning, ensemble methods, supervised learning, random forest, implicit online learning
2 0.39749771 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
Author: Zhuang Wang, Koby Crammer, Slobodan Vucetic
Abstract: Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness. When equipped with kernel functions, similarly to other SVM learning algorithms, SGD is susceptible to the curse of kernelization that causes unbounded linear growth in model size and update time with data size. This may render SGD inapplicable to large data sets. We address this issue by presenting a class of Budgeted SGD (BSGD) algorithms for large-scale kernel SVM training which have constant space and constant time complexity per update. Specifically, BSGD keeps the number of support vectors bounded during training through several budget maintenance strategies. We treat the budget maintenance as a source of the gradient error, and show that the gap between the BSGD and the optimal SVM solutions depends on the model degradation due to budget maintenance. To minimize the gap, we study greedy budget maintenance methods based on removal, projection, and merging of support vectors. We propose budgeted versions of several popular online SVM algorithms that belong to the SGD family. We further derive BSGD algorithms for multi-class SVM training. Comprehensive empirical results show that BSGD achieves higher accuracy than the state-of-the-art budgeted online algorithms and comparable to non-budget algorithms, while achieving impressive computational efficiency both in time and space during training and prediction. Keywords: SVM, large-scale learning, online learning, stochastic gradient descent, kernel methods
3 0.33732671 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAM challenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010). Keywords: causality, graphical models, randomized experiments, structural equation models, latent variables, latent confounders, cycles
4 0.23770218 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
Author: Gil Tahan, Lior Rokach, Yuval Shahar
Abstract: This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files. Keywords: computer security, malware detection, common segment analysis, supervised learning
5 0.23547919 72 jmlr-2012-Multi-Target Regression with Rule Ensembles
Author: Timo Aho, Bernard Ženko, Sašo Džeroski, Tapio Elomaa
Abstract: Methods for learning decision rules are being successfully applied to many problem domains, in particular when understanding and interpretation of the learned model is necessary. In many real life problems, we would like to predict multiple related (nominal or numeric) target attributes simultaneously. While several methods for learning rules that predict multiple targets at once exist, they are all based on the covering algorithm, which does not work well for regression problems. A better solution for regression is the rule ensemble approach that transcribes an ensemble of decision trees into a large collection of rules. An optimization procedure is then used to select the best (and much smaller) subset of these rules and to determine their respective weights. We introduce the F IRE algorithm for solving multi-target regression problems, which employs the rule ensembles approach. We improve the accuracy of the algorithm by adding simple linear functions to the ensemble. We also extensively evaluate the algorithm with and without linear functions. The results show that the accuracy of multi-target regression rule ensembles is high. They are more accurate than, for instance, multi-target regression trees, but not quite as accurate as multi-target random forests. The rule ensembles are significantly more concise than random forests, and it is also possible to create compact rule sets that are smaller than a single regression tree but still comparable in accuracy. Keywords: multi-target prediction, rule learning, rule ensembles, regression ∗. Also in Microtask, Tampere, Finland. †. Also in the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia. ‡. Also in the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia and the Joˇ ef Stefan International Postgraduate School, Ljubljana, Slovenia. z ˇ c 2012 Timo Aho, Bernard Zenko, Saˇo Dˇ eroski and Tapio Elomaa. s z ˇ ˇ
6 0.23081198 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
7 0.20754649 60 jmlr-2012-Local and Global Scaling Reduce Hubs in Space
8 0.19749525 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
9 0.18014529 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels
10 0.17865565 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
11 0.16308598 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
12 0.16074555 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors
13 0.15899697 20 jmlr-2012-Analysis of a Random Forests Model
14 0.1468159 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory
15 0.14437328 82 jmlr-2012-On the Necessity of Irrelevant Variables
16 0.1432721 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression
17 0.14216974 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
18 0.13582596 106 jmlr-2012-Sign Language Recognition using Sub-Units
19 0.12172877 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
20 0.12123588 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity
topicId topicWeight
[(7, 0.013), (19, 0.448), (21, 0.029), (26, 0.022), (27, 0.019), (29, 0.04), (35, 0.01), (49, 0.023), (57, 0.015), (64, 0.014), (69, 0.011), (75, 0.078), (77, 0.016), (79, 0.014), (81, 0.019), (92, 0.039), (96, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.70860803 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification
Author: Adrian Barbu, Nathan Lay
Abstract: Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants’ budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI data sets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost’s detection rate from 79.6% to 81.2% at 3 false positives/volume. Keywords: online learning, ensemble methods, supervised learning, random forest, implicit online learning
2 0.28349996 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
Author: Fei Yan, Josef Kittler, Krystian Mikolajczyk, Atif Tahir
Abstract: Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general ℓ p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of ℓ p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that ℓ p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, ℓ p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines. Keywords: multiple kernel learning, kernel fisher discriminant analysis, regularised least squares, support vector machines
3 0.27906272 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan
Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efficient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efficient, crowdsourcing
4 0.27570036 44 jmlr-2012-Feature Selection via Dependence Maximization
Author: Le Song, Alex Smola, Arthur Gretton, Justin Bedo, Karsten Borgwardt
Abstract: We introduce a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our approach leads to a greedy procedure for feature selection. We show that a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice. Keywords: kernel methods, feature selection, independence measure, Hilbert-Schmidt independence criterion, Hilbert space embedding of distribution
5 0.27289513 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization
Author: Francesco Orabona, Luo Jie, Barbara Caputo
Abstract: In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufficient to reach good solutions. Experiments on standard benchmark databases support our claims. Keywords: multiple kernel learning, learning kernels, online optimization, stochastic subgradient descent, convergence bounds, large scale
6 0.27218831 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
7 0.27097875 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
8 0.27059847 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach
10 0.26931688 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity
11 0.26847461 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
12 0.26751518 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
13 0.26749915 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
14 0.26477021 106 jmlr-2012-Sign Language Recognition using Sub-Units
15 0.26469737 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
16 0.26307392 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
17 0.26215693 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
18 0.26209337 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
19 0.26153123 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
20 0.26129812 100 jmlr-2012-Robust Kernel Density Estimation