fast_ml fast_ml-2013 fast_ml-2013-39 knowledge-graph by maker-knowledge-mining

39 fast ml-2013-09-19-What you wanted to know about AUC


meta infos for this blog

Source: html

Introduction: AUC, or Area Under Curve, is a metric for binary classification. It’s probably the second most popular one, after accuracy. Unfortunately, it’s nowhere near as intuitive. That is, until you have read this article. Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. But many classifiers are able to quantify their uncertainty about the answer by outputting a probability value. To compute accuracy from probabilities you need a threshold to decide when zero turns into one. The most natural threshold is of course 0.5. Let’s suppose you have a quirky classifier. It is able to get all the answers right, but it outputs 0.7 for negative examples and 0.9 for positive examples. Clearly, a threshold of 0.5 won’t get you far here. But 0.8 would be just perfect. That’s the whole point of using AUC - it considers all possible thresholds. Various thresholds result in different true positive/false positive rates. As you decrease the threshold, you get


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. [sent-5, score-0.053]

2 To compute accuracy from probabilities you need a threshold to decide when zero turns into one. [sent-7, score-0.543]

3 It is able to get all the answers right, but it outputs 0. [sent-11, score-0.12]

4 Various thresholds result in different true positive/false positive rates. [sent-19, score-0.244]

5 As you decrease the threshold, you get more true positives, but also more false positives. [sent-20, score-0.264]

6 The relation between them can be plotted: Image credit: Wikipedia From a random classifier you can expect as many true positives as false positives. [sent-21, score-0.657]

7 Computing AUC You compute AUC from a vector of predictions and a vector of true labels. [sent-27, score-0.392]

8 auc In Matlab (but not Octave), you have perfcurve which can even return the best threshold, or optimal operating point In R, one of the packages that provide ROC AUC is caTools Finer points Let’s get more precise with naming. [sent-31, score-0.852]

9 ROC stands for Receiver Operating Characteristic, a term from signal theory. [sent-33, score-0.053]

10 Sometimes you may encounter references to ROC or ROC curve - think AUC then. [sent-34, score-0.168]

11 But wait - Gael Varoquaux points out that AUC is not always area under the curve of a ROC curve. [sent-35, score-0.441]

12 In the situation where you have imbalanced classes, it is often more useful to report AUC for a precision-recall curve. [sent-36, score-0.173]

13 ROC AUC is insensitive to imbalanced classes, however. [sent-37, score-0.127]

14 9 ); % mostly negatives p = zeros(1000,1); % always predicting zero [X,Y,T,AUC] = perfcurve(y,p,1) % 0. [sent-39, score-0.198]

15 In case your class labels are mostly negative or mostly positive, a classifier that always outputs 0 or 1, respectively, will achieve high accuracy. [sent-41, score-0.651]

16 The same goes to random predictions: y = real( rand(1000,1) > 0. [sent-44, score-0.056]

17 1 ); % mostly positives p = rand(1000,1); % random predictions [X,Y,T,AUC] = perfcurve(y,p,1) % 0. [sent-45, score-0.17]

18 9 ); % mostly negatives p = rand(1000,1); % random predictions [X,Y,T,AUC] = perfcurve(y,p,1) % 0. [sent-47, score-0.17]

19 4883 plot(X,Y) And zero zero seven, there’s one more thing you need to know about AUC: it doesn’t care about absolute values, it only cares about ranking. [sent-48, score-0.166]

20 This means that your quirky classifier might as well output 700 for negatives and 900 for positives and it would be perfectly OK for AUC, even when you’re supposed to provide probabilities. [sent-49, score-0.603]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('roc', 0.443), ('auc', 0.403), ('rand', 0.317), ('perfcurve', 0.253), ('threshold', 0.232), ('positives', 0.211), ('curve', 0.168), ('area', 0.139), ('negatives', 0.139), ('imbalanced', 0.127), ('quirky', 0.127), ('positive', 0.126), ('classifier', 0.126), ('true', 0.118), ('mostly', 0.114), ('operating', 0.093), ('false', 0.093), ('negative', 0.093), ('always', 0.084), ('zero', 0.083), ('matlab', 0.08), ('predictions', 0.073), ('real', 0.072), ('compute', 0.067), ('probabilities', 0.067), ('vector', 0.067), ('outputs', 0.067), ('zeros', 0.063), ('classes', 0.056), ('random', 0.056), ('class', 0.053), ('hamner', 0.053), ('perfect', 0.053), ('octave', 0.053), ('stands', 0.053), ('gael', 0.053), ('varoquaux', 0.053), ('answers', 0.053), ('decrease', 0.053), ('precise', 0.053), ('relation', 0.053), ('points', 0.05), ('accuracy', 0.048), ('ben', 0.046), ('respectively', 0.046), ('versions', 0.046), ('decide', 0.046), ('classifiers', 0.046), ('favourite', 0.046), ('report', 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 39 fast ml-2013-09-19-What you wanted to know about AUC

Introduction: AUC, or Area Under Curve, is a metric for binary classification. It’s probably the second most popular one, after accuracy. Unfortunately, it’s nowhere near as intuitive. That is, until you have read this article. Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. But many classifiers are able to quantify their uncertainty about the answer by outputting a probability value. To compute accuracy from probabilities you need a threshold to decide when zero turns into one. The most natural threshold is of course 0.5. Let’s suppose you have a quirky classifier. It is able to get all the answers right, but it outputs 0.7 for negative examples and 0.9 for positive examples. Clearly, a threshold of 0.5 won’t get you far here. But 0.8 would be just perfect. That’s the whole point of using AUC - it considers all possible thresholds. Various thresholds result in different true positive/false positive rates. As you decrease the threshold, you get

2 0.13120295 17 fast ml-2013-01-14-Feature selection in practice

Introduction: Lately we’ve been working with the Madelon dataset. It was originally prepared for a feature selection challenge, so while we’re at it, let’s select some features. Madelon has 500 attributes, 20 of which are real, the rest being noise. Hence the ideal scenario would be to select just those 20 features. Fortunately we know just the right software for this task. It’s called mRMR , for minimum Redundancy Maximum Relevance , and is available in C and Matlab versions for various platforms. mRMR expects a CSV file with labels in the first column and feature names in the first row. So the game plan is: combine training and validation sets into a format expected by mRMR run selection filter the original datasets, discarding all features but the selected ones evaluate the results on the validation set if all goes well, prepare and submit files for the competition We’ll use R scripts for all the steps but feature selection. Now a few words about mRMR. It will show you p

3 0.10197529 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was

4 0.084561668 35 fast ml-2013-08-12-Accelerometer Biometric Competition

Introduction: Can you recognize users of mobile devices from accelerometer data? It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. So we wrote some code and ran EC2 and then wrote more and ran EC2 again. After much computation we had our predictions, submitted them and achieved AUC = 0.83. Yay! But there’s a twist to this story. An interesting thing happened when we mistyped a command. Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc.py train_v.vw r.txt And got 0.86. Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. Turns out you can use this set or that set and get pretty much the same good result with each. That got us thinking. We computed a mean of predictions for each device. Here’s the plot: And here’

5 0.081505194 30 fast ml-2013-06-01-Amazon aspires to automate access control

Introduction: This is about Amazon access control challenge at Kaggle. Either we’re getting smarter, or the competition is easy. Or maybe both. You can beat the benchmark quite easily and with AUC of 0.875 you’d be comfortably in the top twenty percent at the moment. We scored fourth in our first attempt - the model was quick to develop and back then there were fewer competitors. Traditionally we use Vowpal Wabbit . Just simple binary classification with the logistic loss function and 10 passes over the data. It seems to work pretty well even though the classes are very unbalanced: there’s only a handful of negatives when compared to positives. Apparently Amazon employees usually get the access they request, even though sometimes they are refused. Let’s look at the data. First a label and then a bunch of IDs. 1,39353,85475,117961,118300,123472,117905,117906,290919,117908 1,17183,1540,117961,118343,123125,118536,118536,308574,118539 1,36724,14457,118219,118220,117884,117879,267952

6 0.07156451 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

7 0.06466151 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

8 0.06176896 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

9 0.061322924 25 fast ml-2013-04-10-Gender discrimination

10 0.05198383 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

11 0.047745276 19 fast ml-2013-02-07-The secret of the big guys

12 0.046155926 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

13 0.045985762 36 fast ml-2013-08-23-A bag of words and a nice little network

14 0.045956124 40 fast ml-2013-10-06-Pylearn2 in practice

15 0.045535237 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

16 0.045360383 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

17 0.044696063 42 fast ml-2013-10-28-How much data is enough?

18 0.043995775 26 fast ml-2013-04-17-Regression as classification

19 0.042849842 10 fast ml-2012-11-17-The Facebook challenge HOWTO

20 0.041991819 43 fast ml-2013-11-02-Maxing out the digits


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.187), (1, -0.023), (2, -0.088), (3, -0.014), (4, -0.043), (5, 0.04), (6, -0.141), (7, 0.018), (8, 0.017), (9, 0.274), (10, 0.198), (11, -0.26), (12, -0.32), (13, -0.026), (14, 0.325), (15, 0.053), (16, -0.017), (17, -0.165), (18, -0.118), (19, 0.201), (20, 0.01), (21, -0.032), (22, 0.087), (23, -0.022), (24, -0.076), (25, 0.036), (26, 0.093), (27, -0.141), (28, -0.015), (29, 0.069), (30, -0.151), (31, -0.059), (32, 0.236), (33, -0.144), (34, 0.082), (35, -0.054), (36, -0.164), (37, 0.027), (38, 0.158), (39, 0.18), (40, -0.053), (41, 0.159), (42, 0.159), (43, -0.028), (44, 0.232), (45, -0.006), (46, -0.088), (47, -0.134), (48, 0.243), (49, -0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99123549 39 fast ml-2013-09-19-What you wanted to know about AUC

Introduction: AUC, or Area Under Curve, is a metric for binary classification. It’s probably the second most popular one, after accuracy. Unfortunately, it’s nowhere near as intuitive. That is, until you have read this article. Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. But many classifiers are able to quantify their uncertainty about the answer by outputting a probability value. To compute accuracy from probabilities you need a threshold to decide when zero turns into one. The most natural threshold is of course 0.5. Let’s suppose you have a quirky classifier. It is able to get all the answers right, but it outputs 0.7 for negative examples and 0.9 for positive examples. Clearly, a threshold of 0.5 won’t get you far here. But 0.8 would be just perfect. That’s the whole point of using AUC - it considers all possible thresholds. Various thresholds result in different true positive/false positive rates. As you decrease the threshold, you get

2 0.17331423 30 fast ml-2013-06-01-Amazon aspires to automate access control

Introduction: This is about Amazon access control challenge at Kaggle. Either we’re getting smarter, or the competition is easy. Or maybe both. You can beat the benchmark quite easily and with AUC of 0.875 you’d be comfortably in the top twenty percent at the moment. We scored fourth in our first attempt - the model was quick to develop and back then there were fewer competitors. Traditionally we use Vowpal Wabbit . Just simple binary classification with the logistic loss function and 10 passes over the data. It seems to work pretty well even though the classes are very unbalanced: there’s only a handful of negatives when compared to positives. Apparently Amazon employees usually get the access they request, even though sometimes they are refused. Let’s look at the data. First a label and then a bunch of IDs. 1,39353,85475,117961,118300,123472,117905,117906,290919,117908 1,17183,1540,117961,118343,123125,118536,118536,308574,118539 1,36724,14457,118219,118220,117884,117879,267952

3 0.14748996 17 fast ml-2013-01-14-Feature selection in practice

Introduction: Lately we’ve been working with the Madelon dataset. It was originally prepared for a feature selection challenge, so while we’re at it, let’s select some features. Madelon has 500 attributes, 20 of which are real, the rest being noise. Hence the ideal scenario would be to select just those 20 features. Fortunately we know just the right software for this task. It’s called mRMR , for minimum Redundancy Maximum Relevance , and is available in C and Matlab versions for various platforms. mRMR expects a CSV file with labels in the first column and feature names in the first row. So the game plan is: combine training and validation sets into a format expected by mRMR run selection filter the original datasets, discarding all features but the selected ones evaluate the results on the validation set if all goes well, prepare and submit files for the competition We’ll use R scripts for all the steps but feature selection. Now a few words about mRMR. It will show you p

4 0.14557488 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was

5 0.13785237 35 fast ml-2013-08-12-Accelerometer Biometric Competition

Introduction: Can you recognize users of mobile devices from accelerometer data? It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. So we wrote some code and ran EC2 and then wrote more and ran EC2 again. After much computation we had our predictions, submitted them and achieved AUC = 0.83. Yay! But there’s a twist to this story. An interesting thing happened when we mistyped a command. Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc.py train_v.vw r.txt And got 0.86. Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. Turns out you can use this set or that set and get pretty much the same good result with each. That got us thinking. We computed a mean of predictions for each device. Here’s the plot: And here’

6 0.13463776 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

7 0.12532665 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

8 0.10338134 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

9 0.098546386 36 fast ml-2013-08-23-A bag of words and a nice little network

10 0.093072981 26 fast ml-2013-04-17-Regression as classification

11 0.089953713 25 fast ml-2013-04-10-Gender discrimination

12 0.08891207 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

13 0.084289029 61 fast ml-2014-05-08-Impute missing values with Amelia

14 0.082069062 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

15 0.077929474 19 fast ml-2013-02-07-The secret of the big guys

16 0.075230166 9 fast ml-2012-10-25-So you want to work for Facebook

17 0.074876457 32 fast ml-2013-07-05-Processing large files, line by line

18 0.074138798 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

19 0.073812261 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

20 0.071301401 42 fast ml-2013-10-28-How much data is enough?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.502), (26, 0.072), (35, 0.057), (58, 0.031), (69, 0.132), (71, 0.028), (79, 0.032), (99, 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.88196558 39 fast ml-2013-09-19-What you wanted to know about AUC

Introduction: AUC, or Area Under Curve, is a metric for binary classification. It’s probably the second most popular one, after accuracy. Unfortunately, it’s nowhere near as intuitive. That is, until you have read this article. Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. But many classifiers are able to quantify their uncertainty about the answer by outputting a probability value. To compute accuracy from probabilities you need a threshold to decide when zero turns into one. The most natural threshold is of course 0.5. Let’s suppose you have a quirky classifier. It is able to get all the answers right, but it outputs 0.7 for negative examples and 0.9 for positive examples. Clearly, a threshold of 0.5 won’t get you far here. But 0.8 would be just perfect. That’s the whole point of using AUC - it considers all possible thresholds. Various thresholds result in different true positive/false positive rates. As you decrease the threshold, you get

2 0.27542335 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

Introduction: The promise What’s attractive in machine learning? That a machine is learning, instead of a human. But an operator still has a lot of work to do. First, he has to learn how to teach a machine, in general. Then, when it comes to a concrete task, there are two main areas where a human needs to do the work (and remember, laziness is a virtue, at least for a programmer, so we’d like to minimize amount of work done by a human): data preparation model tuning This story is about model tuning. Typically, to achieve satisfactory results, first we need to convert raw data into format accepted by the model we would like to use, and then tune a few hyperparameters of the model. For example, some hyperparams to tune for a random forest may be a number of trees to grow and a number of candidate features at each split ( mtry in R randomForest). For a neural network, there are quite a lot of hyperparams: number of layers, number of neurons in each layer (specifically, in each hid

3 0.26796654 13 fast ml-2012-12-27-Spearmint with a random forest

Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe

4 0.26203516 27 fast ml-2013-05-01-Deep learning made easy

Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends

5 0.26174173 17 fast ml-2013-01-14-Feature selection in practice

Introduction: Lately we’ve been working with the Madelon dataset. It was originally prepared for a feature selection challenge, so while we’re at it, let’s select some features. Madelon has 500 attributes, 20 of which are real, the rest being noise. Hence the ideal scenario would be to select just those 20 features. Fortunately we know just the right software for this task. It’s called mRMR , for minimum Redundancy Maximum Relevance , and is available in C and Matlab versions for various platforms. mRMR expects a CSV file with labels in the first column and feature names in the first row. So the game plan is: combine training and validation sets into a format expected by mRMR run selection filter the original datasets, discarding all features but the selected ones evaluate the results on the validation set if all goes well, prepare and submit files for the competition We’ll use R scripts for all the steps but feature selection. Now a few words about mRMR. It will show you p

6 0.26072547 9 fast ml-2012-10-25-So you want to work for Facebook

7 0.25969476 18 fast ml-2013-01-17-A very fast denoising autoencoder

8 0.25794336 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

9 0.25598252 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision

10 0.25145164 43 fast ml-2013-11-02-Maxing out the digits

11 0.24958587 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

12 0.24826083 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

13 0.24395141 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

14 0.24208701 40 fast ml-2013-10-06-Pylearn2 in practice

15 0.24105015 20 fast ml-2013-02-18-Predicting advertised salaries

16 0.23859069 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

17 0.23613816 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

18 0.23606087 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

19 0.23447482 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

20 0.23323508 35 fast ml-2013-08-12-Accelerometer Biometric Competition