fast_ml fast_ml-2013 fast_ml-2013-35 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Can you recognize users of mobile devices from accelerometer data? It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. So we wrote some code and ran EC2 and then wrote more and ran EC2 again. After much computation we had our predictions, submitted them and achieved AUC = 0.83. Yay! But there’s a twist to this story. An interesting thing happened when we mistyped a command. Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc.py train_v.vw r.txt And got 0.86. Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. Turns out you can use this set or that set and get pretty much the same good result with each. That got us thinking. We computed a mean of predictions for each device. Here’s the plot: And here’
sentIndex sentText sentNum sentScore
1 Can you recognize users of mobile devices from accelerometer data? [sent-1, score-0.587]
2 It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. [sent-2, score-0.334]
3 So we wrote some code and ran EC2 and then wrote more and ran EC2 again. [sent-3, score-0.923]
4 After much computation we had our predictions, submitted them and achieved AUC = 0. [sent-4, score-0.369]
5 An interesting thing happened when we mistyped a command. [sent-8, score-0.184]
6 Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc. [sent-9, score-0.592]
7 Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. [sent-14, score-1.145]
8 Turns out you can use this set or that set and get pretty much the same good result with each. [sent-15, score-0.299]
9 We computed a mean of predictions for each device. [sent-17, score-0.575]
10 Here’s the plot: And here’s a histogram of mean predictions: It looks much like a histogram of number of samples per device (truncated at 100k samples for clarity): Catching our drift? [sent-18, score-2.138]
11 csv And when asked about a device in the test set, we’ll output a probability proportional to the number of samples, so that devices with more samples get bigger outputs: predict_constants. [sent-22, score-1.423]
wordName wordTfidf (topN-words)
[('samples', 0.516), ('computed', 0.281), ('device', 0.281), ('devices', 0.281), ('histogram', 0.234), ('ran', 0.234), ('wrote', 0.207), ('predictions', 0.202), ('file', 0.135), ('lines', 0.132), ('got', 0.118), ('submitted', 0.117), ('recognize', 0.117), ('computation', 0.117), ('auc', 0.112), ('twist', 0.103), ('mobile', 0.103), ('happened', 0.103), ('set', 0.097), ('cares', 0.093), ('count', 0.093), ('mean', 0.092), ('test', 0.089), ('users', 0.086), ('interesting', 0.081), ('asked', 0.08), ('long', 0.08), ('github', 0.08), ('per', 0.08), ('series', 0.08), ('dealing', 0.074), ('outputs', 0.074), ('achieved', 0.07), ('raw', 0.066), ('much', 0.065), ('bigger', 0.062), ('looks', 0.062), ('computing', 0.059), ('turns', 0.059), ('number', 0.058), ('probability', 0.056), ('beat', 0.053), ('problem', 0.051), ('plot', 0.051), ('rather', 0.048), ('many', 0.047), ('different', 0.042), ('code', 0.041), ('result', 0.04), ('go', 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 35 fast ml-2013-08-12-Accelerometer Biometric Competition
Introduction: Can you recognize users of mobile devices from accelerometer data? It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. So we wrote some code and ran EC2 and then wrote more and ran EC2 again. After much computation we had our predictions, submitted them and achieved AUC = 0.83. Yay! But there’s a twist to this story. An interesting thing happened when we mistyped a command. Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc.py train_v.vw r.txt And got 0.86. Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. Turns out you can use this set or that set and get pretty much the same good result with each. That got us thinking. We computed a mean of predictions for each device. Here’s the plot: And here’
2 0.093895867 33 fast ml-2013-07-09-Introducing phraug
Introduction: Recently we proposed to pre-process large files line by line. Now it’s time to introduce phraug *, a set of Python scripts based on this idea. The scripts mostly deal with format conversion (CSV, libsvm, VW) and with few other tasks common in machine learning. With phraug you currently can convert from one format to another: csv to libsvm csv to Vowpal Wabbit libsvm to csv libsvm to Vowpal Wabbit tsv to csv And perform some other file operations: count lines in a file sample lines from a file split a file into two randomly split a file into a number of similiarly sized chunks save a continuous subset of lines from a file (for example, first 100) delete specified columns from a csv file normalize (shift and scale) columns in a csv file Basically, there’s always at least one input file and usually one or more output files. An input file always stays unchanged. If you’re familiar with Unix, you may notice that some of these tasks are easily ach
3 0.084561668 39 fast ml-2013-09-19-What you wanted to know about AUC
Introduction: AUC, or Area Under Curve, is a metric for binary classification. It’s probably the second most popular one, after accuracy. Unfortunately, it’s nowhere near as intuitive. That is, until you have read this article. Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. But many classifiers are able to quantify their uncertainty about the answer by outputting a probability value. To compute accuracy from probabilities you need a threshold to decide when zero turns into one. The most natural threshold is of course 0.5. Let’s suppose you have a quirky classifier. It is able to get all the answers right, but it outputs 0.7 for negative examples and 0.9 for positive examples. Clearly, a threshold of 0.5 won’t get you far here. But 0.8 would be just perfect. That’s the whole point of using AUC - it considers all possible thresholds. Various thresholds result in different true positive/false positive rates. As you decrease the threshold, you get
4 0.082436576 40 fast ml-2013-10-06-Pylearn2 in practice
Introduction: What do you get when you mix one part brilliant and one part daft? You get Pylearn2, a cutting edge neural networks library from Montreal that’s rather hard to use. Here we’ll show how to get through the daft part with your mental health relatively intact. Pylearn2 comes from the Lisa Lab in Montreal , led by Yoshua Bengio. Those are pretty smart guys and they concern themselves with deep learning. Recently they published a paper entitled Pylearn2: a machine learning research library [arxiv] . Here’s a quote: Pylearn2 is a machine learning research library - its users are researchers . This means (…) it is acceptable to assume that the user has some technical sophistication and knowledge of machine learning. The word research is possibly the most common word in the paper. There’s a reason for that: the library is certainly not production-ready. OK, it’s not that bad. There are only two difficult things: getting your data in getting predictions out What’
5 0.075222701 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was
6 0.071494848 17 fast ml-2013-01-14-Feature selection in practice
7 0.070252083 20 fast ml-2013-02-18-Predicting advertised salaries
8 0.057031993 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
9 0.056851972 2 fast ml-2012-08-27-Kaggle job recommendation challenge
10 0.054755423 32 fast ml-2013-07-05-Processing large files, line by line
11 0.052094795 5 fast ml-2012-09-19-Best Buy mobile contest - big data
12 0.050086789 25 fast ml-2013-04-10-Gender discrimination
13 0.049040295 26 fast ml-2013-04-17-Regression as classification
14 0.048031535 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
15 0.046222329 10 fast ml-2012-11-17-The Facebook challenge HOWTO
16 0.043847669 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
17 0.043352865 30 fast ml-2013-06-01-Amazon aspires to automate access control
18 0.0431889 43 fast ml-2013-11-02-Maxing out the digits
19 0.041898057 27 fast ml-2013-05-01-Deep learning made easy
20 0.039295968 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
topicId topicWeight
[(0, 0.176), (1, -0.07), (2, -0.061), (3, -0.073), (4, -0.107), (5, 0.021), (6, -0.07), (7, 0.135), (8, -0.112), (9, 0.084), (10, 0.03), (11, -0.104), (12, -0.219), (13, 0.157), (14, 0.331), (15, -0.27), (16, -0.17), (17, -0.137), (18, -0.056), (19, 0.017), (20, 0.158), (21, -0.024), (22, -0.124), (23, -0.06), (24, -0.035), (25, 0.253), (26, -0.317), (27, 0.127), (28, 0.163), (29, -0.032), (30, 0.172), (31, 0.073), (32, 0.042), (33, 0.147), (34, -0.112), (35, -0.152), (36, 0.222), (37, -0.142), (38, -0.06), (39, -0.067), (40, -0.177), (41, -0.128), (42, -0.248), (43, -0.082), (44, -0.062), (45, 0.051), (46, 0.078), (47, -0.031), (48, 0.007), (49, 0.054)]
simIndex simValue blogId blogTitle
same-blog 1 0.97774333 35 fast ml-2013-08-12-Accelerometer Biometric Competition
Introduction: Can you recognize users of mobile devices from accelerometer data? It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. So we wrote some code and ran EC2 and then wrote more and ran EC2 again. After much computation we had our predictions, submitted them and achieved AUC = 0.83. Yay! But there’s a twist to this story. An interesting thing happened when we mistyped a command. Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc.py train_v.vw r.txt And got 0.86. Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. Turns out you can use this set or that set and get pretty much the same good result with each. That got us thinking. We computed a mean of predictions for each device. Here’s the plot: And here’
2 0.19951364 20 fast ml-2013-02-18-Predicting advertised salaries
Introduction: We’re back to Kaggle competitions. This time we will attempt to predict advertised salaries from job ads and of course beat the benchmark. The benchmark is, as usual, a random forest result. For starters, we’ll use a linear model without much preprocessing. Will it be enough? Congratulations! You have spotted the ceiling cat. A linear model better than a random forest - how so? Well, to train a random forest on data this big, the benchmark code extracts only 100 most common words as features, and we will use all. This approach is similiar to the one we applied in Merck challenge . More data beats a cleverer algorithm, especially when a cleverer algorithm is unable to handle all of data (on your machine, anyway). The competition is about predicting salaries from job adverts. Of course the figures usually appear in the text, so they were removed. An error metric is mean absolute error (MAE) - how refreshing to see so intuitive one. The data for Job salary prediction con
3 0.19555368 33 fast ml-2013-07-09-Introducing phraug
Introduction: Recently we proposed to pre-process large files line by line. Now it’s time to introduce phraug *, a set of Python scripts based on this idea. The scripts mostly deal with format conversion (CSV, libsvm, VW) and with few other tasks common in machine learning. With phraug you currently can convert from one format to another: csv to libsvm csv to Vowpal Wabbit libsvm to csv libsvm to Vowpal Wabbit tsv to csv And perform some other file operations: count lines in a file sample lines from a file split a file into two randomly split a file into a number of similiarly sized chunks save a continuous subset of lines from a file (for example, first 100) delete specified columns from a csv file normalize (shift and scale) columns in a csv file Basically, there’s always at least one input file and usually one or more output files. An input file always stays unchanged. If you’re familiar with Unix, you may notice that some of these tasks are easily ach
4 0.19136053 40 fast ml-2013-10-06-Pylearn2 in practice
Introduction: What do you get when you mix one part brilliant and one part daft? You get Pylearn2, a cutting edge neural networks library from Montreal that’s rather hard to use. Here we’ll show how to get through the daft part with your mental health relatively intact. Pylearn2 comes from the Lisa Lab in Montreal , led by Yoshua Bengio. Those are pretty smart guys and they concern themselves with deep learning. Recently they published a paper entitled Pylearn2: a machine learning research library [arxiv] . Here’s a quote: Pylearn2 is a machine learning research library - its users are researchers . This means (…) it is acceptable to assume that the user has some technical sophistication and knowledge of machine learning. The word research is possibly the most common word in the paper. There’s a reason for that: the library is certainly not production-ready. OK, it’s not that bad. There are only two difficult things: getting your data in getting predictions out What’
5 0.15060896 39 fast ml-2013-09-19-What you wanted to know about AUC
Introduction: AUC, or Area Under Curve, is a metric for binary classification. It’s probably the second most popular one, after accuracy. Unfortunately, it’s nowhere near as intuitive. That is, until you have read this article. Accuracy deals with ones and zeros, meaning you either got the class label right or you didn’t. But many classifiers are able to quantify their uncertainty about the answer by outputting a probability value. To compute accuracy from probabilities you need a threshold to decide when zero turns into one. The most natural threshold is of course 0.5. Let’s suppose you have a quirky classifier. It is able to get all the answers right, but it outputs 0.7 for negative examples and 0.9 for positive examples. Clearly, a threshold of 0.5 won’t get you far here. But 0.8 would be just perfect. That’s the whole point of using AUC - it considers all possible thresholds. Various thresholds result in different true positive/false positive rates. As you decrease the threshold, you get
6 0.15031859 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
7 0.14142592 5 fast ml-2012-09-19-Best Buy mobile contest - big data
8 0.13161337 17 fast ml-2013-01-14-Feature selection in practice
9 0.12676348 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
10 0.1240705 25 fast ml-2013-04-10-Gender discrimination
11 0.11265662 43 fast ml-2013-11-02-Maxing out the digits
12 0.11080676 19 fast ml-2013-02-07-The secret of the big guys
13 0.10537835 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
14 0.10378911 26 fast ml-2013-04-17-Regression as classification
15 0.10319795 27 fast ml-2013-05-01-Deep learning made easy
16 0.098939963 13 fast ml-2012-12-27-Spearmint with a random forest
17 0.097483307 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python
18 0.096281067 36 fast ml-2013-08-23-A bag of words and a nice little network
19 0.093914382 10 fast ml-2012-11-17-The Facebook challenge HOWTO
20 0.086535737 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
topicId topicWeight
[(69, 0.165), (86, 0.633), (99, 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 0.73031998 35 fast ml-2013-08-12-Accelerometer Biometric Competition
Introduction: Can you recognize users of mobile devices from accelerometer data? It’s a rather non-standard problem (we’re dealing with time series here) and an interesting one. So we wrote some code and ran EC2 and then wrote more and ran EC2 again. After much computation we had our predictions, submitted them and achieved AUC = 0.83. Yay! But there’s a twist to this story. An interesting thing happened when we mistyped a command. Instead of computing AUC from a validation test set and raw predictions, we computed it using a training set: auc.py train_v.vw r.txt And got 0.86. Now, the training file has a very different number of lines from the test file, but our script only cares about how many lines are in a predictions file, so as long as the other file has at least that many, it’s OK. Turns out you can use this set or that set and get pretty much the same good result with each. That got us thinking. We computed a mean of predictions for each device. Here’s the plot: And here’
2 0.25671664 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
Introduction: Let’s say that there are some users and some items, like movies, songs or jobs. Each user might be interested in some items. The client asks us to recommend a few items (the number is x) for each user. They will evaluate the results using mean average precision, or MAP, metric. Specifically MAP@x - this means they ask us to recommend x items for each user. So what is this MAP? First, we will get M out of the way. MAP is just an average of APs, or average precision, for all users. In other words, we take the mean for Average Precision, hence Mean Average Precision. If we have 1000 users, we sum APs for each user and divide the sum by 1000. This is MAP. So now, what is AP, or average precision? It may be that we don’t really need to know. But we probably need to know this: we can recommend at most x items for each user it pays to submit all x recommendations, because we are not penalized for bad guesses order matters, so it’s better to submit more certain recommendations fi
3 0.25539508 27 fast ml-2013-05-01-Deep learning made easy
Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends
4 0.25394893 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was
5 0.24453861 13 fast ml-2012-12-27-Spearmint with a random forest
Introduction: Now that we have Spearmint basics nailed, we’ll try tuning a random forest, and specifically two hyperparams: a number of trees ( ntrees ) and a number of candidate features at each split ( mtry ). Here’s some code . We’re going to use a red wine quality dataset. It has about 1600 examples and our goal will be to predict a rating for a wine given all the other properties. This is a regression* task, as ratings are in (0,10) range. We will split the data 80/10/10 into train, validation and test set, and use the first two to establish optimal hyperparams and then predict on the test set. As an error measure we will use RMSE. At first, we will try ntrees between 10 and 200 and mtry between 3 and 11 (there’s eleven features total, so that’s the upper bound). Here are the results of two Spearmint runs with 71 and 95 tries respectively. Colors denote a validation error value: green : RMSE < 0.57 blue : RMSE < 0.58 black : RMSE >= 0.58 Turns out that some diffe
6 0.229132 43 fast ml-2013-11-02-Maxing out the digits
7 0.22404635 17 fast ml-2013-01-14-Feature selection in practice
8 0.22163777 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
9 0.21903622 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
10 0.21688992 18 fast ml-2013-01-17-A very fast denoising autoencoder
11 0.21177596 9 fast ml-2012-10-25-So you want to work for Facebook
12 0.21133897 16 fast ml-2013-01-12-Intro to random forests
13 0.20705055 20 fast ml-2013-02-18-Predicting advertised salaries
14 0.20035425 4 fast ml-2012-09-17-Best Buy mobile contest
15 0.19952422 19 fast ml-2013-02-07-The secret of the big guys
16 0.19452141 61 fast ml-2014-05-08-Impute missing values with Amelia
17 0.19442865 40 fast ml-2013-10-06-Pylearn2 in practice
18 0.19402279 25 fast ml-2013-04-10-Gender discrimination
19 0.18878779 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
20 0.18785745 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction