fast_ml fast_ml-2012 fast_ml-2012-2 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is an introduction to Kaggle job recommendation challenge . It looks a lot like a typical collaborative filtering thing (with a lot of extra information), but not quite. Spot these two big differences: There are no explicit ratings. Instead, there’s info about which jobs user applied to. This is known as one-class collaborative filtering (OCCF), or learning from positive-only feedback. If you want to dig deeper into the subject, there have been already contests with positive feedback only, for example track two of Yahoo KDD Cup or Millions Songs Dataset Challenge at Kaggle (both about songs). The second difference is less apparent. When you look at test users (that is, the users that we are asked to recommend jobs for), only about half of them made at least one application. For the other half, no data and no collaborative filtering. For the users we have applications data for, it’s very sparse, so we would like to use CF, because it does well in similar se
sentIndex sentText sentNum sentScore
1 This is an introduction to Kaggle job recommendation challenge . [sent-1, score-0.478]
2 It looks a lot like a typical collaborative filtering thing (with a lot of extra information), but not quite. [sent-2, score-0.786]
3 Spot these two big differences: There are no explicit ratings. [sent-3, score-0.087]
4 Instead, there’s info about which jobs user applied to. [sent-4, score-0.371]
5 This is known as one-class collaborative filtering (OCCF), or learning from positive-only feedback. [sent-5, score-0.466]
6 If you want to dig deeper into the subject, there have been already contests with positive feedback only, for example track two of Yahoo KDD Cup or Millions Songs Dataset Challenge at Kaggle (both about songs). [sent-6, score-0.389]
7 When you look at test users (that is, the users that we are asked to recommend jobs for), only about half of them made at least one application. [sent-8, score-1.34]
8 For the other half, no data and no collaborative filtering. [sent-9, score-0.313]
9 For the users we have applications data for, it’s very sparse, so we would like to use CF, because it does well in similar settings. [sent-10, score-0.701]
10 To address the issues above, we need software for OCCF and a way to handle the remaining users. [sent-11, score-0.351]
11 It has an item recommendation tool that fits our needs exactly, has some nice algorithms and is very convenient: it takes a list of user/item IDs and produces recommendations for users we are interested in. [sent-13, score-0.903]
12 Some alternatives exists, for example Graphlab’s pmf can handle implicit feedback. [sent-14, score-0.205]
13 Recommendations for users who didn’t make any applications? [sent-15, score-0.446]
14 We will treat them separately, because each user and each job is assigned exactly to one window, so there is no overlap between windows. [sent-22, score-0.381]
15 There are about 50k unique users and 50k unique jobs, and only 200k examples, that is user/job pairs. [sent-24, score-0.752]
16 That means that data is very sparse: its density is roughly 0,01%. [sent-25, score-0.174]
17 Compare with Netflix challenge, which had more users, but also way more ratings (100M), so the density was about 1%, 100 times bigger than here. [sent-27, score-0.251]
wordName wordTfidf (topN-words)
[('users', 0.446), ('collaborative', 0.313), ('jobs', 0.261), ('occf', 0.209), ('applications', 0.191), ('songs', 0.174), ('density', 0.174), ('recommendation', 0.174), ('filtering', 0.153), ('unique', 0.153), ('window', 0.153), ('recommendations', 0.127), ('half', 0.118), ('handle', 0.118), ('challenge', 0.113), ('user', 0.11), ('job', 0.104), ('exactly', 0.098), ('lot', 0.087), ('address', 0.087), ('contests', 0.087), ('differences', 0.087), ('dig', 0.087), ('explicit', 0.087), ('fits', 0.087), ('graphlab', 0.087), ('implicit', 0.087), ('introduction', 0.087), ('netflix', 0.087), ('spot', 0.087), ('take', 0.086), ('sparse', 0.079), ('deeper', 0.077), ('exists', 0.077), ('extra', 0.077), ('issues', 0.077), ('kdd', 0.077), ('ratings', 0.077), ('yahoo', 0.077), ('item', 0.069), ('recommend', 0.069), ('inspiration', 0.069), ('positive', 0.069), ('remaining', 0.069), ('separately', 0.069), ('track', 0.069), ('treat', 0.069), ('typical', 0.069), ('ids', 0.064), ('similar', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 2 fast ml-2012-08-27-Kaggle job recommendation challenge
Introduction: This is an introduction to Kaggle job recommendation challenge . It looks a lot like a typical collaborative filtering thing (with a lot of extra information), but not quite. Spot these two big differences: There are no explicit ratings. Instead, there’s info about which jobs user applied to. This is known as one-class collaborative filtering (OCCF), or learning from positive-only feedback. If you want to dig deeper into the subject, there have been already contests with positive feedback only, for example track two of Yahoo KDD Cup or Millions Songs Dataset Challenge at Kaggle (both about songs). The second difference is less apparent. When you look at test users (that is, the users that we are asked to recommend jobs for), only about half of them made at least one application. For the other half, no data and no collaborative filtering. For the users we have applications data for, it’s very sparse, so we would like to use CF, because it does well in similar se
2 0.11024731 4 fast ml-2012-09-17-Best Buy mobile contest
Introduction: There’s a contest on Kaggle called ACM Hackaton . Actually, there are two, one based on small data and one on big data. Here we will be talking about small data contest - specifically, about beating the benchmark - but the ideas are equally applicable to both. The deal is, we have a training set from Best Buy with search queries and items which users clicked after the query, plus some other data. Items are in this case Xbox games like “Batman” or “Rocksmith” or “Call of Duty”. We are asked to predict an item user clicked given the query. Metric is MAP@5 (see an explanation of MAP ). The problem isn’t typical for Kaggle, because it doesn’t really require using machine learning in traditional sense. To beat the benchmark, it’s enough to write a short script. Concretely ;), we’re gonna build a mapping from queries to items, using the training set. It will be just a Python dictionary looking like this: 'forzasteeringwheel': {'2078113': 1}, 'finalfantasy13': {'9461183': 3, '351
3 0.099320546 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
Introduction: Let’s say that there are some users and some items, like movies, songs or jobs. Each user might be interested in some items. The client asks us to recommend a few items (the number is x) for each user. They will evaluate the results using mean average precision, or MAP, metric. Specifically MAP@x - this means they ask us to recommend x items for each user. So what is this MAP? First, we will get M out of the way. MAP is just an average of APs, or average precision, for all users. In other words, we take the mean for Average Precision, hence Mean Average Precision. If we have 1000 users, we sum APs for each user and divide the sum by 1000. This is MAP. So now, what is AP, or average precision? It may be that we don’t really need to know. But we probably need to know this: we can recommend at most x items for each user it pays to submit all x recommendations, because we are not penalized for bad guesses order matters, so it’s better to submit more certain recommendations fi
4 0.090772793 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
Introduction: The Black Box challenge has just ended. We were thoroughly thrilled to learn that the winner, doubleshot , used sparse filtering, apparently following our cue. His score in terms of accuracy is 0.702, ours 0.645, and the best benchmark 0.525. We ranked 15th out of 217, a few places ahead of the Toronto team consisting of Charlie Tang and Nitish Srivastava . To their credit, Charlie has won the two remaining Challenges in Representation Learning . Not-so-deep learning The difference to our previous, beating-the-benchmark attempt is twofold: one layer instead of two for supervised learning, VW instead of a random forest Somewhat suprisingly, one layer works better than two. Even more surprisingly, with enough units you can get 0.634 using a linear model (Vowpal Wabbit, of course, One-Against-All). In our understanding, that’s the point of overcomplete representations*, which Stanford people seem to care much about. Recall The secret of the big guys and the pape
5 0.089077637 27 fast ml-2013-05-01-Deep learning made easy
Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends
6 0.077085175 5 fast ml-2012-09-19-Best Buy mobile contest - big data
7 0.065746747 58 fast ml-2014-04-12-Deep learning these days
8 0.060306076 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
9 0.059725452 20 fast ml-2013-02-18-Predicting advertised salaries
10 0.056851972 35 fast ml-2013-08-12-Accelerometer Biometric Competition
11 0.055240538 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
12 0.051085606 40 fast ml-2013-10-06-Pylearn2 in practice
13 0.049897201 15 fast ml-2013-01-07-Machine learning courses online
14 0.049779098 41 fast ml-2013-10-09-Big data made easy
15 0.045779444 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
16 0.045197103 25 fast ml-2013-04-10-Gender discrimination
17 0.043741629 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
18 0.043003753 43 fast ml-2013-11-02-Maxing out the digits
19 0.042809524 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow
20 0.041676655 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
topicId topicWeight
[(0, 0.18), (1, -0.017), (2, 0.117), (3, -0.179), (4, 0.061), (5, -0.058), (6, 0.094), (7, 0.006), (8, 0.031), (9, -0.119), (10, 0.242), (11, 0.021), (12, -0.263), (13, 0.335), (14, 0.016), (15, -0.139), (16, -0.098), (17, 0.334), (18, 0.118), (19, -0.052), (20, 0.033), (21, 0.175), (22, -0.123), (23, -0.111), (24, 0.064), (25, 0.016), (26, -0.006), (27, 0.067), (28, -0.001), (29, 0.048), (30, 0.012), (31, 0.077), (32, -0.039), (33, 0.188), (34, -0.226), (35, -0.057), (36, -0.157), (37, 0.112), (38, 0.153), (39, 0.187), (40, 0.251), (41, 0.232), (42, 0.125), (43, 0.025), (44, -0.032), (45, -0.054), (46, 0.179), (47, -0.078), (48, -0.139), (49, -0.124)]
simIndex simValue blogId blogTitle
same-blog 1 0.9891507 2 fast ml-2012-08-27-Kaggle job recommendation challenge
Introduction: This is an introduction to Kaggle job recommendation challenge . It looks a lot like a typical collaborative filtering thing (with a lot of extra information), but not quite. Spot these two big differences: There are no explicit ratings. Instead, there’s info about which jobs user applied to. This is known as one-class collaborative filtering (OCCF), or learning from positive-only feedback. If you want to dig deeper into the subject, there have been already contests with positive feedback only, for example track two of Yahoo KDD Cup or Millions Songs Dataset Challenge at Kaggle (both about songs). The second difference is less apparent. When you look at test users (that is, the users that we are asked to recommend jobs for), only about half of them made at least one application. For the other half, no data and no collaborative filtering. For the users we have applications data for, it’s very sparse, so we would like to use CF, because it does well in similar se
2 0.21515815 4 fast ml-2012-09-17-Best Buy mobile contest
Introduction: There’s a contest on Kaggle called ACM Hackaton . Actually, there are two, one based on small data and one on big data. Here we will be talking about small data contest - specifically, about beating the benchmark - but the ideas are equally applicable to both. The deal is, we have a training set from Best Buy with search queries and items which users clicked after the query, plus some other data. Items are in this case Xbox games like “Batman” or “Rocksmith” or “Call of Duty”. We are asked to predict an item user clicked given the query. Metric is MAP@5 (see an explanation of MAP ). The problem isn’t typical for Kaggle, because it doesn’t really require using machine learning in traditional sense. To beat the benchmark, it’s enough to write a short script. Concretely ;), we’re gonna build a mapping from queries to items, using the training set. It will be just a Python dictionary looking like this: 'forzasteeringwheel': {'2078113': 1}, 'finalfantasy13': {'9461183': 3, '351
3 0.1328887 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
Introduction: The Black Box challenge has just ended. We were thoroughly thrilled to learn that the winner, doubleshot , used sparse filtering, apparently following our cue. His score in terms of accuracy is 0.702, ours 0.645, and the best benchmark 0.525. We ranked 15th out of 217, a few places ahead of the Toronto team consisting of Charlie Tang and Nitish Srivastava . To their credit, Charlie has won the two remaining Challenges in Representation Learning . Not-so-deep learning The difference to our previous, beating-the-benchmark attempt is twofold: one layer instead of two for supervised learning, VW instead of a random forest Somewhat suprisingly, one layer works better than two. Even more surprisingly, with enough units you can get 0.634 using a linear model (Vowpal Wabbit, of course, One-Against-All). In our understanding, that’s the point of overcomplete representations*, which Stanford people seem to care much about. Recall The secret of the big guys and the pape
4 0.12971185 58 fast ml-2014-04-12-Deep learning these days
Introduction: It seems that quite a few people with interest in deep learning think of it in terms of unsupervised pre-training, autoencoders, stacked RBMs and deep belief networks. It’s easy to get into this groove by watching one of Geoff Hinton’s videos from a few years ago, where he bashes backpropagation in favour of unsupervised methods that are able to discover the structure in data by themselves, the same way as human brain does. Those videos, papers and tutorials linger. They were state of the art once, but things have changed since then. These days supervised learning is the king again. This has to do with the fact that you can look at data from many different angles and usually you’d prefer representation that is useful for the discriminative task at hand . Unsupervised learning will find some angle, but will it be the one you want? In case of the MNIST digits, sure. Otherwise probably not. Or maybe it will find a lot of angles while you only need one. Ladies and gentlemen, pleas
5 0.1260221 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
Introduction: Let’s say that there are some users and some items, like movies, songs or jobs. Each user might be interested in some items. The client asks us to recommend a few items (the number is x) for each user. They will evaluate the results using mean average precision, or MAP, metric. Specifically MAP@x - this means they ask us to recommend x items for each user. So what is this MAP? First, we will get M out of the way. MAP is just an average of APs, or average precision, for all users. In other words, we take the mean for Average Precision, hence Mean Average Precision. If we have 1000 users, we sum APs for each user and divide the sum by 1000. This is MAP. So now, what is AP, or average precision? It may be that we don’t really need to know. But we probably need to know this: we can recommend at most x items for each user it pays to submit all x recommendations, because we are not penalized for bad guesses order matters, so it’s better to submit more certain recommendations fi
6 0.11617004 41 fast ml-2013-10-09-Big data made easy
7 0.11507791 20 fast ml-2013-02-18-Predicting advertised salaries
8 0.11116809 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
9 0.10945866 5 fast ml-2012-09-19-Best Buy mobile contest - big data
10 0.10226715 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
11 0.10091089 27 fast ml-2013-05-01-Deep learning made easy
12 0.099749297 15 fast ml-2013-01-07-Machine learning courses online
13 0.085894123 25 fast ml-2013-04-10-Gender discrimination
14 0.082064807 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
15 0.081670143 32 fast ml-2013-07-05-Processing large files, line by line
16 0.080403656 40 fast ml-2013-10-06-Pylearn2 in practice
17 0.075374052 35 fast ml-2013-08-12-Accelerometer Biometric Competition
18 0.073962338 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers
19 0.072490551 22 fast ml-2013-03-07-Choosing a machine learning algorithm
20 0.07220272 10 fast ml-2012-11-17-The Facebook challenge HOWTO
topicId topicWeight
[(26, 0.015), (31, 0.046), (35, 0.023), (51, 0.554), (58, 0.015), (69, 0.123), (71, 0.033), (99, 0.091)]
simIndex simValue blogId blogTitle
1 0.95262676 44 fast ml-2013-11-18-CUDA on a Linux laptop
Introduction: After testing CUDA on a desktop , we now switch to a Linux laptop with 64-bit Xubuntu. Getting CUDA to work is harder here. Will the effort be worth the results? If you have a laptop with a Nvidia card, the thing probably uses it for 3D graphics and Intel’s built-in unit for everything else. This technology is known as Optimus and it happens to make things anything but easy for running CUDA on Linux. The problem is with GPU drivers, specifically between Linux being open-source and Nvidia drivers being not. This strained relation at one time prompted Linus Torvalds to give Nvidia a finger with great passion. Installing GPU drivers Here’s a solution for the driver problem. You need a package called bumblebee . It makes a Nvidia card accessible to your apps. To install drivers and bumblebee , try something along these lines: sudo apt-get install nvidia-current-updates sudo apt-get install bumblebee Note that you don’t need drivers that come with a specific CUDA rel
same-blog 2 0.94544744 2 fast ml-2012-08-27-Kaggle job recommendation challenge
Introduction: This is an introduction to Kaggle job recommendation challenge . It looks a lot like a typical collaborative filtering thing (with a lot of extra information), but not quite. Spot these two big differences: There are no explicit ratings. Instead, there’s info about which jobs user applied to. This is known as one-class collaborative filtering (OCCF), or learning from positive-only feedback. If you want to dig deeper into the subject, there have been already contests with positive feedback only, for example track two of Yahoo KDD Cup or Millions Songs Dataset Challenge at Kaggle (both about songs). The second difference is less apparent. When you look at test users (that is, the users that we are asked to recommend jobs for), only about half of them made at least one application. For the other half, no data and no collaborative filtering. For the users we have applications data for, it’s very sparse, so we would like to use CF, because it does well in similar se
3 0.64631575 34 fast ml-2013-07-14-Running things on a GPU
Introduction: You’ve heard about running things on a graphics card, but have you tried it? All you need to taste the speed is a Nvidia card and some software. We run experiments using Cudamat and Theano in Python. GPUs differ from CPUs in that they are optimized for throughput instead of latency. Here’s a metaphor: when you play an online game, you want fast response times, or low latency. Otherwise you get lag. However when you’re downloading a movie, you don’t care about response times, you care about bandwidth - that’s throughput. Massive computations are similiar to downloading a movie in this respect. The setup We’ll be testing things on a platform with an Intel Dual Core CPU @3Ghz and either GeForce 9600 GT, an old card, or GeForce GTX 550 Ti, a more recent card. See the appendix for more info on GPU hardware. Software we’ll be using is Python. On CPU it employs one core*. That is OK in everyday use because while one core is working hard, you can comfortably do something else becau
4 0.31858897 9 fast ml-2012-10-25-So you want to work for Facebook
Introduction: Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent . They won’t pay, but just might interview. This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work . To honor this, we won’t publish any code. But own work doesn’t mean original work , and we wouldn’t want to reinvent the wheel, would we? The contest differs substantially from a Kaggle stereotype, if there is such a thing, in three major ways: there’s no money prizes, as mentioned above it’s not a real world problem, but rather an assignment to screen job candidates (this has important consequences, described below) it’s not a typical machine learning project, but rather a broader AI exercise You are given a graph of the internet, actually a snapshot of the graph for each of 15 time steps. You are also given a bunch of paths in this graph, which a
5 0.30568099 4 fast ml-2012-09-17-Best Buy mobile contest
Introduction: There’s a contest on Kaggle called ACM Hackaton . Actually, there are two, one based on small data and one on big data. Here we will be talking about small data contest - specifically, about beating the benchmark - but the ideas are equally applicable to both. The deal is, we have a training set from Best Buy with search queries and items which users clicked after the query, plus some other data. Items are in this case Xbox games like “Batman” or “Rocksmith” or “Call of Duty”. We are asked to predict an item user clicked given the query. Metric is MAP@5 (see an explanation of MAP ). The problem isn’t typical for Kaggle, because it doesn’t really require using machine learning in traditional sense. To beat the benchmark, it’s enough to write a short script. Concretely ;), we’re gonna build a mapping from queries to items, using the training set. It will be just a Python dictionary looking like this: 'forzasteeringwheel': {'2078113': 1}, 'finalfantasy13': {'9461183': 3, '351
6 0.29697648 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
7 0.29408962 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network
8 0.29322591 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
9 0.27641654 43 fast ml-2013-11-02-Maxing out the digits
10 0.26102489 27 fast ml-2013-05-01-Deep learning made easy
11 0.25877962 16 fast ml-2013-01-12-Intro to random forests
12 0.25044912 25 fast ml-2013-04-10-Gender discrimination
13 0.24885313 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
14 0.24518722 20 fast ml-2013-02-18-Predicting advertised salaries
15 0.24476802 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
16 0.24424778 61 fast ml-2014-05-08-Impute missing values with Amelia
17 0.23620021 26 fast ml-2013-04-17-Regression as classification
18 0.23393092 58 fast ml-2014-04-12-Deep learning these days
19 0.23293488 18 fast ml-2013-01-17-A very fast denoising autoencoder
20 0.23233458 13 fast ml-2012-12-27-Spearmint with a random forest