fast_ml fast_ml-2014 fast_ml-2014-51 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: IPython is known for the notebooks. But the first thing they list on their homepage is a “powerful interactive shell”. And that’s true - if you use Python interactively, you’ll dig IPython. Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. The standard Python interpreter feels somewhat impoverished without them. %paste Normally when you paste indented code, it won’t work. You can fix it with this magic command. other %magic commands Actually, you can skip the % : paste , run some_script.py , pwd , cd some_dir shell commands with ! For example !ls -las . Alas, aliases from your system shell don’t work. looks good Red and green add a nice touch, especially in a window with dark background. If you have a light background, use %colors LightBG Share your favourite features
sentIndex sentText sentNum sentScore
1 But the first thing they list on their homepage is a “powerful interactive shell”. [sent-2, score-0.362]
2 And that’s true - if you use Python interactively, you’ll dig IPython. [sent-3, score-0.198]
3 Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. [sent-4, score-1.395]
4 The standard Python interpreter feels somewhat impoverished without them. [sent-5, score-0.292]
5 %paste Normally when you paste indented code, it won’t work. [sent-6, score-0.335]
6 other %magic commands Actually, you can skip the % : paste , run some_script. [sent-8, score-0.63]
7 py , pwd , cd some_dir shell commands with ! [sent-9, score-0.809]
8 looks good Red and green add a nice touch, especially in a window with dark background. [sent-13, score-0.602]
9 If you have a light background, use %colors LightBG Share your favourite features in the comments. [sent-14, score-0.182]
10 For more, see Using IPython for interactive work . [sent-15, score-0.253]
wordName wordTfidf (topN-words)
[('shell', 0.506), ('paste', 0.335), ('ipython', 0.304), ('magic', 0.304), ('interactive', 0.253), ('commands', 0.202), ('dig', 0.127), ('ls', 0.127), ('normally', 0.127), ('background', 0.127), ('dark', 0.127), ('touch', 0.127), ('window', 0.112), ('upon', 0.112), ('favourite', 0.112), ('share', 0.112), ('history', 0.112), ('cd', 0.101), ('colors', 0.101), ('green', 0.101), ('feels', 0.101), ('system', 0.101), ('powerful', 0.093), ('skip', 0.093), ('rely', 0.093), ('red', 0.086), ('standard', 0.08), ('python', 0.076), ('basic', 0.075), ('unix', 0.075), ('true', 0.071), ('expect', 0.071), ('somewhat', 0.071), ('features', 0.07), ('list', 0.067), ('looks', 0.067), ('especially', 0.067), ('nice', 0.064), ('command', 0.064), ('actually', 0.064), ('add', 0.064), ('known', 0.061), ('won', 0.055), ('difference', 0.052), ('come', 0.05), ('first', 0.042), ('line', 0.04), ('without', 0.04), ('make', 0.038), ('us', 0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
Introduction: IPython is known for the notebooks. But the first thing they list on their homepage is a “powerful interactive shell”. And that’s true - if you use Python interactively, you’ll dig IPython. Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. The standard Python interpreter feels somewhat impoverished without them. %paste Normally when you paste indented code, it won’t work. You can fix it with this magic command. other %magic commands Actually, you can skip the % : paste , run some_script.py , pwd , cd some_dir shell commands with ! For example !ls -las . Alas, aliases from your system shell don’t work. looks good Red and green add a nice touch, especially in a window with dark background. If you have a light background, use %colors LightBG Share your favourite features
2 0.072947845 56 fast ml-2014-03-31-If you use R, you may want RStudio
Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec
3 0.054361328 3 fast ml-2012-09-01-Running Unix apps on Windows
Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish
4 0.048278525 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn
Introduction: Many machine learning tools will only accept numbers as input. This may be a problem if you want to use such tool but your data includes categorical features. To represent them as numbers typically one converts each categorical feature using “one-hot encoding”, that is from a value like “BMW” or “Mercedes” to a vector of zeros and one 1 . This functionality is available in some software libraries. We load data using Pandas, then convert categorical columns with DictVectorizer from scikit-learn. Pandas is a popular Python library inspired by data frames in R. It allows easier manipulation of tabular numeric and non-numeric data. Downsides: not very intuitive, somewhat steep learning curve. For any questions you may have, Google + StackOverflow combo works well as a source of answers. UPDATE: Turns out that Pandas has get_dummies() function which does what we’re after. More on this in a while. We’ll use Pandas to load the data, do some cleaning and send it to Scikit-
5 0.044442572 44 fast ml-2013-11-18-CUDA on a Linux laptop
Introduction: After testing CUDA on a desktop , we now switch to a Linux laptop with 64-bit Xubuntu. Getting CUDA to work is harder here. Will the effort be worth the results? If you have a laptop with a Nvidia card, the thing probably uses it for 3D graphics and Intel’s built-in unit for everything else. This technology is known as Optimus and it happens to make things anything but easy for running CUDA on Linux. The problem is with GPU drivers, specifically between Linux being open-source and Nvidia drivers being not. This strained relation at one time prompted Linus Torvalds to give Nvidia a finger with great passion. Installing GPU drivers Here’s a solution for the driver problem. You need a package called bumblebee . It makes a Nvidia card accessible to your apps. To install drivers and bumblebee , try something along these lines: sudo apt-get install nvidia-current-updates sudo apt-get install bumblebee Note that you don’t need drivers that come with a specific CUDA rel
6 0.043741629 2 fast ml-2012-08-27-Kaggle job recommendation challenge
7 0.043253936 17 fast ml-2013-01-14-Feature selection in practice
8 0.040348269 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
9 0.033307984 18 fast ml-2013-01-17-A very fast denoising autoencoder
10 0.031810407 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit
11 0.03153703 33 fast ml-2013-07-09-Introducing phraug
12 0.03013782 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
13 0.02985394 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
14 0.029344551 26 fast ml-2013-04-17-Regression as classification
15 0.02873997 30 fast ml-2013-06-01-Amazon aspires to automate access control
16 0.028113099 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
17 0.027860887 13 fast ml-2012-12-27-Spearmint with a random forest
18 0.026300834 32 fast ml-2013-07-05-Processing large files, line by line
19 0.023931803 47 fast ml-2013-12-15-A-B testing with bayesian bandits in Google Analytics
20 0.023812527 20 fast ml-2013-02-18-Predicting advertised salaries
topicId topicWeight
[(0, 0.1), (1, -0.022), (2, -0.022), (3, 0.005), (4, -0.111), (5, 0.029), (6, -0.035), (7, 0.043), (8, 0.162), (9, -0.007), (10, 0.32), (11, -0.085), (12, 0.09), (13, 0.285), (14, -0.209), (15, 0.171), (16, 0.145), (17, -0.046), (18, -0.0), (19, 0.187), (20, 0.172), (21, 0.224), (22, -0.246), (23, -0.064), (24, -0.159), (25, 0.006), (26, -0.192), (27, 0.04), (28, -0.168), (29, 0.335), (30, -0.071), (31, 0.014), (32, -0.222), (33, -0.279), (34, -0.035), (35, 0.116), (36, -0.031), (37, -0.169), (38, 0.055), (39, -0.257), (40, -0.075), (41, -0.068), (42, 0.004), (43, -0.011), (44, 0.015), (45, -0.014), (46, 0.013), (47, -0.039), (48, 0.056), (49, 0.013)]
simIndex simValue blogId blogTitle
same-blog 1 0.99206573 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
Introduction: IPython is known for the notebooks. But the first thing they list on their homepage is a “powerful interactive shell”. And that’s true - if you use Python interactively, you’ll dig IPython. Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. The standard Python interpreter feels somewhat impoverished without them. %paste Normally when you paste indented code, it won’t work. You can fix it with this magic command. other %magic commands Actually, you can skip the % : paste , run some_script.py , pwd , cd some_dir shell commands with ! For example !ls -las . Alas, aliases from your system shell don’t work. looks good Red and green add a nice touch, especially in a window with dark background. If you have a light background, use %colors LightBG Share your favourite features
2 0.09380582 56 fast ml-2014-03-31-If you use R, you may want RStudio
Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec
3 0.083607711 31 fast ml-2013-06-19-Go non-linear with Vowpal Wabbit
Introduction: Vowpal Wabbit now supports a few modes of non-linear supervised learning. They are: a neural network with a single hidden layer automatic creation of polynomial, specifically quadratic and cubic, features N-grams We describe how to use them, providing examples from the Kaggle Amazon competition and for the kin8nm dataset. Neural network The original motivation for creating neural network code in VW was to win some Kaggle competitions using only vee-dub , and that goal becomes much more feasible once you have a strong non-linear learner. The network seems to be a classic multi-layer perceptron with one sigmoidal hidden layer. More interestingly, it has dropout. Unfortunately, in a few tries we haven’t had much luck with the dropout. Here’s an example of how to create a network with 10 hidden units: vw -d data.vw --nn 10 Quadratic and cubic features The idea of quadratic features is to create all possible combinations between original features, so that
4 0.083602384 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
Introduction: The job salary prediction contest at Kaggle offers a highly-dimensional dataset: when you convert categorical values to binary features and text columns to a bag of words, you get roughly 240k features, a number very similiar to the number of examples. We present a way to select a few thousand relevant features using L1 (Lasso) regularization. A linear model seems to work just as well with those selected features as with the full set. This means we get roughly 40 times less features for a much more manageable, smaller data set. What you wanted to know about Lasso and Ridge L1 and L2 are both ways of regularization sometimes called weight decay . Basically, we include parameter weights in a cost function. In effect, the model will try to minimize those weights by going “down the slope”. Example weights: in a linear model or in a neural network. L1 is known as Lasso and L2 is known as Ridge. These names may be confusing, because a chart of Lasso looks like a ridge and a
5 0.081528775 44 fast ml-2013-11-18-CUDA on a Linux laptop
Introduction: After testing CUDA on a desktop , we now switch to a Linux laptop with 64-bit Xubuntu. Getting CUDA to work is harder here. Will the effort be worth the results? If you have a laptop with a Nvidia card, the thing probably uses it for 3D graphics and Intel’s built-in unit for everything else. This technology is known as Optimus and it happens to make things anything but easy for running CUDA on Linux. The problem is with GPU drivers, specifically between Linux being open-source and Nvidia drivers being not. This strained relation at one time prompted Linus Torvalds to give Nvidia a finger with great passion. Installing GPU drivers Here’s a solution for the driver problem. You need a package called bumblebee . It makes a Nvidia card accessible to your apps. To install drivers and bumblebee , try something along these lines: sudo apt-get install nvidia-current-updates sudo apt-get install bumblebee Note that you don’t need drivers that come with a specific CUDA rel
6 0.075848132 3 fast ml-2012-09-01-Running Unix apps on Windows
7 0.074410282 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
8 0.072075538 17 fast ml-2013-01-14-Feature selection in practice
9 0.067964986 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn
10 0.06069012 18 fast ml-2013-01-17-A very fast denoising autoencoder
11 0.05906307 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
12 0.058173455 13 fast ml-2012-12-27-Spearmint with a random forest
13 0.057974957 26 fast ml-2013-04-17-Regression as classification
14 0.056981999 2 fast ml-2012-08-27-Kaggle job recommendation challenge
15 0.056661975 30 fast ml-2013-06-01-Amazon aspires to automate access control
16 0.056553956 40 fast ml-2013-10-06-Pylearn2 in practice
17 0.055968937 32 fast ml-2013-07-05-Processing large files, line by line
18 0.053554121 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
19 0.052223284 33 fast ml-2013-07-09-Introducing phraug
20 0.05157467 19 fast ml-2013-02-07-The secret of the big guys
topicId topicWeight
[(31, 0.051), (33, 0.64), (55, 0.05), (69, 0.101), (99, 0.018)]
simIndex simValue blogId blogTitle
same-blog 1 0.92045546 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
Introduction: IPython is known for the notebooks. But the first thing they list on their homepage is a “powerful interactive shell”. And that’s true - if you use Python interactively, you’ll dig IPython. Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. The standard Python interpreter feels somewhat impoverished without them. %paste Normally when you paste indented code, it won’t work. You can fix it with this magic command. other %magic commands Actually, you can skip the % : paste , run some_script.py , pwd , cd some_dir shell commands with ! For example !ls -las . Alas, aliases from your system shell don’t work. looks good Red and green add a nice touch, especially in a window with dark background. If you have a light background, use %colors LightBG Share your favourite features
2 0.16556717 27 fast ml-2013-05-01-Deep learning made easy
Introduction: As usual, there’s an interesting competition at Kaggle: The Black Box. It’s connected to ICML 2013 Workshop on Challenges in Representation Learning, held by the deep learning guys from Montreal. There are a couple benchmarks for this competition and the best one is unusually hard to beat 1 - only less than a fourth of those taking part managed to do so. We’re among them. Here’s how. The key ingredient in our success is a recently developed secret Stanford technology for deep unsupervised learning: sparse filtering by Jiquan Ngiam et al. Actually, it’s not secret. It’s available at Github , and has one or two very appealling properties. Let us explain. The main idea of deep unsupervised learning, as we understand it, is feature extraction. One of the most common applications is in multimedia. The reason for that is that multimedia tasks, for example object recognition, are easy for humans, but difficult for computers 2 . Geoff Hinton from Toronto talks about two ends
3 0.15730035 14 fast ml-2013-01-04-Madelon: Spearmint's revenge
Introduction: Little Spearmint couldn’t sleep that night. I was so close… - he was thinking. It seemed that he had found a better than default value for one of the random forest hyperparams, but it turned out to be false. He made a decision as he fell asleep: Next time, I will show them! The way to do this is to use a dataset that is known to produce lower error with high mtry values, namely previously mentioned Madelon from NIPS 2003 Feature Selection Challenge. Among 500 attributes, only 20 are informative, the rest are noise. That’s the reason why high mtry is good here: you have to consider a lot of features to find a meaningful one. The dataset consists of a train, validation and test parts, with labels being available for train and validation. We will further split the training set into our train and validation sets, and use the original validation set as a test set to evaluate final results of parameter tuning. As an error measure we use Area Under Curve , or AUC, which was
4 0.15558609 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python
Introduction: We have already written a few articles about Pylearn2 . Today we’ll look at PyBrain. It is another Python neural networks library, and this is where similiarites end. They’re like day and night: Pylearn2 - Byzantinely complicated, PyBrain - simple. We attempted to train a regression model and succeeded at first take (more on this below). Try this with Pylearn2. While there are a few machine learning libraries out there, PyBrain aims to be a very easy-to-use modular library that can be used by entry-level students but still offers the flexibility and algorithms for state-of-the-art research. The library features classic perceptron as well as recurrent neural networks and other things, some of which, for example Evolino , would be hard to find elsewhere. On the downside, PyBrain feels unfinished, abandoned. It is no longer actively developed and the documentation is skimpy. There’s no modern gimmicks like dropout and rectified linear units - just good ol’ sigmoid and ta
5 0.15552555 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision
Introduction: Let’s say that there are some users and some items, like movies, songs or jobs. Each user might be interested in some items. The client asks us to recommend a few items (the number is x) for each user. They will evaluate the results using mean average precision, or MAP, metric. Specifically MAP@x - this means they ask us to recommend x items for each user. So what is this MAP? First, we will get M out of the way. MAP is just an average of APs, or average precision, for all users. In other words, we take the mean for Average Precision, hence Mean Average Precision. If we have 1000 users, we sum APs for each user and divide the sum by 1000. This is MAP. So now, what is AP, or average precision? It may be that we don’t really need to know. But we probably need to know this: we can recommend at most x items for each user it pays to submit all x recommendations, because we are not penalized for bad guesses order matters, so it’s better to submit more certain recommendations fi
6 0.1550281 13 fast ml-2012-12-27-Spearmint with a random forest
7 0.15319425 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
8 0.15101495 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
9 0.14834772 32 fast ml-2013-07-05-Processing large files, line by line
10 0.14798471 20 fast ml-2013-02-18-Predicting advertised salaries
11 0.14663373 43 fast ml-2013-11-02-Maxing out the digits
12 0.14427747 18 fast ml-2013-01-17-A very fast denoising autoencoder
13 0.14396451 17 fast ml-2013-01-14-Feature selection in practice
14 0.14335753 40 fast ml-2013-10-06-Pylearn2 in practice
15 0.14333551 19 fast ml-2013-02-07-The secret of the big guys
16 0.13804826 35 fast ml-2013-08-12-Accelerometer Biometric Competition
17 0.13803239 25 fast ml-2013-04-10-Gender discrimination
18 0.13718739 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
19 0.13628341 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
20 0.13387161 9 fast ml-2012-10-25-So you want to work for Facebook