fast_ml fast_ml-2012 fast_ml-2012-3 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish
sentIndex sentText sentNum sentScore
1 When it comes to machine learning, most software seems to be in either Python, Matlab or R. [sent-1, score-0.073]
2 So how do you run them on your computer if you have Windows installed? [sent-5, score-0.171]
3 Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. [sent-6, score-0.134]
4 The added thrill was, if something went wrong, your computer wouldn’t boot. [sent-7, score-0.19]
5 You just run Linux inside Windows, using what’s called a virtual machine. [sent-9, score-0.489]
6 You need virtualization software and a machine image to do so. [sent-10, score-0.131]
7 There is also VirtualBox - it is able to run VMware images. [sent-12, score-0.144]
8 We have experience with WMware mostly, so this is what we’ll refer to. [sent-13, score-0.161]
9 There are also many images available, of various flavours of Linux and other operating systems. [sent-15, score-0.202]
10 In fact, you can run Windows inside Linux if you wish to. [sent-16, score-0.395]
11 But we want to run Linux inside Windows, because we are used to Windows and only want to run Unix apps. [sent-17, score-0.53]
12 When you run a virtual machine, you want to be able to copy files between the host (Windows) and guest (Linux), and also use copy-and-paste between them. [sent-18, score-0.908]
13 Otherwise, you install them in a guest environment (Linux in this case). [sent-21, score-0.345]
14 Play the image in a player and you have Linux on your computer. [sent-23, score-0.26]
15 The apps are mostly command-line, so you just open a terminal, download or copy some files, and proceed. [sent-24, score-0.495]
16 If you don’t know Unix command line, it’s relatively easy but much more powerful than Windows command line. [sent-25, score-0.185]
17 You’ll need just basic commands like cd, ls, mv, cp etc. [sent-26, score-0.054]
18 VMware has a feature called Unity - it allows you to have guest apps directly on host desktop, out of the player’s window. [sent-28, score-0.702]
19 It’s an amazing experience to have a Unix terminal on a Windows taskbar. [sent-29, score-0.346]
wordName wordTfidf (topN-words)
[('linux', 0.483), ('vmware', 0.404), ('windows', 0.286), ('guest', 0.242), ('inside', 0.242), ('player', 0.202), ('apps', 0.178), ('download', 0.161), ('host', 0.161), ('terminal', 0.161), ('virtual', 0.161), ('unix', 0.16), ('installed', 0.134), ('experience', 0.118), ('plus', 0.107), ('copy', 0.098), ('run', 0.086), ('computer', 0.085), ('images', 0.076), ('software', 0.073), ('command', 0.068), ('wrong', 0.067), ('amazing', 0.067), ('compiled', 0.067), ('directly', 0.067), ('flavours', 0.067), ('ls', 0.067), ('native', 0.067), ('wish', 0.067), ('added', 0.059), ('desktop', 0.059), ('downloading', 0.059), ('operating', 0.059), ('play', 0.059), ('able', 0.058), ('image', 0.058), ('mostly', 0.058), ('want', 0.058), ('allows', 0.054), ('cd', 0.054), ('commands', 0.054), ('environment', 0.054), ('wouldn', 0.049), ('install', 0.049), ('powerful', 0.049), ('day', 0.046), ('free', 0.046), ('went', 0.046), ('files', 0.044), ('refer', 0.043)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 3 fast ml-2012-09-01-Running Unix apps on Windows
Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish
2 0.082322001 56 fast ml-2014-03-31-If you use R, you may want RStudio
Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec
3 0.073311172 33 fast ml-2013-07-09-Introducing phraug
Introduction: Recently we proposed to pre-process large files line by line. Now it’s time to introduce phraug *, a set of Python scripts based on this idea. The scripts mostly deal with format conversion (CSV, libsvm, VW) and with few other tasks common in machine learning. With phraug you currently can convert from one format to another: csv to libsvm csv to Vowpal Wabbit libsvm to csv libsvm to Vowpal Wabbit tsv to csv And perform some other file operations: count lines in a file sample lines from a file split a file into two randomly split a file into a number of similiarly sized chunks save a continuous subset of lines from a file (for example, first 100) delete specified columns from a csv file normalize (shift and scale) columns in a csv file Basically, there’s always at least one input file and usually one or more output files. An input file always stays unchanged. If you’re familiar with Unix, you may notice that some of these tasks are easily ach
4 0.071655646 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
Introduction: Out of 215 contestants, we placed 8th in the Cats and Dogs competition at Kaggle. The top ten finish gave us the master badge. The competition was about discerning the animals in images and here’s how we did it. We extracted the features using pre-trained deep convolutional networks, specifically decaf and OverFeat . Then we trained some classifiers on these features. The whole thing was inspired by Kyle Kastner’s decaf + pylearn2 combo and we expanded this idea. The classifiers were linear models from scikit-learn and a neural network from Pylearn2 . At the end we created a voting ensemble of the individual models. OverFeat features We touched on OverFeat in Classifying images with a pre-trained deep network . A better way to use it in this competition’s context is to extract the features from the layer before the classifier, as Pierre Sermanet suggested in the comments. Concretely, in the larger OverFeat model ( -l ) layer 24 is the softmax, at least in the
5 0.069694147 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
Introduction: Object recognition in images is where deep learning, and specifically convolutional neural networks, are often applied and benchmarked these days. To get a piece of the action, we’ll be using Alex Krizhevsky’s cuda-convnet , a shining diamond of machine learning software, in a Kaggle competition. Continuing to run things on a GPU, we turn to applying convolutional neural networks for object recognition. This kind of network was developed by Yann LeCun and it’s powerful, but a bit complicated: Image credit: EBLearn tutorial A typical convolutional network has two parts. The first is responsible for feature extraction and consists of one or more pairs of convolution and subsampling/max-pooling layers, as you can see above. The second part is just a classic fully-connected multilayer perceptron taking extracted features as input. For a detailed explanation of all this see unit 9 in Hugo LaRochelle’s neural networks course . Daniel Nouri has an interesting story about
6 0.065045699 44 fast ml-2013-11-18-CUDA on a Linux laptop
7 0.054361328 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
8 0.041637242 32 fast ml-2013-07-05-Processing large files, line by line
9 0.0414006 34 fast ml-2013-07-14-Running things on a GPU
10 0.041016258 37 fast ml-2013-09-03-Our followers and who else they follow
11 0.040937897 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network
12 0.038518257 20 fast ml-2013-02-18-Predicting advertised salaries
13 0.038355298 40 fast ml-2013-10-06-Pylearn2 in practice
14 0.036096536 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
15 0.027905341 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn
16 0.026999788 25 fast ml-2013-04-10-Gender discrimination
17 0.026151389 39 fast ml-2013-09-19-What you wanted to know about AUC
18 0.022591468 54 fast ml-2014-03-06-PyBrain - a simple neural networks library in Python
19 0.022587558 17 fast ml-2013-01-14-Feature selection in practice
20 0.021897454 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
topicId topicWeight
[(0, 0.109), (1, -0.019), (2, 0.069), (3, -0.006), (4, -0.224), (5, 0.029), (6, -0.16), (7, 0.076), (8, 0.18), (9, -0.067), (10, 0.118), (11, -0.078), (12, 0.21), (13, 0.285), (14, -0.078), (15, 0.081), (16, 0.126), (17, -0.139), (18, 0.114), (19, -0.004), (20, -0.043), (21, 0.112), (22, 0.249), (23, 0.001), (24, -0.061), (25, 0.102), (26, 0.196), (27, -0.116), (28, -0.099), (29, -0.328), (30, -0.344), (31, 0.08), (32, 0.21), (33, 0.212), (34, -0.298), (35, -0.093), (36, -0.014), (37, -0.062), (38, 0.01), (39, -0.038), (40, -0.171), (41, -0.024), (42, -0.024), (43, -0.034), (44, -0.087), (45, 0.131), (46, 0.045), (47, 0.071), (48, -0.019), (49, 0.084)]
simIndex simValue blogId blogTitle
same-blog 1 0.99231148 3 fast ml-2012-09-01-Running Unix apps on Windows
Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish
2 0.13942011 56 fast ml-2014-03-31-If you use R, you may want RStudio
Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec
3 0.1333973 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
Introduction: Out of 215 contestants, we placed 8th in the Cats and Dogs competition at Kaggle. The top ten finish gave us the master badge. The competition was about discerning the animals in images and here’s how we did it. We extracted the features using pre-trained deep convolutional networks, specifically decaf and OverFeat . Then we trained some classifiers on these features. The whole thing was inspired by Kyle Kastner’s decaf + pylearn2 combo and we expanded this idea. The classifiers were linear models from scikit-learn and a neural network from Pylearn2 . At the end we created a voting ensemble of the individual models. OverFeat features We touched on OverFeat in Classifying images with a pre-trained deep network . A better way to use it in this competition’s context is to extract the features from the layer before the classifier, as Pierre Sermanet suggested in the comments. Concretely, in the larger OverFeat model ( -l ) layer 24 is the softmax, at least in the
4 0.11582585 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
Introduction: Object recognition in images is where deep learning, and specifically convolutional neural networks, are often applied and benchmarked these days. To get a piece of the action, we’ll be using Alex Krizhevsky’s cuda-convnet , a shining diamond of machine learning software, in a Kaggle competition. Continuing to run things on a GPU, we turn to applying convolutional neural networks for object recognition. This kind of network was developed by Yann LeCun and it’s powerful, but a bit complicated: Image credit: EBLearn tutorial A typical convolutional network has two parts. The first is responsible for feature extraction and consists of one or more pairs of convolution and subsampling/max-pooling layers, as you can see above. The second part is just a classic fully-connected multilayer perceptron taking extracted features as input. For a detailed explanation of all this see unit 9 in Hugo LaRochelle’s neural networks course . Daniel Nouri has an interesting story about
5 0.10732578 33 fast ml-2013-07-09-Introducing phraug
Introduction: Recently we proposed to pre-process large files line by line. Now it’s time to introduce phraug *, a set of Python scripts based on this idea. The scripts mostly deal with format conversion (CSV, libsvm, VW) and with few other tasks common in machine learning. With phraug you currently can convert from one format to another: csv to libsvm csv to Vowpal Wabbit libsvm to csv libsvm to Vowpal Wabbit tsv to csv And perform some other file operations: count lines in a file sample lines from a file split a file into two randomly split a file into a number of similiarly sized chunks save a continuous subset of lines from a file (for example, first 100) delete specified columns from a csv file normalize (shift and scale) columns in a csv file Basically, there’s always at least one input file and usually one or more output files. An input file always stays unchanged. If you’re familiar with Unix, you may notice that some of these tasks are easily ach
6 0.10489228 44 fast ml-2013-11-18-CUDA on a Linux laptop
7 0.079793237 37 fast ml-2013-09-03-Our followers and who else they follow
8 0.079430796 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively
9 0.078079551 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
10 0.073044091 34 fast ml-2013-07-14-Running things on a GPU
11 0.071423888 20 fast ml-2013-02-18-Predicting advertised salaries
12 0.07120043 40 fast ml-2013-10-06-Pylearn2 in practice
13 0.056688305 25 fast ml-2013-04-10-Gender discrimination
14 0.05651015 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network
15 0.051610403 28 fast ml-2013-05-12-And deliver us from Weka
16 0.050028764 8 fast ml-2012-10-15-Merck challenge
17 0.049515016 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn
18 0.047308281 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
19 0.046694502 17 fast ml-2013-01-14-Feature selection in practice
20 0.046245277 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
topicId topicWeight
[(8, 0.612), (26, 0.03), (31, 0.014), (35, 0.038), (55, 0.06), (69, 0.064), (71, 0.015), (84, 0.028), (99, 0.011)]
simIndex simValue blogId blogTitle
same-blog 1 0.93795377 3 fast ml-2012-09-01-Running Unix apps on Windows
Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish
2 0.12918322 32 fast ml-2013-07-05-Processing large files, line by line
Introduction: Perhaps the most common format of data for machine learning is text files. Often data is too large to fit in memory; this is sometimes referred to as big data. But do you need to load the whole data into memory? Maybe you could at least pre-process it line by line. We show how to do this with Python. Prepare to read and possibly write some code. The most common format for text files is probably CSV. For sparse data, libsvm format is popular. Both can be processed using csv module in Python. import csv i_f = open( input_file, 'r' ) reader = csv.reader( i_f ) For libsvm you just set the delimiter to space: reader = csv.reader( i_f, delimiter = ' ' ) Then you go over the file contents. Each line is a list of strings: for line in reader: # do something with the line, for example: label = float( line[0] ) # .... writer.writerow( line ) If you need to do a second pass, you just rewind the input file: i_f.seek( 0 ) for line in re
3 0.12626931 9 fast ml-2012-10-25-So you want to work for Facebook
Introduction: Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent . They won’t pay, but just might interview. This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work . To honor this, we won’t publish any code. But own work doesn’t mean original work , and we wouldn’t want to reinvent the wheel, would we? The contest differs substantially from a Kaggle stereotype, if there is such a thing, in three major ways: there’s no money prizes, as mentioned above it’s not a real world problem, but rather an assignment to screen job candidates (this has important consequences, described below) it’s not a typical machine learning project, but rather a broader AI exercise You are given a graph of the internet, actually a snapshot of the graph for each of 15 time steps. You are also given a bunch of paths in this graph, which a
4 0.12335549 20 fast ml-2013-02-18-Predicting advertised salaries
Introduction: We’re back to Kaggle competitions. This time we will attempt to predict advertised salaries from job ads and of course beat the benchmark. The benchmark is, as usual, a random forest result. For starters, we’ll use a linear model without much preprocessing. Will it be enough? Congratulations! You have spotted the ceiling cat. A linear model better than a random forest - how so? Well, to train a random forest on data this big, the benchmark code extracts only 100 most common words as features, and we will use all. This approach is similiar to the one we applied in Merck challenge . More data beats a cleverer algorithm, especially when a cleverer algorithm is unable to handle all of data (on your machine, anyway). The competition is about predicting salaries from job adverts. Of course the figures usually appear in the text, so they were removed. An error metric is mean absolute error (MAE) - how refreshing to see so intuitive one. The data for Job salary prediction con
5 0.12227575 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition
Introduction: Out of 215 contestants, we placed 8th in the Cats and Dogs competition at Kaggle. The top ten finish gave us the master badge. The competition was about discerning the animals in images and here’s how we did it. We extracted the features using pre-trained deep convolutional networks, specifically decaf and OverFeat . Then we trained some classifiers on these features. The whole thing was inspired by Kyle Kastner’s decaf + pylearn2 combo and we expanded this idea. The classifiers were linear models from scikit-learn and a neural network from Pylearn2 . At the end we created a voting ensemble of the individual models. OverFeat features We touched on OverFeat in Classifying images with a pre-trained deep network . A better way to use it in this competition’s context is to extract the features from the layer before the classifier, as Pierre Sermanet suggested in the comments. Concretely, in the larger OverFeat model ( -l ) layer 24 is the softmax, at least in the
6 0.12152379 33 fast ml-2013-07-09-Introducing phraug
7 0.12051222 40 fast ml-2013-10-06-Pylearn2 in practice
8 0.12018503 56 fast ml-2014-03-31-If you use R, you may want RStudio
9 0.11993629 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
10 0.11904623 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
11 0.11884025 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network
12 0.11764571 17 fast ml-2013-01-14-Feature selection in practice
13 0.11736337 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
14 0.11733343 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
15 0.11716962 13 fast ml-2012-12-27-Spearmint with a random forest
16 0.11581259 25 fast ml-2013-04-10-Gender discrimination
17 0.1135931 27 fast ml-2013-05-01-Deep learning made easy
18 0.11337547 11 fast ml-2012-12-07-Predicting wine quality
19 0.11312464 18 fast ml-2013-01-17-A very fast denoising autoencoder
20 0.11212978 19 fast ml-2013-02-07-The secret of the big guys