fast_ml fast_ml-2014 fast_ml-2014-56 knowledge-graph by maker-knowledge-mining

56 fast ml-2014-03-31-If you use R, you may want RStudio


meta infos for this blog

Source: html

Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 It gives the language a bit of a slickness factor it so badly needs. [sent-2, score-0.202]

2 The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. [sent-3, score-0.715]

3 For example, instead of switching for help to a web browser, you have it right next to the console. [sent-4, score-0.328]

4 The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. [sent-5, score-0.844]

5 While saving as image you are presented with a preview and you get to choose the image format and size. [sent-7, score-0.438]

6 That’s the kind of detail that shows how RStudio makes working with R better. [sent-8, score-0.124]

7 You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. [sent-9, score-1.507]

8 The source pane has a run button, so you can execute the current line of code in the console, or select a few lines and execute the whole region. [sent-10, score-1.098]

9 There’s a R script for each unit and Trevor walks you through it in a video. [sent-12, score-0.06]

10 Instead of typing each command (which has its advantages) or copy-and-pasting, you just click once. [sent-13, score-0.072]

11 The workspace shows data, variable values and functions. [sent-14, score-0.272]

12 The history pane is somewhat underpowered - for example one can’t select a few lines (or even just one) and copy them to clipboard. [sent-16, score-0.867]

13 One can send them to the console and to the source pane, though. [sent-17, score-0.607]

14 RStudio also has a few nice color schemes, particularly with dark backgrounds. [sent-18, score-0.371]

15 One shortcoming is that the dark ones don’t mix that well with white background for history, help and plots. [sent-19, score-0.557]

16 The software is available for Linux, Mac and Windows. [sent-20, score-0.059]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('console', 0.394), ('pane', 0.394), ('plots', 0.394), ('rstudio', 0.295), ('history', 0.217), ('workspace', 0.197), ('help', 0.174), ('execute', 0.164), ('dark', 0.164), ('source', 0.131), ('image', 0.106), ('select', 0.104), ('lines', 0.092), ('windows', 0.087), ('nice', 0.082), ('switching', 0.082), ('hastie', 0.082), ('trevor', 0.082), ('advantages', 0.082), ('presented', 0.082), ('button', 0.082), ('background', 0.082), ('mac', 0.082), ('send', 0.082), ('view', 0.082), ('shows', 0.075), ('click', 0.072), ('mix', 0.072), ('preview', 0.072), ('pages', 0.072), ('badly', 0.072), ('saving', 0.072), ('browser', 0.072), ('web', 0.072), ('linux', 0.065), ('screen', 0.065), ('gives', 0.065), ('factor', 0.065), ('white', 0.065), ('statistical', 0.065), ('color', 0.065), ('editor', 0.065), ('copy', 0.06), ('particularly', 0.06), ('unit', 0.06), ('software', 0.059), ('place', 0.056), ('save', 0.056), ('current', 0.049), ('kind', 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 56 fast ml-2014-03-31-If you use R, you may want RStudio

Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec

2 0.082322001 3 fast ml-2012-09-01-Running Unix apps on Windows

Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish

3 0.072947845 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively

Introduction: IPython is known for the notebooks. But the first thing they list on their homepage is a “powerful interactive shell”. And that’s true - if you use Python interactively, you’ll dig IPython. Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. The standard Python interpreter feels somewhat impoverished without them. %paste Normally when you paste indented code, it won’t work. You can fix it with this magic command. other %magic commands Actually, you can skip the % : paste , run some_script.py , pwd , cd some_dir shell commands with ! For example !ls -las . Alas, aliases from your system shell don’t work. looks good Red and green add a nice touch, especially in a window with dark background. If you have a light background, use %colors LightBG Share your favourite features

4 0.052914634 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

Introduction: Object recognition in images is where deep learning, and specifically convolutional neural networks, are often applied and benchmarked these days. To get a piece of the action, we’ll be using Alex Krizhevsky’s cuda-convnet , a shining diamond of machine learning software, in a Kaggle competition. Continuing to run things on a GPU, we turn to applying convolutional neural networks for object recognition. This kind of network was developed by Yann LeCun and it’s powerful, but a bit complicated: Image credit: EBLearn tutorial A typical convolutional network has two parts. The first is responsible for feature extraction and consists of one or more pairs of convolution and subsampling/max-pooling layers, as you can see above. The second part is just a classic fully-connected multilayer perceptron taking extracted features as input. For a detailed explanation of all this see unit 9 in Hugo LaRochelle’s neural networks course . Daniel Nouri has an interesting story about

5 0.049751848 33 fast ml-2013-07-09-Introducing phraug

Introduction: Recently we proposed to pre-process large files line by line. Now it’s time to introduce phraug *, a set of Python scripts based on this idea. The scripts mostly deal with format conversion (CSV, libsvm, VW) and with few other tasks common in machine learning. With phraug you currently can convert from one format to another: csv to libsvm csv to Vowpal Wabbit libsvm to csv libsvm to Vowpal Wabbit tsv to csv And perform some other file operations: count lines in a file sample lines from a file split a file into two randomly split a file into a number of similiarly sized chunks save a continuous subset of lines from a file (for example, first 100) delete specified columns from a csv file normalize (shift and scale) columns in a csv file Basically, there’s always at least one input file and usually one or more output files. An input file always stays unchanged. If you’re familiar with Unix, you may notice that some of these tasks are easily ach

6 0.045181118 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

7 0.04390309 17 fast ml-2013-01-14-Feature selection in practice

8 0.041029673 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

9 0.037388168 47 fast ml-2013-12-15-A-B testing with bayesian bandits in Google Analytics

10 0.036580253 15 fast ml-2013-01-07-Machine learning courses online

11 0.034224302 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

12 0.033995584 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

13 0.032985132 20 fast ml-2013-02-18-Predicting advertised salaries

14 0.031933486 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

15 0.031901091 44 fast ml-2013-11-18-CUDA on a Linux laptop

16 0.030259442 38 fast ml-2013-09-09-Predicting solar energy from weather forecasts plus a NetCDF4 tutorial

17 0.02964719 41 fast ml-2013-10-09-Big data made easy

18 0.028547259 25 fast ml-2013-04-10-Gender discrimination

19 0.028281126 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

20 0.028034175 27 fast ml-2013-05-01-Deep learning made easy


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.112), (1, -0.044), (2, 0.044), (3, -0.008), (4, -0.114), (5, 0.066), (6, -0.128), (7, 0.104), (8, 0.208), (9, 0.003), (10, 0.252), (11, -0.042), (12, 0.164), (13, 0.294), (14, -0.217), (15, 0.089), (16, 0.119), (17, -0.229), (18, 0.087), (19, -0.002), (20, -0.015), (21, -0.021), (22, 0.078), (23, 0.034), (24, 0.043), (25, 0.071), (26, -0.133), (27, 0.063), (28, 0.319), (29, -0.156), (30, 0.335), (31, -0.232), (32, -0.014), (33, 0.083), (34, 0.329), (35, -0.118), (36, -0.079), (37, 0.206), (38, -0.077), (39, 0.145), (40, 0.122), (41, -0.02), (42, -0.012), (43, 0.122), (44, 0.131), (45, -0.042), (46, -0.011), (47, -0.014), (48, 0.002), (49, 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98879981 56 fast ml-2014-03-31-If you use R, you may want RStudio

Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec

2 0.1257717 3 fast ml-2012-09-01-Running Unix apps on Windows

Introduction: When it comes to machine learning, most software seems to be in either Python, Matlab or R. Plus native apps, that is, compiled C/C++. These are fastest. Most of them is written for Unix environments, for example Linux or MacOS. So how do you run them on your computer if you have Windows installed? Back in the day, you re-partitioned your hard drive and installed Linux alongside Windows. The added thrill was, if something went wrong, your computer wouldn’t boot. Now it’s easier. You just run Linux inside Windows, using what’s called a virtual machine. You need virtualization software and a machine image to do so. Most popular software seems to be VMware. There is also VirtualBox - it is able to run VMware images. We have experience with WMware mostly, so this is what we’ll refer to. VMware player is free to download and use. There are also many images available, of various flavours of Linux and other operating systems. In fact, you can run Windows inside Linux if you wish

3 0.089845784 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

Introduction: Much of data in machine learning is sparse, that is mostly zeros, and often binary. The phenomenon may result from converting categorical variables to one-hot vectors, and from converting text to bag-of-words representation. If each feature is binary - either zero or one - then it holds exactly one bit of information. Surely we could somehow compress such data to fewer real numbers. To do this, we turn to topic models, an area of research with roots in natural language processing. In NLP, a training set is called a corpus, and each document is like a row in the set. A document might be three pages of text, or just a few words, as in a tweet. The idea of topic modelling is that you can group words in your corpus into relatively few topics and represent each document as a mixture of these topics. It’s attractive because you can interpret the model by looking at words that form the topics. Sometimes they seem meaningful, sometimes not. A meaningful topic might be, for example: “cric

4 0.089018978 51 fast ml-2014-01-25-Why IPy: reasons for using IPython interactively

Introduction: IPython is known for the notebooks. But the first thing they list on their homepage is a “powerful interactive shell”. And that’s true - if you use Python interactively, you’ll dig IPython. Here are a few features that make a difference for us: command line history and autocomplete These are basic shell features you come to expect and rely upon after a first contact with a Unix shell. The standard Python interpreter feels somewhat impoverished without them. %paste Normally when you paste indented code, it won’t work. You can fix it with this magic command. other %magic commands Actually, you can skip the % : paste , run some_script.py , pwd , cd some_dir shell commands with ! For example !ls -las . Alas, aliases from your system shell don’t work. looks good Red and green add a nice touch, especially in a window with dark background. If you have a light background, use %colors LightBG Share your favourite features

5 0.088289857 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

Introduction: Object recognition in images is where deep learning, and specifically convolutional neural networks, are often applied and benchmarked these days. To get a piece of the action, we’ll be using Alex Krizhevsky’s cuda-convnet , a shining diamond of machine learning software, in a Kaggle competition. Continuing to run things on a GPU, we turn to applying convolutional neural networks for object recognition. This kind of network was developed by Yann LeCun and it’s powerful, but a bit complicated: Image credit: EBLearn tutorial A typical convolutional network has two parts. The first is responsible for feature extraction and consists of one or more pairs of convolution and subsampling/max-pooling layers, as you can see above. The second part is just a classic fully-connected multilayer perceptron taking extracted features as input. For a detailed explanation of all this see unit 9 in Hugo LaRochelle’s neural networks course . Daniel Nouri has an interesting story about

6 0.086830363 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

7 0.078390047 27 fast ml-2013-05-01-Deep learning made easy

8 0.077164859 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

9 0.072616145 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

10 0.067310251 17 fast ml-2013-01-14-Feature selection in practice

11 0.067151234 33 fast ml-2013-07-09-Introducing phraug

12 0.067142427 15 fast ml-2013-01-07-Machine learning courses online

13 0.062215682 38 fast ml-2013-09-09-Predicting solar energy from weather forecasts plus a NetCDF4 tutorial

14 0.059290744 25 fast ml-2013-04-10-Gender discrimination

15 0.058722299 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

16 0.058029357 20 fast ml-2013-02-18-Predicting advertised salaries

17 0.057500184 52 fast ml-2014-02-02-Yesterday a kaggler, today a Kaggle master: a wrap-up of the cats and dogs competition

18 0.056356169 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

19 0.054689832 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

20 0.053663597 2 fast ml-2012-08-27-Kaggle job recommendation challenge


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.025), (26, 0.035), (55, 0.057), (69, 0.064), (71, 0.016), (91, 0.649), (99, 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91031152 56 fast ml-2014-03-31-If you use R, you may want RStudio

Introduction: RStudio is an IDE for R. It gives the language a bit of a slickness factor it so badly needs. The nice thing about the software, beside good looks, is that it integrates console, help pages, plots and editor (if you want it) in one place. For example, instead of switching for help to a web browser, you have it right next to the console. The same thing with plots: in place of the usual overlapping windows, all plots go to one pane where you can navigate back and forth between them with arrows. You can save a plot as an image or as a PDF. While saving as image you are presented with a preview and you get to choose the image format and size. That’s the kind of detail that shows how RStudio makes working with R better. You can have up to four panes on the screen: console source / data plots / help / files history and workspace Arrange them in any way you like. The source pane has a run button, so you can execute the current line of code in the console, or selec

2 0.12366946 32 fast ml-2013-07-05-Processing large files, line by line

Introduction: Perhaps the most common format of data for machine learning is text files. Often data is too large to fit in memory; this is sometimes referred to as big data. But do you need to load the whole data into memory? Maybe you could at least pre-process it line by line. We show how to do this with Python. Prepare to read and possibly write some code. The most common format for text files is probably CSV. For sparse data, libsvm format is popular. Both can be processed using csv module in Python. import csv i_f = open( input_file, 'r' ) reader = csv.reader( i_f ) For libsvm you just set the delimiter to space: reader = csv.reader( i_f, delimiter = ' ' ) Then you go over the file contents. Each line is a list of strings: for line in reader: # do something with the line, for example: label = float( line[0] ) # .... writer.writerow( line ) If you need to do a second pass, you just rewind the input file: i_f.seek( 0 ) for line in re

3 0.12217782 7 fast ml-2012-10-05-Predicting closed questions on Stack Overflow

Introduction: This time we enter the Stack Overflow challenge , which is about predicting a status of a given question on SO. There are five possible statuses, so it’s a multi-class classification problem. We would prefer a tool able to perform multiclass classification by itself. It can be done by hand by constructing five datasets, each with binary labels (one class against all others), and then combining predictions, but it might be a bit tricky to get right - we tried. Fortunately, nice people at Yahoo, excuse us, Microsoft, recently relased a new version of Vowpal Wabbit , and this new version supports multiclass classification. In case you’re wondering, Vowpal Wabbit is a fast linear learner. We like the “fast” part and “linear” is OK for dealing with lots of words, as in this contest. In any case, with more than three million data points it wouldn’t be that easy to train a kernel SVM, a neural net or what have you. VW, being a well-polished tool, has a few very convenient features.

4 0.1175337 20 fast ml-2013-02-18-Predicting advertised salaries

Introduction: We’re back to Kaggle competitions. This time we will attempt to predict advertised salaries from job ads and of course beat the benchmark. The benchmark is, as usual, a random forest result. For starters, we’ll use a linear model without much preprocessing. Will it be enough? Congratulations! You have spotted the ceiling cat. A linear model better than a random forest - how so? Well, to train a random forest on data this big, the benchmark code extracts only 100 most common words as features, and we will use all. This approach is similiar to the one we applied in Merck challenge . More data beats a cleverer algorithm, especially when a cleverer algorithm is unable to handle all of data (on your machine, anyway). The competition is about predicting salaries from job adverts. Of course the figures usually appear in the text, so they were removed. An error metric is mean absolute error (MAE) - how refreshing to see so intuitive one. The data for Job salary prediction con

5 0.10982095 25 fast ml-2013-04-10-Gender discrimination

Introduction: There’s a contest at Kaggle held by Qatar University. They want to be able to discriminate men from women based on handwriting. For a thousand bucks, well, why not? Congratulations! You have spotted the ceiling cat. As Sashi noticed on the forums , it’s not difficult to improve on the benchmarks a little bit. In particular, he mentioned feature selection, normalizing the data and using a regularized linear model. Here’s our version of the story. Let’s start with normalizing. There’s a nice function for that in R, scale() . The dataset is small, 1128 examples, so we can go ahead and use R. Turns out that in its raw form, the data won’t scale. That’s because apparently there are some columns with zeros only. That makes it difficult to divide. Fortunately, we know just the right tool for the task. We learned about it on Kaggle forums too. It’s a function in caret package called nearZeroVar() . It will give you indexes of all the columns which have near zero var

6 0.10808226 33 fast ml-2013-07-09-Introducing phraug

7 0.10746779 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

8 0.10655983 17 fast ml-2013-01-14-Feature selection in practice

9 0.10623839 35 fast ml-2013-08-12-Accelerometer Biometric Competition

10 0.10617024 40 fast ml-2013-10-06-Pylearn2 in practice

11 0.10495853 13 fast ml-2012-12-27-Spearmint with a random forest

12 0.10465599 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

13 0.1041629 27 fast ml-2013-05-01-Deep learning made easy

14 0.10307463 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data

15 0.10235558 43 fast ml-2013-11-02-Maxing out the digits

16 0.10155788 1 fast ml-2012-08-09-What you wanted to know about Mean Average Precision

17 0.10115141 9 fast ml-2012-10-25-So you want to work for Facebook

18 0.10094336 14 fast ml-2013-01-04-Madelon: Spearmint's revenge

19 0.10082833 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

20 0.10069567 16 fast ml-2013-01-12-Intro to random forests