fast_ml fast_ml-2013 fast_ml-2013-37 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
sentIndex sentText sentNum sentScore
1 To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. [sent-2, score-0.578]
2 Our followers This table show our 20 most popular followers as measeared by their follower count. [sent-5, score-0.673]
3 Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. [sent-8, score-0.099]
4 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! [sent-10, score-0.247]
5 4582 thuske thuske & 4442 ram Ram Ravichandran Software Research Engineer at Twitter. [sent-11, score-0.28]
6 Data crunchist @opendns 2651 jtoy jtoy I study intelligence, ponder consciousness, build machine learning systems, program in clojure/R/ruby/python, and start companies. [sent-15, score-0.281]
7 co/iU3TYN84O8 2231 bscofield Ben Scofield RailsConf Program Chair; Director of Ruby Central; mostly a nice guy 1863 johncandido John Candido Developing predictive algorithms for financial markets and sports analytics. [sent-20, score-0.18]
8 ESPN Insider contributor 1849 arnicas Lynn Cherny Data Consultant (Data Analysis/Mining/Graphical Data); Infovis, Python, R, Stats, Javascript, Gender, Science Fiction, TV, ex-researcher with Stanford PhD. [sent-22, score-0.148]
9 #b2b #saas #marketing #stanford 1492 medicalphysics medicalphysics Serious Medical Physics and Fun ! [sent-25, score-0.198]
10 1329 nikolaypavlov Nikolay Pavlov Dreamer, entrepreneur and simply curious man. [sent-27, score-0.099]
11 1196 GaelVaroquaux Gael Varoquaux Researcher and geek: science + Python + data = brain imaging + scikit-learn + machine learning + Mayavi + physics (& traveling… sailing… hiking…) 1085 treycausey Trey Causey Data scientist. [sent-29, score-0.582]
12 1026 statisticsblog Statistics Blog I blog about probability & statistics with a focus on Monte Carlo simulations & epistemology: http://t. [sent-33, score-0.284]
13 Who else they follow This table shows the most common friends of our followers, that is who they follow. [sent-41, score-0.198]
14 41% 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! [sent-50, score-0.247]
15 34% 374533 googleresearch Google Research At Google, research is performed company wide; we conduct and leverage research to build large-scale systems that are used in the real world. [sent-52, score-0.393]
16 27% 13938 peteskomoroch Peter Skomoroch My mission is to create intelligent systems that help people make better decisions. [sent-59, score-0.131]
17 27% 73453 CompSciFact Computer Science One fact per day from computer science M-F from @JohnDCook. [sent-62, score-0.629]
18 One tip per day M-F plus occasional other tweets. [sent-64, score-0.267]
19 26% 108989 coursera Coursera Learning without limits. [sent-66, score-0.123]
20 Sometimes I blog about research and playing with machine learning 20% 27247 ProbFact Probability Fact One probability fact per day M-F from @JohnDCook. [sent-74, score-0.568]
wordName wordTfidf (topN-words)
[('science', 0.331), ('followers', 0.287), ('scientist', 0.205), ('mining', 0.205), ('http', 0.18), ('contributor', 0.148), ('systems', 0.131), ('research', 0.131), ('computer', 0.13), ('coursera', 0.123), ('ben', 0.108), ('blog', 0.108), ('python', 0.099), ('braaains', 0.099), ('clojure', 0.099), ('datageek', 0.099), ('director', 0.099), ('entrepreneur', 0.099), ('follow', 0.099), ('geek', 0.099), ('graham', 0.099), ('jtoy', 0.099), ('kdnuggets', 0.099), ('lorica', 0.099), ('medicalphysics', 0.099), ('nlproc', 0.099), ('occasional', 0.099), ('ogrisel', 0.099), ('olivier', 0.099), ('pankaj', 0.099), ('pig', 0.099), ('rstats', 0.099), ('stats', 0.099), ('table', 0.099), ('thuske', 0.099), ('stanford', 0.098), ('statistics', 0.098), ('making', 0.098), ('predictive', 0.098), ('data', 0.086), ('day', 0.084), ('per', 0.084), ('machine', 0.083), ('sports', 0.082), ('grisel', 0.082), ('engineer', 0.082), ('linked', 0.082), ('physics', 0.082), ('ram', 0.082), ('probability', 0.078)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 37 fast ml-2013-09-03-Our followers and who else they follow
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
2 0.19533667 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
3 0.13928397 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
4 0.13892294 41 fast ml-2013-10-09-Big data made easy
Introduction: An overview of key points about big data. This post was inspired by a very good article about big data by Chris Stucchio (linked below). The article is about hype and technology. We hate the hype. Big data is hype Everybody talks about big data; nobody knows exactly what it is. That’s pretty much the definition of hype. Google Trends suggest that the term took off at the beginning of 2011 (and the searches are coming mainly from Asia, curiously). Now, to put things in context: Big data is right there (or maybe not quite yet?) with other slogans like web 2.0 , cloud computing and social media . In effect, big data is a generic term for: data science machine learning data mining predictive analytics and so on. Don’t believe us? What about James Goodnight, the CEO of SAS : The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing. Before cloud computing it was data warehousing or
5 0.10335237 22 fast ml-2013-03-07-Choosing a machine learning algorithm
Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like
6 0.056763109 28 fast ml-2013-05-12-And deliver us from Weka
7 0.050540872 47 fast ml-2013-12-15-A-B testing with bayesian bandits in Google Analytics
8 0.05006652 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview
9 0.047261126 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn
10 0.045499906 27 fast ml-2013-05-01-Deep learning made easy
11 0.044519115 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
12 0.043692231 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
13 0.041672055 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
14 0.041509125 6 fast ml-2012-09-25-Best Buy mobile contest - full disclosure
15 0.041016258 3 fast ml-2012-09-01-Running Unix apps on Windows
16 0.04037514 40 fast ml-2013-10-06-Pylearn2 in practice
17 0.040317737 38 fast ml-2013-09-09-Predicting solar energy from weather forecasts plus a NetCDF4 tutorial
18 0.040180355 5 fast ml-2012-09-19-Best Buy mobile contest - big data
19 0.040002149 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers
20 0.038266771 10 fast ml-2012-11-17-The Facebook challenge HOWTO
topicId topicWeight
[(0, 0.174), (1, 0.1), (2, 0.233), (3, -0.098), (4, 0.122), (5, 0.18), (6, -0.053), (7, 0.187), (8, 0.417), (9, 0.179), (10, -0.215), (11, 0.094), (12, 0.027), (13, 0.057), (14, 0.124), (15, -0.019), (16, 0.043), (17, -0.09), (18, -0.056), (19, -0.009), (20, -0.06), (21, -0.045), (22, -0.154), (23, -0.021), (24, 0.157), (25, -0.003), (26, 0.192), (27, 0.083), (28, -0.099), (29, 0.077), (30, -0.046), (31, 0.112), (32, -0.121), (33, 0.008), (34, -0.062), (35, -0.018), (36, -0.039), (37, -0.014), (38, -0.201), (39, 0.029), (40, -0.162), (41, 0.108), (42, -0.177), (43, 0.181), (44, 0.117), (45, -0.071), (46, -0.053), (47, -0.157), (48, -0.102), (49, -0.014)]
simIndex simValue blogId blogTitle
same-blog 1 0.9742195 37 fast ml-2013-09-03-Our followers and who else they follow
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
2 0.43786439 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
3 0.32187665 41 fast ml-2013-10-09-Big data made easy
Introduction: An overview of key points about big data. This post was inspired by a very good article about big data by Chris Stucchio (linked below). The article is about hype and technology. We hate the hype. Big data is hype Everybody talks about big data; nobody knows exactly what it is. That’s pretty much the definition of hype. Google Trends suggest that the term took off at the beginning of 2011 (and the searches are coming mainly from Asia, curiously). Now, to put things in context: Big data is right there (or maybe not quite yet?) with other slogans like web 2.0 , cloud computing and social media . In effect, big data is a generic term for: data science machine learning data mining predictive analytics and so on. Don’t believe us? What about James Goodnight, the CEO of SAS : The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing. Before cloud computing it was data warehousing or
4 0.30539009 22 fast ml-2013-03-07-Choosing a machine learning algorithm
Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like
5 0.27515599 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v
6 0.14912881 28 fast ml-2013-05-12-And deliver us from Weka
7 0.11983273 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn
8 0.11446805 6 fast ml-2012-09-25-Best Buy mobile contest - full disclosure
9 0.11380965 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit
10 0.11347861 38 fast ml-2013-09-09-Predicting solar energy from weather forecasts plus a NetCDF4 tutorial
11 0.11297888 25 fast ml-2013-04-10-Gender discrimination
12 0.10339084 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers
13 0.10198293 43 fast ml-2013-11-02-Maxing out the digits
14 0.1005353 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview
15 0.098162882 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
16 0.096944436 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition
17 0.093103275 3 fast ml-2012-09-01-Running Unix apps on Windows
18 0.091894083 32 fast ml-2013-07-05-Processing large files, line by line
19 0.09031567 27 fast ml-2013-05-01-Deep learning made easy
20 0.081626445 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data
topicId topicWeight
[(15, 0.015), (30, 0.046), (31, 0.023), (69, 0.045), (71, 0.039), (78, 0.01), (79, 0.018), (84, 0.645), (92, 0.026), (99, 0.044)]
simIndex simValue blogId blogTitle
same-blog 1 0.96784127 37 fast ml-2013-09-03-Our followers and who else they follow
Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window. Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past. 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! 4582 thuske thuske & 4442 ram Ram Ravichandran So
2 0.82811838 11 fast ml-2012-12-07-Predicting wine quality
Introduction: This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code , and if you want to get right to the point, skip to the charts . There’s a book by Philipp Janert called Data Analysis with Open Source Tools , which, by the way, we would recommend. From this book we found out about the wine quality datasets . There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine. We’ll be looking at white and red wine separately for the reasons you will see shortly. Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. What’s interesting about this plot is that judging by the first tw
3 0.26331872 15 fast ml-2013-01-07-Machine learning courses online
Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m
4 0.18451078 41 fast ml-2013-10-09-Big data made easy
Introduction: An overview of key points about big data. This post was inspired by a very good article about big data by Chris Stucchio (linked below). The article is about hype and technology. We hate the hype. Big data is hype Everybody talks about big data; nobody knows exactly what it is. That’s pretty much the definition of hype. Google Trends suggest that the term took off at the beginning of 2011 (and the searches are coming mainly from Asia, curiously). Now, to put things in context: Big data is right there (or maybe not quite yet?) with other slogans like web 2.0 , cloud computing and social media . In effect, big data is a generic term for: data science machine learning data mining predictive analytics and so on. Don’t believe us? What about James Goodnight, the CEO of SAS : The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing. Before cloud computing it was data warehousing or
5 0.17356807 9 fast ml-2012-10-25-So you want to work for Facebook
Introduction: Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent . They won’t pay, but just might interview. This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work . To honor this, we won’t publish any code. But own work doesn’t mean original work , and we wouldn’t want to reinvent the wheel, would we? The contest differs substantially from a Kaggle stereotype, if there is such a thing, in three major ways: there’s no money prizes, as mentioned above it’s not a real world problem, but rather an assignment to screen job candidates (this has important consequences, described below) it’s not a typical machine learning project, but rather a broader AI exercise You are given a graph of the internet, actually a snapshot of the graph for each of 15 time steps. You are also given a bunch of paths in this graph, which a
6 0.16813244 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA
7 0.14399756 53 fast ml-2014-02-20-Are stocks predictable?
8 0.13571264 46 fast ml-2013-12-07-13 NIPS papers that caught our eye
9 0.13033766 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network
10 0.12954229 34 fast ml-2013-07-14-Running things on a GPU
11 0.12803948 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet
12 0.12235162 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect
13 0.12198979 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint
14 0.12033345 3 fast ml-2012-09-01-Running Unix apps on Windows
15 0.11879335 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction
16 0.11756808 36 fast ml-2013-08-23-A bag of words and a nice little network
17 0.11724636 13 fast ml-2012-12-27-Spearmint with a random forest
18 0.11404555 16 fast ml-2013-01-12-Intro to random forests
19 0.111911 35 fast ml-2013-08-12-Accelerometer Biometric Competition
20 0.094170026 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview