fast_ml fast_ml-2013 fast_ml-2013-37 knowledge-graph by maker-knowledge-mining

37 fast ml-2013-09-03-Our followers and who else they follow


meta infos for this blog

Source: html

Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window.   Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past.   5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains!   4582 thuske thuske & 4442 ram Ram Ravichandran So


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. [sent-2, score-0.578]

2 Our followers This table show our 20 most popular followers as measeared by their follower count. [sent-5, score-0.673]

3 Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. [sent-8, score-0.099]

4 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! [sent-10, score-0.247]

5 4582 thuske thuske & 4442 ram Ram Ravichandran Software Research Engineer at Twitter. [sent-11, score-0.28]

6 Data crunchist @opendns   2651 jtoy jtoy I study intelligence, ponder consciousness, build machine learning systems, program in clojure/R/ruby/python, and start companies. [sent-15, score-0.281]

7 co/iU3TYN84O8   2231 bscofield Ben Scofield RailsConf Program Chair; Director of Ruby Central; mostly a nice guy   1863 johncandido John Candido Developing predictive algorithms for financial markets and sports analytics. [sent-20, score-0.18]

8 ESPN Insider contributor   1849 arnicas Lynn Cherny Data Consultant (Data Analysis/Mining/Graphical Data); Infovis, Python, R, Stats, Javascript, Gender, Science Fiction, TV, ex-researcher with Stanford PhD. [sent-22, score-0.148]

9 #b2b #saas #marketing #stanford   1492 medicalphysics medicalphysics Serious Medical Physics and Fun ! [sent-25, score-0.198]

10 1329 nikolaypavlov Nikolay Pavlov Dreamer, entrepreneur and simply curious man. [sent-27, score-0.099]

11 1196 GaelVaroquaux Gael Varoquaux Researcher and geek: science + Python + data = brain imaging + scikit-learn + machine learning + Mayavi + physics (& traveling… sailing… hiking…)   1085 treycausey Trey Causey Data scientist. [sent-29, score-0.582]

12 1026 statisticsblog Statistics Blog I blog about probability & statistics with a focus on Monte Carlo simulations & epistemology: http://t. [sent-33, score-0.284]

13 Who else they follow This table shows the most common friends of our followers, that is who they follow. [sent-41, score-0.198]

14 41% 5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains! [sent-50, score-0.247]

15 34% 374533 googleresearch Google Research At Google, research is performed company wide; we conduct and leverage research to build large-scale systems that are used in the real world. [sent-52, score-0.393]

16 27% 13938 peteskomoroch Peter Skomoroch My mission is to create intelligent systems that help people make better decisions. [sent-59, score-0.131]

17 27% 73453 CompSciFact Computer Science One fact per day from computer science M-F from @JohnDCook. [sent-62, score-0.629]

18 One tip per day M-F plus occasional other tweets. [sent-64, score-0.267]

19 26% 108989 coursera Coursera Learning without limits. [sent-66, score-0.123]

20 Sometimes I blog about research and playing with machine learning     20% 27247 ProbFact Probability Fact One probability fact per day M-F from @JohnDCook. [sent-74, score-0.568]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('science', 0.331), ('followers', 0.287), ('scientist', 0.205), ('mining', 0.205), ('http', 0.18), ('contributor', 0.148), ('systems', 0.131), ('research', 0.131), ('computer', 0.13), ('coursera', 0.123), ('ben', 0.108), ('blog', 0.108), ('python', 0.099), ('braaains', 0.099), ('clojure', 0.099), ('datageek', 0.099), ('director', 0.099), ('entrepreneur', 0.099), ('follow', 0.099), ('geek', 0.099), ('graham', 0.099), ('jtoy', 0.099), ('kdnuggets', 0.099), ('lorica', 0.099), ('medicalphysics', 0.099), ('nlproc', 0.099), ('occasional', 0.099), ('ogrisel', 0.099), ('olivier', 0.099), ('pankaj', 0.099), ('pig', 0.099), ('rstats', 0.099), ('stats', 0.099), ('table', 0.099), ('thuske', 0.099), ('stanford', 0.098), ('statistics', 0.098), ('making', 0.098), ('predictive', 0.098), ('data', 0.086), ('day', 0.084), ('per', 0.084), ('machine', 0.083), ('sports', 0.082), ('grisel', 0.082), ('engineer', 0.082), ('linked', 0.082), ('physics', 0.082), ('ram', 0.082), ('probability', 0.078)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 37 fast ml-2013-09-03-Our followers and who else they follow

Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window.   Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past.   5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains!   4582 thuske thuske & 4442 ram Ram Ravichandran So

2 0.19533667 15 fast ml-2013-01-07-Machine learning courses online

Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m

3 0.13928397 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

4 0.13892294 41 fast ml-2013-10-09-Big data made easy

Introduction: An overview of key points about big data. This post was inspired by a very good article about big data by Chris Stucchio (linked below). The article is about hype and technology. We hate the hype. Big data is hype Everybody talks about big data; nobody knows exactly what it is. That’s pretty much the definition of hype. Google Trends suggest that the term took off at the beginning of 2011 (and the searches are coming mainly from Asia, curiously). Now, to put things in context: Big data is right there (or maybe not quite yet?) with other slogans like web 2.0 , cloud computing and social media . In effect, big data is a generic term for: data science machine learning data mining predictive analytics and so on. Don’t believe us? What about James Goodnight, the CEO of SAS : The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing. Before cloud computing it was data warehousing or

5 0.10335237 22 fast ml-2013-03-07-Choosing a machine learning algorithm

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like

6 0.056763109 28 fast ml-2013-05-12-And deliver us from Weka

7 0.050540872 47 fast ml-2013-12-15-A-B testing with bayesian bandits in Google Analytics

8 0.05006652 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview

9 0.047261126 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

10 0.045499906 27 fast ml-2013-05-01-Deep learning made easy

11 0.044519115 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

12 0.043692231 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

13 0.041672055 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

14 0.041509125 6 fast ml-2012-09-25-Best Buy mobile contest - full disclosure

15 0.041016258 3 fast ml-2012-09-01-Running Unix apps on Windows

16 0.04037514 40 fast ml-2013-10-06-Pylearn2 in practice

17 0.040317737 38 fast ml-2013-09-09-Predicting solar energy from weather forecasts plus a NetCDF4 tutorial

18 0.040180355 5 fast ml-2012-09-19-Best Buy mobile contest - big data

19 0.040002149 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

20 0.038266771 10 fast ml-2012-11-17-The Facebook challenge HOWTO


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.174), (1, 0.1), (2, 0.233), (3, -0.098), (4, 0.122), (5, 0.18), (6, -0.053), (7, 0.187), (8, 0.417), (9, 0.179), (10, -0.215), (11, 0.094), (12, 0.027), (13, 0.057), (14, 0.124), (15, -0.019), (16, 0.043), (17, -0.09), (18, -0.056), (19, -0.009), (20, -0.06), (21, -0.045), (22, -0.154), (23, -0.021), (24, 0.157), (25, -0.003), (26, 0.192), (27, 0.083), (28, -0.099), (29, 0.077), (30, -0.046), (31, 0.112), (32, -0.121), (33, 0.008), (34, -0.062), (35, -0.018), (36, -0.039), (37, -0.014), (38, -0.201), (39, 0.029), (40, -0.162), (41, 0.108), (42, -0.177), (43, 0.181), (44, 0.117), (45, -0.071), (46, -0.053), (47, -0.157), (48, -0.102), (49, -0.014)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9742195 37 fast ml-2013-09-03-Our followers and who else they follow

Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window.   Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past.   5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains!   4582 thuske thuske & 4442 ram Ram Ravichandran So

2 0.43786439 15 fast ml-2013-01-07-Machine learning courses online

Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m

3 0.32187665 41 fast ml-2013-10-09-Big data made easy

Introduction: An overview of key points about big data. This post was inspired by a very good article about big data by Chris Stucchio (linked below). The article is about hype and technology. We hate the hype. Big data is hype Everybody talks about big data; nobody knows exactly what it is. That’s pretty much the definition of hype. Google Trends suggest that the term took off at the beginning of 2011 (and the searches are coming mainly from Asia, curiously). Now, to put things in context: Big data is right there (or maybe not quite yet?) with other slogans like web 2.0 , cloud computing and social media . In effect, big data is a generic term for: data science machine learning data mining predictive analytics and so on. Don’t believe us? What about James Goodnight, the CEO of SAS : The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing. Before cloud computing it was data warehousing or

4 0.30539009 22 fast ml-2013-03-07-Choosing a machine learning algorithm

Introduction: To celbrate the first 100 followers on Twitter, we asked them what would they like to read about here. One of the responders, Itamar Berger, suggested a topic: how to choose a ML algorithm for a task at hand. Well, what do we now? Three things come to mind: We’d try fast things first. In terms of speed, here’s how we imagine the order: linear models trees, that is bagged or boosted trees everything else* We’d use something we are comfortable with. Learning new things is very exciting, however we’d ask ourselves a question: do we want to learn a new technique, or do we want a result? We’d prefer something with fewer hyperparameters to set. More params means more tuning, that is training and re-training over and over, even if automatically. Random forests are hard to beat in this department. Linear models are pretty good too. A random forest scene, credit: Jonathan MacGregor *By “everything else” we mean “everything popular”, mostly things like

5 0.27515599 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

Introduction: On May 15th Yann LeCun answered “ask me anything” questions on Reddit . We hand-picked some of his thoughts and grouped them by topic for your enjoyment. Toronto, Montreal and New York All three groups are strong and complementary. Geoff (who spends more time at Google than in Toronto now) and Russ Salakhutdinov like RBMs and deep Boltzmann machines. I like the idea of Boltzmann machines (it’s a beautifully simple concept) but it doesn’t scale well. Also, I totally hate sampling. Yoshua and his colleagues have focused a lot on various unsupervised learning, including denoising auto-encoders, contracting auto-encoders. They are not allergic to sampling like I am. On the application side, they have worked on text, not so much on images. In our lab at NYU (Rob Fergus, David Sontag, me and our students and postdocs), we have been focusing on sparse auto-encoders for unsupervised learning. They have the advantage of scaling well. We have also worked on applications, mostly to v

6 0.14912881 28 fast ml-2013-05-12-And deliver us from Weka

7 0.11983273 60 fast ml-2014-04-30-Converting categorical data into numbers with Pandas and Scikit-learn

8 0.11446805 6 fast ml-2012-09-25-Best Buy mobile contest - full disclosure

9 0.11380965 23 fast ml-2013-03-18-Large scale L1 feature selection with Vowpal Wabbit

10 0.11347861 38 fast ml-2013-09-09-Predicting solar energy from weather forecasts plus a NetCDF4 tutorial

11 0.11297888 25 fast ml-2013-04-10-Gender discrimination

12 0.10339084 59 fast ml-2014-04-21-Predicting happiness from demographics and poll answers

13 0.10198293 43 fast ml-2013-11-02-Maxing out the digits

14 0.1005353 24 fast ml-2013-03-25-Dimensionality reduction for sparse binary data - an overview

15 0.098162882 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

16 0.096944436 29 fast ml-2013-05-25-More on sparse filtering and the Black Box competition

17 0.093103275 3 fast ml-2012-09-01-Running Unix apps on Windows

18 0.091894083 32 fast ml-2013-07-05-Processing large files, line by line

19 0.09031567 27 fast ml-2013-05-01-Deep learning made easy

20 0.081626445 21 fast ml-2013-02-27-Dimensionality reduction for sparse binary data


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.015), (30, 0.046), (31, 0.023), (69, 0.045), (71, 0.039), (78, 0.01), (79, 0.018), (84, 0.645), (92, 0.026), (99, 0.044)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96784127 37 fast ml-2013-09-03-Our followers and who else they follow

Introduction: Recently we hit 400 followers mark on Twitter. To celebrate we decided to do some data mining on you , specifically to discover who our followers are and who else they follow. For your viewing pleasure we packaged the results nicely with Bootstrap. Here’s some data science in action. Our followers This table show our 20 most popular followers as measeared by their follower count. The occasional question marks stand for non-ASCII characters. Each link opens a new window.   Followers Screen name Name Description 8685 pankaj Pankaj Gupta I lead the Personalization and Recommender Systems group at Twitter. Founded two startups in the past.   5070 ogrisel Olivier Grisel Datageek, contributor to scikit-learn, works with Python / Java / Clojure / Pig, interested in Machine Learning, NLProc, {Big|Linked|Open} Data and braaains!   4582 thuske thuske & 4442 ram Ram Ravichandran So

2 0.82811838 11 fast ml-2012-12-07-Predicting wine quality

Introduction: This post is as much about wine as it is about machine learning, so if you enjoy wine, like we do, you may find it especially interesting. Here’s some R and Matlab code , and if you want to get right to the point, skip to the charts . There’s a book by Philipp Janert called Data Analysis with Open Source Tools , which, by the way, we would recommend. From this book we found out about the wine quality datasets . There are two, one for red wine and one for white wine, and they are interesting because they contain quality ratings (1 - 10) for a few thousands of wines, along with their physical and chemical properties. We could probably use these properties to predict a rating for a wine. We’ll be looking at white and red wine separately for the reasons you will see shortly. Principal component analysis for white wine Janert performs a principal component analysis (PCA) and shows a resulting plot for white wine. What’s interesting about this plot is that judging by the first tw

3 0.26331872 15 fast ml-2013-01-07-Machine learning courses online

Introduction: How do you learn machine learning? A good way to begin is to take an online course. These courses started appearing towards the end of 2011, first from Stanford University, now from Coursera , Udacity , edX and other institutions. There are very many of them, including a few about machine learning. Here’s a list: Introduction to Artificial Intelligence by Sebastian Thrun and Peter Norvig. That was the first online class, and it contains two units on machine learning (units five and six). Both instructors work at Google. Sebastian Thrun is best known for building a self-driving car and Peter Norvig is a leading authority on AI, so they know what they are talking about. After the success of the class Sebastian Thrun quit Stanford to found Udacity, his online learning startup. Machine Learning by Andrew Ng. Again, one of the first classes, by Stanford professor who started Coursera, the best known online learning provider today. Andrew Ng is a world class authority on m

4 0.18451078 41 fast ml-2013-10-09-Big data made easy

Introduction: An overview of key points about big data. This post was inspired by a very good article about big data by Chris Stucchio (linked below). The article is about hype and technology. We hate the hype. Big data is hype Everybody talks about big data; nobody knows exactly what it is. That’s pretty much the definition of hype. Google Trends suggest that the term took off at the beginning of 2011 (and the searches are coming mainly from Asia, curiously). Now, to put things in context: Big data is right there (or maybe not quite yet?) with other slogans like web 2.0 , cloud computing and social media . In effect, big data is a generic term for: data science machine learning data mining predictive analytics and so on. Don’t believe us? What about James Goodnight, the CEO of SAS : The term big data is being used today because computer analysts and journalists got tired of writing about cloud computing. Before cloud computing it was data warehousing or

5 0.17356807 9 fast ml-2012-10-25-So you want to work for Facebook

Introduction: Good news, everyone! There’s a new contest on Kaggle - Facebook is looking for talent . They won’t pay, but just might interview. This post is in a way a bonus for active readers because most visitors of fastml.com originally come from Kaggle forums. For this competition the forums are disabled to encourage own work . To honor this, we won’t publish any code. But own work doesn’t mean original work , and we wouldn’t want to reinvent the wheel, would we? The contest differs substantially from a Kaggle stereotype, if there is such a thing, in three major ways: there’s no money prizes, as mentioned above it’s not a real world problem, but rather an assignment to screen job candidates (this has important consequences, described below) it’s not a typical machine learning project, but rather a broader AI exercise You are given a graph of the internet, actually a snapshot of the graph for each of 15 time steps. You are also given a bunch of paths in this graph, which a

6 0.16813244 62 fast ml-2014-05-26-Yann LeCun's answers from the Reddit AMA

7 0.14399756 53 fast ml-2014-02-20-Are stocks predictable?

8 0.13571264 46 fast ml-2013-12-07-13 NIPS papers that caught our eye

9 0.13033766 49 fast ml-2014-01-10-Classifying images with a pre-trained deep network

10 0.12954229 34 fast ml-2013-07-14-Running things on a GPU

11 0.12803948 45 fast ml-2013-11-27-Object recognition in images with cuda-convnet

12 0.12235162 48 fast ml-2013-12-28-Regularizing neural networks with dropout and with DropConnect

13 0.12198979 12 fast ml-2012-12-21-Tuning hyperparams automatically with Spearmint

14 0.12033345 3 fast ml-2012-09-01-Running Unix apps on Windows

15 0.11879335 55 fast ml-2014-03-20-Good representations, distance, metric learning and supervised dimensionality reduction

16 0.11756808 36 fast ml-2013-08-23-A bag of words and a nice little network

17 0.11724636 13 fast ml-2012-12-27-Spearmint with a random forest

18 0.11404555 16 fast ml-2013-01-12-Intro to random forests

19 0.111911 35 fast ml-2013-08-12-Accelerometer Biometric Competition

20 0.094170026 57 fast ml-2014-04-01-Exclusive Geoff Hinton interview