hilary_mason_data hilary_mason_data-2012 hilary_mason_data-2012-80 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
sentIndex sentText sentNum sentScore
1 Here’s what I usually tell them: The best way to get started in data science is to DO data science! [sent-2, score-1.239]
2 First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . [sent-3, score-0.792]
3 Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. [sent-4, score-0.116]
4 Then figure out which one of these you’re best at, and pick a project which shows off your abilities. [sent-5, score-0.607]
5 If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. [sent-7, score-0.296]
6 Look for groups, like DataKind , that need data skills put to work for good. [sent-8, score-0.538]
7 No matter how much of a beginner you might be, your enthusiasm will be appreciated, you’ll learn things, and you’ll meet great people. [sent-9, score-0.306]
8 And if you can’t find a physical meetup close to you, start one, or join the twitter discussion . [sent-10, score-0.55]
9 Explain why you thought the question was interesting, where you got the data (and good data is everywhere), and how you came to a conclusion. [sent-13, score-0.615]
10 A couple examples of data projects motivated by nothing more than the author’s curiosity are Yvo’s TechCrunch analysis and Drew and John’s Ranking the Popularity of Programming Languages . [sent-15, score-0.71]
wordName wordTfidf (topN-words)
[('started', 0.26), ('data', 0.258), ('science', 0.207), ('advice', 0.197), ('scientists', 0.197), ('figure', 0.184), ('put', 0.173), ('start', 0.15), ('get', 0.147), ('projects', 0.146), ('re', 0.135), ('share', 0.12), ('meetup', 0.119), ('groups', 0.119), ('lately', 0.119), ('stay', 0.119), ('appreciated', 0.119), ('engineer', 0.119), ('enthusiasm', 0.119), ('fundamentally', 0.119), ('project', 0.116), ('best', 0.109), ('examples', 0.107), ('skills', 0.107), ('popularity', 0.107), ('december', 0.107), ('techcrunch', 0.107), ('john', 0.107), ('nothing', 0.107), ('smart', 0.107), ('physical', 0.099), ('shows', 0.099), ('pick', 0.099), ('messages', 0.099), ('programming', 0.099), ('came', 0.099), ('math', 0.099), ('matter', 0.099), ('drew', 0.099), ('datascience', 0.099), ('things', 0.095), ('couple', 0.092), ('join', 0.092), ('github', 0.092), ('find', 0.09), ('great', 0.088), ('quite', 0.087), ('events', 0.087), ('third', 0.087), ('hacking', 0.087)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 80 hilary mason data-2012-12-28-Getting Started with Data Science
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
2 0.2021427 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
Introduction: Interview Questions for Data Scientists Posted: January 3, 2013 | Author: Hilary Mason | Filed under: blog | Tags: datascience , hiring , startups | 28 Comments » Great data scientists come from such diverse backgrounds that it can be difficult to get a sense of whether someone is up to the job in just a short interview. In addition to the technical questions, I find it useful to have a few questions that draw out the more creative and less discrete elements of a candidate’s personality. Here are a few of my favorite questions. What was the last thing that you made for fun? This is my favorite question by far — I want to work with the kind of people who don’t turn their brains off when they go home. It’s also a great way to learn what gets people excited. What’s your favorite algorithm? Can you explain it to me? I don’t know any data scientists who haven’t fallen in love with an algorithm, and I want to see both that enthusiasm and that the ca
3 0.2011269 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
4 0.17375736 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
Introduction: DataGotham: The Empire State of Data Posted: August 22, 2012 | Author: Hilary Mason | Filed under: blog , projects | 2 Comments » I’m extremely excited about DataGotham , a conference that I’m co-hosting with friends and fellow New York data nerds Drew , John , and Mike . DataGotham is a celebration of the NYC data community, and will bring together professionals from all industries in New York that are built around data, from finance to fashion and from startups to the Fortune 500 and government. The event is September 13th – 14th at NYU, with tutorials and The Great Data Extravaganza Show (with cocktails!) at the Tribeca Rooftop Thursday evening, and a single track conference Friday. Our speakers and sponsors are all amazing. You can register now . While DataGotham is definitely a labor of love, there are numerous reasons to do it. I believe that New York has a distinct data philosophy — the study of human behavior — that is unique and should be cel
5 0.15372322 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
Introduction: A (short) List of Data Science Blogs Posted: February 25, 2013 | Author: Hilary Mason | Filed under: blog | Tags: blogs , data science | 7 Comments » I’m gathering a bundle of data science blogs to share. I’m looking to include blogs that update regularly and aren’t either personal opinion and project blogs (like this one) or primarily about marketing any particular company. Let me know if you have a favorite that I’ve forgotten. If you’re just looking for one place to start, hop on over to Simply Statistics .
6 0.14781605 78 hilary mason data-2012-09-21-Help, I’m the first data scientist at my company!
7 0.14694783 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
8 0.14600739 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
9 0.13399224 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
10 0.12857842 106 hilary mason data-2013-08-12-DataGotham 2013 is coming!
11 0.12817833 108 hilary mason data-2013-09-26-Learn to Code, Learn to Think
12 0.12292115 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
13 0.12159208 116 hilary mason data-2014-04-09-Come speak at DataGotham 2014!
14 0.10964123 49 hilary mason data-2010-11-10-Machine Learning: A Love Story
15 0.10935876 42 hilary mason data-2010-04-18-Stop talking, start coding
16 0.10569495 99 hilary mason data-2013-04-01-Data Engineering
17 0.10508359 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
18 0.096387304 102 hilary mason data-2013-05-03-Speaking: Explaining Technical Information to a Mixed Audience
19 0.094091237 111 hilary mason data-2013-10-22-The DataGotham 2013 Videos are up!
20 0.093243696 112 hilary mason data-2013-11-01-Books Recommendations for Programming Excellence
topicId topicWeight
[(0, -0.394), (1, -0.175), (2, 0.031), (3, -0.021), (4, 0.015), (5, 0.225), (6, 0.089), (7, -0.023), (8, 0.018), (9, 0.032), (10, -0.127), (11, 0.019), (12, 0.063), (13, 0.007), (14, 0.154), (15, -0.078), (16, 0.128), (17, 0.08), (18, -0.206), (19, 0.146), (20, 0.15), (21, 0.037), (22, -0.009), (23, 0.046), (24, -0.1), (25, -0.117), (26, -0.146), (27, 0.033), (28, 0.088), (29, -0.053), (30, -0.144), (31, -0.009), (32, 0.022), (33, -0.027), (34, 0.032), (35, 0.067), (36, -0.062), (37, 0.102), (38, -0.058), (39, -0.007), (40, 0.025), (41, -0.025), (42, 0.037), (43, -0.031), (44, -0.083), (45, 0.001), (46, -0.073), (47, -0.025), (48, -0.056), (49, 0.017)]
simIndex simValue blogId blogTitle
same-blog 1 0.97674865 80 hilary mason data-2012-12-28-Getting Started with Data Science
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
2 0.68546277 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
3 0.63041013 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
Introduction: Interview Questions for Data Scientists Posted: January 3, 2013 | Author: Hilary Mason | Filed under: blog | Tags: datascience , hiring , startups | 28 Comments » Great data scientists come from such diverse backgrounds that it can be difficult to get a sense of whether someone is up to the job in just a short interview. In addition to the technical questions, I find it useful to have a few questions that draw out the more creative and less discrete elements of a candidate’s personality. Here are a few of my favorite questions. What was the last thing that you made for fun? This is my favorite question by far — I want to work with the kind of people who don’t turn their brains off when they go home. It’s also a great way to learn what gets people excited. What’s your favorite algorithm? Can you explain it to me? I don’t know any data scientists who haven’t fallen in love with an algorithm, and I want to see both that enthusiasm and that the ca
4 0.61592251 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
Introduction: A (short) List of Data Science Blogs Posted: February 25, 2013 | Author: Hilary Mason | Filed under: blog | Tags: blogs , data science | 7 Comments » I’m gathering a bundle of data science blogs to share. I’m looking to include blogs that update regularly and aren’t either personal opinion and project blogs (like this one) or primarily about marketing any particular company. Let me know if you have a favorite that I’ve forgotten. If you’re just looking for one place to start, hop on over to Simply Statistics .
5 0.52555734 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
Introduction: DataGotham: The Empire State of Data Posted: August 22, 2012 | Author: Hilary Mason | Filed under: blog , projects | 2 Comments » I’m extremely excited about DataGotham , a conference that I’m co-hosting with friends and fellow New York data nerds Drew , John , and Mike . DataGotham is a celebration of the NYC data community, and will bring together professionals from all industries in New York that are built around data, from finance to fashion and from startups to the Fortune 500 and government. The event is September 13th – 14th at NYU, with tutorials and The Great Data Extravaganza Show (with cocktails!) at the Tribeca Rooftop Thursday evening, and a single track conference Friday. Our speakers and sponsors are all amazing. You can register now . While DataGotham is definitely a labor of love, there are numerous reasons to do it. I believe that New York has a distinct data philosophy — the study of human behavior — that is unique and should be cel
6 0.52464348 78 hilary mason data-2012-09-21-Help, I’m the first data scientist at my company!
7 0.49624181 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
8 0.45030913 99 hilary mason data-2013-04-01-Data Engineering
9 0.44047627 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
10 0.42805624 106 hilary mason data-2013-08-12-DataGotham 2013 is coming!
11 0.42628804 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
12 0.40202957 116 hilary mason data-2014-04-09-Come speak at DataGotham 2014!
13 0.39942318 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious
14 0.39344853 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
15 0.39093789 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data
16 0.38353503 108 hilary mason data-2013-09-26-Learn to Code, Learn to Think
17 0.36350703 30 hilary mason data-2009-06-01-My Barcamp Presentation: Have Data? What Now?!
18 0.36205792 115 hilary mason data-2014-02-14-Play with your food!
19 0.34534043 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
20 0.32585061 49 hilary mason data-2010-11-10-Machine Learning: A Love Story
topicId topicWeight
[(2, 0.144), (11, 0.041), (17, 0.485), (56, 0.183), (63, 0.042)]
simIndex simValue blogId blogTitle
same-blog 1 0.89763707 80 hilary mason data-2012-12-28-Getting Started with Data Science
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
2 0.40900186 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to
3 0.38829738 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
Introduction: Startups: Why to Share Data with Academics Posted: January 28, 2013 | Author: Hilary Mason | Filed under: blog | 5 Comments » Last week I wrote a bit about how to share data with academics . This is the complimentary piece, on why you should invest the time and energy in sharing your data with the academic community. As I was talking to people about this topic it became clear that there are really two different questions people ask. First, why do this at all? And second, what do I tell my boss? Let’s start with the second one. This is what you should tell your boss: Academic research based on our work is a great press opportunity and demonstrates that credible people outside of our company find our work interesting. Having researchers work on our data is an easy way to access highly educated brainpower, for free, that in no way competes with us. Who knows what interesting stuff they’ll come up with? Personal relationships with university faculty ar
4 0.38492537 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.
Introduction: Need actual random numbers? Meet the NIST randomness beacon. Posted: September 30, 2013 | Author: Hilary Mason | Filed under: projects | Tags: beacon , python , random , randomness , randomnumbers | 5 Comments » I wrote a python module that wraps that NIST Randomness Beacon , making it simple to get truly random numbers in python. It’s easy to use: b = Beacon() print b.last_record() print b.previous_record() #and so on There’s also a handy generator for getting a set of n random numbers. (One of the best gifts I ever got was a copy of 1,000,000 Random Numbers , and I’ve been intrigued ever since.) Please note that this the randomness beacon is not intended to be a source of cryptographic keys — indeed, it’s a public set of numbers, so I wouldn’t recommend doing anything that could be compromised by someone else having the access to the exact same set of numbers . Rather, this is interesting precisely for the scientific opportunities that
5 0.37249869 58 hilary mason data-2011-06-22-My Head is Open Source!
Introduction: My Head is Open Source! Posted: June 22, 2011 | Author: Hilary Mason | Filed under: blog | Tags: 3d , makerbot , opensource | 8 Comments » Last night I visited friends at Makerbot , where artist-in-residence Jonathan Monaghan scanned my head with a high-resolution laser scanner. The model is available on Thingiverse and can be printed on your friendly neighborhood makerbot or other 3d printer. There are lots of other awesome models of people and things to play with, including Stephen Colbert’s head . I look forward to the emergence of plastic clone head armies! Edit: Please note: thanks for asking, but brains are not included.
6 0.36150211 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
7 0.35675287 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
8 0.35392871 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas
9 0.34004703 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
10 0.32902905 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
11 0.32658491 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
12 0.32078063 82 hilary mason data-2013-01-08-Bitly Social Data APIs
13 0.3154875 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
14 0.31263074 83 hilary mason data-2013-01-10-Book Book — Goose!
15 0.31109393 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
16 0.31051618 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
17 0.30939639 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
18 0.29913861 116 hilary mason data-2014-04-09-Come speak at DataGotham 2014!
19 0.29731238 90 hilary mason data-2013-02-18-One Random Tweet, please.
20 0.29069245 106 hilary mason data-2013-08-12-DataGotham 2013 is coming!