hilary_mason_data hilary_mason_data-2013 hilary_mason_data-2013-84 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
sentIndex sentText sentNum sentScore
1 Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. [sent-2, score-1.376]
2 I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. [sent-3, score-2.791]
3 The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. [sent-4, score-1.2]
4 (I’ve shared the bundle before, but this post can act as unofficial homepage for it. [sent-7, score-0.995]
wordName wordTfidf (topN-words)
[('bundle', 0.357), ('data', 0.302), ('sets', 0.274), ('buttons', 0.198), ('homepage', 0.198), ('act', 0.198), ('quality', 0.198), ('includes', 0.198), ('collect', 0.198), ('item', 0.198), ('spam', 0.179), ('diverse', 0.179), ('scientists', 0.165), ('need', 0.163), ('shared', 0.154), ('dataset', 0.154), ('put', 0.145), ('exciting', 0.145), ('research', 0.125), ('hard', 0.125), ('start', 0.125), ('add', 0.12), ('bitly', 0.12), ('know', 0.114), ('useful', 0.111), ('together', 0.107), ('social', 0.107), ('possible', 0.103), ('media', 0.103), ('something', 0.094), ('many', 0.091), ('post', 0.088), ('list', 0.086), ('january', 0.086), ('let', 0.086), ('one', 0.084), ('good', 0.084), ('projects', 0.081), ('things', 0.079), ('ve', 0.059), ('mason', 0.023), ('comments', 0.015), ('tags', 0.009)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
2 0.2011269 80 hilary mason data-2012-12-28-Getting Started with Data Science
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
3 0.19978884 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
Introduction: Startups: How to Share Data with Academics Posted: January 19, 2013 | Author: Hilary Mason | Filed under: blog | Tags: academics , data , research | 8 Comments » This post assumes that you want to share data. If you’re not convinced, don’t worry — that’s next on my list. You and your academic colleagues will benefit from having at least a quick chat about the research questions they want to address. I’ve read every paper I’ve been able to find that uses bitly data and all of the ones that acquired the data without our assistance had serious flaws, generally based on incorrect assumptions about the data they had acquired (this, unfortunately, makes me question the validity of most research done on commercial social data without cooperation from the subject company). The easiest way to share data is through your own API . Set generous rate limits where possible. Most projects are not realtime and they can gather the data (or, more likely, have a grad
4 0.14169377 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
Introduction: Interview Questions for Data Scientists Posted: January 3, 2013 | Author: Hilary Mason | Filed under: blog | Tags: datascience , hiring , startups | 28 Comments » Great data scientists come from such diverse backgrounds that it can be difficult to get a sense of whether someone is up to the job in just a short interview. In addition to the technical questions, I find it useful to have a few questions that draw out the more creative and less discrete elements of a candidate’s personality. Here are a few of my favorite questions. What was the last thing that you made for fun? This is my favorite question by far — I want to work with the kind of people who don’t turn their brains off when they go home. It’s also a great way to learn what gets people excited. What’s your favorite algorithm? Can you explain it to me? I don’t know any data scientists who haven’t fallen in love with an algorithm, and I want to see both that enthusiasm and that the ca
5 0.1364807 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
Introduction: A (short) List of Data Science Blogs Posted: February 25, 2013 | Author: Hilary Mason | Filed under: blog | Tags: blogs , data science | 7 Comments » I’m gathering a bundle of data science blogs to share. I’m looking to include blogs that update regularly and aren’t either personal opinion and project blogs (like this one) or primarily about marketing any particular company. Let me know if you have a favorite that I’ve forgotten. If you’re just looking for one place to start, hop on over to Simply Statistics .
6 0.13155487 112 hilary mason data-2013-11-01-Books Recommendations for Programming Excellence
7 0.11836015 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
8 0.10658088 82 hilary mason data-2013-01-08-Bitly Social Data APIs
9 0.10550065 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
10 0.10395957 99 hilary mason data-2013-04-01-Data Engineering
11 0.087835528 115 hilary mason data-2014-02-14-Play with your food!
12 0.083849125 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
13 0.081057847 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
14 0.080658413 76 hilary mason data-2012-08-28-How do you prioritize research?
15 0.076588735 78 hilary mason data-2012-09-21-Help, I’m the first data scientist at my company!
16 0.075281195 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
17 0.070220143 33 hilary mason data-2009-10-03-Hadoop World NYC
18 0.068079948 104 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk
19 0.067861132 42 hilary mason data-2010-04-18-Stop talking, start coding
20 0.06694527 32 hilary mason data-2009-08-29-Do you do human subject research?
topicId topicWeight
[(0, -0.27), (1, -0.13), (2, -0.064), (3, -0.102), (4, -0.08), (5, 0.278), (6, 0.104), (7, -0.162), (8, 0.091), (9, -0.013), (10, -0.015), (11, 0.035), (12, -0.051), (13, 0.059), (14, 0.016), (15, -0.043), (16, 0.128), (17, -0.087), (18, -0.067), (19, 0.06), (20, 0.188), (21, -0.009), (22, 0.16), (23, -0.161), (24, 0.038), (25, -0.122), (26, -0.094), (27, 0.012), (28, 0.139), (29, -0.029), (30, -0.058), (31, -0.077), (32, -0.064), (33, -0.021), (34, -0.025), (35, 0.044), (36, -0.066), (37, 0.066), (38, 0.016), (39, -0.102), (40, -0.13), (41, -0.098), (42, -0.051), (43, 0.001), (44, -0.108), (45, 0.013), (46, 0.054), (47, 0.138), (48, 0.075), (49, -0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.96821135 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
2 0.67104381 80 hilary mason data-2012-12-28-Getting Started with Data Science
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
3 0.66675049 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
Introduction: A (short) List of Data Science Blogs Posted: February 25, 2013 | Author: Hilary Mason | Filed under: blog | Tags: blogs , data science | 7 Comments » I’m gathering a bundle of data science blogs to share. I’m looking to include blogs that update regularly and aren’t either personal opinion and project blogs (like this one) or primarily about marketing any particular company. Let me know if you have a favorite that I’ve forgotten. If you’re just looking for one place to start, hop on over to Simply Statistics .
4 0.57653767 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
Introduction: Startups: How to Share Data with Academics Posted: January 19, 2013 | Author: Hilary Mason | Filed under: blog | Tags: academics , data , research | 8 Comments » This post assumes that you want to share data. If you’re not convinced, don’t worry — that’s next on my list. You and your academic colleagues will benefit from having at least a quick chat about the research questions they want to address. I’ve read every paper I’ve been able to find that uses bitly data and all of the ones that acquired the data without our assistance had serious flaws, generally based on incorrect assumptions about the data they had acquired (this, unfortunately, makes me question the validity of most research done on commercial social data without cooperation from the subject company). The easiest way to share data is through your own API . Set generous rate limits where possible. Most projects are not realtime and they can gather the data (or, more likely, have a grad
5 0.51163411 99 hilary mason data-2013-04-01-Data Engineering
Introduction: Data Engineering Posted: April 1, 2013 | Author: Hilary Mason | Filed under: blog | Tags: bitly , data , engineering , infrastructure | 5 Comments » Data engineering is when the architecture of your system is dependent on characteristics of the data flowing through that system . It requires a different kind of engineering process than typical systems engineering, because you have to do some work upfront to understand the nature of the data before you can effectively begin to design the infrastructure. Most data engineering systems also transform the data as they process it. Developing these types of systems requires an initial research phase, where you do the necessary work to understand the characteristics of the data, before you design the system (and perhaps even requiring an active experimental process where you try multiple infrastructure options in the wild before making a final decision). I’ve seen numerous people run straight into walls when
6 0.46444684 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
7 0.44959342 82 hilary mason data-2013-01-08-Bitly Social Data APIs
8 0.42133307 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
9 0.40780634 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
10 0.40675604 115 hilary mason data-2014-02-14-Play with your food!
11 0.36415344 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
12 0.3579427 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
13 0.34945405 78 hilary mason data-2012-09-21-Help, I’m the first data scientist at my company!
14 0.33797944 112 hilary mason data-2013-11-01-Books Recommendations for Programming Excellence
15 0.29770845 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious
16 0.28242078 29 hilary mason data-2009-05-07-I’m on Jon Udell’s Interviews with Innovators!
17 0.27391419 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data
18 0.26870167 30 hilary mason data-2009-06-01-My Barcamp Presentation: Have Data? What Now?!
19 0.26860502 104 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk
20 0.26364425 42 hilary mason data-2010-04-18-Stop talking, start coding
topicId topicWeight
[(2, 0.123), (56, 0.147), (96, 0.589)]
simIndex simValue blogId blogTitle
same-blog 1 0.93456453 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
2 0.29126635 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to
3 0.27530587 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
Introduction: Startups: Why to Share Data with Academics Posted: January 28, 2013 | Author: Hilary Mason | Filed under: blog | 5 Comments » Last week I wrote a bit about how to share data with academics . This is the complimentary piece, on why you should invest the time and energy in sharing your data with the academic community. As I was talking to people about this topic it became clear that there are really two different questions people ask. First, why do this at all? And second, what do I tell my boss? Let’s start with the second one. This is what you should tell your boss: Academic research based on our work is a great press opportunity and demonstrates that credible people outside of our company find our work interesting. Having researchers work on our data is an easy way to access highly educated brainpower, for free, that in no way competes with us. Who knows what interesting stuff they’ll come up with? Personal relationships with university faculty ar
4 0.27301925 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.
Introduction: Need actual random numbers? Meet the NIST randomness beacon. Posted: September 30, 2013 | Author: Hilary Mason | Filed under: projects | Tags: beacon , python , random , randomness , randomnumbers | 5 Comments » I wrote a python module that wraps that NIST Randomness Beacon , making it simple to get truly random numbers in python. It’s easy to use: b = Beacon() print b.last_record() print b.previous_record() #and so on There’s also a handy generator for getting a set of n random numbers. (One of the best gifts I ever got was a copy of 1,000,000 Random Numbers , and I’ve been intrigued ever since.) Please note that this the randomness beacon is not intended to be a source of cryptographic keys — indeed, it’s a public set of numbers, so I wouldn’t recommend doing anything that could be compromised by someone else having the access to the exact same set of numbers . Rather, this is interesting precisely for the scientific opportunities that
5 0.26364914 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
Introduction: A (short) List of Data Science Blogs Posted: February 25, 2013 | Author: Hilary Mason | Filed under: blog | Tags: blogs , data science | 7 Comments » I’m gathering a bundle of data science blogs to share. I’m looking to include blogs that update regularly and aren’t either personal opinion and project blogs (like this one) or primarily about marketing any particular company. Let me know if you have a favorite that I’ve forgotten. If you’re just looking for one place to start, hop on over to Simply Statistics .
6 0.26332751 58 hilary mason data-2011-06-22-My Head is Open Source!
7 0.25657365 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
8 0.24937218 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas
9 0.24886839 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
10 0.24430764 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
11 0.23758927 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
12 0.23746578 80 hilary mason data-2012-12-28-Getting Started with Data Science
13 0.23058629 33 hilary mason data-2009-10-03-Hadoop World NYC
14 0.22971609 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
15 0.22833848 82 hilary mason data-2013-01-08-Bitly Social Data APIs
16 0.22458844 83 hilary mason data-2013-01-10-Book Book — Goose!
17 0.22417018 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
18 0.22143707 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
19 0.22128549 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
20 0.22069961 13 hilary mason data-2008-01-22-Create a group Twitter account