hilary_mason_data hilary_mason_data-2013 hilary_mason_data-2013-89 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Experimenting With Physical Graphs Posted: February 4, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » I ended up at NYC Resistor on Sunday, and decided to experiment with physical visualization of some data. I grabbed the clicks per second on keyphrases including my name (“hilary mason”) over the last six months, aggregated them by day, and made this graph: This is easy enough to construct for any phrase using the clickrate data that we’re calculating at bitly . I exported it from matplotlib in svg, added a label, and used the laser-cutter to create this out of plywood: laser-cut time series …which will shortly be adorning my desk at work. This is very simple, but there’s a lot of fun to be had with the physical manifestation of patterns we see in large amount of ephemeral data.
sentIndex sentText sentNum sentScore
1 Experimenting With Physical Graphs Posted: February 4, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » I ended up at NYC Resistor on Sunday, and decided to experiment with physical visualization of some data. [sent-1, score-1.174]
2 I grabbed the clicks per second on keyphrases including my name (“hilary mason”) over the last six months, aggregated them by day, and made this graph: This is easy enough to construct for any phrase using the clickrate data that we’re calculating at bitly . [sent-2, score-2.012]
3 I exported it from matplotlib in svg, added a label, and used the laser-cutter to create this out of plywood: laser-cut time series …which will shortly be adorning my desk at work. [sent-3, score-0.753]
4 This is very simple, but there’s a lot of fun to be had with the physical manifestation of patterns we see in large amount of ephemeral data. [sent-4, score-1.194]
wordName wordTfidf (topN-words)
[('physical', 0.483), ('six', 0.194), ('visualization', 0.194), ('construct', 0.194), ('added', 0.194), ('aggregated', 0.194), ('graph', 0.175), ('ended', 0.175), ('label', 0.175), ('shortly', 0.175), ('resistor', 0.175), ('experiment', 0.175), ('patterns', 0.175), ('months', 0.15), ('amount', 0.15), ('phrase', 0.142), ('large', 0.142), ('clicks', 0.142), ('enough', 0.134), ('decided', 0.134), ('day', 0.128), ('used', 0.122), ('per', 0.122), ('including', 0.117), ('bitly', 0.117), ('nyc', 0.113), ('simple', 0.108), ('series', 0.108), ('made', 0.108), ('lot', 0.108), ('name', 0.105), ('february', 0.098), ('easy', 0.098), ('last', 0.095), ('create', 0.092), ('data', 0.084), ('using', 0.084), ('second', 0.082), ('fun', 0.079), ('re', 0.073), ('time', 0.062), ('see', 0.057), ('mason', 0.046), ('comments', 0.015), ('blog', 0.013)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 89 hilary mason data-2013-02-04-Experimenting With Physical Graphs
Introduction: Experimenting With Physical Graphs Posted: February 4, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » I ended up at NYC Resistor on Sunday, and decided to experiment with physical visualization of some data. I grabbed the clicks per second on keyphrases including my name (“hilary mason”) over the last six months, aggregated them by day, and made this graph: This is easy enough to construct for any phrase using the clickrate data that we’re calculating at bitly . I exported it from matplotlib in svg, added a label, and used the laser-cutter to create this out of plywood: laser-cut time series …which will shortly be adorning my desk at work. This is very simple, but there’s a lot of fun to be had with the physical manifestation of patterns we see in large amount of ephemeral data.
2 0.089141242 82 hilary mason data-2013-01-08-Bitly Social Data APIs
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
3 0.089041576 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious
Introduction: Web 2.0 Summit: The Secrets of our Data Subconscious Posted: October 21, 2011 | Author: Hilary Mason | Filed under: Presentations | Tags: conference , data , web2summit | 1 Comment Âť I just got home from the Web 2.0 Summit , a three-day conference that was packed with announcements, interesting ideas, and good conversations. My short talk, The Secrets of our Data Subconscious , touches on how the data we generate online interactions with the physical world spatially and through time, and on the relationships between the things we consume (in private) and the things we broadcast (in public).
4 0.084531009 80 hilary mason data-2012-12-28-Getting Started with Data Science
Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner
5 0.083295777 27 hilary mason data-2009-04-02-From the ACM: Learning More About Active Learning
Introduction: From the ACM: Learning More About Active Learning Posted: April 2, 2009 | Author: hilary | Filed under: blog | Tags: acm , active learning , machine learning | 2 Comments Âť The April edition of Communications of the ACM has an interesting article on recent advances in active learning by Graeme Stemp-Morlock. In passive learning (a more traditional approach), you build a large training set of classified data by (often) manually assigning labels. This data is used as the basis of your analysis. In the real world, we find that generating these large sets of labeled data is often expensive and time consuming. With active learning , you identify the most ambiguous data to label, resulting in a much higher payoff for each label defined (and fewer headaches for your labelers). The article goes on to mention that active learning is being used in practice with excellent results (for example in music identification, text classification, and even bioinfo
6 0.073837884 48 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails
7 0.071188211 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
8 0.063635379 33 hilary mason data-2009-10-03-Hadoop World NYC
9 0.06326966 9 hilary mason data-2007-08-27-Second Life Community Convention
10 0.062276363 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
11 0.060001876 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
12 0.059164029 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
13 0.058653608 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
14 0.053844448 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
15 0.052348375 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
16 0.052177075 36 hilary mason data-2009-11-10-My code is on TV (and so am I)!
17 0.04869578 3 hilary mason data-2007-06-08-The Best Time to Search for Academic Jobs
18 0.044218078 53 hilary mason data-2011-03-11-Conference: PyCon 2011 Keynote!
19 0.043371879 5 hilary mason data-2007-07-17-Where the Sun Rises… in Second Life
20 0.043281045 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
topicId topicWeight
[(0, -0.172), (1, -0.032), (2, -0.02), (3, -0.002), (4, -0.063), (5, 0.016), (6, 0.014), (7, -0.073), (8, -0.013), (9, -0.125), (10, 0.076), (11, -0.078), (12, 0.011), (13, 0.137), (14, -0.081), (15, -0.023), (16, 0.042), (17, 0.029), (18, -0.088), (19, 0.107), (20, 0.126), (21, 0.023), (22, -0.143), (23, -0.082), (24, -0.283), (25, 0.05), (26, 0.098), (27, 0.037), (28, -0.025), (29, -0.244), (30, 0.138), (31, 0.02), (32, 0.078), (33, -0.214), (34, -0.161), (35, 0.097), (36, -0.03), (37, -0.112), (38, -0.096), (39, 0.113), (40, 0.014), (41, 0.223), (42, -0.065), (43, -0.115), (44, -0.057), (45, 0.239), (46, 0.016), (47, 0.045), (48, -0.062), (49, 0.027)]
simIndex simValue blogId blogTitle
same-blog 1 0.98398137 89 hilary mason data-2013-02-04-Experimenting With Physical Graphs
Introduction: Experimenting With Physical Graphs Posted: February 4, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » I ended up at NYC Resistor on Sunday, and decided to experiment with physical visualization of some data. I grabbed the clicks per second on keyphrases including my name (“hilary mason”) over the last six months, aggregated them by day, and made this graph: This is easy enough to construct for any phrase using the clickrate data that we’re calculating at bitly . I exported it from matplotlib in svg, added a label, and used the laser-cutter to create this out of plywood: laser-cut time series …which will shortly be adorning my desk at work. This is very simple, but there’s a lot of fun to be had with the physical manifestation of patterns we see in large amount of ephemeral data.
2 0.4223083 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious
Introduction: Web 2.0 Summit: The Secrets of our Data Subconscious Posted: October 21, 2011 | Author: Hilary Mason | Filed under: Presentations | Tags: conference , data , web2summit | 1 Comment Âť I just got home from the Web 2.0 Summit , a three-day conference that was packed with announcements, interesting ideas, and good conversations. My short talk, The Secrets of our Data Subconscious , touches on how the data we generate online interactions with the physical world spatially and through time, and on the relationships between the things we consume (in private) and the things we broadcast (in public).
3 0.27724412 27 hilary mason data-2009-04-02-From the ACM: Learning More About Active Learning
Introduction: From the ACM: Learning More About Active Learning Posted: April 2, 2009 | Author: hilary | Filed under: blog | Tags: acm , active learning , machine learning | 2 Comments Âť The April edition of Communications of the ACM has an interesting article on recent advances in active learning by Graeme Stemp-Morlock. In passive learning (a more traditional approach), you build a large training set of classified data by (often) manually assigning labels. This data is used as the basis of your analysis. In the real world, we find that generating these large sets of labeled data is often expensive and time consuming. With active learning , you identify the most ambiguous data to label, resulting in a much higher payoff for each label defined (and fewer headaches for your labelers). The article goes on to mention that active learning is being used in practice with excellent results (for example in music identification, text classification, and even bioinfo
4 0.2557992 82 hilary mason data-2013-01-08-Bitly Social Data APIs
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
5 0.25127661 48 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails
Introduction: Twitter Succeeds Because it Fails Posted: September 4, 2010 | Author: hilary | Filed under: blog | Tags: failure , twitter | 9 Comments » How can twitter be so popular and successful if it’s down all the time ? We base statements like this on the assumption that quality of a web application maps linearly to the application’s stability. This is obviously true for most sites most of the time, but things get interesting at the edge where rare, unpredictable failure actually enables more complex human interactions around the service. Unlike e-mail, twitter etiquette doesn’t demand that you read or reply to every message from every person you follow (or who follows you). Combine that lightweight social touch with occasional technical issues and human communication patterns, and we start to see some interesting behavior. Twitter’s lack of reliability as a platform allows us to use the technical failings to mask our own social imperfections . How often have
6 0.24902566 5 hilary mason data-2007-07-17-Where the Sun Rises… in Second Life
7 0.24259381 9 hilary mason data-2007-08-27-Second Life Community Convention
8 0.23353849 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
9 0.22978479 80 hilary mason data-2012-12-28-Getting Started with Data Science
10 0.22893262 67 hilary mason data-2011-10-31-Happy Halloween
11 0.22394784 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
12 0.21951327 36 hilary mason data-2009-11-10-My code is on TV (and so am I)!
13 0.21941859 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
14 0.21092831 107 hilary mason data-2013-08-31-In Search of the Optimal … Cheeseburger
15 0.20424412 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
16 0.19110455 100 hilary mason data-2013-04-05-Speaking: 1 Kitten per Equation
17 0.18858455 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
18 0.18492141 2 hilary mason data-2006-05-04-Intro to the Linux Command Line
19 0.16970247 62 hilary mason data-2011-09-25-Conference: Strata NY 2011
20 0.1696652 102 hilary mason data-2013-05-03-Speaking: Explaining Technical Information to a Mixed Audience
topicId topicWeight
[(2, 0.088), (4, 0.624), (56, 0.148)]
simIndex simValue blogId blogTitle
same-blog 1 0.97112346 89 hilary mason data-2013-02-04-Experimenting With Physical Graphs
Introduction: Experimenting With Physical Graphs Posted: February 4, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » I ended up at NYC Resistor on Sunday, and decided to experiment with physical visualization of some data. I grabbed the clicks per second on keyphrases including my name (“hilary mason”) over the last six months, aggregated them by day, and made this graph: This is easy enough to construct for any phrase using the clickrate data that we’re calculating at bitly . I exported it from matplotlib in svg, added a label, and used the laser-cutter to create this out of plywood: laser-cut time series …which will shortly be adorning my desk at work. This is very simple, but there’s a lot of fun to be had with the physical manifestation of patterns we see in large amount of ephemeral data.
2 0.83891445 13 hilary mason data-2008-01-22-Create a group Twitter account
Introduction: Create a group Twitter account Posted: January 22, 2008 | Author: hilary | Filed under: blog | Tags: hack , social networking , twitter , web apps , web dev | 13 Comments » Twitter rocks. It’s useful for all kinds of things , but especially for chronicling a live event as it happens, including the pre-event discussion and post-conference wrapup. We’re very excited to be hosting NewB Camp here in Providence, RI on February 23rd. In preparation for the event, Sara created a NewBCamp Twitter account and I coded up this quick script to pull in all tweets related to the conference. It examines all of your followers tweets for a particular phrase or tag, and then reposts those tweets containing the tag to its own timeline with the author’s name prepended. I’m running this as a cron job on my hosting account. You can see it in action here . This is a quick hack. It has a couple of issue that I’m aware of: Someone has to log in and manually add
3 0.26147202 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to
4 0.25305024 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
Introduction: Startups: Why to Share Data with Academics Posted: January 28, 2013 | Author: Hilary Mason | Filed under: blog | 5 Comments » Last week I wrote a bit about how to share data with academics . This is the complimentary piece, on why you should invest the time and energy in sharing your data with the academic community. As I was talking to people about this topic it became clear that there are really two different questions people ask. First, why do this at all? And second, what do I tell my boss? Let’s start with the second one. This is what you should tell your boss: Academic research based on our work is a great press opportunity and demonstrates that credible people outside of our company find our work interesting. Having researchers work on our data is an easy way to access highly educated brainpower, for free, that in no way competes with us. Who knows what interesting stuff they’ll come up with? Personal relationships with university faculty ar
5 0.2504403 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.
Introduction: Need actual random numbers? Meet the NIST randomness beacon. Posted: September 30, 2013 | Author: Hilary Mason | Filed under: projects | Tags: beacon , python , random , randomness , randomnumbers | 5 Comments » I wrote a python module that wraps that NIST Randomness Beacon , making it simple to get truly random numbers in python. It’s easy to use: b = Beacon() print b.last_record() print b.previous_record() #and so on There’s also a handy generator for getting a set of n random numbers. (One of the best gifts I ever got was a copy of 1,000,000 Random Numbers , and I’ve been intrigued ever since.) Please note that this the randomness beacon is not intended to be a source of cryptographic keys — indeed, it’s a public set of numbers, so I wouldn’t recommend doing anything that could be compromised by someone else having the access to the exact same set of numbers . Rather, this is interesting precisely for the scientific opportunities that
6 0.24852189 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
7 0.24583048 58 hilary mason data-2011-06-22-My Head is Open Source!
8 0.23711991 82 hilary mason data-2013-01-08-Bitly Social Data APIs
9 0.23685288 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas
10 0.23588136 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
11 0.22376302 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
12 0.22244042 33 hilary mason data-2009-10-03-Hadoop World NYC
13 0.20889008 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
14 0.20754817 35 hilary mason data-2009-10-17-Yahoo OpenHackNYC: The Del.icio.us Cake
15 0.2036038 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
16 0.19982868 48 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails
17 0.19859791 80 hilary mason data-2012-12-28-Getting Started with Data Science
18 0.1911847 28 hilary mason data-2009-04-28-LSL: AOL IM Status Indicator
19 0.1898115 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
20 0.18937722 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.