hilary_mason_data hilary_mason_data-2009 hilary_mason_data-2009-38 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: IgniteNYC: The video! Posted: December 24, 2009 | Author: hilary | Filed under: academics , blog | Tags: presentation , python | 15 Comments » The video of my IgniteNYC presentation is up, and has gotten a great response! I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon!
sentIndex sentText sentNum sentScore
1 Posted: December 24, 2009 | Author: hilary | Filed under: academics , blog | Tags: presentation , python | 15 Comments » The video of my IgniteNYC presentation is up, and has gotten a great response! [sent-2, score-1.732]
2 I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon! [sent-3, score-0.766]
wordName wordTfidf (topN-words)
[('ignitenyc', 0.528), ('presentation', 0.34), ('video', 0.34), ('bits', 0.264), ('december', 0.264), ('soon', 0.264), ('posting', 0.243), ('response', 0.227), ('gotten', 0.214), ('python', 0.185), ('academics', 0.185), ('working', 0.148), ('code', 0.111), ('great', 0.108), ('ll', 0.091), ('comments', 0.022), ('blog', 0.02), ('tags', 0.014)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 38 hilary mason data-2009-12-24-IgniteNYC: The video!
Introduction: IgniteNYC: The video! Posted: December 24, 2009 | Author: hilary | Filed under: academics , blog | Tags: presentation , python | 15 Comments » The video of my IgniteNYC presentation is up, and has gotten a great response! I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon!
2 0.22579703 37 hilary mason data-2009-11-25-IgniteNYC: How to Replace Yourself with a Very Small Shell Script
Introduction: IgniteNYC: How to Replace Yourself with a Very Small Shell Script Posted: November 25, 2009 | Author: hilary | Filed under: blog , Presentations | Tags: email , ignitenyc , presentations , scripts | 15 Comments » I recently gave a talk at IgniteNYC on How to Replace Yourself with a Very Small Shell Script . The Ignite events are a fun blend of performance, technology, and speaking skill. Each presenter gives a five minute talk with twenty slides that auto-advance after 15 seconds. The title of my talk is a classic geek reference (you can get the t-shirt ). I’m very interested in developing automated techniques for handling the massive and growing amounts of information that we all have to deal with. I started with e-mail and twitter, both of which are easy to access programmatically (via IMAP and the Twitter API ). In the talk, I went through several of the simple and successful e-mail management scripts that I’ve developed. I decided to
3 0.17987248 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
Introduction: My NYC Python Meetup Presentation: Practical Data Analysis in Python Posted: August 12, 2009 | Author: hilary | Filed under: blog | Tags: data , data analysis , nltk , presentations , python , spam , twitter | Leave a comment » I gave a talk at the NYC Python Meetup on July 29 on Practical Data Analysis in Python . I tend to use my slides for visual representations of the concepts I’m discussing, so there’s a lot of content that was in the presentation that you unfortunately won’t see here. The talk starts with the immense opportunities for knowledge derived from data. I spent some time showing data systems ‘in the wild’ along with the appropriate algorithmic vocabulary (for example, amazon.com ‘s ‘books you might like’ feature is a recommender system ). Once we can describe the problems properly, we can look for tools, and Python has many! Finally, in the fun part of the presentation, I demoed working code that uses NLTK to build a Twitter sp
4 0.13071211 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
Introduction: E-mail automation, questions and answers Posted: May 27, 2010 | Author: hilary | Filed under: blog , projects | Tags: email , ignitenyc | 66 Comments » Welcome! I’ve gotten several hundred e-mails about my e-mail management code. I do want to share it as soon as possible. Here are the answers to the most common questions. Why separate scripts? My philosophy is based on the unix command-line tool model; Each script should be simple and useful alone, but when combined together they become extremely powerful. Why don’t we have the code yet?! I had no idea the talk would be shared beyond the couple hundred people in the audience or that it would be so popular! I started my position at bit.ly the same day I gave that IgniteNYC presentation, and I also have some other awesome projects that are competing for time. I have to admit that the trained classifiers are all based on my personal data and were also trained mostly through tweaking in ipython.
5 0.1294055 51 hilary mason data-2011-02-11-Interview on Silicon Angle TV
Introduction: Interview on Silicon Angle TV Posted: February 11, 2011 | Author: Hilary Mason | Filed under: Media | Tags: conference , press , strataconf | 2 Comments » You can catch an interview (or see a writeup ) that I did live from the Strata Conference on Silicon Angle TV! We talk about bit.ly data, politics, and touch briefly on some of the interesting problems that we’re working on. Full video: [I removed the video embed because it was annoyingly auto-playing in some browsers. You can still see the video here .]
6 0.12155474 49 hilary mason data-2010-11-10-Machine Learning: A Love Story
7 0.099833339 57 hilary mason data-2011-05-21-An Introduction to Machine Learning with Web Data is now available!
8 0.085897565 44 hilary mason data-2010-06-24-Conference: Web2 Expo SF
9 0.079273775 61 hilary mason data-2011-08-24-bash: get http response codes for a list of URLs
10 0.065407254 72 hilary mason data-2012-03-17-Short URLs, Big Fun: I spoke at dropbox!
11 0.052472476 52 hilary mason data-2011-02-13-Betaworks Builds a Makerbot
12 0.052242428 53 hilary mason data-2011-03-11-Conference: PyCon 2011 Keynote!
13 0.05020649 80 hilary mason data-2012-12-28-Getting Started with Data Science
14 0.049846224 36 hilary mason data-2009-11-10-My code is on TV (and so am I)!
15 0.049455248 103 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem
16 0.048275061 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
17 0.045978129 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
18 0.044119965 68 hilary mason data-2011-11-06-80 Million Links a Day Don’t Lie: Fast Company interview!
19 0.041369807 39 hilary mason data-2010-01-03-SMS to e-mail gateway: The SMS doorbell
20 0.037229236 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
topicId topicWeight
[(0, -0.138), (1, -0.038), (2, 0.084), (3, -0.013), (4, -0.28), (5, -0.134), (6, -0.163), (7, 0.048), (8, -0.021), (9, 0.315), (10, 0.023), (11, -0.168), (12, 0.129), (13, -0.193), (14, 0.199), (15, 0.132), (16, 0.061), (17, -0.12), (18, -0.01), (19, 0.155), (20, -0.015), (21, -0.007), (22, 0.049), (23, 0.044), (24, -0.156), (25, 0.083), (26, -0.008), (27, 0.108), (28, -0.083), (29, 0.121), (30, -0.015), (31, 0.015), (32, -0.003), (33, 0.099), (34, -0.086), (35, -0.133), (36, -0.05), (37, 0.066), (38, 0.079), (39, 0.084), (40, -0.169), (41, -0.012), (42, 0.053), (43, -0.039), (44, -0.067), (45, -0.019), (46, 0.02), (47, -0.118), (48, -0.163), (49, 0.053)]
simIndex simValue blogId blogTitle
same-blog 1 0.99469638 38 hilary mason data-2009-12-24-IgniteNYC: The video!
Introduction: IgniteNYC: The video! Posted: December 24, 2009 | Author: hilary | Filed under: academics , blog | Tags: presentation , python | 15 Comments » The video of my IgniteNYC presentation is up, and has gotten a great response! I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon!
2 0.49136361 37 hilary mason data-2009-11-25-IgniteNYC: How to Replace Yourself with a Very Small Shell Script
Introduction: IgniteNYC: How to Replace Yourself with a Very Small Shell Script Posted: November 25, 2009 | Author: hilary | Filed under: blog , Presentations | Tags: email , ignitenyc , presentations , scripts | 15 Comments » I recently gave a talk at IgniteNYC on How to Replace Yourself with a Very Small Shell Script . The Ignite events are a fun blend of performance, technology, and speaking skill. Each presenter gives a five minute talk with twenty slides that auto-advance after 15 seconds. The title of my talk is a classic geek reference (you can get the t-shirt ). I’m very interested in developing automated techniques for handling the massive and growing amounts of information that we all have to deal with. I started with e-mail and twitter, both of which are easy to access programmatically (via IMAP and the Twitter API ). In the talk, I went through several of the simple and successful e-mail management scripts that I’ve developed. I decided to
3 0.44727591 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
Introduction: My NYC Python Meetup Presentation: Practical Data Analysis in Python Posted: August 12, 2009 | Author: hilary | Filed under: blog | Tags: data , data analysis , nltk , presentations , python , spam , twitter | Leave a comment » I gave a talk at the NYC Python Meetup on July 29 on Practical Data Analysis in Python . I tend to use my slides for visual representations of the concepts I’m discussing, so there’s a lot of content that was in the presentation that you unfortunately won’t see here. The talk starts with the immense opportunities for knowledge derived from data. I spent some time showing data systems ‘in the wild’ along with the appropriate algorithmic vocabulary (for example, amazon.com ‘s ‘books you might like’ feature is a recommender system ). Once we can describe the problems properly, we can look for tools, and Python has many! Finally, in the fun part of the presentation, I demoed working code that uses NLTK to build a Twitter sp
4 0.39823538 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
Introduction: E-mail automation, questions and answers Posted: May 27, 2010 | Author: hilary | Filed under: blog , projects | Tags: email , ignitenyc | 66 Comments » Welcome! I’ve gotten several hundred e-mails about my e-mail management code. I do want to share it as soon as possible. Here are the answers to the most common questions. Why separate scripts? My philosophy is based on the unix command-line tool model; Each script should be simple and useful alone, but when combined together they become extremely powerful. Why don’t we have the code yet?! I had no idea the talk would be shared beyond the couple hundred people in the audience or that it would be so popular! I started my position at bit.ly the same day I gave that IgniteNYC presentation, and I also have some other awesome projects that are competing for time. I have to admit that the trained classifiers are all based on my personal data and were also trained mostly through tweaking in ipython.
5 0.36124256 51 hilary mason data-2011-02-11-Interview on Silicon Angle TV
Introduction: Interview on Silicon Angle TV Posted: February 11, 2011 | Author: Hilary Mason | Filed under: Media | Tags: conference , press , strataconf | 2 Comments » You can catch an interview (or see a writeup ) that I did live from the Strata Conference on Silicon Angle TV! We talk about bit.ly data, politics, and touch briefly on some of the interesting problems that we’re working on. Full video: [I removed the video embed because it was annoyingly auto-playing in some browsers. You can still see the video here .]
6 0.2918008 49 hilary mason data-2010-11-10-Machine Learning: A Love Story
7 0.24474196 57 hilary mason data-2011-05-21-An Introduction to Machine Learning with Web Data is now available!
8 0.21903957 61 hilary mason data-2011-08-24-bash: get http response codes for a list of URLs
9 0.21114896 55 hilary mason data-2011-03-27-Gitmarks: a peer-to-peer bookmarking system
10 0.20916316 44 hilary mason data-2010-06-24-Conference: Web2 Expo SF
11 0.20547082 72 hilary mason data-2012-03-17-Short URLs, Big Fun: I spoke at dropbox!
12 0.18025681 36 hilary mason data-2009-11-10-My code is on TV (and so am I)!
13 0.1611083 12 hilary mason data-2007-10-24-Teen Second Life College Fair
14 0.15143679 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
15 0.13740142 52 hilary mason data-2011-02-13-Betaworks Builds a Makerbot
16 0.13559183 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
17 0.12353023 103 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem
18 0.12259041 68 hilary mason data-2011-11-06-80 Million Links a Day Don’t Lie: Fast Company interview!
19 0.11538924 80 hilary mason data-2012-12-28-Getting Started with Data Science
20 0.11088702 39 hilary mason data-2010-01-03-SMS to e-mail gateway: The SMS doorbell
topicId topicWeight
[(2, 0.014), (87, 0.771)]
simIndex simValue blogId blogTitle
same-blog 1 0.96251214 38 hilary mason data-2009-12-24-IgniteNYC: The video!
Introduction: IgniteNYC: The video! Posted: December 24, 2009 | Author: hilary | Filed under: academics , blog | Tags: presentation , python | 15 Comments » The video of my IgniteNYC presentation is up, and has gotten a great response! I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon!
2 0.90004802 99 hilary mason data-2013-04-01-Data Engineering
Introduction: Data Engineering Posted: April 1, 2013 | Author: Hilary Mason | Filed under: blog | Tags: bitly , data , engineering , infrastructure | 5 Comments » Data engineering is when the architecture of your system is dependent on characteristics of the data flowing through that system . It requires a different kind of engineering process than typical systems engineering, because you have to do some work upfront to understand the nature of the data before you can effectively begin to design the infrastructure. Most data engineering systems also transform the data as they process it. Developing these types of systems requires an initial research phase, where you do the necessary work to understand the characteristics of the data, before you design the system (and perhaps even requiring an active experimental process where you try multiple infrastructure options in the wild before making a final decision). I’ve seen numerous people run straight into walls when
3 0.15115674 8 hilary mason data-2007-08-19-Curriculum Design as Software Engineering
Introduction: Curriculum Design as Software Engineering Posted: August 19, 2007 | Author: hilary | Filed under: blog | Tags: education | 1 Comment » This summer, I’ve been involved in the process of creating a new undergraduate curriculum essentially from scratch. I was reflecting back on this process, and I realized the development of a robust and relevant curriculum shares many attributes with the process of developing robust and functional software. Modern software development is a largely modular process. Each component of a system interacts with every other component through a defined interface. I see this same behavior in a degree program – each course has certain incoming requires and defined outcomes. Students navigate through a narrative of courses that must fit together to equal a bachelor’s degree. Unit testing is the practice of separating out each module in a software system and insuring that it functions correctly. The final system is will contain many
4 0.14608827 42 hilary mason data-2010-04-18-Stop talking, start coding
Introduction: Stop talking, start coding Posted: April 18, 2010 | Author: hilary | Filed under: blog | 65 Comments » I read Out of the Loop in Silicon Valley in the NYTimes today, which explores how and why women are under-repesented in tech startups. From the number of retweets I saw and the clicks through bit.ly links (12,579 at the time of this posting), it’s been getting a lot of attention. There are some very strong, compelling themes in this article. Computer science and engineering to have an “image problem”; the way we teach math to elementary school students is horrible and turns way too many away. I don’t want to nitpick the article, but there are a few statements that reinforce the very damaging stereotypes that the article sets out to dispel. “When women take on the challenges of an engineering or computer science education in college, some studies suggest that they struggle against a distinct set of personal, psycho-social issues… Even women who soldier
5 0.10100639 76 hilary mason data-2012-08-28-How do you prioritize research?
Introduction: How do you prioritize research? Posted: August 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: datascience , startups | 14 Comments » One of the most fun and challenging parts of my job is setting bitly’s research agenda. We’re a startup, so this means prioritizing the set of questions we look into in the context of what will be most beneficial for the rest of the business, for the short and long-term, by creating opportunity and opening up potential futures. We work on a wide variety of projects, from pure research to press collaborations to infrastructure and experimental products . We always have a list of research questions way longer than we have time and resources to pursue, so we developed a process for evaluating whether a given question is worth pursuing at a particular time. This is the kind of process that I’ve only discussed with several people over whisky (thanks!), but not seen written up. I initially had a much longer list o
6 0.08909025 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
7 0.080080144 74 hilary mason data-2012-08-19-Why I love New York City
8 0.079420142 68 hilary mason data-2011-11-06-80 Million Links a Day Don’t Lie: Fast Company interview!
9 0.077925719 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
10 0.07602676 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
11 0.060844891 33 hilary mason data-2009-10-03-Hadoop World NYC
12 0.060023945 21 hilary mason data-2008-09-26-What am I like? How about you?
13 0.058653034 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
14 0.056709796 113 hilary mason data-2013-11-22-Speaking: Two Questions to Ask Before You Give a Talk
15 0.056249671 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
16 0.052585591 93 hilary mason data-2013-03-01-Speaking: Pick a Vague and Specific Title for Your Talk
17 0.047998369 2 hilary mason data-2006-05-04-Intro to the Linux Command Line
18 0.047353506 54 hilary mason data-2011-03-18-Be Ballsy.
19 0.045859311 106 hilary mason data-2013-08-12-DataGotham 2013 is coming!
20 0.041572712 80 hilary mason data-2012-12-28-Getting Started with Data Science