hilary_mason_data hilary_mason_data-2010 knowledge-graph by maker-knowledge-mining
1 hilary mason data-2010-11-10-Machine Learning: A Love Story
Introduction: Machine Learning: A Love Story Posted: November 10, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: conferences , machinelearning , presentations , video | 27 Comments » The video from my keynote at Strange Loop 2010 is up! You can watch the video here: Machine Learning: A Love Story The original abstract: Machine learning has come a long way in recent years — from a long-marginalized field so old it still has the word “machine” in the name, to the last, best hope for making sense of our massive flows of data. The art of ‘data science’ is asking the right questions; the answers are generally trivial or impossible. This talk will focus more on questions than on answers. I’ll give a brief history of the field with a focus on the fundamental math and algorithmic tools that we use to address these kinds of problems, then walk through several descriptive and predictive scenarios. Finally, I’ll show one example syst
2 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails
Introduction: Twitter Succeeds Because it Fails Posted: September 4, 2010 | Author: hilary | Filed under: blog | Tags: failure , twitter | 9 Comments » How can twitter be so popular and successful if it’s down all the time ? We base statements like this on the assumption that quality of a web application maps linearly to the application’s stability. This is obviously true for most sites most of the time, but things get interesting at the edge where rare, unpredictable failure actually enables more complex human interactions around the service. Unlike e-mail, twitter etiquette doesn’t demand that you read or reply to every message from every person you follow (or who follows you). Combine that lightweight social touch with occasional technical issues and human communication patterns, and we start to see some interesting behavior. Twitter’s lack of reliability as a platform allows us to use the technical failings to mask our own social imperfections . How often have
3 hilary mason data-2010-08-23-New York Times: Reinventing E-mail, One Message at a Time
Introduction: New York Times: Reinventing E-mail, One Message at a Time Posted: August 23, 2010 | Author: Hilary Mason | Filed under: Media | Tags: code , email , hacking , newyorktimes | Leave a comment Âť Nick Bilton did a writeup of my homegrown e-mail scripts in the New York Times!
4 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
Introduction: Should you attend Hadoop World? Yes. Posted: August 15, 2010 | Author: hilary | Filed under: blog | Tags: conferences , hadoop , hadoopworld , questions | 2 Comments » I received this e-mail via my contact form : I just discovered you via a Google search because I’m highly considering attending this year’s upcoming Hadoop World in NYC. I appreciate your page that you wrote up after attending last year’s event. I’m wondering if you feel that Hadoop has enough momentum and support to be a “here to stay” technology worth investing one’s time and education into, or is it possible it might fade and be deprecated by something else as the need for big data analysis continues to grow? … I’ve had a few similar conversation with people lately, and I thought posting my response might help others making similar decisions. The e-mail is referencing my post from last year’s hadoop world NYC . Thanks for reaching out. There are several questions in your messa
5 hilary mason data-2010-07-26-A quick twitter bot, @bc l
Introduction: A quick twitter bot, @bc_l Posted: July 26, 2010 | Author: hilary | Filed under: blog , projects | Tags: bc , bot , command line , hack , script , twitter , unix | 5 Comments » Several months ago, on a whim inspired by an off-hand comment from Chris , I created a bot to bring the wonders of the Unix bc language to twitter. bc is a command-line calculator that’s fast and has the capacity to do some fairly complex math. Try it out on the command line: echo '100 / 10' | bc -l …Or by sending a direct message to bc_l (if you follow bc_l it will follow you back within a few hours). I released the code under GPL, and it’s available on github: http://github.com/hmason/tweetbc . John Cook mentions the bot and makes some great observations in his post three surprises with bc .
6 hilary mason data-2010-06-24-Conference: Web2 Expo SF
Introduction: Conference: Web2 Expo SF Posted: June 24, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: bitly , conference , data , presentation , realtime , web2expo | 6 Comments » I gave a talk called A Data-driven Look at the Realtime Web Ecosystem at the Web2Expo SF conference in May in San Francisco. I attempted to highlight some of the interesting facets of the bit.ly data set, and it appeared to be well-received (showing up on TechCrunch , ZDNet , and a few other places). I attended the full conference, and it was great. The attendees were extremely international and I met a ton of fascinating people. I’m still getting a couple of e-mail requests per week for my slides and materials, so they’re posted below for posterity. The slides: A Data-driven Look at the Realtime Web View more presentations from Hilary Mason . And the video: As always, I welcome your questions or comments.
7 hilary mason data-2010-05-27-E-mail automation, questions and answers
Introduction: E-mail automation, questions and answers Posted: May 27, 2010 | Author: hilary | Filed under: blog , projects | Tags: email , ignitenyc | 66 Comments » Welcome! I’ve gotten several hundred e-mails about my e-mail management code. I do want to share it as soon as possible. Here are the answers to the most common questions. Why separate scripts? My philosophy is based on the unix command-line tool model; Each script should be simple and useful alone, but when combined together they become extremely powerful. Why don’t we have the code yet?! I had no idea the talk would be shared beyond the couple hundred people in the audience or that it would be so popular! I started my position at bit.ly the same day I gave that IgniteNYC presentation, and I also have some other awesome projects that are competing for time. I have to admit that the trained classifiers are all based on my personal data and were also trained mostly through tweaking in ipython.
8 hilary mason data-2010-04-18-Stop talking, start coding
Introduction: Stop talking, start coding Posted: April 18, 2010 | Author: hilary | Filed under: blog | 65 Comments » I read Out of the Loop in Silicon Valley in the NYTimes today, which explores how and why women are under-repesented in tech startups. From the number of retweets I saw and the clicks through bit.ly links (12,579 at the time of this posting), it’s been getting a lot of attention. There are some very strong, compelling themes in this article. Computer science and engineering to have an “image problem”; the way we teach math to elementary school students is horrible and turns way too many away. I don’t want to nitpick the article, but there are a few statements that reinforce the very damaging stereotypes that the article sets out to dispel. “When women take on the challenges of an engineering or computer science education in college, some studies suggest that they struggle against a distinct set of personal, psycho-social issues… Even women who soldier
9 hilary mason data-2010-03-14-Art and Technology: Seven on Seven
Introduction: Art and Technology: Seven on Seven Posted: March 14, 2010 | Author: hilary | Filed under: blog , Presentations | Tags: art , hack , museum | 1 Comment » I’m honored and excited to be participating in Rhizome’s new conference Seven on Seven , where technologists and artists are paired up to create a completely new project in 24-hours. The formal description: Seven on Seven will pair seven leading artists with seven game-changing technologists in teams of two, and challenge them to develop something new –be it an application, social media, artwork, product, or whatever they imagine– over the course of a single day. The seven teams will unveil their ideas at a one-day event at the New Museum on April 17th. I really love this idea because the time constraints and the inherent discomfort of the situation (working in an unfamiliar space with an unfamiliar person) makes it likely that we’ll be able to accomplish something creative and unexpected. Or
10 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
Introduction: Conference: Search and Social Media 2010 Posted: February 16, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: algorithms , conference , research , search | 2 Comments » I recently attended the Third Annual Workshop on Search and Social Media , an academic workshop with very strong industry participation. The workshop was packed, and had some of the most informative and interesting panel discussions I’ve seen (not counting the one I spoke on!). Daniel Tunkelang did a great job of writing up the specific presentations on his site and on the ACM blog , so I won’t attempt to re-create the presentations line by line at this late date. Rather, I’d like to highlight a few open problems and research questions that came out of the discussions that I hope to see developed in the next year. Social search consists of a set of problems including (but hardly limited to) search of social content like status updat
11 hilary mason data-2010-01-03-SMS to e-mail gateway: The SMS doorbell
Introduction: SMS to e-mail gateway: The SMS doorbell Posted: January 3, 2010 | Author: hilary | Filed under: blog , projects | Tags: code , nycresistor , python , textmarks | 5 Comments » Over at NYC Resistor , it was getting cold, and we needed a doorbell so visitors wouldn’t be stranded outside when the building was locked. A standard wireless model didn’t work reliably (the space is on the fifth floor, just out of range), so various members generally resorted to writing their phone numbers on a sign on the front door when they were expecting guests. Since almost everyone has a mobile phone already, and SMS-based solution seemed appropriate. In order to implement this we need two things: An SMS shortcode A system to notify when the shortcode is triggered It’s irritating and expensive to acquire your own shortcode, but there are several services that will allow you to use one in exchange for a small fee or advertisements in your messages. TextMarks is my