hilary_mason_data hilary_mason_data-2009 hilary_mason_data-2009-33 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Hadoop World NYC Posted: October 3, 2009 | Author: hilary | Filed under: academics , blog | Tags: conference , data analysis , hadoop | 9 Comments » Yesterday, I attended the first Hadoop World NYC conference. Hadoop is a platform for scalable distributed computing. In essence, it makes analyzing large quantities of data much faster, and analyzing very large quantities of data possible. Cloudera did a great job organizing the conference, and managed to assemble a diverse set of speakers. The sessions covered everything from academic research to fraud detection to bioinformatics and even helping people fall in love (eHarmony uses Hadoop)! I’m not going to review every session, but I saw several themes emerging from the content and conversations. Hadoop is Getting Easier New integrated UIs like Cloudera Desktop and Karmasphere mean that developers will no longer be required to use a command-line interface to configure and execute Hadoop job
sentIndex sentText sentNum sentScore
1 Hadoop World NYC Posted: October 3, 2009 | Author: hilary | Filed under: academics , blog | Tags: conference , data analysis , hadoop | 9 Comments » Yesterday, I attended the first Hadoop World NYC conference. [sent-1, score-0.923]
2 In essence, it makes analyzing large quantities of data much faster, and analyzing very large quantities of data possible. [sent-3, score-0.906]
3 The sessions covered everything from academic research to fraud detection to bioinformatics and even helping people fall in love (eHarmony uses Hadoop)! [sent-5, score-0.237]
4 IBM’s M2 project hides Hadoop behind a spreadsheet metaphor, making the collection, analysis and visualization of data as easy as using Excel. [sent-8, score-0.213]
5 This doesn’t just speed up development time, it puts the tools for manipulating the data directly in the hands of the people who need the results, without requiring them to talk to a database programmer. [sent-9, score-0.486]
6 Hadoop is a Utility The only organizations that talked about building their own Hadoop clusters are those who deal with very sensitive data (VISA) and those who deal with very very large quantities of data (Yahoo, Facebook, eBay). [sent-10, score-0.805]
7 Organizations with more manageable data sets, such as eHarmony and the New York Times, use EC2 and Amazon’s Elastic Map-Reduce. [sent-11, score-0.096]
8 Amazon , Rackspace , and Softlayer have offerings in this area and were all event sponsors. [sent-12, score-0.077]
9 Hadoop Can Talk to Your Existing Systems Hadoop has an ecosystem of supporting products that allow organizations to adapt their existing infrastructure. [sent-14, score-0.349]
10 Cloudera’s Sqoop (which is just fun to say out loud) is a tool for importing data from SQL databases, HBase is a Hadoop database, and Pig lets you talk to the system in a SQL-like language. [sent-15, score-0.222]
11 I expect we’ll see more information available in the near future to clarify which systems are more appropriate for which kinds of users (an ecosystem decision tree? [sent-16, score-0.201]
12 Hadoop is Changing Things I heard the phrase “an order of magnitude improvement in speed” so many times that I lost count. [sent-18, score-0.069]
13 Speaking from personal experience, the difference you see in productivity between waiting minutes and hours for results and waiting days is immense. [sent-19, score-0.21]
14 When you can see the answer to a question shortly after you ask it you can preserve the context you need to act on that answer immediately without having to spend the time to figure out why you were asking that question in the first place. [sent-20, score-0.389]
15 Most of the projects were doing fairly simple analysis over data like web user sessions or transactions. [sent-21, score-0.302]
16 I was intrigued by Deepak Singh’s talk on bioinformatics and genome sequencing ( slides ) and Jake Hofman ‘s talk on social network analysis ( slides ). [sent-22, score-0.529]
17 More and more massive datasets are becoming available and will drive techniques for new analysis. [sent-23, score-0.116]
18 I do wish there had been a talk about Mahout , which is a very promising approach to developing machine learning algorithms on the Hadoop platform. [sent-24, score-0.082]
19 I left the event more excited about the technology and very enthusiastic about the community. [sent-25, score-0.077]
20 Settle in… Pete Skomoroch posted his slides and thoughts http://jakehofman. [sent-28, score-0.176]
wordName wordTfidf (topN-words)
[('hadoop', 0.71), ('cloudera', 0.156), ('organizations', 0.156), ('quantities', 0.156), ('session', 0.156), ('analysis', 0.117), ('analyzing', 0.104), ('bioinformatics', 0.104), ('deepak', 0.104), ('eharmony', 0.104), ('existing', 0.104), ('singh', 0.104), ('speed', 0.104), ('thoughts', 0.104), ('large', 0.097), ('data', 0.096), ('sessions', 0.089), ('answer', 0.089), ('ecosystem', 0.089), ('talk', 0.082), ('notes', 0.08), ('deal', 0.08), ('waiting', 0.08), ('event', 0.077), ('amazon', 0.074), ('slides', 0.072), ('times', 0.069), ('new', 0.065), ('database', 0.065), ('systems', 0.061), ('http', 0.058), ('question', 0.058), ('without', 0.051), ('available', 0.051), ('nyc', 0.051), ('results', 0.05), ('world', 0.048), ('manipulating', 0.044), ('fall', 0.044), ('faster', 0.044), ('lets', 0.044), ('utility', 0.044), ('development', 0.044), ('metaphor', 0.044), ('talked', 0.044), ('steve', 0.044), ('act', 0.044), ('assemble', 0.044), ('databases', 0.044), ('elastic', 0.044)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000006 33 hilary mason data-2009-10-03-Hadoop World NYC
Introduction: Hadoop World NYC Posted: October 3, 2009 | Author: hilary | Filed under: academics , blog | Tags: conference , data analysis , hadoop | 9 Comments » Yesterday, I attended the first Hadoop World NYC conference. Hadoop is a platform for scalable distributed computing. In essence, it makes analyzing large quantities of data much faster, and analyzing very large quantities of data possible. Cloudera did a great job organizing the conference, and managed to assemble a diverse set of speakers. The sessions covered everything from academic research to fraud detection to bioinformatics and even helping people fall in love (eHarmony uses Hadoop)! I’m not going to review every session, but I saw several themes emerging from the content and conversations. Hadoop is Getting Easier New integrated UIs like Cloudera Desktop and Karmasphere mean that developers will no longer be required to use a command-line interface to configure and execute Hadoop job
2 0.53945202 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
Introduction: Should you attend Hadoop World? Yes. Posted: August 15, 2010 | Author: hilary | Filed under: blog | Tags: conferences , hadoop , hadoopworld , questions | 2 Comments » I received this e-mail via my contact form : I just discovered you via a Google search because I’m highly considering attending this year’s upcoming Hadoop World in NYC. I appreciate your page that you wrote up after attending last year’s event. I’m wondering if you feel that Hadoop has enough momentum and support to be a “here to stay” technology worth investing one’s time and education into, or is it possible it might fade and be deprecated by something else as the need for big data analysis continues to grow? … I’ve had a few similar conversation with people lately, and I thought posting my response might help others making similar decisions. The e-mail is referencing my post from last year’s hadoop world NYC . Thanks for reaching out. There are several questions in your messa
3 0.097795956 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
Introduction: Speaking: Spend at least 1/3 of the time practicing the talk Posted: July 5, 2013 | Author: Hilary Mason | Filed under: speaking | 3 Comments » This week we welcome a guest contribution. Matthew Trentacoste is a recovering academic and a computer scientist at Adobe, where he writes software to make pretty pictures. He’s constantly curious, often about data, and cooks a lot. You can follow his exploits at @mattttrent . In Hilary’s last post, she made the point that your slides != your talk . In a well-crafted talk, your message — in the form of the words you say — needs to dominate while the slides need to play a supporting role. Speak the important parts, and use your slides as a backdrop for what you’re saying. Hilary has provided a valuable strategy in her post, but how should someone approach crafting such a clearly-organized presentation? If you’re just getting started speaking, it can be a real challenge to make a coherent talk and along with slid
4 0.093880415 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
Introduction: My NYC Python Meetup Presentation: Practical Data Analysis in Python Posted: August 12, 2009 | Author: hilary | Filed under: blog | Tags: data , data analysis , nltk , presentations , python , spam , twitter | Leave a comment » I gave a talk at the NYC Python Meetup on July 29 on Practical Data Analysis in Python . I tend to use my slides for visual representations of the concepts I’m discussing, so there’s a lot of content that was in the presentation that you unfortunately won’t see here. The talk starts with the immense opportunities for knowledge derived from data. I spent some time showing data systems ‘in the wild’ along with the appropriate algorithmic vocabulary (for example, amazon.com ‘s ‘books you might like’ feature is a recommender system ). Once we can describe the problems properly, we can look for tools, and Python has many! Finally, in the fun part of the presentation, I demoed working code that uses NLTK to build a Twitter sp
5 0.089395873 104 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk
Introduction: Speaking: Your Slides != Your Talk Posted: June 14, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: design , obama , slides | Leave a comment » Slides are the supporting structure for your talk, not the main event . Speak the meaty and informative portion of the presentation out loud and use slides as a backdrop to set either the emotional tone or reinforce the message that you are trying to convey. For example, I love using this image of Obama in Berlin as a backdrop when I talk about the growth of social data over the last several years. In this image every single person has a device and is generating their own data about their shared social experience. The content of the image supports what is otherwise a fairly abstract statement, and you can feel the excitement of the crowd, boosting the excitement that I want to share about the possibilities of social data. This is a particular style of slide design will fail for situations wher
6 0.080156997 113 hilary mason data-2013-11-22-Speaking: Two Questions to Ask Before You Give a Talk
7 0.077037022 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
8 0.07688988 49 hilary mason data-2010-11-10-Machine Learning: A Love Story
9 0.075874485 80 hilary mason data-2012-12-28-Getting Started with Data Science
10 0.075748958 57 hilary mason data-2011-05-21-An Introduction to Machine Learning with Web Data is now available!
11 0.070220143 84 hilary mason data-2013-01-17-Need Data? Start Here
12 0.069543675 44 hilary mason data-2010-06-24-Conference: Web2 Expo SF
13 0.068970084 82 hilary mason data-2013-01-08-Bitly Social Data APIs
14 0.068366095 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
15 0.066825226 47 hilary mason data-2010-08-23-New York Times: Reinventing E-mail, One Message at a Time
16 0.064578518 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data
17 0.063818306 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
18 0.063635379 89 hilary mason data-2013-02-04-Experimenting With Physical Graphs
19 0.063221879 99 hilary mason data-2013-04-01-Data Engineering
20 0.063153788 37 hilary mason data-2009-11-25-IgniteNYC: How to Replace Yourself with a Very Small Shell Script
topicId topicWeight
[(0, -0.264), (1, -0.074), (2, 0.059), (3, -0.073), (4, -0.11), (5, -0.01), (6, 0.115), (7, -0.131), (8, -0.669), (9, -0.115), (10, 0.064), (11, -0.158), (12, -0.209), (13, 0.041), (14, 0.039), (15, -0.092), (16, 0.03), (17, 0.035), (18, 0.122), (19, -0.093), (20, -0.085), (21, 0.091), (22, 0.016), (23, 0.041), (24, 0.005), (25, -0.074), (26, -0.012), (27, 0.008), (28, 0.052), (29, 0.064), (30, 0.003), (31, 0.036), (32, -0.026), (33, -0.017), (34, -0.045), (35, 0.098), (36, 0.025), (37, 0.07), (38, -0.033), (39, -0.029), (40, -0.07), (41, 0.068), (42, 0.047), (43, -0.007), (44, 0.031), (45, -0.03), (46, 0.045), (47, 0.001), (48, 0.031), (49, -0.081)]
simIndex simValue blogId blogTitle
same-blog 1 0.96345013 33 hilary mason data-2009-10-03-Hadoop World NYC
Introduction: Hadoop World NYC Posted: October 3, 2009 | Author: hilary | Filed under: academics , blog | Tags: conference , data analysis , hadoop | 9 Comments » Yesterday, I attended the first Hadoop World NYC conference. Hadoop is a platform for scalable distributed computing. In essence, it makes analyzing large quantities of data much faster, and analyzing very large quantities of data possible. Cloudera did a great job organizing the conference, and managed to assemble a diverse set of speakers. The sessions covered everything from academic research to fraud detection to bioinformatics and even helping people fall in love (eHarmony uses Hadoop)! I’m not going to review every session, but I saw several themes emerging from the content and conversations. Hadoop is Getting Easier New integrated UIs like Cloudera Desktop and Karmasphere mean that developers will no longer be required to use a command-line interface to configure and execute Hadoop job
2 0.9257471 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
Introduction: Should you attend Hadoop World? Yes. Posted: August 15, 2010 | Author: hilary | Filed under: blog | Tags: conferences , hadoop , hadoopworld , questions | 2 Comments » I received this e-mail via my contact form : I just discovered you via a Google search because I’m highly considering attending this year’s upcoming Hadoop World in NYC. I appreciate your page that you wrote up after attending last year’s event. I’m wondering if you feel that Hadoop has enough momentum and support to be a “here to stay” technology worth investing one’s time and education into, or is it possible it might fade and be deprecated by something else as the need for big data analysis continues to grow? … I’ve had a few similar conversation with people lately, and I thought posting my response might help others making similar decisions. The e-mail is referencing my post from last year’s hadoop world NYC . Thanks for reaching out. There are several questions in your messa
3 0.19660319 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data
Introduction: DataGotham: The Empire State of Data Posted: August 22, 2012 | Author: Hilary Mason | Filed under: blog , projects | 2 Comments » I’m extremely excited about DataGotham , a conference that I’m co-hosting with friends and fellow New York data nerds Drew , John , and Mike . DataGotham is a celebration of the NYC data community, and will bring together professionals from all industries in New York that are built around data, from finance to fashion and from startups to the Fortune 500 and government. The event is September 13th – 14th at NYU, with tutorials and The Great Data Extravaganza Show (with cocktails!) at the Tribeca Rooftop Thursday evening, and a single track conference Friday. Our speakers and sponsors are all amazing. You can register now . While DataGotham is definitely a labor of love, there are numerous reasons to do it. I believe that New York has a distinct data philosophy — the study of human behavior — that is unique and should be cel
4 0.18070397 82 hilary mason data-2013-01-08-Bitly Social Data APIs
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
5 0.17307428 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
Introduction: My NYC Python Meetup Presentation: Practical Data Analysis in Python Posted: August 12, 2009 | Author: hilary | Filed under: blog | Tags: data , data analysis , nltk , presentations , python , spam , twitter | Leave a comment » I gave a talk at the NYC Python Meetup on July 29 on Practical Data Analysis in Python . I tend to use my slides for visual representations of the concepts I’m discussing, so there’s a lot of content that was in the presentation that you unfortunately won’t see here. The talk starts with the immense opportunities for knowledge derived from data. I spent some time showing data systems ‘in the wild’ along with the appropriate algorithmic vocabulary (for example, amazon.com ‘s ‘books you might like’ feature is a recommender system ). Once we can describe the problems properly, we can look for tools, and Python has many! Finally, in the fun part of the presentation, I demoed working code that uses NLTK to build a Twitter sp
6 0.17168431 106 hilary mason data-2013-08-12-DataGotham 2013 is coming!
7 0.17031902 104 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk
8 0.16722572 30 hilary mason data-2009-06-01-My Barcamp Presentation: Have Data? What Now?!
9 0.16512942 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
10 0.16457041 49 hilary mason data-2010-11-10-Machine Learning: A Love Story
11 0.16451609 113 hilary mason data-2013-11-22-Speaking: Two Questions to Ask Before You Give a Talk
12 0.16373082 99 hilary mason data-2013-04-01-Data Engineering
13 0.161213 44 hilary mason data-2010-06-24-Conference: Web2 Expo SF
14 0.16075169 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
15 0.15560335 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
16 0.15499279 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious
17 0.14961097 84 hilary mason data-2013-01-17-Need Data? Start Here
18 0.14890188 80 hilary mason data-2012-12-28-Getting Started with Data Science
19 0.14853014 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
20 0.13815269 27 hilary mason data-2009-04-02-From the ACM: Learning More About Active Learning
topicId topicWeight
[(2, 0.079), (4, 0.028), (26, 0.025), (31, 0.055), (50, 0.024), (56, 0.105), (61, 0.027), (63, 0.033), (81, 0.018), (87, 0.019), (89, 0.017), (94, 0.461), (96, 0.013)]
simIndex simValue blogId blogTitle
1 0.91742909 12 hilary mason data-2007-10-24-Teen Second Life College Fair
Introduction: Teen Second Life College Fair Posted: October 24, 2007 | Author: hilary | Filed under: blog | Tags: education , second life | 1 Comment » I was immensely privileged to participate in the first ever Teen Second Life College Fair. The event was on the Eye4You Alliance TSL island. At least 18 institutions were represented (see some of the booths in the image to the left), and approximately 200 teens attended. I gave a short presentation on my own educational experiences and the incredible possibilities for careers in technology, but my favorite part of the college fair was the casual conversations that took place outside of the sessions and in the booth area. We talked about everything from education in Europe vs the US to tagging to SL building and scripting to politics… you get the idea! For educators and recruiters, this was a fantastic event for connecting with young people who are excited, passionate, and resourceful. The students were able to t
same-blog 2 0.89210397 33 hilary mason data-2009-10-03-Hadoop World NYC
Introduction: Hadoop World NYC Posted: October 3, 2009 | Author: hilary | Filed under: academics , blog | Tags: conference , data analysis , hadoop | 9 Comments » Yesterday, I attended the first Hadoop World NYC conference. Hadoop is a platform for scalable distributed computing. In essence, it makes analyzing large quantities of data much faster, and analyzing very large quantities of data possible. Cloudera did a great job organizing the conference, and managed to assemble a diverse set of speakers. The sessions covered everything from academic research to fraud detection to bioinformatics and even helping people fall in love (eHarmony uses Hadoop)! I’m not going to review every session, but I saw several themes emerging from the content and conversations. Hadoop is Getting Easier New integrated UIs like Cloudera Desktop and Karmasphere mean that developers will no longer be required to use a command-line interface to configure and execute Hadoop job
3 0.86581671 106 hilary mason data-2013-08-12-DataGotham 2013 is coming!
Introduction: DataGotham 2013 is coming! Posted: August 12, 2013 | Author: Hilary Mason | Filed under: blog | Tags: datagotham , nyc | Leave a comment » Registration is open for DataGotham 2013 , our second annual New York data community conference, September 12th and 13th. The core of the conference is a series of brilliant data practitioners telling the stories about what they work on. The content is technically-oriented but not all deeply technical, and we really welcome anyone curious about how New York companies and institutions are pushing the boundaries on data to attend. We have two goals for the conference. The primary goal is to connect people in the greater New York data community who are working on interesting things. If our community is strong and supportive, we will all do better work. Our second goal is to highlight the amazing working happening here, so that people near and far will realize that New York is the best place in the world to do data science.
4 0.29888776 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
Introduction: Should you attend Hadoop World? Yes. Posted: August 15, 2010 | Author: hilary | Filed under: blog | Tags: conferences , hadoop , hadoopworld , questions | 2 Comments » I received this e-mail via my contact form : I just discovered you via a Google search because I’m highly considering attending this year’s upcoming Hadoop World in NYC. I appreciate your page that you wrote up after attending last year’s event. I’m wondering if you feel that Hadoop has enough momentum and support to be a “here to stay” technology worth investing one’s time and education into, or is it possible it might fade and be deprecated by something else as the need for big data analysis continues to grow? … I’ve had a few similar conversation with people lately, and I thought posting my response might help others making similar decisions. The e-mail is referencing my post from last year’s hadoop world NYC . Thanks for reaching out. There are several questions in your messa
5 0.25839379 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to
6 0.24763274 42 hilary mason data-2010-04-18-Stop talking, start coding
7 0.2461139 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
8 0.2444887 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
9 0.24390803 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.
10 0.23661168 58 hilary mason data-2011-06-22-My Head is Open Source!
11 0.23413295 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
12 0.22536099 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas
13 0.21799427 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
14 0.21351567 80 hilary mason data-2012-12-28-Getting Started with Data Science
15 0.20592223 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
16 0.20402743 13 hilary mason data-2008-01-22-Create a group Twitter account
17 0.20381443 71 hilary mason data-2012-01-26-Identity Slippage, and what’s the weirdest thing you’ve been e-mailed by accident?
18 0.20347664 82 hilary mason data-2013-01-08-Bitly Social Data APIs
19 0.2030137 76 hilary mason data-2012-08-28-How do you prioritize research?
20 0.20080756 6 hilary mason data-2007-07-27-Uninstall Programs … For Real.