hilary_mason_data hilary_mason_data-2013 hilary_mason_data-2013-82 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
sentIndex sentText sentNum sentScore
1 Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . [sent-1, score-1.217]
2 I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. [sent-2, score-0.552]
3 There are three types of endpoints and each one is awesome for a different reason. [sent-3, score-0.089]
4 First, we share the analysis that we do at the link level. [sent-4, score-0.117]
5 Every developer using data from the web has the same set of problems — what are the topics of those URLs? [sent-5, score-0.274]
6 Why should you rebuild this infrastructure when we’ve done it already? [sent-7, score-0.069]
7 We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . [sent-8, score-0.801]
8 Second, we’ve opened up access to a realtime search engine. [sent-9, score-0.482]
9 That’s an actual search engine that returns results ranked by current attention and popularity. [sent-10, score-0.727]
10 Links are only retained for 24 hours, so you know that anything you see is actively receiving attention. [sent-11, score-0.105]
11 You can test it out with a human-friendly interface at rt. [sent-13, score-0.154]
12 Finally, we asked the question — what is the world paying attention to right now? [sent-15, score-0.578]
13 We have a system that tracks the rate of clicks – a proxy for attention – on phrases contained within the URLs being clicked through bitly. [sent-16, score-0.908]
14 Then we can look and see which phrases are currently receiving a disproportionate amount of attention. [sent-17, score-0.417]
15 We call these “bursting phrases”, and you can access them with the /v3/realtime/bursting_phrases endpoint. [sent-18, score-0.186]
16 It’s analogous to Twitter’s trending topics, but based on attention (what people do), not shares (what they say), and across the entire social web. [sent-19, score-0.767]
17 I’m extremely excited to see what people build with these tools. [sent-20, score-0.275]
wordName wordTfidf (topN-words)
[('attention', 0.344), ('phrases', 0.312), ('bitly', 0.268), ('apis', 0.208), ('receiving', 0.208), ('stream', 0.208), ('social', 0.191), ('topics', 0.178), ('realtime', 0.147), ('links', 0.138), ('urls', 0.138), ('paying', 0.138), ('search', 0.134), ('analysis', 0.117), ('access', 0.112), ('api', 0.107), ('see', 0.105), ('data', 0.096), ('excited', 0.096), ('world', 0.096), ('meets', 0.089), ('shares', 0.089), ('returns', 0.089), ('proxy', 0.089), ('power', 0.089), ('favorites', 0.089), ('hasn', 0.089), ('opened', 0.089), ('clicked', 0.089), ('magic', 0.089), ('added', 0.089), ('offers', 0.089), ('types', 0.089), ('domain', 0.08), ('bits', 0.08), ('test', 0.08), ('filter', 0.08), ('developers', 0.08), ('actual', 0.08), ('engine', 0.08), ('ve', 0.079), ('ability', 0.074), ('interface', 0.074), ('released', 0.074), ('call', 0.074), ('within', 0.074), ('people', 0.074), ('pull', 0.069), ('entire', 0.069), ('infrastructure', 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 82 hilary mason data-2013-01-08-Bitly Social Data APIs
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
2 0.18136415 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
Introduction: Conference: Search and Social Media 2010 Posted: February 16, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: algorithms , conference , research , search | 2 Comments » I recently attended the Third Annual Workshop on Search and Social Media , an academic workshop with very strong industry participation. The workshop was packed, and had some of the most informative and interesting panel discussions I’ve seen (not counting the one I spoke on!). Daniel Tunkelang did a great job of writing up the specific presentations on his site and on the ACM blog , so I won’t attempt to re-create the presentations line by line at this late date. Rather, I’d like to highlight a few open problems and research questions that came out of the discussions that I hope to see developed in the next year. Social search consists of a set of problems including (but hardly limited to) search of social content like status updat
3 0.18071339 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
Introduction: Startups: How to Share Data with Academics Posted: January 19, 2013 | Author: Hilary Mason | Filed under: blog | Tags: academics , data , research | 8 Comments » This post assumes that you want to share data. If you’re not convinced, don’t worry — that’s next on my list. You and your academic colleagues will benefit from having at least a quick chat about the research questions they want to address. I’ve read every paper I’ve been able to find that uses bitly data and all of the ones that acquired the data without our assistance had serious flaws, generally based on incorrect assumptions about the data they had acquired (this, unfortunately, makes me question the validity of most research done on commercial social data without cooperation from the subject company). The easiest way to share data is through your own API . Set generous rate limits where possible. Most projects are not realtime and they can gather the data (or, more likely, have a grad
4 0.11649574 44 hilary mason data-2010-06-24-Conference: Web2 Expo SF
Introduction: Conference: Web2 Expo SF Posted: June 24, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: bitly , conference , data , presentation , realtime , web2expo | 6 Comments » I gave a talk called A Data-driven Look at the Realtime Web Ecosystem at the Web2Expo SF conference in May in San Francisco. I attempted to highlight some of the interesting facets of the bit.ly data set, and it appeared to be well-received (showing up on TechCrunch , ZDNet , and a few other places). I attended the full conference, and it was great. The attendees were extremely international and I met a ton of fascinating people. I’m still getting a couple of e-mail requests per week for my slides and materials, so they’re posted below for posterity. The slides: A Data-driven Look at the Realtime Web View more presentations from Hilary Mason . And the video: As always, I welcome your questions or comments.
5 0.10658088 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
6 0.10469759 94 hilary mason data-2013-03-08-Speaking: Title Slides + Twitter = You Win
7 0.098395586 62 hilary mason data-2011-09-25-Conference: Strata NY 2011
8 0.094295196 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
9 0.093055844 103 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem
10 0.089141242 89 hilary mason data-2013-02-04-Experimenting With Physical Graphs
11 0.086017333 99 hilary mason data-2013-04-01-Data Engineering
12 0.083259836 76 hilary mason data-2012-08-28-How do you prioritize research?
13 0.081216544 48 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails
14 0.07936231 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
15 0.077807896 95 hilary mason data-2013-03-17-Speaking: Entertain, Don’t Teach
16 0.076685101 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
17 0.071310721 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
18 0.071181215 80 hilary mason data-2012-12-28-Getting Started with Data Science
19 0.070997991 3 hilary mason data-2007-06-08-The Best Time to Search for Academic Jobs
20 0.070054039 104 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk
topicId topicWeight
[(0, -0.287), (1, -0.014), (2, -0.056), (3, -0.151), (4, -0.079), (5, 0.067), (6, -0.01), (7, -0.167), (8, 0.033), (9, -0.105), (10, 0.182), (11, 0.047), (12, -0.027), (13, 0.087), (14, -0.083), (15, 0.105), (16, -0.14), (17, -0.069), (18, -0.051), (19, 0.055), (20, -0.026), (21, -0.196), (22, 0.133), (23, 0.022), (24, -0.125), (25, 0.057), (26, 0.03), (27, 0.057), (28, -0.037), (29, 0.094), (30, 0.19), (31, 0.019), (32, 0.093), (33, 0.076), (34, 0.036), (35, 0.075), (36, -0.104), (37, 0.091), (38, 0.085), (39, 0.107), (40, -0.001), (41, -0.035), (42, 0.025), (43, -0.04), (44, -0.006), (45, 0.032), (46, 0.172), (47, 0.239), (48, 0.097), (49, -0.044)]
simIndex simValue blogId blogTitle
same-blog 1 0.97587806 82 hilary mason data-2013-01-08-Bitly Social Data APIs
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
2 0.58063918 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
Introduction: Conference: Search and Social Media 2010 Posted: February 16, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: algorithms , conference , research , search | 2 Comments » I recently attended the Third Annual Workshop on Search and Social Media , an academic workshop with very strong industry participation. The workshop was packed, and had some of the most informative and interesting panel discussions I’ve seen (not counting the one I spoke on!). Daniel Tunkelang did a great job of writing up the specific presentations on his site and on the ACM blog , so I won’t attempt to re-create the presentations line by line at this late date. Rather, I’d like to highlight a few open problems and research questions that came out of the discussions that I hope to see developed in the next year. Social search consists of a set of problems including (but hardly limited to) search of social content like status updat
3 0.51364863 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
Introduction: Startups: How to Share Data with Academics Posted: January 19, 2013 | Author: Hilary Mason | Filed under: blog | Tags: academics , data , research | 8 Comments » This post assumes that you want to share data. If you’re not convinced, don’t worry — that’s next on my list. You and your academic colleagues will benefit from having at least a quick chat about the research questions they want to address. I’ve read every paper I’ve been able to find that uses bitly data and all of the ones that acquired the data without our assistance had serious flaws, generally based on incorrect assumptions about the data they had acquired (this, unfortunately, makes me question the validity of most research done on commercial social data without cooperation from the subject company). The easiest way to share data is through your own API . Set generous rate limits where possible. Most projects are not realtime and they can gather the data (or, more likely, have a grad
4 0.40459922 103 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem
Introduction: Lucene Revolution Keynote: Search is Not a Solved Problem Posted: June 4, 2013 | Author: Hilary Mason | Filed under: Presentations | Tags: lucene , presentation , search , solr , talk | 3 Comments » The wonderful folks at LucidWorks have posted the video of my recent Lucene Revolution keynote. The brief idea behind this talk is that search is not a solved problem — there is still a big opportunity for building search (and finding?) capabilities for the kinds of questions that the current product fail to solve. For example, why do search engines just return a list of sorted URLs, but give me no information about the themes that are consistent across them? The audience was technical, specifically Lucene and Solr devs, so I spent some time talking about how we use those technologies at bitly.
5 0.37468022 84 hilary mason data-2013-01-17-Need Data? Start Here
Introduction: Need Data? Start Here Posted: January 17, 2013 | Author: Hilary Mason | Filed under: projects | Tags: data , dataset | 12 Comments » Data scientists need data, and good data is hard to find. I put together this bitly bundle of research quality data sets to collect as many useful data sets as possible in one place. The list includes such exciting and diverse things as spam, belly buttons, item pricing, social media, and face recognition, so you know there’s something that will intrigue anyone. Have one to add? Let me know! (I’ve shared the bundle before, but this post can act as unofficial homepage for it.)
6 0.35150123 99 hilary mason data-2013-04-01-Data Engineering
7 0.32387078 44 hilary mason data-2010-06-24-Conference: Web2 Expo SF
8 0.29917467 63 hilary mason data-2011-09-26-Hacking the Food System: The Ultimate Chocolate Chip Cookie
9 0.28890088 89 hilary mason data-2013-02-04-Experimenting With Physical Graphs
10 0.28822052 62 hilary mason data-2011-09-25-Conference: Strata NY 2011
11 0.28147888 31 hilary mason data-2009-08-12-My NYC Python Meetup Presentation: Practical Data Analysis in Python
12 0.27221322 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
13 0.25774795 104 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk
14 0.24842647 115 hilary mason data-2014-02-14-Play with your food!
15 0.24827026 112 hilary mason data-2013-11-01-Books Recommendations for Programming Excellence
16 0.24688224 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
17 0.2439172 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
18 0.23885232 77 hilary mason data-2012-09-18-Hey Yahoo, You’re Optimizing the Wrong Thing
19 0.2372206 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data
20 0.23433876 33 hilary mason data-2009-10-03-Hadoop World NYC
topicId topicWeight
[(2, 0.153), (15, 0.038), (56, 0.157), (63, 0.012), (74, 0.539)]
simIndex simValue blogId blogTitle
same-blog 1 0.88867104 82 hilary mason data-2013-01-08-Bitly Social Data APIs
Introduction: Bitly Social Data APIs Posted: January 8, 2013 | Author: Hilary Mason | Filed under: blog | Tags: api , bitly , data , dataset | 1 Comment » We just released a bunch of social data analysis APIs over at bitly . I’m really excited about this, as it’s offering developers the power to use social data in a way that hasn’t been available before. There are three types of endpoints and each one is awesome for a different reason. First, we share the analysis that we do at the link level. Every developer using data from the web has the same set of problems — what are the topics of those URLs? What are their keywords? Why should you rebuild this infrastructure when we’ve done it already? We’ve also added in a few bits of bitly magic — for example, you can use the /v3/link/location endpoint to see where in the world people are consuming that information from . Second, we’ve opened up access to a realtime search engine. That’s an actual search engine that retu
2 0.34453511 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to
3 0.33588177 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
Introduction: Startups: How to Share Data with Academics Posted: January 19, 2013 | Author: Hilary Mason | Filed under: blog | Tags: academics , data , research | 8 Comments » This post assumes that you want to share data. If you’re not convinced, don’t worry — that’s next on my list. You and your academic colleagues will benefit from having at least a quick chat about the research questions they want to address. I’ve read every paper I’ve been able to find that uses bitly data and all of the ones that acquired the data without our assistance had serious flaws, generally based on incorrect assumptions about the data they had acquired (this, unfortunately, makes me question the validity of most research done on commercial social data without cooperation from the subject company). The easiest way to share data is through your own API . Set generous rate limits where possible. Most projects are not realtime and they can gather the data (or, more likely, have a grad
4 0.32622501 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
Introduction: Speaking: Spend at least 1/3 of the time practicing the talk Posted: July 5, 2013 | Author: Hilary Mason | Filed under: speaking | 3 Comments » This week we welcome a guest contribution. Matthew Trentacoste is a recovering academic and a computer scientist at Adobe, where he writes software to make pretty pictures. He’s constantly curious, often about data, and cooks a lot. You can follow his exploits at @mattttrent . In Hilary’s last post, she made the point that your slides != your talk . In a well-crafted talk, your message — in the form of the words you say — needs to dominate while the slides need to play a supporting role. Speak the important parts, and use your slides as a backdrop for what you’re saying. Hilary has provided a valuable strategy in her post, but how should someone approach crafting such a clearly-organized presentation? If you’re just getting started speaking, it can be a real challenge to make a coherent talk and along with slid
5 0.32169861 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
Introduction: Startups: Why to Share Data with Academics Posted: January 28, 2013 | Author: Hilary Mason | Filed under: blog | 5 Comments » Last week I wrote a bit about how to share data with academics . This is the complimentary piece, on why you should invest the time and energy in sharing your data with the academic community. As I was talking to people about this topic it became clear that there are really two different questions people ask. First, why do this at all? And second, what do I tell my boss? Let’s start with the second one. This is what you should tell your boss: Academic research based on our work is a great press opportunity and demonstrates that credible people outside of our company find our work interesting. Having researchers work on our data is an easy way to access highly educated brainpower, for free, that in no way competes with us. Who knows what interesting stuff they’ll come up with? Personal relationships with university faculty ar
6 0.31936711 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.
7 0.30516228 58 hilary mason data-2011-06-22-My Head is Open Source!
8 0.2995114 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
9 0.29736185 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
10 0.29256931 80 hilary mason data-2012-12-28-Getting Started with Data Science
11 0.28759032 90 hilary mason data-2013-02-18-One Random Tweet, please.
12 0.28627604 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas
13 0.28306782 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
14 0.28230298 60 hilary mason data-2011-08-21-What do you read that changes the way you think?
15 0.28067487 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
16 0.27963269 83 hilary mason data-2013-01-10-Book Book — Goose!
17 0.27742997 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
18 0.27132177 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs
19 0.27010563 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
20 0.26993474 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010