hilary_mason_data hilary_mason_data-2013 hilary_mason_data-2013-88 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I’m a Dead Celebrity! Posted: January 29, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » Hilary Mason, Bing Celebrity I have a Google alert set up for my name, and over the weekend it sent me here . Update: Bing has removed the page and now redirects to a regular search. It’s a page on Bing Celebrities, merging my information with information about Hilary Mason, the (now deceased) British actress . According to this page, I have starred in movies before I was born and made videos after I died. It’s my photo and her filmography. It’s creepy, but it’s also intriguing. How does this happen? The data is credited to AMG and inbaseline , whose domain, though linked directly from Bing, does not resolve. Entity disambiguation is certainly a challenge, but I expect more from Microsoft, with so much data and so many brains. This kind of error makes it extremely clear that identity is not a solved problem . I’ve written a bit about iden
sentIndex sentText sentNum sentScore
1 Posted: January 29, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » Hilary Mason, Bing Celebrity I have a Google alert set up for my name, and over the weekend it sent me here . [sent-2, score-0.223]
2 Update: Bing has removed the page and now redirects to a regular search. [sent-3, score-0.435]
3 It’s a page on Bing Celebrities, merging my information with information about Hilary Mason, the (now deceased) British actress . [sent-4, score-0.557]
4 According to this page, I have starred in movies before I was born and made videos after I died. [sent-5, score-0.183]
5 The data is credited to AMG and inbaseline , whose domain, though linked directly from Bing, does not resolve. [sent-9, score-0.383]
6 Entity disambiguation is certainly a challenge, but I expect more from Microsoft, with so much data and so many brains. [sent-10, score-0.294]
7 This kind of error makes it extremely clear that identity is not a solved problem . [sent-11, score-0.823]
8 I’ve written a bit about identity slippage before. [sent-12, score-0.47]
9 And that people are especially sensitive to errors about themselves. [sent-13, score-0.183]
10 This isn’t the first time a search engine has confused me with the other Hilary Mason, except the first time was cuil (remember that? [sent-14, score-0.656]
wordName wordTfidf (topN-words)
[('bing', 0.587), ('celebrity', 0.275), ('page', 0.212), ('identity', 0.211), ('photo', 0.171), ('regular', 0.117), ('weekend', 0.117), ('slippage', 0.117), ('according', 0.117), ('actress', 0.117), ('bio', 0.117), ('confused', 0.117), ('cuil', 0.117), ('deceased', 0.117), ('entity', 0.117), ('error', 0.117), ('videos', 0.117), ('information', 0.114), ('domain', 0.106), ('happen', 0.106), ('linked', 0.106), ('thank', 0.106), ('sensitive', 0.106), ('sent', 0.106), ('removed', 0.106), ('whose', 0.106), ('solved', 0.106), ('disambiguation', 0.106), ('engine', 0.106), ('directly', 0.097), ('certainly', 0.097), ('expect', 0.091), ('challenge', 0.091), ('remember', 0.091), ('clear', 0.086), ('first', 0.084), ('extremely', 0.081), ('problem', 0.077), ('especially', 0.077), ('time', 0.074), ('written', 0.074), ('isn', 0.074), ('though', 0.074), ('kind', 0.074), ('update', 0.071), ('makes', 0.071), ('google', 0.068), ('bit', 0.068), ('made', 0.066), ('name', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 88 hilary mason data-2013-01-29-I’m a Dead Celebrity!
Introduction: I’m a Dead Celebrity! Posted: January 29, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » Hilary Mason, Bing Celebrity I have a Google alert set up for my name, and over the weekend it sent me here . Update: Bing has removed the page and now redirects to a regular search. It’s a page on Bing Celebrities, merging my information with information about Hilary Mason, the (now deceased) British actress . According to this page, I have starred in movies before I was born and made videos after I died. It’s my photo and her filmography. It’s creepy, but it’s also intriguing. How does this happen? The data is credited to AMG and inbaseline , whose domain, though linked directly from Bing, does not resolve. Entity disambiguation is certainly a challenge, but I expect more from Microsoft, with so much data and so many brains. This kind of error makes it extremely clear that identity is not a solved problem . I’ve written a bit about iden
2 0.39383575 101 hilary mason data-2013-04-14-Et tu, Google?
Introduction: Et tu, Google? Posted: April 14, 2013 | Author: Hilary Mason | Filed under: blog | Tags: google , search | 8 Comments » In 2008, cuil , a search engine startup, displayed my bio alongside a photo of deceased actress Hilary Mason . In January 2013, Bing confused us , this time putting my photo next to her bio (they fixed it after a suitable amount of mocking on Twitter). Today, Google did the same thing . ( live search link ) Today I win the internet? If you zoom in on the bio section, you can clearly see that it’s her bio with a photo of me (originally from Crain’s New York 40 under Forty ). Further, if you go into her filmography, you continue to see my photo. I’m most proud of my starring role in the amazing film Robot Jox . (bottom right of the image below) I know that entity disambiguation is a hard problem. I’ve worked on it, though never with the kind of resources that I imagine Google can bring to it. And yet, this
3 0.10345793 71 hilary mason data-2012-01-26-Identity Slippage, and what’s the weirdest thing you’ve been e-mailed by accident?
Introduction: Identity Slippage, and what’s the weirdest thing you’ve been e-mailed by accident? Posted: January 26, 2012 | Author: Hilary Mason | Filed under: blog | 31 Comments » I have an old, short, and concise gmail address (my first initial and last name at gmail.com). There are many other hmasons in the world who have since signed up for gmail, with variations on the “hmason” theme. Every so often, they mistype the address, or someone mishears it. I now receive between four and ten pieces of e-mail per week meant for other hmasons . This was pretty amusing until someone opened an amazon account on that address (which I had to shut down). Poor Holly has never seen a single Citibank credit card statement (and Citibank won’t remove the e-mail address from the account when I call, since I’m not the account holder). Heidi hasn’t linked her Paypal account to her bank account, but I’m waiting for someone to send her money. This sort of unwitting misattribution results in an
4 0.083138958 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
Introduction: Where’s the API that can tell me that this photo contains a puppy and a can of Coke? Posted: November 5, 2012 | Author: Hilary Mason | Filed under: blog | Tags: api | 18 Comments » Photo by Ahmad van der Breggen on Flickr. We’ve gotten very good at extracting and disambiguation entities from text data. You can license a commodity system , and there are API and even open source tools that work fairly well. However, a large percentage of content that people share is not primarily text (a back-of-the-envelope guess says around 18%), and we currently have very little automated insight into that content. I know this is a very hard problem, but I’m continuously surprised by how few people seem to be working on it. Any ideas?
5 0.082356282 97 hilary mason data-2013-03-23-Why Google Now is Awesome
Introduction: Why Google Now is Awesome Posted: March 23, 2013 | Author: Hilary Mason | Filed under: blog | Tags: google | 11 Comments » Google Now is an extension to Google’s Android search app that uses all of the data that Google has about you along with what it can guess about your current context to present the information it thinks you need when it thinks you need it. It’ll tell you to leave a bit early to make your next calendar event because of heavy traffic, or that it’s a friend’s birthday, or that there’s a cool cafe nearby where you are. I think it’s amazing. It’s amazing because this is the first Google product that takes ALL OF THE DATA that they have about us and actually makes it useful for us . Not for advertisers. Finally.
6 0.06563659 103 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem
7 0.063946404 111 hilary mason data-2013-10-22-The DataGotham 2013 Videos are up!
8 0.057742305 4 hilary mason data-2007-06-11-Teaching Search Techniques with Google Games
9 0.054280307 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
10 0.053369019 82 hilary mason data-2013-01-08-Bitly Social Data APIs
11 0.048724346 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
12 0.044236008 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010
13 0.044213388 95 hilary mason data-2013-03-17-Speaking: Entertain, Don’t Teach
14 0.043137215 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
15 0.042463876 76 hilary mason data-2012-08-28-How do you prioritize research?
16 0.042375654 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
17 0.039408062 62 hilary mason data-2011-09-25-Conference: Strata NY 2011
18 0.039162844 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
19 0.0384988 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data
20 0.038398921 11 hilary mason data-2007-10-07-An Experience with Using a Wiki for a Collaborative Classroom Documentation Project
topicId topicWeight
[(0, -0.16), (1, -0.017), (2, -0.212), (3, -0.206), (4, 0.14), (5, -0.181), (6, -0.02), (7, -0.031), (8, -0.029), (9, 0.016), (10, -0.033), (11, -0.112), (12, 0.046), (13, -0.061), (14, -0.113), (15, 0.059), (16, -0.391), (17, -0.237), (18, -0.176), (19, -0.276), (20, 0.212), (21, 0.16), (22, 0.048), (23, 0.086), (24, 0.02), (25, -0.022), (26, -0.056), (27, 0.12), (28, -0.008), (29, -0.088), (30, -0.028), (31, -0.19), (32, -0.017), (33, -0.1), (34, -0.054), (35, 0.029), (36, 0.038), (37, 0.038), (38, -0.021), (39, 0.07), (40, -0.079), (41, -0.124), (42, -0.106), (43, 0.001), (44, 0.024), (45, -0.014), (46, -0.036), (47, -0.038), (48, 0.016), (49, 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 0.9857077 88 hilary mason data-2013-01-29-I’m a Dead Celebrity!
Introduction: I’m a Dead Celebrity! Posted: January 29, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » Hilary Mason, Bing Celebrity I have a Google alert set up for my name, and over the weekend it sent me here . Update: Bing has removed the page and now redirects to a regular search. It’s a page on Bing Celebrities, merging my information with information about Hilary Mason, the (now deceased) British actress . According to this page, I have starred in movies before I was born and made videos after I died. It’s my photo and her filmography. It’s creepy, but it’s also intriguing. How does this happen? The data is credited to AMG and inbaseline , whose domain, though linked directly from Bing, does not resolve. Entity disambiguation is certainly a challenge, but I expect more from Microsoft, with so much data and so many brains. This kind of error makes it extremely clear that identity is not a solved problem . I’ve written a bit about iden
2 0.79920691 101 hilary mason data-2013-04-14-Et tu, Google?
Introduction: Et tu, Google? Posted: April 14, 2013 | Author: Hilary Mason | Filed under: blog | Tags: google , search | 8 Comments » In 2008, cuil , a search engine startup, displayed my bio alongside a photo of deceased actress Hilary Mason . In January 2013, Bing confused us , this time putting my photo next to her bio (they fixed it after a suitable amount of mocking on Twitter). Today, Google did the same thing . ( live search link ) Today I win the internet? If you zoom in on the bio section, you can clearly see that it’s her bio with a photo of me (originally from Crain’s New York 40 under Forty ). Further, if you go into her filmography, you continue to see my photo. I’m most proud of my starring role in the amazing film Robot Jox . (bottom right of the image below) I know that entity disambiguation is a hard problem. I’ve worked on it, though never with the kind of resources that I imagine Google can bring to it. And yet, this
3 0.21395326 79 hilary mason data-2012-11-05-Where’s the API that can tell me that this photo contains a puppy and a can of Coke?
Introduction: Where’s the API that can tell me that this photo contains a puppy and a can of Coke? Posted: November 5, 2012 | Author: Hilary Mason | Filed under: blog | Tags: api | 18 Comments » Photo by Ahmad van der Breggen on Flickr. We’ve gotten very good at extracting and disambiguation entities from text data. You can license a commodity system , and there are API and even open source tools that work fairly well. However, a large percentage of content that people share is not primarily text (a back-of-the-envelope guess says around 18%), and we currently have very little automated insight into that content. I know this is a very hard problem, but I’m continuously surprised by how few people seem to be working on it. Any ideas?
4 0.18871249 103 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem
Introduction: Lucene Revolution Keynote: Search is Not a Solved Problem Posted: June 4, 2013 | Author: Hilary Mason | Filed under: Presentations | Tags: lucene , presentation , search , solr , talk | 3 Comments » The wonderful folks at LucidWorks have posted the video of my recent Lucene Revolution keynote. The brief idea behind this talk is that search is not a solved problem — there is still a big opportunity for building search (and finding?) capabilities for the kinds of questions that the current product fail to solve. For example, why do search engines just return a list of sorted URLs, but give me no information about the themes that are consistent across them? The audience was technical, specifically Lucene and Solr devs, so I spent some time talking about how we use those technologies at bitly.
5 0.18373893 71 hilary mason data-2012-01-26-Identity Slippage, and what’s the weirdest thing you’ve been e-mailed by accident?
Introduction: Identity Slippage, and what’s the weirdest thing you’ve been e-mailed by accident? Posted: January 26, 2012 | Author: Hilary Mason | Filed under: blog | 31 Comments » I have an old, short, and concise gmail address (my first initial and last name at gmail.com). There are many other hmasons in the world who have since signed up for gmail, with variations on the “hmason” theme. Every so often, they mistype the address, or someone mishears it. I now receive between four and ten pieces of e-mail per week meant for other hmasons . This was pretty amusing until someone opened an amazon account on that address (which I had to shut down). Poor Holly has never seen a single Citibank credit card statement (and Citibank won’t remove the e-mail address from the account when I call, since I’m not the account holder). Heidi hasn’t linked her Paypal account to her bank account, but I’m waiting for someone to send her money. This sort of unwitting misattribution results in an
6 0.18169571 43 hilary mason data-2010-05-27-E-mail automation, questions and answers
7 0.1644133 97 hilary mason data-2013-03-23-Why Google Now is Awesome
8 0.14834383 30 hilary mason data-2009-06-01-My Barcamp Presentation: Have Data? What Now?!
9 0.14710459 76 hilary mason data-2012-08-28-How do you prioritize research?
10 0.13937519 111 hilary mason data-2013-10-22-The DataGotham 2013 Videos are up!
11 0.13905951 82 hilary mason data-2013-01-08-Bitly Social Data APIs
12 0.13187522 4 hilary mason data-2007-06-11-Teaching Search Techniques with Google Games
13 0.12707971 100 hilary mason data-2013-04-05-Speaking: 1 Kitten per Equation
14 0.12499326 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
15 0.11886907 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data
16 0.1181545 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.
17 0.11485874 78 hilary mason data-2012-09-21-Help, I’m the first data scientist at my company!
18 0.10972611 67 hilary mason data-2011-10-31-Happy Halloween
19 0.10551196 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
20 0.10129758 102 hilary mason data-2013-05-03-Speaking: Explaining Technical Information to a Mixed Audience
topicId topicWeight
[(2, 0.096), (23, 0.691), (56, 0.082)]
simIndex simValue blogId blogTitle
same-blog 1 0.95658416 88 hilary mason data-2013-01-29-I’m a Dead Celebrity!
Introduction: I’m a Dead Celebrity! Posted: January 29, 2013 | Author: Hilary Mason | Filed under: blog | 4 Comments » Hilary Mason, Bing Celebrity I have a Google alert set up for my name, and over the weekend it sent me here . Update: Bing has removed the page and now redirects to a regular search. It’s a page on Bing Celebrities, merging my information with information about Hilary Mason, the (now deceased) British actress . According to this page, I have starred in movies before I was born and made videos after I died. It’s my photo and her filmography. It’s creepy, but it’s also intriguing. How does this happen? The data is credited to AMG and inbaseline , whose domain, though linked directly from Bing, does not resolve. Entity disambiguation is certainly a challenge, but I expect more from Microsoft, with so much data and so many brains. This kind of error makes it extremely clear that identity is not a solved problem . I’ve written a bit about iden
2 0.90835446 21 hilary mason data-2008-09-26-What am I like? How about you?
Introduction: What am I like? How about you? Posted: September 26, 2008 | Author: hilary | Filed under: blog | Tags: me , path101 , personality | Leave a comment » My Path 101 Personality Quiz Traits Highest Scoring Traits Love of Thinking Relativism Compartmentalization Lowest Scoring Traits Concreteness Idealism Emotion Like-minded people work in: Biotechnology and Pharmaceuticals Medical Equipment Manufacturing Computer Hardware and Infrastructure Urban Planning Corporate Law See hmason’s full assessment and get your own . I’ve always been skeptical of and fascinated by personality tests. On the one hand, it’s your personality — who could possibly know more about you than you do? On the other, there’s something alluring about quantifying your characteristics, especially when you can compare them to others. These are my results from the Path101 person
3 0.16640458 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post
Introduction: WordPress tip: Move comments from one post to another post Posted: January 31, 2009 | Author: hilary | Filed under: blog | Tags: tips , wordpress | 3 Comments » I recently ended up with two posts on this site about the same project — one was a short summary, and the other a long, detailed article. I decided to consolidate them into the longer article, but I didn’t want to lose the six comments that had been posted to the short article. I couldn’t find a way in the WordPress UI to move comments from one post to another, so I jumped into the database. If you run WordPress on a host, they probably provide a MySQL management tool like PHPMyAdmin, or you can log in with a mysql client. First, find the table that contains the posts for your blog (the table name usually ends in _posts ). Find the ID that matches the post you want to move comments from , and the ID for the post that you want to move comments to . Note: An easy way to do this is to search by t
4 0.15732712 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk
Introduction: Speaking: Spend at least 1/3 of the time practicing the talk Posted: July 5, 2013 | Author: Hilary Mason | Filed under: speaking | 3 Comments » This week we welcome a guest contribution. Matthew Trentacoste is a recovering academic and a computer scientist at Adobe, where he writes software to make pretty pictures. He’s constantly curious, often about data, and cooks a lot. You can follow his exploits at @mattttrent . In Hilary’s last post, she made the point that your slides != your talk . In a well-crafted talk, your message — in the form of the words you say — needs to dominate while the slides need to play a supporting role. Speak the important parts, and use your slides as a backdrop for what you’re saying. Hilary has provided a valuable strategy in her post, but how should someone approach crafting such a clearly-organized presentation? If you’re just getting started speaking, it can be a real challenge to make a coherent talk and along with slid
5 0.15727714 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers
Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to
6 0.15194485 34 hilary mason data-2009-10-16-Data: first and last names from the US Census
7 0.14742269 90 hilary mason data-2013-02-18-One Random Tweet, please.
8 0.14517388 60 hilary mason data-2011-08-21-What do you read that changes the way you think?
9 0.14450775 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics
10 0.14366466 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.
11 0.13924897 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics
12 0.13908856 94 hilary mason data-2013-03-08-Speaking: Title Slides + Twitter = You Win
13 0.13847148 80 hilary mason data-2012-12-28-Getting Started with Data Science
14 0.13802963 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking
15 0.13555631 58 hilary mason data-2011-06-22-My Head is Open Source!
16 0.13509953 83 hilary mason data-2013-01-10-Book Book — Goose!
17 0.13453877 82 hilary mason data-2013-01-08-Bitly Social Data APIs
18 0.13101941 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists
19 0.13092501 113 hilary mason data-2013-11-22-Speaking: Two Questions to Ask Before You Give a Talk
20 0.12809265 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs