hilary_mason_data hilary_mason_data-2013 hilary_mason_data-2013-110 knowledge-graph by maker-knowledge-mining

110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data


meta infos for this blog

Source: html

Introduction: What Mugshots Mean For Public Data Posted: October 6, 2013 | Author: Hilary Mason | Filed under: blog | Tags: data , mugshots , privacy | 20 Comments » The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion . These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime. What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it. Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you c


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 What Mugshots Mean For Public Data Posted: October 6, 2013 | Author: Hilary Mason | Filed under: blog | Tags: data , mugshots , privacy | 20 Comments » The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion . [sent-1, score-1.226]

2 These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. [sent-2, score-1.58]

3 Many of the people featured were never even convicted of a crime. [sent-3, score-0.24]

4 What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it. [sent-4, score-1.201]

5 Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). [sent-5, score-0.724]

6 Before online systems, you could physically go to the various records offices, sometimes in each town, to request information about them. [sent-6, score-0.496]

7 Given that there are ~20,000 municipalities in the United States, just doing a check would take the unreasonable investment of days. [sent-7, score-0.288]

8 Before mugshot sites, you had to actually visit each state’s database, figure out how to query it, and assemble the results. [sent-8, score-0.523]

9 Now we’re looking at an investment of hours, instead of days. [sent-9, score-0.294]

10 Now you just search, and this information is there. [sent-11, score-0.074]

11 It is just as public as it was before, but the cost to access has become a matter of seconds, not hours or days, and we could imagine that you might be googling your date to find something else about him and instead stumble on the mugshot image. [sent-12, score-1.411]

12 The cost for accessing the data is so trivial that can come up as part of an adjacent task. [sent-13, score-0.489]

13 The debate around fixing this problem has focused on whether the data should be removed from the public entirely. [sent-14, score-0.666]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mugshot', 0.355), ('public', 0.314), ('mugshots', 0.266), ('records', 0.266), ('sites', 0.251), ('cost', 0.205), ('extortion', 0.177), ('investment', 0.177), ('featured', 0.152), ('data', 0.132), ('date', 0.126), ('longer', 0.126), ('instead', 0.117), ('state', 0.117), ('highly', 0.111), ('check', 0.111), ('hours', 0.111), ('access', 0.095), ('actually', 0.092), ('never', 0.088), ('could', 0.088), ('story', 0.088), ('around', 0.076), ('cases', 0.076), ('accessing', 0.076), ('searches', 0.076), ('essentially', 0.076), ('united', 0.076), ('seconds', 0.076), ('assemble', 0.076), ('databases', 0.076), ('rank', 0.076), ('growing', 0.076), ('technically', 0.076), ('trivial', 0.076), ('feasible', 0.076), ('aggregated', 0.076), ('focused', 0.076), ('explicitly', 0.076), ('information', 0.074), ('function', 0.068), ('pages', 0.068), ('states', 0.068), ('days', 0.068), ('removed', 0.068), ('background', 0.068), ('says', 0.068), ('sometimes', 0.068), ('demonstrates', 0.068), ('ability', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data

Introduction: What Mugshots Mean For Public Data Posted: October 6, 2013 | Author: Hilary Mason | Filed under: blog | Tags: data , mugshots , privacy | 20 Comments » The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion . These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime. What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it. Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you c

2 0.11423731 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking

Introduction: Why YOU (an introverted nerd) Should Try Public Speaking Posted: February 22, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: algorithms , hacks , public speaking | 27 Comments » You should be speaking at conferences. Not an extrovert? Great. Speaking is for introverts! We go to conferences to meet people (and learn things from people and find opportunities… from people). Meeting people at events takes a lot of energy, especially if you don’t look like the average dude at a conference. You have to explain your story to every single person you talk to, listen to theirs, and try to see if you have overlapping interests. It’s inefficient and takes a lot of time. By being a speaker, you can tell your story just once, to everyone, and the people who are excited about what you have to say will come find you. You will actually save energy if you get up on stage. It’s a great hack. Before you say, “fine, but I’m not good at speaking”, please tak

3 0.1015458 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010

Introduction: Conference: Search and Social Media 2010 Posted: February 16, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: algorithms , conference , research , search | 2 Comments » I recently attended the Third Annual Workshop on Search and Social Media , an academic workshop with very strong industry participation. The workshop was packed, and had some of the most informative and interesting panel discussions I’ve seen (not counting the one I spoke on!). Daniel Tunkelang did a great job of writing up the specific presentations on his site and on the ACM blog , so I won’t attempt to re-create the presentations line by line at this late date. Rather, I’d like to highlight a few open problems and research questions that came out of the discussions that I hope to see developed in the next year. Social search consists of a set of problems including (but hardly limited to) search of social content like status updat

4 0.097920395 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics

Introduction: Startups: How to Share Data with Academics Posted: January 19, 2013 | Author: Hilary Mason | Filed under: blog | Tags: academics , data , research | 8 Comments » This post assumes that you want to share data. If you’re not convinced, don’t worry — that’s next on my list. You and your academic colleagues will benefit from having at least a quick chat about the research questions they want to address. I’ve read every paper I’ve been able to find that uses bitly data and all of the ones that acquired the data without our assistance had serious flaws, generally based on incorrect assumptions about the data they had acquired (this, unfortunately, makes me question the validity of most research done on commercial social data without cooperation from the subject company). The easiest way to share data is through your own API . Set generous rate limits where possible. Most projects are not realtime and they can gather the data (or, more likely, have a grad

5 0.082485266 80 hilary mason data-2012-12-28-Getting Started with Data Science

Introduction: Getting Started with Data Science Posted: December 28, 2012 | Author: Hilary Mason | Filed under: blog | Tags: advice , datascience , hacking , learning | 18 Comments » I get quite a few e-mail messages from very smart people who are looking to get started in data science. Here’s what I usually tell them: The best way to get started in data science is to DO data science! First, data scientists do three fundamentally different things: math , code (and engineer systems), and communicate . Figure out which one of these you’re weakest at, and do a project that enhances your capabilities. Then figure out which one of these you’re best at, and pick a project which shows off your abilities. Second, get to know other data scientists! If you’re in New York, try the DataGotham events list to find some meetups, and make sure to stay for the beers. Look for groups, like DataKind , that need data skills put to work for good. No matter how much of a beginner

6 0.080101788 28 hilary mason data-2009-04-28-LSL: AOL IM Status Indicator

7 0.078948453 48 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails

8 0.078875422 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious

9 0.074959278 43 hilary mason data-2010-05-27-E-mail automation, questions and answers

10 0.070802584 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics

11 0.069374517 99 hilary mason data-2013-04-01-Data Engineering

12 0.065304875 59 hilary mason data-2011-07-29-Uses This

13 0.064644836 50 hilary mason data-2011-02-07-NPR: Interview on Science Friday

14 0.064578518 33 hilary mason data-2009-10-03-Hadoop World NYC

15 0.06374786 116 hilary mason data-2014-04-09-Come speak at DataGotham 2014!

16 0.063388869 75 hilary mason data-2012-08-22-DataGotham: The Empire State of Data

17 0.062117342 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk

18 0.060007408 74 hilary mason data-2012-08-19-Why I love New York City

19 0.058925308 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.

20 0.058850579 1 hilary mason data-2006-02-20-JavaScript Rotating Images Tutorial


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.229), (1, -0.042), (2, -0.086), (3, -0.043), (4, 0.038), (5, 0.081), (6, -0.013), (7, -0.075), (8, -0.01), (9, -0.097), (10, -0.028), (11, 0.01), (12, -0.016), (13, -0.04), (14, 0.015), (15, -0.13), (16, -0.025), (17, -0.118), (18, 0.087), (19, 0.007), (20, -0.03), (21, -0.089), (22, -0.284), (23, 0.026), (24, -0.082), (25, 0.056), (26, -0.083), (27, -0.094), (28, -0.026), (29, -0.148), (30, -0.126), (31, -0.028), (32, 0.248), (33, 0.067), (34, -0.002), (35, -0.304), (36, -0.003), (37, 0.078), (38, 0.02), (39, 0.156), (40, -0.067), (41, -0.025), (42, 0.05), (43, 0.062), (44, -0.066), (45, -0.18), (46, 0.061), (47, -0.04), (48, 0.161), (49, 0.079)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96088791 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data

Introduction: What Mugshots Mean For Public Data Posted: October 6, 2013 | Author: Hilary Mason | Filed under: blog | Tags: data , mugshots , privacy | 20 Comments » The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion . These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime. What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it. Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you c

2 0.38821605 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking

Introduction: Why YOU (an introverted nerd) Should Try Public Speaking Posted: February 22, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: algorithms , hacks , public speaking | 27 Comments » You should be speaking at conferences. Not an extrovert? Great. Speaking is for introverts! We go to conferences to meet people (and learn things from people and find opportunities… from people). Meeting people at events takes a lot of energy, especially if you don’t look like the average dude at a conference. You have to explain your story to every single person you talk to, listen to theirs, and try to see if you have overlapping interests. It’s inefficient and takes a lot of time. By being a speaker, you can tell your story just once, to everyone, and the people who are excited about what you have to say will come find you. You will actually save energy if you get up on stage. It’s a great hack. Before you say, “fine, but I’m not good at speaking”, please tak

3 0.38500553 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious

Introduction: Web 2.0 Summit: The Secrets of our Data Subconscious Posted: October 21, 2011 | Author: Hilary Mason | Filed under: Presentations | Tags: conference , data , web2summit | 1 Comment Âť I just got home from the Web 2.0 Summit , a three-day conference that was packed with announcements, interesting ideas, and good conversations. My short talk, The Secrets of our Data Subconscious , touches on how the data we generate online interactions with the physical world spatially and through time, and on the relationships between the things we consume (in private) and the things we broadcast (in public).

4 0.35888258 50 hilary mason data-2011-02-07-NPR: Interview on Science Friday

Introduction: NPR: Interview on Science Friday Posted: February 7, 2011 | Author: hilary | Filed under: blog , Media | 3 Comments » On Friday, January 28th I hopped in a cab and went up to NPR’s Bryant Park recording studio for a fifteen minute chat with Ira Flatow , host of Science Friday . I’ve been a big fan of Ira and Science Friday since I discovered the show years ago, and it was a very exciting honor to be a guest. The image at the right is a snapshot I took with my phone while nervously waiting outside the studio. The title of the segment is the rather dramatic Privacy At Stake As Sites Track Online Preferences . Our conversation wound around the issues of tracking user data online, and the potential opportunities and dangers that all users of online services face. NPR has the full broadcast and transcript online . By far the most fun and unexpected aspect of this was the number of people who wrote to me to ask questions or say that they appreciated my

5 0.31528831 28 hilary mason data-2009-04-28-LSL: AOL IM Status Indicator

Introduction: LSL: AOL IM Status Indicator Posted: April 28, 2009 | Author: hilary | Filed under: blog | Tags: aim , lsl , second life | 3 Comments » I think this might be my very first LSL script, from back in 2005! This script indicates whether your AIM (AOL Instant Messenger) account is online by changing the color of an object. You can configure it to either share your AIM ID publicly, or keep it private. AIM Indicators in Second LIfe This script uses the AIM web services API to check your online status — you only need to give it your username, not your password! This is not a proxy service. You can’t send messages through this script, just show your online status in SL. To use this script, create an object in your favorite shape, create a new script inside of it, paste this code into it and save . key request_id; string aim_id; string av_name; key data_card; integer nLine = 0; integer public = TRUE; default { state_entry() { llSetText(

6 0.31452996 59 hilary mason data-2011-07-29-Uses This

7 0.30692777 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010

8 0.27823263 99 hilary mason data-2013-04-01-Data Engineering

9 0.26907936 1 hilary mason data-2006-02-20-JavaScript Rotating Images Tutorial

10 0.26581758 48 hilary mason data-2010-09-04-Twitter Succeeds Because it Fails

11 0.25416318 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics

12 0.24641311 80 hilary mason data-2012-12-28-Getting Started with Data Science

13 0.24388739 43 hilary mason data-2010-05-27-E-mail automation, questions and answers

14 0.24063827 116 hilary mason data-2014-04-09-Come speak at DataGotham 2014!

15 0.23829798 86 hilary mason data-2013-01-22-Introbot: A Script to Ease the Process of Writing Introductory E-mails

16 0.23416737 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.

17 0.21178056 82 hilary mason data-2013-01-08-Bitly Social Data APIs

18 0.21148656 102 hilary mason data-2013-05-03-Speaking: Explaining Technical Information to a Mixed Audience

19 0.21046513 34 hilary mason data-2009-10-16-Data: first and last names from the US Census

20 0.19973147 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.087), (18, 0.014), (56, 0.136), (57, 0.631), (63, 0.023)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93370199 110 hilary mason data-2013-10-06-What Mugshots Mean For Public Data

Introduction: What Mugshots Mean For Public Data Posted: October 6, 2013 | Author: Hilary Mason | Filed under: blog | Tags: data , mugshots , privacy | 20 Comments » The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion . These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime. What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it. Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you c

2 0.3185102 109 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.

Introduction: Need actual random numbers? Meet the NIST randomness beacon. Posted: September 30, 2013 | Author: Hilary Mason | Filed under: projects | Tags: beacon , python , random , randomness , randomnumbers | 5 Comments » I wrote a python module that wraps that NIST Randomness Beacon , making it simple to get truly random numbers in python. It’s easy to use: b = Beacon() print b.last_record() print b.previous_record() #and so on There’s also a handy generator for getting a set of n random numbers. (One of the best gifts I ever got was a copy of 1,000,000 Random Numbers , and I’ve been intrigued ever since.) Please note that this the randomness beacon is not intended to be a source of cryptographic keys — indeed, it’s a public set of numbers, so I wouldn’t recommend doing anything that could be compromised by someone else having the access to the  exact same set of numbers . Rather, this is interesting precisely for the scientific opportunities that

3 0.24154764 114 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers

Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to

4 0.23786566 40 hilary mason data-2010-02-16-Conference: Search and Social Media 2010

Introduction: Conference: Search and Social Media 2010 Posted: February 16, 2010 | Author: hilary | Filed under: academics , blog , Presentations | Tags: algorithms , conference , research , search | 2 Comments » I recently attended the Third Annual Workshop on Search and Social Media , an academic workshop with very strong industry participation. The workshop was packed, and had some of the most informative and interesting panel discussions I’ve seen (not counting the one I spoke on!). Daniel Tunkelang did a great job of writing up the specific presentations on his site and on the ACM blog , so I won’t attempt to re-create the presentations line by line at this late date. Rather, I’d like to highlight a few open problems and research questions that came out of the discussions that I hope to see developed in the next year. Social search consists of a set of problems including (but hardly limited to) search of social content like status updat

5 0.23258294 87 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics

Introduction: Startups: Why to Share Data with Academics Posted: January 28, 2013 | Author: Hilary Mason | Filed under: blog | 5 Comments » Last week I wrote a bit about how to share data with academics . This is the complimentary piece, on why you should invest the time and energy in sharing your data with the academic community. As I was talking to people about this topic it became clear that there are really two different questions people ask. First, why do this at all? And second, what do I tell my boss? Let’s start with the second one. This is what you should tell your boss: Academic research based on our work is a great press opportunity and demonstrates that credible people outside of our company find our work interesting. Having researchers work on our data is an easy way to access highly educated brainpower, for free, that in no way competes with us. Who knows what interesting stuff they’ll come up with? Personal relationships with university faculty ar

6 0.22520462 58 hilary mason data-2011-06-22-My Head is Open Source!

7 0.21619901 7 hilary mason data-2007-07-30-Tip: How to Search Google for Ideas

8 0.20417565 24 hilary mason data-2009-01-31-WordPress tip: Move comments from one post to another post

9 0.19803619 105 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk

10 0.19569123 85 hilary mason data-2013-01-19-Startups: How to Share Data with Academics

11 0.18969941 80 hilary mason data-2012-12-28-Getting Started with Data Science

12 0.18290082 46 hilary mason data-2010-08-15-Should you attend Hadoop World? Yes.

13 0.17885357 82 hilary mason data-2013-01-08-Bitly Social Data APIs

14 0.17748836 92 hilary mason data-2013-02-25-A (short) List of Data Science Blogs

15 0.17696249 81 hilary mason data-2013-01-03-Interview Questions for Data Scientists

16 0.17652608 21 hilary mason data-2008-09-26-What am I like? How about you?

17 0.17569397 66 hilary mason data-2011-10-21-Web 2.0 Summit: The Secrets of our Data Subconscious

18 0.17390807 83 hilary mason data-2013-01-10-Book Book — Goose!

19 0.16784082 91 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking

20 0.16752239 116 hilary mason data-2014-04-09-Come speak at DataGotham 2014!