hilary_mason_data hilary_mason_data-2013 knowledge-graph by maker-knowledge-mining

hilary_mason_data 2013 knowledge graph


similar blogs computed by tfidf model


similar blogs computed by lsi model


similar blogs computed by lda model


blogs list:

1 hilary mason data-2013-12-18-Using Twitter’s Lead-Gen Card to Recruit Beta Testers

Introduction: Using Twitter’s Lead-Gen Card to Recruit Beta Testers Posted: December 18, 2013 | Author: Hilary Mason | Filed under: blog | Tags: email , hack , twitter | 12 Comments » It turns out that it’s pretty easy to co-opt Twitter’s Lead Generation card for anything where you want to gather a bunch of e-mail addresses from your Twitter community. I was looking for people willing to alpha test a little side project of mine, and it worked great and didn’t cost anything. The tweet itself: Love tech discussion but looking for a better community? Help me beta test a side project! https://t.co/H3DYjbCy19 — Hilary Mason (@hmason) December 12, 2013 I created it pretty easily: First, go to ads.twitter.com , log in, and go to “creatives”, then “cards”. Click “Create Lead Generation Card”. It’s a big blue button. You can include a title and a short description. Curiously, you can also include a 600px by 150px image. This seems like an opportunity to

2 hilary mason data-2013-11-22-Speaking: Two Questions to Ask Before You Give a Talk

Introduction: Speaking: Two Questions to Ask Before You Give a Talk Posted: November 22, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: questions , speaking | 6 Comments » If you’ve had a talk proposal accepted or been invited to speak at an event, you’ll usually get a chance to chat with the organizers before you show up to give your talk. While you probably have a good idea of the topic of your talk (if you don’t, that’s a post for another day!), event organizers can be invaluable in helping you frame a talk that will succeed with their audience. They are on your side and they want you to do great, or they wouldn’t be hosting you at their event. These are two questions that I always ask the organizers before I speak. Question 1: Who will be in the audience? Knowing the basic demographics of the audience is necessary to make sure you’re speaking at the right level and tuning the cultural references and humor for the room. I often speak to audiences o

3 hilary mason data-2013-11-01-Books Recommendations for Programming Excellence

Introduction: Books Recommendations for Programming Excellence Posted: November 1, 2013 | Author: Hilary Mason | Filed under: blog | 15 Comments » Yesterday I asked people on twitter for recommendations for things to read to improve as a programmer. I’m looking mainly for things on the philosophy side of software engineering. I do realize that practice is the most important thing, but sometimes you run into a design question and it’s always helpful to realize that very smart people have, indeed, thought about these things before. I assembled the book recommendations into a bitly bundle . I’ve only read a few of these (generally the older books) and so I can’t recommend specifics, but if you’d care to take a look here they are ! If you see something that you think should be included, please do let me know in the comments and I’ll add it to the list.

4 hilary mason data-2013-10-22-The DataGotham 2013 Videos are up!

Introduction: The DataGotham 2013 Videos are up! Posted: October 22, 2013 | Author: Hilary Mason | Filed under: blog | Tags: datagotham , youtube | Leave a comment » I’m happy to be able to share that the full set of videos from DataGotham 2013 are now on youtube . The talks are a wide perspective on the interesting work happening around data in New York, and I believe you’ll enjoy all of them!

5 hilary mason data-2013-10-06-What Mugshots Mean For Public Data

Introduction: What Mugshots Mean For Public Data Posted: October 6, 2013 | Author: Hilary Mason | Filed under: blog | Tags: data , mugshots , privacy | 20 Comments » The New York Times has a story this morning on the growing use of mugshot data for, essentially, extortion . These sites scrape mugshots off of public records databases, use SEO techniques to rank highly in Google searches for people’s names, and then charge those featured in the image to have the pages removed. Many of the people featured were never even convicted of a crime. What the mugshot story demonstrates but never says explicitly is that data is no longer just private or public, but often exists in an in-between state, where the public-ness of the data is a function of how much work is required to find it. Let’s say you’re actually doing a background check on someone you are going on a date with (one of the use cases the operators of these sites claim is common). Before online systems, you c

6 hilary mason data-2013-09-30-Need actual random numbers? Meet the NIST randomness beacon.

Introduction: Need actual random numbers? Meet the NIST randomness beacon. Posted: September 30, 2013 | Author: Hilary Mason | Filed under: projects | Tags: beacon , python , random , randomness , randomnumbers | 5 Comments » I wrote a python module that wraps that NIST Randomness Beacon , making it simple to get truly random numbers in python. It’s easy to use: b = Beacon() print b.last_record() print b.previous_record() #and so on There’s also a handy generator for getting a set of n random numbers. (One of the best gifts I ever got was a copy of 1,000,000 Random Numbers , and I’ve been intrigued ever since.) Please note that this the randomness beacon is not intended to be a source of cryptographic keys — indeed, it’s a public set of numbers, so I wouldn’t recommend doing anything that could be compromised by someone else having the access to the  exact same set of numbers . Rather, this is interesting precisely for the scientific opportunities that

7 hilary mason data-2013-09-26-Learn to Code, Learn to Think

Introduction: Learn to Code, Learn to Think Posted: September 26, 2013 | Author: Hilary Mason | Filed under: blog | Tags: code , philosophy , teaching , thinking | 11 Comments » I recently had a tweet that’s caused a bit of comment , and I wanted to expand on the point. Everyone does realize that it's not about teaching people to CODE as much as it is about teaching people to THINK … right? — Hilary Mason (@hmason) September 17, 2013 I’m a huge fan of the movement to teach people, especially kids, to code. When you learn to code, you’re learning to think precisely and analytically about a quirky world. It doesn’t really matter which particular technology you learn, as long as you are learning to solve the underlying logical problems. If a student becomes a professional engineer, their programming ability will rise above the details of the language, anyway. And if they don’t, they will have learned to reason logically, a skill that’s invaluable no matt

8 hilary mason data-2013-08-31-In Search of the Optimal … Cheeseburger

Introduction: In Search of the Optimal … Cheeseburger Posted: August 31, 2013 | Author: Hilary Mason | Filed under: Presentations | Tags: cheeseburgers , ignite , talks | 8 Comments » My ignite talk from last year’s data-centric Ignite spectacular is finally up! This was about a fun, personal project, where I was playing with NYC menu data.

9 hilary mason data-2013-08-12-DataGotham 2013 is coming!

Introduction: DataGotham 2013 is coming! Posted: August 12, 2013 | Author: Hilary Mason | Filed under: blog | Tags: datagotham , nyc | Leave a comment » Registration is open for DataGotham 2013 , our second annual New York data community conference, September 12th and 13th. The core of the conference is a series of brilliant data practitioners telling the stories about what they work on. The content is technically-oriented but not all deeply technical, and we really welcome anyone curious about how New York companies and institutions are pushing the boundaries on data to attend. We have two goals for the conference. The primary goal is to connect people in the greater New York data community who are working on interesting things. If our community is strong and supportive, we will all do better work. Our second goal is to highlight the amazing working happening here, so that people near and far will realize that New York is the best place in the world to do data science.

10 hilary mason data-2013-07-05-Speaking: Spend at least 1-3 of the time practicing the talk

Introduction: Speaking: Spend at least 1/3 of the time practicing the talk Posted: July 5, 2013 | Author: Hilary Mason | Filed under: speaking | 3 Comments » This week we welcome a guest contribution. Matthew Trentacoste is a recovering academic and a computer scientist at Adobe, where he writes software to make pretty pictures. He’s constantly curious, often about data, and cooks a lot. You can follow his exploits at @mattttrent . In Hilary’s last post, she made the point that your slides != your talk . In a well-crafted talk, your message — in the form of the words you say — needs to dominate while the slides need to play a supporting role. Speak the important parts, and use your slides as a backdrop for what you’re saying. Hilary has provided a valuable strategy in her post, but how should someone approach crafting such a clearly-organized presentation? If you’re just getting started speaking, it can be a real challenge to make a coherent talk and along with slid

11 hilary mason data-2013-06-14-Speaking: Your Slides != Your Talk

Introduction: Speaking: Your Slides != Your Talk Posted: June 14, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: design , obama , slides | Leave a comment » Slides are the supporting structure for your talk, not the main event . Speak the meaty and informative portion of the presentation out loud and use slides as a backdrop to set either the emotional tone or reinforce the message that you are trying to convey. For example, I love using this image of Obama in Berlin as a backdrop when I talk about the growth of social data over the last several years. In this image every single person has a device and is generating their own data about their shared social experience. The content of the image supports what is otherwise a fairly abstract statement, and you can feel the excitement of the crowd, boosting the excitement that I want to share about the possibilities of social data. This is a particular style of slide design will fail for situations wher

12 hilary mason data-2013-06-04-Lucene Revolution Keynote: Search is Not a Solved Problem

Introduction: Lucene Revolution Keynote: Search is Not a Solved Problem Posted: June 4, 2013 | Author: Hilary Mason | Filed under: Presentations | Tags: lucene , presentation , search , solr , talk | 3 Comments » The wonderful folks at LucidWorks have posted the video of my recent Lucene Revolution keynote. The brief idea behind this talk is that search is not a solved problem — there is still a big opportunity for building search (and finding?) capabilities for the kinds of questions that the current product fail to solve. For example, why do search engines just return a list of sorted URLs, but give me no information about the themes that are consistent across them? The audience was technical, specifically Lucene and Solr devs, so I spent some time talking about how we use those technologies at bitly.

13 hilary mason data-2013-05-03-Speaking: Explaining Technical Information to a Mixed Audience

Introduction: Speaking: Explaining Technical Information to a Mixed Audience Posted: May 3, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: puppies , speaking | 11 Comments » It’s a challenge to present deeply technical material to a room of people with varying expertise levels. If you leave it out, you’re abandoning the substance of your presentation. If you focus on it exclusively, you will lose most of the room. Instead, include the material, but plan to repeat it two (or even three!) times. The first time you explain it, explain it for the expert audience. The second time you explain it, walk through an example of what the system enables. If you’re audience is on Twitter, throw in a third version — the concise and tweetable one! Let’s say we were giving a talk about a machine learning system to classify puppies. Slide one would have a technical diagram of the architecture of the system, and you might explain it as: “We use a naive bayesian classi

14 hilary mason data-2013-04-14-Et tu, Google?

Introduction: Et tu, Google? Posted: April 14, 2013 | Author: Hilary Mason | Filed under: blog | Tags: google , search | 8 Comments » In 2008, cuil , a search engine startup, displayed my bio alongside a photo of deceased actress Hilary Mason . In January 2013, Bing confused us , this time putting my photo next to her bio  (they fixed it after a suitable amount of mocking on Twitter). Today, Google did the same thing . ( live search link ) Today I win the internet? If you zoom in on the bio section, you can clearly see that it’s her bio with a photo of me (originally from Crain’s New York 40 under Forty ). Further, if you go into her filmography, you continue to see my photo. I’m most proud of my starring role in the amazing film Robot Jox . (bottom right of the image below) I know that entity disambiguation is a hard problem. I’ve worked on it, though never with the kind of resources that I imagine Google can bring to it. And yet, this

15 hilary mason data-2013-04-05-Speaking: 1 Kitten per Equation

Introduction: Speaking: 1 Kitten per Equation Posted: April 5, 2013 | Author: Hilary Mason | Filed under: speaking | 2 Comments » Use a ratio of one cute cat photo per equation in your talk. This is a concise way of saying that a ratio of one part heavy, technical content to one part light-hearted explanation is ideal. You may have to play with the ratio depending on the audience or the expectations, but people react best when they have the chance to learn something fundamentally hard and interesting while, at the same time, getting to smile. And yes, DO use photos of cute things in your talks! The hack here is that people naturally smile when they look at adorableness . If they are smiling in your talk they credit  you for the positive feelings. It’s an easy way to boost people’s perceived enjoyment of your talk and to get your audience into the kind of mood where it’s easier to walk them through more complex, technical material.

16 hilary mason data-2013-04-01-Data Engineering

Introduction: Data Engineering Posted: April 1, 2013 | Author: Hilary Mason | Filed under: blog | Tags: bitly , data , engineering , infrastructure | 5 Comments » Data engineering is when the architecture of your system is dependent on characteristics of the data flowing through that system . It requires a different kind of engineering process than typical systems engineering, because you have to do some work upfront to understand the nature of the data before you can effectively begin to design the infrastructure. Most data engineering systems also transform the data as they process it. Developing these types of systems requires an initial research phase, where you do the necessary work to understand the characteristics of the data,  before   you design the system (and perhaps even requiring an active experimental process where you try multiple infrastructure options in the wild before making a final decision). I’ve seen numerous people run straight into walls when

17 hilary mason data-2013-03-29-Speaking: Use the Narrative Arc

Introduction: Speaking: Use the Narrative Arc Posted: March 29, 2013 | Author: Hilary Mason | Filed under: speaking | 2 Comments » If you took a college freshman literature class, you probably remember a diagram like this: …with the x-axis reprenting time, and the y-axis (which, for some infuriating reason, is never labeled) representing intensity. Last week’s speaking hack was to limit yourself to 15 minutes (or less!) per idea . The hack this week is to use this gradient of intensity within each segment you present. If you wrote it out as a linear outline, each idea in your talk might have: an introduction to the idea a high-level overview of the idea the technical details an example that brings the technical details together (this is the most exciting part!) a conclusion that wraps up why this is exciting, how it works, and what people learned You can also use the narrative arc to structure the intensity of the talk as a whole. By ordering the ide

18 hilary mason data-2013-03-23-Why Google Now is Awesome

Introduction: Why Google Now is Awesome Posted: March 23, 2013 | Author: Hilary Mason | Filed under: blog | Tags: google | 11 Comments » Google Now is an extension to Google’s Android search app that uses all of the data that Google has about you along with what it can guess about your current context to present the information it thinks you need when it thinks you need it. It’ll tell you to leave a bit early to make your next calendar event because of heavy traffic, or that it’s a friend’s birthday, or that there’s a cool cafe nearby where you are. I think it’s amazing. It’s amazing because this is the first Google product that takes ALL OF THE DATA that they have about us and actually makes it useful for us . Not for advertisers. Finally.

19 hilary mason data-2013-03-23-Speaking: 15 Minutes Or Less Per Idea

Introduction: Speaking: 15 Minutes Or Less Per Idea Posted: March 23, 2013 | Author: Hilary Mason | Filed under: speaking | 3 Comments » Let’s just admit it: very few people can pay attention to anything for more than fifteen minutes straight. Take advantage of this by never spending more than fifteen minutes on one idea during a talk. That means that if your talk is 45 minutes long, you should break it down into at least three, perhaps four different ideas that you want to explore. I find it helpful to outline my talks this way on paper before I start putting slides together. The ideas that you choose to explore within a talk should flow naturally together; there shouldn’t be a jarring transition. And if you find yourself belaboring the same point for more than fifteen minutes, try to break it down further. This article is part of my series of speaking hacks for introverts and nerds. Read about the motivation here .

20 hilary mason data-2013-03-17-Speaking: Entertain, Don’t Teach

Introduction: Speaking: Entertain, Don’t Teach Posted: March 17, 2013 | Author: Hilary Mason | Filed under: speaking | Tags: teaching | 7 Comments » It’s tempting to think of a talk as the opportunity to take a body of knowledge and to educate your audience about that body of knowledge. You have something in your head and you want to get it into theirs. Making education your top priority leads to terrible talks, with an unhappy audience that won’t retain any of the information you wanted them to remember, anyway. Instead, think about how you can create a compelling narrative through your material, layering in the deep technical content so that the most attentive listeners will take away a deep understanding while the people who are only half paying attention will, at the very least, enjoy the experience. I can’t think of any talk that demonstrates this better than Gary Bernhardt’s WAT: Remember: you’re entertaining , not educating . This article is part of my

21 hilary mason data-2013-03-08-Speaking: Title Slides + Twitter = You Win

22 hilary mason data-2013-03-01-Speaking: Pick a Vague and Specific Title for Your Talk

23 hilary mason data-2013-02-25-A (short) List of Data Science Blogs

24 hilary mason data-2013-02-22-Why YOU (an introverted nerd) Should Try Public Speaking

25 hilary mason data-2013-02-18-One Random Tweet, please.

26 hilary mason data-2013-02-04-Experimenting With Physical Graphs

27 hilary mason data-2013-01-29-I’m a Dead Celebrity!

28 hilary mason data-2013-01-28-Startups: Why to Share Data with Academics

29 hilary mason data-2013-01-22-Introbot: A Script to Ease the Process of Writing Introductory E-mails

30 hilary mason data-2013-01-19-Startups: How to Share Data with Academics

31 hilary mason data-2013-01-17-Need Data? Start Here

32 hilary mason data-2013-01-10-Book Book — Goose!

33 hilary mason data-2013-01-08-Bitly Social Data APIs

34 hilary mason data-2013-01-03-Interview Questions for Data Scientists