nathan_marz_storm nathan_marz_storm-2010 knowledge-graph by maker-knowledge-mining

nathan_marz_storm 2010 knowledge graph


similar blogs computed by tfidf model


similar blogs computed by lsi model


similar blogs computed by lda model


blogs list:

1 nathan marz storm-2010-12-09-How to reject a job candidate without being an asshole

Introduction: I used to send a job candidate an email like the following after the person failed a phone interview: Hi [Candidate], Thanks a lot for taking the time to interview with us. However, we've decided not to move forward in the process with you. I wish you the best of luck on your future projects. Best, Nathan Rejection emails like that are cold, impersonal, and hollow. It doesn't feel good to send an email like that, and it sure doesn't feel good to receive an email like that. You want a candidate to feel good about your company even after failing the interview process. You don't want the candidate to discourage their friends from interviewing with your company, and ideally you want the candidate to refer their friends to your company. Now I tell candidates on the spot whether they pass or fail at the end of the phone interview. I give them feedback on what they did well and what they did poorly. I'm very candid with them. The first time I tried this, I had butterflies i

2 nathan marz storm-2010-12-06-You Are a Product

Introduction: I had a revelation the other day. I realized that the terms "programmer" and "employee" are inadequate to describe what I am. What I am is a product, and you are one too. If you want to develop your career, you need to approach your career as a product development problem. You sell yourself for various things: money, status, the opportunity to work on interesting problems, good coworkers, etc. In this post I'll be referring to this as "getting paid", but please keep in mind that "getting paid" means more than just money. Supply and Demand Like any product, you have supply and demand. Your supply is what you can do for a company that hires you. It's your ability to make beautiful websites. It's your ability to scale a database. It's your ability to get the best work out of others. Your supply is the actual value you will provide to a company that hires you. Your demand is what companies think you can do for them. Your demand is your perceived value by others. At the end of

3 nathan marz storm-2010-11-03-The time I hacked my high school

Introduction: When I was in high school, I started the Chess Club. I needed money to buy chess sets and chess clocks to get the club going, but at first I had some difficulty raising cash. Then I hacked the system, and Chess Club became a cash generating machine. Before the hack Clubs made money by reselling burritos or pizza from nearby restaurants during lunch. Each of these lunch sales typically made about $100 in profit. Since I didn't want to charge dues for the club, I needed lunch sales to raise money for Chess Club. Unfortunately, the rules around lunch sales were restrictive. Only one club could sell per week, and other clubs like the Science Club had a much stronger precedent for needing lunch sales. Without a precedent for needing money, I was unable to acquire enough lunch sale dates. The hack I studied the rules for operating clubs on campus and found the loophole I needed: clubs were allowed to go into debt to the student government for $200. I figured that if I were in deb

4 nathan marz storm-2010-10-27-Fastest Viable Product: Investing in Speed at a Startup

Introduction: A startup is like a rat in a maze searching for a piece of cheese. The cheese in the startup's case is product-market fit , that pivotal point when the startup can scale and monetize the business. In the maze, the startup has a dazzling amount of choices of where to go. Should we build this new feature? Should we try this new idea we have for a product? Should we backtrack and completely change our idea? Lean startups use a strategy called "Minimum Viable Products" to help navigate the maze. The idea is that a startup formulates hypotheses about what users want or do not want; each of these hypotheses is a "turn" in the maze. A "Minimum Viable Product" is the smallest test that will let the startup know whether their "turn" was a good one. A startup wants to stop going the wrong direction as early as possible. A "Minimum Viable Product" can be anything from a working application to an SEO' d survey that will gauge interest in an idea. "Minimum Viable Products" have been wr

5 nathan marz storm-2010-10-05-How to get a job at a kick-ass startup (for programmers)

Introduction: When I finished college, I was incredibly naive when it came to finding a great job. I knew that I wanted to work at a small startup but didn't know how to find that great opportunity. I didn't know what questions to ask to evaluate a company, and I didn't know how I should present myself during the recruitment process. Now I'm a few years out of college and I have that kick-ass job I was looking for. My dual experiences of looking for a job and being on the other side recruiting programmers have taught me quite a bit about what it takes to get a great job at a kick-ass startup. Here are my tips, from preparing for the job search process to finding great startups to applying and getting the job. If you have any tips of your own, be sure to leave them in the comments! Preparing for the job search 1. Make a list of the qualities you're looking for in a job. Be explicit and specific. What are you looking for? Coworkers that are really smart that you can learn from? Coworkers

6 nathan marz storm-2010-08-20-5 Tips for Thinking Under Uncertainty

Introduction: Most people flee uncertainty. Yet being able to think well under uncertainty can be very rewarding, and I've come to realize it's a skill that can be learned. Here are some tips for thinking under uncertainty. 1. Think in terms of ranges of scenarios In many situations, you only have limited information. Our brains are quick to make conclusions and don't consider all the scenarios. Don't let this happen. You have to realize that given the information you have, there is a range of scenarios that could account for it. Talking to someone who seems cold and aloof? Maybe the person is unfriendly, or maybe the person is just shy . Is your boss micromanaging you lately? Maybe he doesn't trust you, maybe he's under increased pressure from his boss, or maybe it's just a random event. You need to avoid tunnel vision. It's easy to become focused on one scenario - because you're afraid of the scenario or really hopeful for it - and ignore information that points to other possibil

7 nathan marz storm-2010-07-30-You should blog even if you have no readers

Introduction: Spencer Fry wrote a great post on "Why entrepreneurs should write." I would further add that the benefits of writing are so extraordinary that you should write a blog even if you have no readers (and regardless of whether you're an entrepreneur). I have over 50 unfinished drafts. Some of them are just a few ideas scribbled down arguing with myself. Most of them will never be published, yet I got value out of writing all of them. Writing makes you a better reader Blogging has changed how I read other people's writing. In struggling to find the right ways to structure and present my posts, I am much more attuned to what makes a good argument and what makes a bad argument. I am better at seeing holes in other people's reasoning. At the same time, when reading I am less likely to fall into the trap of discrediting a post with weak counterclaims. In most any post, there are likely to be counterclaims that are based on exceptional cases . Internet commenters love to point these

8 nathan marz storm-2010-07-12-My experience as the first employee of a Y Combinator startup

Introduction: I'm the first employee of BackType , a Summer '08 YC company. My joining the company increased the company size by 50%. The experience has been awesome, but I will say up front that being the first employee of a startup is not for everyone. The best part of being the first employee of a startup is the total exposure to all parts of the company. I've learned a ton about product development, customer development, recruiting, and entrepreneurship. Additionally, I've met and connected with lots of other awesome people through the YC network. I've gotten all these benefits at relatively low risk for myself, as I still have a salary and a solid chunk of equity. No Rules There are a lot of rules working at most companies. You don't even realize that some of the rules are rules until you work at a company with no rules. I'm talking about the most basic things like what hours you work, what days of the week you work, what tools you use, and whether you come into the office or not. B

9 nathan marz storm-2010-06-16-Your company has a knowledge debt problem

Introduction: When your company lacks experience in tools and techniques that can make it more productive, your company has knowledge debt. Companies tend to operate in ways that exacerbate their knowledge debt problem. Consider this fairly typical job ad: Initech is seeking an experienced Software Engineer to join the engineering team. Responsibilities * Design core, back-end software components * Analyze and improve efficiency, scalability, and stability of various system resources Requirements * M.S. Computer Science or related field preferred * 2+ years of Java experience * Expert in relational data modeling and query optimization using MySQL I would posit a guess that this company uses Java for the majority of its work and uses MySQL on the back-end. Naturally, the company wants to recruit people who share that skill set and can "jump right in" and contribute. This mindset is fundamentally flawed. A company should be hiring for problem solving skills

10 nathan marz storm-2010-05-26-Why your company should have a very permissive open source policy

Introduction: Having a permissive open source policy is important if a company wants to recruit truly stellar programmers. Or put another way: great programmers will be less inclined to work for you if you have a restrictive open source policy because being involved in open source projects is one of the best ways for a programmer to increase his market value. Traditional methods for measuring programming ability are ineffective The job market for programmers, especially the top programmers, is notoriously inefficient. This inefficiency is due to employers lacking good methods for evaluating programmers. The standard techniques used to evaluate programmers -- resumes, on-the-spot coding questions, take-home projects -- are at best crude approximations of a programmer's ability, and none of them will be indicators of the truly visionary people. Sure, there are other indicators like being involved in successful companies or having past impressive titles, but those are still indirect indicators of p

11 nathan marz storm-2010-05-08-News Feed in 38 lines of code using Cascalog

Introduction: In this tutorial for Cascalog , we are going to create part of the back-end for a simplified version of a Facebook-like news feed. In doing so we are going to walk through an end-to-end example of running Cascalog on a production cluster. If you're new to Cascalog, you should first look at the introductory tutorials here and here . The code and sample data for the example presented in this tutorial can be found on Github . Problem description A news feed ranks events happening in your social network. Our program will take as input two sources of data. The first is "follows" relationships which are stored in text format: nathan bob chris mike mike chris michelle nathan Follows relationships are 2-tuples of (username, username) with fields separated by whitespace. Our second source of data is "action" data which is also stored in text format: nathan status=good 1273094927000 nathan birthday 1273026922000 david newjob 1273096922000 david travelling 1273094927000 bob st

12 nathan marz storm-2010-05-07-Cascalog Presentation at Bay Area Clojure User Group

Introduction: Here are the slides from my presentation about Cascalog at the Bay Area Clojure User Group last night:

13 nathan marz storm-2010-04-27-New Cascalog features: outer joins, combiners, sorting, and more

Introduction: In the first tutorial for Cascalog , I showed off many of Cascalog's powerful features: joins, aggregates, subqueries, custom operations, and more. Since Cascalog's release a couple weeks ago, I've added a number of new features to Cascalog that seriously increase the expressiveness and performance of the language without compromising its simplicity or flexibility. Like the first tutorial, go ahead and load up the playground by issuing the following commands: lein compile-java && lein compile lein repl user=> (use 'cascalog.playground) (bootstrap) Outer joins As we saw in the first tutorial, you can join together multiple sources of data in Cascalog by using the same variable name in multiple sources of data. For example, given "age" and "gender" sources, we can get the age and gender for each person by running: user=> (?<- (stdout) [?person ?age ?gender] (age ?person ?age) (gender ?person ?gender)) This is an inner join . We will only have results for peop

14 nathan marz storm-2010-04-14-Introducing Cascalog: a Clojure-based query language for Hadoop

Introduction: I'm very excited to be releasing Cascalog as open-source today. Cascalog is a Clojure-based query language for Hadoop inspired by Datalog . Highlights Simple - Functions, filters, and aggregators all use the same syntax. Joins are implicit and natural. Expressive - Logical composition is very powerful, and you can run arbitrary Clojure code in your query with little effort. Interactive - Run queries from the Clojure REPL. Scalable - Cascalog queries run as a series of MapReduce jobs. Query anything - Query HDFS data, database data, and/or local data by making use of Cascading's "Tap" abstraction Careful handling of null values - Null values can make life difficult. Cascalog has a feature called "non-nullable variables" that makes dealing with nulls painless. First class interoperability with Cascading - Operations defined for Cascalog can be used in a Cascading flow and vice-versa First class interoperability with Clojure - Can use regular Clojure

15 nathan marz storm-2010-04-10-Fun with equality in Clojure

Introduction: I ran into some very non-intuitive behavior from Clojure recently. See if you can guess what "foo" is in the following examples: Example 1: user=> foo 1 user=> (= foo 1) true user=> (= [foo 2] [1 2]) true user=> (= {foo 2} {1 2}) false Example 2: user=> foo false user=> (= foo false) true user=> (when foo (println "shouldn't print?")) shouldn't print? nil Yikes, huh? Here are the answers: Example 1: (def foo (Long. "1")) Example 2: (def foo (Boolean. false)) For example 1, the map equality breaks down because Long and Integer have different hashcodes for the same numeric value. In example 2, Clojure considers anything besides false or nil to be true in a conditional, so that means a false Boolean object will be true in a conditional even though it's equal to "false". I would definitely consider #1 a bug, as part of the contract of equality is that two equal objects have the same hashcode. #2 is more debatable, but it seems more intuitive that the Boolean object false

16 nathan marz storm-2010-03-23-Migrating data from a SQL database to Hadoop

Introduction: I wrote about the various options available for migrating data from a SQL database to Hadoop, the problems with existing solutions, and a new solution that we open-sourced on the BackType tech blog . The tool we open-sourced is on GitHub here .

17 nathan marz storm-2010-03-17-Proof that 1 = 0 using a common logical fallacy

Introduction: Awhile ago I read a post by Daniel Levine that shows a formal proof of x*0 = 0. Here's a reprint of the proof: y = y (identity axiom) y - y = 0 (arithmetic) x*(y - y) = 0 (substitution) x*y - x*y = 0 (distributive) x*y = x*y (arithmetic) The logic of this proof is that since we can reduce x*0 = 0 to the identity axiom, x*0 = 0 is true. Unfortunately, this is not logically sound. Now I don't mean to pick on Daniel Levine. He's a really smart guy. I've made this same mistake, and only when I lost points on problem sets a number of times did I really understand the fallacy of this logic. To show why this logic is unsound, here's a "proof" that 1 = 0: 1 = 0 (hypothesis) 0 * 1 = 0 * 0 (multiply each side by same amount maintains equality) 0 = 0 (arithmetic) According to the logic of the previous proof, we have reduced 1 = 0 to 0 = 0, a known true statement, so 1 = 0 is true. Obviously this is incorrect. What we have actually shown is that 1 = 0 implies

18 nathan marz storm-2010-03-10-Thrift + Graphs = Strong, flexible schemas on Hadoop

Introduction: There are a lot of misconceptions about what Hadoop is useful for and what kind of data you can put in it. A lot of people think that Hadoop is meant for unstructured data like log files. While Hadoop is great for log files, it's also fantastic for strongly typed, structured data. In this post I'll discuss how you can use a tool like Thrift to store strongly typed data in Hadoop while retaining the flexibility to evolve your schema. We'll look at graph-based schemas and see why they are an ideal fit for many Hadoop-based applications. OK, so what kind of "structured" data can you put in Hadoop? Anything! At BackType we put data about news, conversations, and people into Hadoop as structured objects. You can easily push structured information about social graphs, financial information, or anything you want into Hadoop.   That sounds all well and good, but why not just use JSON as the data format? JSON doesn't give you a real schema and doesn't protect against data i

19 nathan marz storm-2010-03-08-Follow-up to "The mathematics behind Hadoop-based systems"

Introduction: In a previous post , I developed an equation modeling the stable runtime of an iterative, batch-oriented workflow. We saw how the equation explained a number of counter-intuitive behaviors of batch-oriented systems. In this post, we will learn how to measure the amount of overhead versus dynamic time in a workflow, which is the first step in applying the theory to optimize a workflow. Recall that we started with the equation for the runtime of a single iteration of a workflow: Runtime = Overhead + {Time to process one hour of data} * {Hours of Data} T = O + P * H We ended with the equation for the stable runtime of a workflow that runs repeatedly: {Stable Runtime} = Overhead / (1 - {Time to process one hour of data} T = O / (1 - P) Measuring O and P The first step towards utilizing this theory for optimizing your workflow will be to measure the values of O and P for your workflow. This can be difficult if the cluster is shared with lots of other jobs, as the P for each ru

20 nathan marz storm-2010-03-04-Introducing "Nanny" - a really simple dependency management tool

Introduction: Dependency management in software projects is a pretty simple problem when you think about it. A tool to manage dependencies just needs to do three things: Provide a mechanism to specify the direct dependencies to a project Download the transitive closure of dependencies to a project Publish packages that can be used as a dependency to other projects Some languages have good dependency management systems - for example, rubygems. Others, like Java, have tools like Maven which I would call a complex solution to a simple problem. You shouldn't need to buy a book to understand the solution to such a simple problem. Plus, these dependency management systems are all language specific. I've seen companies do crazy things to manage their dependencies. One company, to manage their jar files, would put all the jars that any project might need in a special "jars" project. You would then need to setup a JARS_HOME environment variable and be sure to update the jars project if you n

21 nathan marz storm-2010-02-21-Why so many research papers are so hard to understand

22 nathan marz storm-2010-01-30-Stateless, fault-tolerant scheduling using randomness

23 nathan marz storm-2010-01-26-My conversation with the great John McCarthy

24 nathan marz storm-2010-01-13-Mimi Silbert: the greatest hacker in the world

25 nathan marz storm-2010-01-03-Tips for Optimizing Cascading Flows