nathan_marz_storm nathan_marz_storm-2012 nathan_marz_storm-2012-32 knowledge-graph by maker-knowledge-mining

32 nathan marz storm-2012-01-09-Early access edition of my book is available


meta infos for this blog

Source: html

Introduction: The early access edition of my book Big Data: principles and best practices of scalable realtime data systems is now available from Manning! I've been working on this book for quite some time, and I'm excited to have it out there and start getting some feedback. The interest in the book has already been overwhelming, and I've been answering questions about it on Hacker News .


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The early access edition of my book Big Data: principles and best practices of scalable realtime data systems is now available from Manning! [sent-1, score-2.13]

2 I've been working on this book for quite some time, and I'm excited to have it out there and start getting some feedback. [sent-2, score-1.207]

3 The interest in the book has already been overwhelming, and I've been answering questions about it on Hacker News . [sent-3, score-1.266]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('book', 0.453), ('answering', 0.294), ('scalable', 0.294), ('practices', 0.294), ('access', 0.254), ('interest', 0.254), ('principles', 0.254), ('hacker', 0.226), ('excited', 0.226), ('quite', 0.205), ('news', 0.187), ('realtime', 0.187), ('available', 0.172), ('questions', 0.137), ('data', 0.13), ('already', 0.128), ('getting', 0.12), ('working', 0.105), ('start', 0.098), ('best', 0.092), ('time', 0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 32 nathan marz storm-2012-01-09-Early access edition of my book is available

Introduction: The early access edition of my book Big Data: principles and best practices of scalable realtime data systems is now available from Manning! I've been working on this book for quite some time, and I'm excited to have it out there and start getting some feedback. The interest in the book has already been overwhelming, and I've been answering questions about it on Hacker News .

2 0.12358567 39 nathan marz storm-2014-02-12-Interview with "Programmer Magazine"

Introduction: I was recently interviewed for "Programmer Magazine", a Chinese magazine. The interview was published in Chinese, but a lot of people told me they'd like to see the English version of the interview. Due to the Google translation being, ahem, a little iffy, I decided to just publish the original English version on my blog. Hope you enjoy! What drew you to programming and what was the first interesting program you wrote? I started programming when I was 10 years old on my TI-82 graphing calculator. Initially I started programming because I wanted to make games on my calculator – and also because I was bored in math class :D. The first interesting game I made on my calculator was an archery game where you'd shoot arrows at moving targets. You'd get points for hitting more targets or completing all the targets faster. A couple years later I graduated to programming the TI-89 which was a huge upgrade in power. I remember how the TI-82 only let you have 26 variables (for the character

3 0.098618306 31 nathan marz storm-2011-10-13-How to beat the CAP theorem

Introduction: The CAP theorem states a database cannot guarantee consistency, availability, and partition-tolerance at the same time. But you can't sacrifice partition-tolerance (see here and here ), so you must make a tradeoff between availability and consistency. Managing this tradeoff is a central focus of the NoSQL movement. Consistency means that after you do a successful write, future reads will always take that write into account. Availability means that you can always read and write to the system. During a partition, you can only have one of these properties. Systems that choose consistency over availability have to deal with some awkward issues. What do you do when the database isn't available? You can try buffering writes for later, but you risk losing those writes if you lose the machine with the buffer. Also, buffering writes can be a form of inconsistency because a client thinks a write has succeeded but the write isn't in the database yet. Alternatively, you can return errors ba

4 0.088641502 22 nathan marz storm-2010-10-05-How to get a job at a kick-ass startup (for programmers)

Introduction: When I finished college, I was incredibly naive when it came to finding a great job. I knew that I wanted to work at a small startup but didn't know how to find that great opportunity. I didn't know what questions to ask to evaluate a company, and I didn't know how I should present myself during the recruitment process. Now I'm a few years out of college and I have that kick-ass job I was looking for. My dual experiences of looking for a job and being on the other side recruiting programmers have taught me quite a bit about what it takes to get a great job at a kick-ass startup. Here are my tips, from preparing for the job search process to finding great startups to applying and getting the job. If you have any tips of your own, be sure to leave them in the comments! Preparing for the job search 1. Make a list of the qualities you're looking for in a job. Be explicit and specific. What are you looking for? Coworkers that are really smart that you can learn from? Coworkers

5 0.079491496 30 nathan marz storm-2011-03-29-My talks at POSSCON

Introduction: Last week I went to POSSCON in Columbia, South Carolina. It was an interesting experience and a good reminder that not everyone in the world thinks like we do in Silicon Valley. I gave two talks at the conference. One was a technical talk about how to build realtime Big Data systems, and the other was a non-technical talk about the things we do at BackType to be a super-productive team. Both slide decks are embedded below. The Secrets of Building Realtime Big Data Systems Become Efficient or Die: The Story of BackType

6 0.078658685 35 nathan marz storm-2013-03-16-Leaving Twitter

7 0.069722474 28 nathan marz storm-2011-01-11-Cascalog workshop

8 0.067985252 38 nathan marz storm-2013-04-12-Break into Silicon Valley with a blog

9 0.052148812 11 nathan marz storm-2010-03-23-Migrating data from a SQL database to Hadoop

10 0.040796235 33 nathan marz storm-2012-02-06-Suffering-oriented programming

11 0.030838273 16 nathan marz storm-2010-05-08-News Feed in 38 lines of code using Cascalog

12 0.027763018 41 nathan marz storm-2014-05-10-Why we in tech must support Lawrence Lessig

13 0.02576858 4 nathan marz storm-2010-01-26-My conversation with the great John McCarthy

14 0.025317769 17 nathan marz storm-2010-05-26-Why your company should have a very permissive open source policy

15 0.023548281 19 nathan marz storm-2010-07-12-My experience as the first employee of a Y Combinator startup

16 0.023202021 25 nathan marz storm-2010-12-06-You Are a Product

17 0.022666542 1 nathan marz storm-2009-12-28-The mathematics behind Hadoop-based systems

18 0.021533001 18 nathan marz storm-2010-06-16-Your company has a knowledge debt problem

19 0.019898485 9 nathan marz storm-2010-03-10-Thrift + Graphs = Strong, flexible schemas on Hadoop

20 0.019791521 23 nathan marz storm-2010-10-27-Fastest Viable Product: Investing in Speed at a Startup


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.187), (1, 0.068), (2, -0.008), (3, 0.116), (4, 0.208), (5, -0.135), (6, -0.257), (7, 0.121), (8, 0.071), (9, -0.205), (10, 0.03), (11, -0.426), (12, 0.432), (13, 0.059), (14, -0.105), (15, 0.056), (16, 0.133), (17, -0.025), (18, 0.033), (19, -0.05), (20, 0.091), (21, -0.106), (22, -0.07), (23, 0.021), (24, -0.362), (25, -0.131), (26, -0.153), (27, 0.108), (28, 0.175), (29, -0.129), (30, -0.004), (31, 0.068), (32, -0.075), (33, 0.05), (34, 0.128), (35, 0.214), (36, 0.012), (37, 0.094), (38, 0.104), (39, 0.003), (40, 0.004)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99344683 32 nathan marz storm-2012-01-09-Early access edition of my book is available

Introduction: The early access edition of my book Big Data: principles and best practices of scalable realtime data systems is now available from Manning! I've been working on this book for quite some time, and I'm excited to have it out there and start getting some feedback. The interest in the book has already been overwhelming, and I've been answering questions about it on Hacker News .

2 0.17558844 31 nathan marz storm-2011-10-13-How to beat the CAP theorem

Introduction: The CAP theorem states a database cannot guarantee consistency, availability, and partition-tolerance at the same time. But you can't sacrifice partition-tolerance (see here and here ), so you must make a tradeoff between availability and consistency. Managing this tradeoff is a central focus of the NoSQL movement. Consistency means that after you do a successful write, future reads will always take that write into account. Availability means that you can always read and write to the system. During a partition, you can only have one of these properties. Systems that choose consistency over availability have to deal with some awkward issues. What do you do when the database isn't available? You can try buffering writes for later, but you risk losing those writes if you lose the machine with the buffer. Also, buffering writes can be a form of inconsistency because a client thinks a write has succeeded but the write isn't in the database yet. Alternatively, you can return errors ba

3 0.13503747 39 nathan marz storm-2014-02-12-Interview with "Programmer Magazine"

Introduction: I was recently interviewed for "Programmer Magazine", a Chinese magazine. The interview was published in Chinese, but a lot of people told me they'd like to see the English version of the interview. Due to the Google translation being, ahem, a little iffy, I decided to just publish the original English version on my blog. Hope you enjoy! What drew you to programming and what was the first interesting program you wrote? I started programming when I was 10 years old on my TI-82 graphing calculator. Initially I started programming because I wanted to make games on my calculator – and also because I was bored in math class :D. The first interesting game I made on my calculator was an archery game where you'd shoot arrows at moving targets. You'd get points for hitting more targets or completing all the targets faster. A couple years later I graduated to programming the TI-89 which was a huge upgrade in power. I remember how the TI-82 only let you have 26 variables (for the character

4 0.10177695 30 nathan marz storm-2011-03-29-My talks at POSSCON

Introduction: Last week I went to POSSCON in Columbia, South Carolina. It was an interesting experience and a good reminder that not everyone in the world thinks like we do in Silicon Valley. I gave two talks at the conference. One was a technical talk about how to build realtime Big Data systems, and the other was a non-technical talk about the things we do at BackType to be a super-productive team. Both slide decks are embedded below. The Secrets of Building Realtime Big Data Systems Become Efficient or Die: The Story of BackType

5 0.099490561 22 nathan marz storm-2010-10-05-How to get a job at a kick-ass startup (for programmers)

Introduction: When I finished college, I was incredibly naive when it came to finding a great job. I knew that I wanted to work at a small startup but didn't know how to find that great opportunity. I didn't know what questions to ask to evaluate a company, and I didn't know how I should present myself during the recruitment process. Now I'm a few years out of college and I have that kick-ass job I was looking for. My dual experiences of looking for a job and being on the other side recruiting programmers have taught me quite a bit about what it takes to get a great job at a kick-ass startup. Here are my tips, from preparing for the job search process to finding great startups to applying and getting the job. If you have any tips of your own, be sure to leave them in the comments! Preparing for the job search 1. Make a list of the qualities you're looking for in a job. Be explicit and specific. What are you looking for? Coworkers that are really smart that you can learn from? Coworkers

6 0.095768705 35 nathan marz storm-2013-03-16-Leaving Twitter

7 0.081410795 28 nathan marz storm-2011-01-11-Cascalog workshop

8 0.080897674 11 nathan marz storm-2010-03-23-Migrating data from a SQL database to Hadoop

9 0.07791058 1 nathan marz storm-2009-12-28-The mathematics behind Hadoop-based systems

10 0.071639508 38 nathan marz storm-2013-04-12-Break into Silicon Valley with a blog

11 0.066936634 16 nathan marz storm-2010-05-08-News Feed in 38 lines of code using Cascalog

12 0.066884845 9 nathan marz storm-2010-03-10-Thrift + Graphs = Strong, flexible schemas on Hadoop

13 0.056520242 27 nathan marz storm-2011-01-07-Analysis of the #LessAmbitiousMovies Twitter Meme

14 0.05410023 19 nathan marz storm-2010-07-12-My experience as the first employee of a Y Combinator startup

15 0.053678017 33 nathan marz storm-2012-02-06-Suffering-oriented programming

16 0.050284404 8 nathan marz storm-2010-03-08-Follow-up to "The mathematics behind Hadoop-based systems"

17 0.046118639 23 nathan marz storm-2010-10-27-Fastest Viable Product: Investing in Speed at a Startup

18 0.041532554 25 nathan marz storm-2010-12-06-You Are a Product

19 0.036398157 4 nathan marz storm-2010-01-26-My conversation with the great John McCarthy

20 0.03587826 41 nathan marz storm-2014-05-10-Why we in tech must support Lawrence Lessig


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(76, 0.808)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 32 nathan marz storm-2012-01-09-Early access edition of my book is available

Introduction: The early access edition of my book Big Data: principles and best practices of scalable realtime data systems is now available from Manning! I've been working on this book for quite some time, and I'm excited to have it out there and start getting some feedback. The interest in the book has already been overwhelming, and I've been answering questions about it on Hacker News .

2 0.99451858 35 nathan marz storm-2013-03-16-Leaving Twitter

Introduction: Yesterday was my last day at Twitter. I left to start my own company. What I'll be working on is very exciting (though I'm keeping it secret for now). Leaving Twitter was a tough decision. I worked with a whole bunch of great people on fascinating problems with some of the most interesting data in the world. Ultimately though, I felt that if I didn't make this move, I would regret it for the rest of my life. So I put in my papers about a month ago and then spent a month transitioning my team for my departure. This ends an eventful three years that started with me joining BackType in January of 2010. So much has happened in these past three years. I open-sourced Cascalog , ElephantDB , and Storm , started writing a book , gave a lot of talks , and in July of 2011 experienced the thrill of being acquired . My projects spread beyond BackType and Twitter to be relied on by dozens and dozens of companies . Through all this, I learned an enormous amount about entrepreneurship, p

3 0.22950059 8 nathan marz storm-2010-03-08-Follow-up to "The mathematics behind Hadoop-based systems"

Introduction: In a previous post , I developed an equation modeling the stable runtime of an iterative, batch-oriented workflow. We saw how the equation explained a number of counter-intuitive behaviors of batch-oriented systems. In this post, we will learn how to measure the amount of overhead versus dynamic time in a workflow, which is the first step in applying the theory to optimize a workflow. Recall that we started with the equation for the runtime of a single iteration of a workflow: Runtime = Overhead + {Time to process one hour of data} * {Hours of Data} T = O + P * H We ended with the equation for the stable runtime of a workflow that runs repeatedly: {Stable Runtime} = Overhead / (1 - {Time to process one hour of data} T = O / (1 - P) Measuring O and P The first step towards utilizing this theory for optimizing your workflow will be to measure the values of O and P for your workflow. This can be difficult if the cluster is shared with lots of other jobs, as the P for each ru

4 0.22239098 39 nathan marz storm-2014-02-12-Interview with "Programmer Magazine"

Introduction: I was recently interviewed for "Programmer Magazine", a Chinese magazine. The interview was published in Chinese, but a lot of people told me they'd like to see the English version of the interview. Due to the Google translation being, ahem, a little iffy, I decided to just publish the original English version on my blog. Hope you enjoy! What drew you to programming and what was the first interesting program you wrote? I started programming when I was 10 years old on my TI-82 graphing calculator. Initially I started programming because I wanted to make games on my calculator – and also because I was bored in math class :D. The first interesting game I made on my calculator was an archery game where you'd shoot arrows at moving targets. You'd get points for hitting more targets or completing all the targets faster. A couple years later I graduated to programming the TI-89 which was a huge upgrade in power. I remember how the TI-82 only let you have 26 variables (for the character

5 0.2129921 1 nathan marz storm-2009-12-28-The mathematics behind Hadoop-based systems

Introduction: I wish I had known this a year ago. Now, with some simple mathematics I can finally answer: Why doesn't the speed of my workflow double when I double the amount of processing power? Why does a 10% failure rate cause my runtime to go up by 300%? How does optimizing out 30% of my workflow runtime cause the runtime to decrease by 80%? How many machines should I have in my cluster to be adequately performant and fault-tolerant? All of these questions are neatly answered by one simple equation: Runtime = Overhead / (1 - {Time to process one hour of data}) We will derive this equation in a moment. First, let's briefly discuss what I mean by "Hadoop-based system." 1 A common use-case of Hadoop is running a workflow that processes a continuous stream of incoming data. The workflow runs in a "while(true)" loop, and each iteration of the workflow processes the data that accumulated since last iteration. The inspiration for the following analysis can be summarized in a simp

6 0.19696021 25 nathan marz storm-2010-12-06-You Are a Product

7 0.19221719 27 nathan marz storm-2011-01-07-Analysis of the #LessAmbitiousMovies Twitter Meme

8 0.18164726 18 nathan marz storm-2010-06-16-Your company has a knowledge debt problem

9 0.18114783 7 nathan marz storm-2010-03-04-Introducing "Nanny" - a really simple dependency management tool

10 0.1537147 9 nathan marz storm-2010-03-10-Thrift + Graphs = Strong, flexible schemas on Hadoop

11 0.14161751 38 nathan marz storm-2013-04-12-Break into Silicon Valley with a blog

12 0.13296266 23 nathan marz storm-2010-10-27-Fastest Viable Product: Investing in Speed at a Startup

13 0.12859987 19 nathan marz storm-2010-07-12-My experience as the first employee of a Y Combinator startup

14 0.12623705 40 nathan marz storm-2014-02-24-The inexplicable rise of open floor plans in tech companies

15 0.11269885 11 nathan marz storm-2010-03-23-Migrating data from a SQL database to Hadoop

16 0.10405201 17 nathan marz storm-2010-05-26-Why your company should have a very permissive open source policy

17 0.097833663 28 nathan marz storm-2011-01-11-Cascalog workshop

18 0.093630157 16 nathan marz storm-2010-05-08-News Feed in 38 lines of code using Cascalog

19 0.092748895 22 nathan marz storm-2010-10-05-How to get a job at a kick-ass startup (for programmers)

20 0.092399649 24 nathan marz storm-2010-11-03-The time I hacked my high school