nathan_marz_storm nathan_marz_storm-2011 knowledge-graph by maker-knowledge-mining

nathan_marz_storm 2011 knowledge graph


similar blogs computed by tfidf model


similar blogs computed by lsi model


similar blogs computed by lda model


blogs list:

1 nathan marz storm-2011-10-13-How to beat the CAP theorem

Introduction: The CAP theorem states a database cannot guarantee consistency, availability, and partition-tolerance at the same time. But you can't sacrifice partition-tolerance (see here and here ), so you must make a tradeoff between availability and consistency. Managing this tradeoff is a central focus of the NoSQL movement. Consistency means that after you do a successful write, future reads will always take that write into account. Availability means that you can always read and write to the system. During a partition, you can only have one of these properties. Systems that choose consistency over availability have to deal with some awkward issues. What do you do when the database isn't available? You can try buffering writes for later, but you risk losing those writes if you lose the machine with the buffer. Also, buffering writes can be a form of inconsistency because a client thinks a write has succeeded but the write isn't in the database yet. Alternatively, you can return errors ba

2 nathan marz storm-2011-03-29-My talks at POSSCON

Introduction: Last week I went to POSSCON in Columbia, South Carolina. It was an interesting experience and a good reminder that not everyone in the world thinks like we do in Silicon Valley. I gave two talks at the conference. One was a technical talk about how to build realtime Big Data systems, and the other was a non-technical talk about the things we do at BackType to be a super-productive team. Both slide decks are embedded below. The Secrets of Building Realtime Big Data Systems Become Efficient or Die: The Story of BackType

3 nathan marz storm-2011-01-19-Inglourious Software Patents

Introduction: Most articles arguing for the abolishment of software patents focus on how so many software patents don't meet the "non-obvious and non-trivial" guidelines for patents. The problem with this approach is that the same argument could be used to advocate for reform in how software patents are evaluated rather than the abolishment of software patents altogether. Software patents should be abolished though, and I'm going to show this with an economic analysis. We'll see that even non-obvious and non-trivial software patents should never be granted as they can only cause economic loss. Why do patents exist in the first place? The patent system exists to provide an incentive for innovation where that incentive would not have existed otherwise . Imagine you're an individual living in the 19th century. Let's say the patent system does not exist and you have an idea to make a radically better kind of sewing machine. If you invested the time to develop your idea into a working invention,

4 nathan marz storm-2011-01-11-Cascalog workshop

Introduction: I'll be teaching a Cascalog workshop on February 19th at BackType HQ in Union Square. You can sign up at http://cascalog.eventbrite.com . Early bird tickets are available until January 31st. I'm very excited to be teaching this workshop. Cascalog's tight integration with Clojure opens up a world of techniques that no other data processing tool is able to do. Even though I created Cascalog, I've been discovering many of these techniques as I've made use of Cascalog for more and more varied tasks. Along the way, I've tweaked Cascalog so that making use of these techniques would be cleaner and more idiomatic. At this point, after nine months of iteration, Cascalog is a joy to use for even the most complex tasks. I'm excited to impart this knowledge upon others in this workshop.

5 nathan marz storm-2011-01-07-Analysis of the #LessAmbitiousMovies Twitter Meme

Introduction: We did a fun post on the BackType blog today analyzing a meme that took off on Twitter this week. A person with about 500 followers started the meme that eventually reached more than 27 million people. Check out our analysis here , and you can check out TechCrunch coverage of our analysis here . Doing the analysis was relatively simple. We extracted an 80 MB dataset of the tweets involved in the meme from our 25 TB social dataset. We downloaded that data to a local computer and ran queries on the data from a Clojure REPL using Cascalog . Doing the data analysis only took us a couple hours.