nathan_marz_storm nathan_marz_storm-2009 knowledge-graph by maker-knowledge-mining

nathan_marz_storm 2009 knowledge graph

similar blogs computed by tfidf model

similar blogs computed by lsi model

similar blogs computed by lda model

blogs list:

1 nathan marz storm-2009-12-28-The mathematics behind Hadoop-based systems

Introduction: I wish I had known this a year ago. Now, with some simple mathematics I can finally answer: Why doesn't the speed of my workflow double when I double the amount of processing power? Why does a 10% failure rate cause my runtime to go up by 300%? How does optimizing out 30% of my workflow runtime cause the runtime to decrease by 80%? How many machines should I have in my cluster to be adequately performant and fault-tolerant? All of these questions are neatly answered by one simple equation: Runtime = Overhead / (1 - {Time to process one hour of data}) We will derive this equation in a moment. First, let's briefly discuss what I mean by "Hadoop-based system." 1 A common use-case of Hadoop is running a workflow that processes a continuous stream of incoming data. The workflow runs in a "while(true)" loop, and each iteration of the workflow processes the data that accumulated since last iteration. The inspiration for the following analysis can be summarized in a simp