high_scalability high_scalability-2008 high_scalability-2008-463 knowledge-graph by maker-knowledge-mining

463 high scalability-2008-12-09-Rules of Thumb in Data Engineering


meta infos for this blog

Source: html

Introduction: This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You . Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV .


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. [sent-1, score-1.551]

2 It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. [sent-2, score-0.585]

3 Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You . [sent-3, score-0.313]

4 Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV . [sent-4, score-0.494]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('gray', 0.428), ('thumb', 0.377), ('prashant', 0.248), ('rules', 0.224), ('trends', 0.218), ('whitepaper', 0.214), ('examines', 0.207), ('ratios', 0.202), ('research', 0.192), ('harris', 0.189), ('reflects', 0.185), ('storagemojo', 0.182), ('robin', 0.158), ('jim', 0.153), ('architecting', 0.148), ('relevant', 0.124), ('storage', 0.122), ('topic', 0.115), ('updated', 0.115), ('term', 0.109), ('presentation', 0.102), ('interesting', 0.096), ('networking', 0.087), ('focus', 0.086), ('parts', 0.086), ('particular', 0.085), ('blog', 0.08), ('looks', 0.08), ('microsoft', 0.08), ('paper', 0.077), ('center', 0.076), ('internet', 0.068), ('processing', 0.063), ('costs', 0.063), ('long', 0.054), ('post', 0.05), ('design', 0.049), ('still', 0.046), ('great', 0.045), ('data', 0.037), ('performance', 0.029)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 463 high scalability-2008-12-09-Rules of Thumb in Data Engineering

Introduction: This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You . Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV .

2 0.091475114 734 high scalability-2009-10-30-Hot Scalabilty Links for October 30 2009

Introduction: Life beyond Distributed Transactions: an Apostate’s Opinion  by Pat Helland.  In particular, we focus on the implications that fall out of assuming we cannot have large-scale distributed transactions. T ragedy of the Commons, and Cold Starts  - Cold application starts on Google App Engine kill your application's responsiveness. Intel’s 1M IOPS desktop SSD setup  by Kevin Burton.  What do you get when you take 7 Intel SSDs and throw them in a desktop?  1M IOPS Videos from NoSQL Berlin sessions.  Nicely done talks on CAP, MongoDB, Redis, 4th generation object databases, CouchDB, and Riak. Designs, Lessons and Advice from Building Large Distributed Systems  by Jeff Dean of Google describing how they do their thing.   Here are some glosses on the talk by Greg Linden and James Hamilton. You really can't do better than Greg and James.  Advice from Google on Large Distributed Systems by Greg Linden. A nice summary of Jeff Dean's talk. A standard Google server

3 0.086970627 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System

Introduction: Update :   Parascale’s CTO on what’s different about Parascale . Let's say you have gigglebytes of data to store and you aren't sure you want to use a CDN . Amazon's S3 doesn't excite you. And you aren't quite ready to join the grid nation. You want to keep it all in house. Wouldn't it be nice to have something like the Google File System you could use to create a unified file system out of all your disks sitting on all your nodes? According to Robin Harris, a.k.a StorageMojo (a great blog BTW), you can now have your own GFS: Parascale launches Google-like storage software . Parascale calls their softwate a Virtual Storage Network (VSN). It "aggregates disks across commodity Linux x86 servers to deliver petabyte-scale file storage. With features such as automated, transparent file replication and file migration, Parascale eliminates storage hotspots and delivers massive read/write bandwidth." Why should you care? I don't know about you, but the "storage problem" is one

4 0.083457336 863 high scalability-2010-07-22-How can we spark the movement of research out of the Ivory Tower and into production?

Introduction: Over the years I've read a lot of research papers looking for better ways of doing things. Sometimes I find ideas I can use, but more often than not I come up empty. The problem is there are very few good papers. And by good I mean: can a reasonably intelligent person read a paper and turn it into something useful?  Now, clearly I'm not an academic and clearly I'm no genius, I'm just an everyday programmer searching for leverage, and as a common specimen of the species I've often thought how much better our industry would be if we could simply move research from academia into production with some sort of self-conscious professionalism. Currently the process is horribly hit or miss. And this problem extends equally to companies with research divisions that often do very little to help front-line developers succeed.  How many ideas break out of academia into industry in computer science? We have many brilliant examples: encryption, microprocessors, compression, transactions, distribu

5 0.081009313 652 high scalability-2009-07-08-Art of Parallelism presentation

Introduction: This presentation about parallel computing, and it’s discover the following topic: What is parallelism? Why now? How it’s works? What is the current options Parallel Runtime Library. (for more information go there ) Note: All of my presentation is open source, so feel free to copy it, use it, and re-distribute it. Download

6 0.073561646 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data

7 0.072681904 823 high scalability-2010-05-05-How will memristors change everything?

8 0.069503993 446 high scalability-2008-11-18-Scalability Perspectives #2: Van Jacobson – Content-Centric Networking

9 0.068792909 905 high scalability-2010-09-21-Sponsored Post: Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7

10 0.062711775 949 high scalability-2010-11-29-Stuff the Internet Says on Scalability For November 29th, 2010

11 0.060264021 909 high scalability-2010-09-28-Sponsored Post: Wiredrive, Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7

12 0.059167333 1157 high scalability-2011-12-14-Virtualization and Cloud Computing is Changing the Network to East-West Routing

13 0.059095271 242 high scalability-2008-02-07-Looking for good business examples of compaines using Hadoop

14 0.058827266 753 high scalability-2009-12-21-Hot Holiday Scalability Links for 2009

15 0.058652561 1313 high scalability-2012-08-28-Making Hadoop Run Faster

16 0.057283998 445 high scalability-2008-11-14-Useful Cloud Computing Blogs

17 0.05599333 915 high scalability-2010-10-05-Sponsored Post: Box.net, Wiredrive, Joyent, DeviantART, CloudSigma, ManageEngine, Site24x7

18 0.055728883 1035 high scalability-2011-05-05-Paper: A Study of Practical Deduplication

19 0.05485053 693 high scalability-2009-09-03-Storage Systems for High Scalable Systems presentation

20 0.054314911 650 high scalability-2009-07-02-Product: Hbase


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.067), (1, 0.027), (2, 0.023), (3, 0.046), (4, -0.014), (5, 0.014), (6, -0.006), (7, -0.01), (8, 0.004), (9, 0.043), (10, -0.012), (11, -0.044), (12, -0.013), (13, 0.007), (14, -0.009), (15, 0.034), (16, -0.0), (17, 0.001), (18, 0.004), (19, -0.011), (20, 0.019), (21, 0.022), (22, -0.027), (23, 0.005), (24, -0.021), (25, -0.029), (26, 0.008), (27, 0.014), (28, -0.05), (29, 0.019), (30, 0.005), (31, -0.003), (32, 0.014), (33, 0.064), (34, -0.007), (35, 0.022), (36, 0.002), (37, 0.023), (38, 0.022), (39, -0.009), (40, 0.018), (41, 0.001), (42, -0.007), (43, 0.031), (44, 0.02), (45, 0.006), (46, -0.036), (47, -0.046), (48, -0.049), (49, 0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95260197 463 high scalability-2008-12-09-Rules of Thumb in Data Engineering

Introduction: This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You . Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV .

2 0.66091317 852 high scalability-2010-07-07-Strategy: Recompute Instead of Remember Big Data

Introduction: Professor Lance Fortnow, in his blog post  Drowning in Data , says complexity has taught him this lesson: When storage is expensive, it is cheaper to recompute what you've already computed. And that's the world we now live in: Storage is pretty cheap but data acquisition and computation are even cheaper. Jouni, one of the commenters, thinks the opposite is true: storage is cheap, but computation is expensive. When you are dealing with massive data, the size of the data set is very often determined by the amount of computing power available for a certain price . With such data, a linear-time algorithm takes O(1) seconds to finish, while a quadratic-time algorithm requires O(n) seconds. But as computing power increases exponentially over time, the quadratic algorithm gets exponentially slower . For me it's not a matter of which is true, both positions can be true, but what's interesting is to think that storage and computation are in some cases fungible. Your architecture can dec

3 0.65887964 403 high scalability-2008-10-06-Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview

Introduction: Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?

4 0.63223428 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops

Introduction: “ Data is everywhere, never be at a single location. Not scalable, not maintainable. ” –Alex Szalay While Galileo played life and death doctrinal games over the mysteries revealed by the telescope, another revolution went unnoticed, the microscope gave up mystery after mystery and nobody yet understood how subversive would be what it revealed. For the first time these new tools of perceptual augmentation allowed humans to peek behind the veil of appearance. A new new eye driving human invention and discovery for hundreds of years. Data is another material that hides, revealing itself only when we look at different scales and investigate its underlying patterns. If the universe is truly made of information , then we are looking into truly primal stuff. A new eye is needed for Data and an ambitious project called Data-scope aims to be the lens. A detailed paper on the Data-Scope tells more about what it is: The Data-Scope is a new scientific instrum

5 0.61707497 580 high scalability-2009-04-24-INFOSCALE 2009 in June in Hong Kong

Introduction: In case you are interested here's the info: INFOSCALE 2009: The 4th International ICST Conference on Scalable Information Systems. 10-12 June 2009, Hong Kong, China. In the last few years, we have seen the proliferation of the use of heterogeneous distributed systems, ranging from simple Networks of Workstations, to highly complex grid computing environments. Such computational paradigms have been preferred due to their reduced costs and inherent scalability, which pose many challenges to scalable systems and applications in terms of information access, storage and retrieval. Grid computing, P2P technology, data and knowledge bases, distributed information retrieval technology and networking technology should all converge to address the scalability concern. Furthermore, with the advent of emerging computing architectures - e.g. SMTs, GPUs, Multicores. - the importance of designing techniques explicitly targeting these systems is becoming more and more important. INFOSCA

6 0.60961658 635 high scalability-2009-06-22-Improving performance and scalability with DDD

7 0.60283852 992 high scalability-2011-02-18-Stuff The Internet Says On Scalability For February 18, 2011

8 0.60122395 693 high scalability-2009-09-03-Storage Systems for High Scalable Systems presentation

9 0.59920198 1035 high scalability-2011-05-05-Paper: A Study of Practical Deduplication

10 0.5979743 1007 high scalability-2011-03-18-Stuff The Internet Says On Scalability For March 18, 2011

11 0.58757025 101 high scalability-2007-09-27-Product: Ganglia Monitoring System

12 0.58608782 726 high scalability-2009-10-22-Paper: The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM

13 0.5686838 734 high scalability-2009-10-30-Hot Scalabilty Links for October 30 2009

14 0.55939752 503 high scalability-2009-01-27-Video: Storage in the Cloud at Joyent

15 0.55807257 901 high scalability-2010-09-16-How Can the Large Hadron Collider Withstand One Petabyte of Data a Second?

16 0.55563331 839 high scalability-2010-06-09-Paper: Propagation Networks: A Flexible and Expressive Substrate for Computation

17 0.55095285 284 high scalability-2008-03-19-RAD Lab is Creating a Datacenter Operating System

18 0.54624939 371 high scalability-2008-08-24-A Scalable, Commodity Data Center Network Architecture

19 0.54465449 1499 high scalability-2013-08-09-Stuff The Internet Says On Scalability For August 9, 2013

20 0.54419684 368 high scalability-2008-08-17-Wuala - P2P Online Storage Cloud


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.119), (2, 0.139), (10, 0.052), (65, 0.406), (79, 0.026), (94, 0.11)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.84538388 463 high scalability-2008-12-09-Rules of Thumb in Data Engineering

Introduction: This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You . Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV .

2 0.83546865 234 high scalability-2008-01-30-The AOL XMPP scalability challenge

Introduction: Large scale distributed instant messaging, presence based protocol are a real challenge. With big players adopting the standard, the XMPP (eXtensible Messaging and Presence Protocol) community is facing the need to validate protocol and implementations to even larger scale.

3 0.74975157 48 high scalability-2007-07-30-What is Mashery?

Introduction: In the Amazon Services architecture article the podcast mentions Mashery. I went to their site at http://www.mashery.com/, but I can't quite figure out what it is. They want to: Unleash and manage channels for your API responsibly with Mashery’s combination of security, usage, access management, tracking, metrics, commerce, performance optimization and developer community tools. An example would help, because I am not getting it.

4 0.71588731 158 high scalability-2007-11-17-Can How Bees Solve their Load Balancing Problems Help Build More Scalable Websites?

Introduction: Bees have a similar problem to website servers: how to do a lot of work with limited resources in an ever changing environment. Usually lessons from biology are hard to apply to computer problems. Nature throws hardware at problems. Billions and billions of cells cooperate at different levels of organizations to find food, fight lions, and make sure your DNA is passed on. Nature's software is "simple," but her hardware rocks. We do the opposite. For us hardware is in short supply so we use limited hardware and leverage "smart" software to work around our inability to throw hardware at problems. But we might be able to borrow some load balancing techniques from bees. What do bees do that we can learn from? Bees do a dance to indicate the quality and location of a nectar source. When a bee finds a better source they do a better dance and resources shift to the new location. This approach may seem inefficient, but it turns out to be "optimal for the unpredictable nectar world." Crai

5 0.57025254 1296 high scalability-2012-08-02-Strategy: Use Spare Region Capacity to Survive Availability Zone Failures

Introduction: In the wake of the recent Amazon problems Ryan Lackey  offers some practical first responder cloud survival advice: If you're a large site (particularly a PaaS) on AWS and care about availability, you need to have spare capacity in your region (using Reserve Instances, like Netflix does) to cover when a single AZ disappears, and your own external to AWS load balancing (not DNS based), with your own per-AZ subsidiary load balancers (nginx or whatever) running within EC2. You need a robust database layer, ideally multi-region or AWS + nonAWS, but that's more site specific.  Going multiregion is the next step, and the above is an essential part of getting to that point.

6 0.56801331 1581 high scalability-2014-01-17-Stuff The Internet Says On Scalability For January 17th, 2014

7 0.56716156 800 high scalability-2010-03-26-Strategy: Caching 404s Saved the Onion 66% on Server Time

8 0.56080323 489 high scalability-2009-01-11-17 Distributed Systems and Web Scalability Resources

9 0.539424 1365 high scalability-2012-11-30-Stuff The Internet Says On Scalability For November 30, 2012

10 0.48098484 728 high scalability-2009-10-26-Facebook's Memcached Multiget Hole: More machines != More Capacity

11 0.47320843 1499 high scalability-2013-08-09-Stuff The Internet Says On Scalability For August 9, 2013

12 0.46828154 1373 high scalability-2012-12-17-11 Uses For the Humble Presents Queue, er, Message Queue

13 0.46149892 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010

14 0.45681214 970 high scalability-2011-01-06-BankSimple Mini-Architecture - Using a Next Generation Toolchain

15 0.44981271 241 high scalability-2008-02-05-SLA monitoring

16 0.44939655 1084 high scalability-2011-07-22-Stuff The Internet Says On Scalability For July 22, 2011

17 0.44856635 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012

18 0.44597495 1516 high scalability-2013-09-13-Stuff The Internet Says On Scalability For September 13, 2013

19 0.44276381 1412 high scalability-2013-02-25-SongPop Scales to 1 Million Active Users on GAE, Showing PaaS is not Passé

20 0.44217616 266 high scalability-2008-03-04-Manage Downtime Risk by Connecting Multiple Data Centers into a Secure Virtual LAN