high_scalability high_scalability-2010 high_scalability-2010-860 knowledge-graph by maker-knowledge-mining

860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010


meta infos for this blog

Source: html

Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid  feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake :   People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. [sent-6, score-0.409]

2 Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. [sent-7, score-0.103]

3 How this would all work with real-time feeds, paid  feeds (Twitter, movies, . [sent-9, score-0.113]

4 is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? [sent-13, score-0.198]

5 Tweets of Gold: jamesurquhart : Key to applications is architecture. [sent-14, score-0.093]

6 agastiya : Focus on stability and features first, scalability and manageability second, per-unit performance last of all. [sent-22, score-0.093]

7 Others say scaling is a problem that's good to have, don't worry, be happy, but Jonathan thinks planning ahead has some value: "We’ve thought really far ahead but we’ve also punted on really critical things that we needed to do. [sent-26, score-0.33]

8 Now we’re under the gun rather than being able to do them on our own time. [sent-27, score-0.12]

9 First off is buzzwords – cloud, scalable and so on . [sent-34, score-0.112]

10 Pregel can be thought as a generalized parallel graph transformation framework. [sent-42, score-0.211]

11 I think the Pregel model is general enough for a large portion of classical graph algorithm. [sent-43, score-0.218]

12 Royans Tharakan summarizes a talk by Theo Schlossnagle : Thoughts on scalable web operations . [sent-44, score-0.089]

13 Covers: optimization, tools, cookies, datastores, automation, revision control, networking, caching, people, systems, and moderation. [sent-45, score-0.095]

14 The folks at GigaOM have a summer reading list for you. [sent-51, score-0.152]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('crawler', 0.186), ('pregel', 0.176), ('jonathan', 0.157), ('quora', 0.157), ('summer', 0.152), ('silicon', 0.136), ('graph', 0.127), ('thatgoogle', 0.12), ('sizzling', 0.12), ('gun', 0.12), ('appointing', 0.12), ('czar', 0.12), ('configurability', 0.12), ('ravenous', 0.12), ('nosqls', 0.12), ('punted', 0.12), ('quotefrom', 0.12), ('feeds', 0.113), ('beach', 0.112), ('byjames', 0.112), ('fad', 0.112), ('atalk', 0.112), ('buzzwords', 0.112), ('listfor', 0.112), ('ahigh', 0.107), ('ahead', 0.105), ('macaskill', 0.103), ('pocket', 0.103), ('pounding', 0.103), ('unstable', 0.103), ('spent', 0.103), ('tharakan', 0.1), ('conditioning', 0.1), ('nosql', 0.098), ('ricky', 0.097), ('datastores', 0.097), ('revision', 0.095), ('smugmug', 0.095), ('gigaom', 0.095), ('extracting', 0.095), ('manageability', 0.093), ('jamesurquhart', 0.093), ('upside', 0.091), ('classical', 0.091), ('summarizes', 0.089), ('andy', 0.088), ('tom', 0.086), ('transformation', 0.084), ('statements', 0.083), ('movies', 0.083)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010

Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid  feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake :   People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt

2 0.14578566 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?

Introduction: It's a truism that we should choose the right tool for the job . Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? Is it worth the risk to try something new when all my people know something else and we have a deadline to meet? How can I make all the tools work together? In the NoSQL space this kind of real-world data is still a bit vague. When asked, vendors tend to give very general answers like NoSQL is good for BigData or key-value access. What does that mean for for the developer in the trenches faced with the task of solving a specific problem and there are a dozen confusing choices and no obvious winner? Not a lot. It's often hard to take that next step and imagine how their specific problems could be solved in a way that's worth taking the trouble and risk. Let's change that. What problems are you using NoSQL to sol

3 0.14425801 631 high scalability-2009-06-15-Large-scale Graph Computing at Google

Introduction: To continue the graph theme Google has got into the act and released information on Pregel . Pregel does not appear to be a new type of potato chip. Pregel is instead a scalable infrastructure... ...to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology. Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers

4 0.12536594 1054 high scalability-2011-06-06-NoSQL Pain? Learn How to Read-write Scale Without a Complete Re-write

Introduction: Lately I've been reading more cases were different people have started to realize the limitations of the NoSQL promise to database scalability. Note the references below: Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc? Why did Diaspora abandon MongoDB for MySQL? How scalable is CouchDB in practice, not just in theory? Take MongoDB for example. It's damn fast, but it doesn't really know how to save data reliably to disk. I've had it set up in a replica pair to mitigate that risk. Guess what - both servers in the pair failed and corrupted their data files at the same day. It appears that for many, the switch to NoSQL can be rather painful. IMO that doesn't necessarily mean that NoSQL is wrong in general, but it's a combination of 1) lack of maturity 2) not the right tool for the job. That brings the question of what's the alternative solution? In the following post I tried to summarize the lessons from

5 0.12043495 1507 high scalability-2013-08-26-Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month

Introduction: Jeremy Edberg , the first paid employee at reddit, teaches us a lot about how to create a successful social site in a really good talk he gave at the RAMP conference. Watch it here at  Scaling Reddit from 1 Million to 1 Billion–Pitfalls and Lessons . Jeremy uses a virtue and sin approach. Examples of the mistakes made in scaling reddit are shared and it turns out they did a lot of good stuff too. Somewhat of a shocker is that  Jeremy is now a Reliability Architect at Netflix, so we get a little Netflix perspective thrown in for free. Some of the lessons that stood out most for me:  Think of SSDs as cheap RAM, not expensive disk . When reddit moved from spinning disks to SSDs for the database the number of servers was reduced from 12 to 1 with a ton of headroom. SSDs are 4x more expensive but you get 16x the performance. Worth the cost.  Give users a little bit of power, see what they do with it, and turn the good stuff into features . One of the biggest revelations

6 0.1113774 1085 high scalability-2011-07-25-Is NoSQL a Premature Optimization that's Worse than Death? Or the Lady Gaga of the Database World?

7 0.11058173 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned

8 0.10824156 931 high scalability-2010-10-28-Notes from A NOSQL Evening in Palo Alto

9 0.1041574 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management

10 0.10066025 1439 high scalability-2013-04-12-Stuff The Internet Says On Scalability For April 12, 2013

11 0.10008044 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox

12 0.097389638 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010

13 0.096541137 1040 high scalability-2011-05-13-Stuff The Internet Says On Scalability For May 13, 2011

14 0.095971733 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud

15 0.095965922 1355 high scalability-2012-11-05-Gone Fishin': Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud

16 0.095035881 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database

17 0.094068997 1067 high scalability-2011-06-24-Stuff The Internet Says On Scalability For June 24, 2011

18 0.0933543 739 high scalability-2009-11-09-10 NoSQL Systems Reviewed

19 0.092952214 160 high scalability-2007-11-19-Tailrank Architecture - Learn How to Track Memes Across the Entire Blogosphere

20 0.092716172 797 high scalability-2010-03-19-Hot Scalability Links for March 19, 2010


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.183), (1, 0.079), (2, 0.026), (3, 0.041), (4, 0.054), (5, 0.034), (6, -0.115), (7, 0.005), (8, 0.058), (9, 0.011), (10, -0.01), (11, -0.005), (12, 0.015), (13, -0.006), (14, -0.031), (15, -0.033), (16, 0.067), (17, 0.046), (18, 0.002), (19, 0.047), (20, -0.046), (21, -0.03), (22, 0.015), (23, -0.046), (24, 0.017), (25, -0.007), (26, 0.03), (27, 0.063), (28, 0.016), (29, -0.018), (30, -0.037), (31, -0.047), (32, 0.004), (33, -0.004), (34, -0.016), (35, 0.017), (36, -0.023), (37, 0.007), (38, -0.02), (39, -0.004), (40, -0.011), (41, 0.011), (42, 0.022), (43, 0.033), (44, -0.026), (45, 0.014), (46, -0.015), (47, -0.025), (48, 0.009), (49, 0.0)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95885497 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010

Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid  feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake :   People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt

2 0.81232184 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010

Introduction: You're Doing it Wrong  by Poul-Henning Kamp. Don't look so guilty, he's not talking about you know what, he's talking about writing high-performance server programs:  Not just wrong as in not perfect, but wrong as in wasting half, or more, of your performance. What good is an  O(log2(n))  algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an  O(n)  or even an  O(n^2)  algorithm, which avoids page faults, will run circles around it.  A Microsoft Windows Azure primer: the basics by Peter Bright. Nice article explaining the basics of Azure and how it compares to Google and Amazon. A call to change the name from  NoSQL to Postmodern Databases . Interesting idea, but the problem is the same one I have for Postmodern Art, when is it? I always feel like I'm in the post-post modern period, yet for art it's really in the early 1900s. Let's save future developers from this existential time crisis. Constructions from Dots and Lines by M

3 0.7597664 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010

Introduction: Lots of good links this week... Membase, powering Farmville's 500k operations *per second* . Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit. Tweets of Gold: kbsingh : I dont understand why some developers think its ok to leave operations people out of scalability decisions karmazilla : I find it a little odd when a database claims to support "massive scalability" when it is not distributed. pcapr : OH: teenagers are eventually consistent tv : Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies." Superfeedr makes The Case against Rate Limiting . Push, don't poll. Of course, the receiving systems may still need to rate limit. M ulti-core, Threads & Message Passing by Ilya Grigorik. We need threads, we need events, and we need message passing - it is not a question of which is better . Doug Cutting gives

4 0.75288296 1129 high scalability-2011-09-30-Stuff The Internet Says On Scalability For September 30, 2011

Introduction: You deserve a  HighScalability today : Tumblr > Wikipedia Potent quotables: @tokutek : Yelp generates close to 400 GB of compressed logs per day according to @petersirota of Amazon #Strataconf #BigData. More at From Under the Desk to the Cloud @LHK_ITRG : Massive scalability: 80,000 users on a single AppSense server. I think that should do... @solarce : OH: "Automation is a great way to distribute failure across the system" #surgecon palominodb : #surgeconf - DataDog presenting on their "Data Mullet" All SQL in front, NoSQL party in the back. Classic. Ryan Dahl  : I hate almost all software Software Design Glossary . Apparently Kent Beck didn't get the memo, only algorithms matter now, software engineering is dead. In case you don't feel that way, Kent wrote a short glossary of important software design concepts. Also, Screaming Architecture  by Bob Martin. Improving Percona Server performance with Flashcach

5 0.74912757 1067 high scalability-2011-06-24-Stuff The Internet Says On Scalability For June 24, 2011

Introduction: Submitted for your scaling pleasure:  Achievements: Watson uses 10,000's of watts, the computer between the ears uses 20. With only 200 million pages and 2TB of data, Watson is BigInsights, not BigData. That Google is pretty big: 1 billion unique monthly visitors tweetimages : We peaked at 22m avatars yesterday. Bandwidth peaked at 9GB of @twitter avatars in a single hour. Foursquare Surpasses 10 Million Users  Reddit Hits 1.2B Monthly Pageviews, More Than Doubles Its Engineering Staff Twitter : 185 million tweets are posted daily;  1.6 billion search queries daily; indexing latency is less than 10 seconds.  Quotable quotes: skr : OH: "people wait their whole lives for a situation where they can use bloom filters" joeweinman : @Werner at #structureconf : as of Nov 10, 2010, all Amazon.com traffic was served from AWS. <-- The child surpasses the parent. bbatsov : A compiled language does not scalability mak

6 0.7415669 1024 high scalability-2011-04-15-Stuff The Internet Says On Scalability For April 15, 2011

7 0.74028057 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010

8 0.73626131 1147 high scalability-2011-11-25-Stuff The Internet Says On Scalability For November 25, 2011

9 0.72830206 935 high scalability-2010-11-05-Hot Scalability Links For November 5th, 2010

10 0.72731233 1040 high scalability-2011-05-13-Stuff The Internet Says On Scalability For May 13, 2011

11 0.7221992 783 high scalability-2010-02-24-Hot Scalability Links for February 24, 2010

12 0.72127008 797 high scalability-2010-03-19-Hot Scalability Links for March 19, 2010

13 0.72048783 1368 high scalability-2012-12-07-Stuff The Internet Says On Scalability For December 7, 2012

14 0.71836525 1026 high scalability-2011-04-18-6 Ways Not to Scale that Will Make You Hip, Popular and Loved By VCs

15 0.7181052 1607 high scalability-2014-03-07-Stuff The Internet Says On Scalability For March 7th, 2014

16 0.71689367 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?

17 0.71601003 1085 high scalability-2011-07-25-Is NoSQL a Premature Optimization that's Worse than Death? Or the Lady Gaga of the Database World?

18 0.7156179 1154 high scalability-2011-12-09-Stuff The Internet Says On Scalability For December 9, 2011

19 0.71471542 1036 high scalability-2011-05-06-Stuff The Internet Says On Scalability For May 6th, 2011

20 0.7093392 1097 high scalability-2011-08-12-Stuff The Internet Says On Scalability For August 12, 2011


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.091), (2, 0.106), (10, 0.021), (40, 0.414), (61, 0.12), (77, 0.018), (79, 0.09), (85, 0.022), (94, 0.044)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9170875 338 high scalability-2008-06-02-Total Cost of Ownership for different web development frameworks

Introduction: I would like to compile a comparison matrix on the total cost of ownership for .Net, Java, Lamp & Rails. Where should I start? Has anyone seen or know of a recent study on this subject?

same-blog 2 0.86330712 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010

Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid  feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake :   People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt

3 0.83206767 402 high scalability-2008-10-05-Paper: Scalability Design Patterns

Introduction: I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing . Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to: Introduce Scalability Optimize Algorithm Add Hardware Add Parallelism Add Intra-Process Parallelism Add Inter-Porcess Parallelism Add Hybrid Parallelism Optimize Decentralization Control Shared Resources Automate Scalability

4 0.83171999 27 high scalability-2007-07-25-Product: 3 PAR REMOTE COPY

Introduction: 3PAR Remote Copy is a uniquely simple and efficient replication technology that allows customers to protect and share any application data affordably. Built upon 3PAR Thin Copy technology, Remote Copy lowers the total cost of storage by addressing the cost and complexity of remote replication. Common Uses of 3PAR Remote Copy: Affordable Disaster Recovery: Mirror data cost-effectively across town or across the world. Centralized Archive: Replicate data from multiple 3PAR InServs located in multiple data centers to a centralized data archive location. Resilient Pod Architecture: Mutually replicate tier 1 or 2 data to tier 3 capacity between two InServs (application pods). Remote Data Access: Replicate data to a remote location for sharing of data with remote users.

5 0.82568395 1419 high scalability-2013-03-07-It's a VM Wasteland - A Near Optimal Packing of VMs to Machines Reduces TCO by 22%

Introduction: In  Algorithm Design for Performance Aware VM Consolidation  we learn some shocking facts (gambling in Casablanca?): Average server utilization in many data centers is low, estimated between 5% and 15%. This is wasteful because an idle server often consumes more than 50% of peak power. Surely that's just for old style datacenters? Nope. In Google data centers, workloads that are consolidated use only 50% of the processor cores. Every other processor core is left unused simply to ensure that performance does not degrade. It's a VM wasteland. The goal is to reduce waste by packing VMs onto machines without hurting performance or wasting resources. The idea is to select VMs that interfere the least with each other and places them together on the same server. It's a NP-Complete problem, but this paper describes a practical method that performs provably close to the optimal. Interestingly they can optimize for performance or power efficiency, so you can use different algorithm

6 0.81637257 1466 high scalability-2013-05-29-Amazon: Creating a Customer Utopia One Culture Hack at a Time

7 0.77923566 330 high scalability-2008-05-27-Should Twitter be an All-You-Can-Eat Buffet or a Vending Machine?

8 0.77428848 482 high scalability-2009-01-04-Alternative Memcache Usage: A Highly Scalable, Highly Available, In-Memory Shard Index

9 0.76919597 778 high scalability-2010-02-15-The Amazing Collective Compute Power of the Ambient Cloud

10 0.74540234 300 high scalability-2008-04-07-Scalr - Open Source Auto-scaling Hosting on Amazon EC2

11 0.72773248 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010

12 0.72456115 1414 high scalability-2013-03-01-Stuff The Internet Says On Scalability For February 29, 2013

13 0.71796829 1471 high scalability-2013-06-06-Paper: Memory Barriers: a Hardware View for Software Hackers

14 0.6974259 280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

15 0.67952418 768 high scalability-2010-02-01-What Will Kill the Cloud?

16 0.67938566 879 high scalability-2010-08-12-Think of Latency as a Pseudo-permanent Network Partition

17 0.65274757 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010

18 0.61766839 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?

19 0.59392452 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed

20 0.55058306 97 high scalability-2007-09-18-Session management in highly scalable web sites