high_scalability high_scalability-2010 high_scalability-2010-883 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Lots of good links this week... Membase, powering Farmville's 500k operations *per second* . Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit. Tweets of Gold: kbsingh : I dont understand why some developers think its ok to leave operations people out of scalability decisions karmazilla : I find it a little odd when a database claims to support "massive scalability" when it is not distributed. pcapr : OH: teenagers are eventually consistent tv : Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies." Superfeedr makes The Case against Rate Limiting . Push, don't poll. Of course, the receiving systems may still need to rate limit. M ulti-core, Threads & Message Passing by Ilya Grigorik. We need threads, we need events, and we need message passing - it is not a question of which is better . Doug Cutting gives
sentIndex sentText sentNum sentScore
1 Membase, powering Farmville's 500k operations *per second* . [sent-4, score-0.094]
2 Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit. [sent-5, score-0.108]
3 Tweets of Gold: kbsingh : I dont understand why some developers think its ok to leave operations people out of scalability decisions karmazilla : I find it a little odd when a database claims to support "massive scalability" when it is not distributed. [sent-6, score-0.422]
4 pcapr : OH: teenagers are eventually consistent tv : Verb suggestion for the act of mapreducing data: "marinating". [sent-7, score-0.257]
5 Of course, the receiving systems may still need to rate limit. [sent-11, score-0.281]
6 We need threads, we need events, and we need message passing - it is not a question of which is better . [sent-13, score-0.374]
7 A schema is sent with the data which makes for smaller payloads, strong typing, easier schema evolution, and no need for code generation. [sent-15, score-0.357]
8 A conference for developers and users of open source software projects, focussing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. [sent-21, score-0.119]
9 Transformer has two layers: a common runtime system and a model-specific system. [sent-23, score-0.083]
10 Using Transformer, the authors show how to implement three programming models: Dryad-like data flow, MapReduce, and All-Pairs. [sent-24, score-0.188]
11 Covers lots of core issues like queueing, distribution, task models, blocking vs non-blocking, too many open file problems, and so on. [sent-26, score-0.084]
12 Todd Stavish began to wonder if it would be possible to extract data from Cassandra for analysis in a graph database. [sent-34, score-0.09]
13 In other-words, implement my own polyglot persistence application by fusing InfiniteGraph and Cassandra. [sent-35, score-0.353]
14 In order to understand what each data-store can give to the other. [sent-36, score-0.096]
wordName wordTfidf (topN-words)
[('infinitegraph', 0.268), ('cassandra', 0.152), ('persistence', 0.151), ('hashingby', 0.149), ('teenagers', 0.149), ('vigorous', 0.149), ('transformer', 0.14), ('payloads', 0.14), ('buzzwords', 0.14), ('thencontact', 0.14), ('verb', 0.14), ('schema', 0.135), ('avro', 0.134), ('farmville', 0.119), ('focussing', 0.119), ('models', 0.118), ('threads', 0.116), ('berlin', 0.116), ('message', 0.113), ('odd', 0.111), ('strangely', 0.111), ('dont', 0.109), ('todd', 0.108), ('suggestion', 0.108), ('contend', 0.108), ('tom', 0.108), ('claims', 0.106), ('polyglot', 0.105), ('typing', 0.102), ('course', 0.1), ('rate', 0.1), ('gold', 0.097), ('serialization', 0.097), ('ilya', 0.097), ('queueing', 0.097), ('implement', 0.097), ('understand', 0.096), ('integrating', 0.096), ('receiving', 0.094), ('powering', 0.094), ('authors', 0.091), ('schemas', 0.091), ('extract', 0.09), ('workers', 0.09), ('cutting', 0.088), ('need', 0.087), ('involve', 0.086), ('lots', 0.084), ('generating', 0.084), ('runtime', 0.083)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010
Introduction: Lots of good links this week... Membase, powering Farmville's 500k operations *per second* . Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit. Tweets of Gold: kbsingh : I dont understand why some developers think its ok to leave operations people out of scalability decisions karmazilla : I find it a little odd when a database claims to support "massive scalability" when it is not distributed. pcapr : OH: teenagers are eventually consistent tv : Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies." Superfeedr makes The Case against Rate Limiting . Push, don't poll. Of course, the receiving systems may still need to rate limit. M ulti-core, Threads & Message Passing by Ilya Grigorik. We need threads, we need events, and we need message passing - it is not a question of which is better . Doug Cutting gives
2 0.1728332 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
Introduction: It's a truism that we should choose the right tool for the job . Everyone says that. And who can disagree? The problem is this is not helpful advice without being able to answer more specific questions like: What jobs are the tools good at? Will they work on jobs like mine? Is it worth the risk to try something new when all my people know something else and we have a deadline to meet? How can I make all the tools work together? In the NoSQL space this kind of real-world data is still a bit vague. When asked, vendors tend to give very general answers like NoSQL is good for BigData or key-value access. What does that mean for for the developer in the trenches faced with the task of solving a specific problem and there are a dozen confusing choices and no obvious winner? Not a lot. It's often hard to take that next step and imagine how their specific problems could be solved in a way that's worth taking the trouble and risk. Let's change that. What problems are you using NoSQL to sol
3 0.12439735 855 high scalability-2010-07-11-So, Why is Twitter Really Not Using Cassandra to Store Tweets?
Introduction: A firestorm of accusations circled around recently saying that Cassandra, the elected-by-major-adopters emperor of the NoSQL movement, has no clothes. It was said Twitter was dumping Cassandra; Reddit outages were linked to Cassandra; and even Facebook, Cassandra's cradle of birth, was said to have abandoned Cassandra. Shouts of NoSQL Fail! were heard in the streets. Much gloating followed. Is the emperor really naked? Casually dressed maybe, but not naked. (Note: after this point the article contains a flow chart that is NSFW. Some people are very sensitive about cussing, so if that's you, please go back, don't read on. Danger! There are no nude pictures or anything, just some strong language. But this is my most favorite flow chart of all time, so it's worth it :-) Is Twitter really abandoning Cassandra? Not according to Twitter, which came out with a post, Cassandra at Twitter Today , explaining that they are using Cassandra in production for geolocation and analytics. T
Introduction: On the surface nothing appears more different than soft data and hard raw materials like iron. Then isn’t it ironic , in the Alanis Morissette sense, that in this Age of Information, great wealth still lies hidden deep beneath piles of stuff? It's so strange how directly digging for dollars in data parallels the great wealth producing models of the Industrial Revolution. The piles of stuff is the Internet. It takes lots of prospecting to find the right stuff. Mighty web crawling machines tirelessly collect stuff, bringing it into their huge maws, then depositing load after load into rack after rack of distributed file system machines. Then armies of still other machines take this stuff and strip out the valuable raw materials, which in the Information Age, are endless bytes of raw data. Link clicks, likes, page views, content, head lines, searches, inbound links, outbound links, search clicks, hashtags, friends, purchases: anything and everything you do on the Internet is a valu
5 0.11615246 774 high scalability-2010-02-08-How FarmVille Scales to Harvest 75 Million Players a Month
Introduction: Several readers had follow-up questions in response to this article. Luke's responses can be found in How FarmVille Scales - The Follow-up . If real farming was as comforting as it is in Zynga's mega-hit Farmville then my family would have probably never left those harsh North Dakota winters. None of the scary bedtime stories my Grandma used to tell about farming are true in FarmVille. Farmers make money, plants grow, and animals never visit the red barn . I guess it's just that keep-your-shoes-clean back-to-the-land charm that has helped make FarmVille the "largest game in the world" in such an astonishingly short time. How did FarmVille scale a web application to handle 75 million players a month? Fortunately FarmVille's Luke Rajlich has agreed to let us in on a few their challenges and secrets. Here's what Luke has to say... The format of the interview was that I sent Luke a few general questions and he replied with this response: FarmVille has a unique set of sc
6 0.11592038 649 high scalability-2009-07-02-Product: Facebook's Cassandra - A Massive Distributed Store
7 0.11575185 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
8 0.11037163 931 high scalability-2010-10-28-Notes from A NOSQL Evening in Palo Alto
9 0.10978098 1240 high scalability-2012-05-07-Startups are Creating a New System of the World for IT
10 0.10973976 787 high scalability-2010-03-03-Hot Scalability Links for March 3, 2010
11 0.10843696 940 high scalability-2010-11-12-Stuff the Internet Says on Scalability For November 12th, 2010
12 0.10816049 1064 high scalability-2011-06-20-35+ Use Cases for Choosing Your Next NoSQL Database
13 0.10669497 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
14 0.10547686 880 high scalability-2010-08-13-Hot Scalability Links for Aug 13, 2010
15 0.10339152 1093 high scalability-2011-08-05-Stuff The Internet Says On Scalability For August 5, 2011
16 0.101409 554 high scalability-2009-04-04-Digg Architecture
17 0.1011841 833 high scalability-2010-06-01-Sponsored Post: Get Your High Scalability Fix at Digg
18 0.10074946 1591 high scalability-2014-02-05-Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck. What you can do?
20 0.094542205 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
topicId topicWeight
[(0, 0.194), (1, 0.076), (2, 0.019), (3, 0.069), (4, 0.072), (5, 0.066), (6, -0.05), (7, 0.026), (8, 0.042), (9, 0.027), (10, 0.022), (11, 0.015), (12, 0.005), (13, -0.07), (14, -0.018), (15, -0.051), (16, 0.041), (17, 0.024), (18, -0.024), (19, 0.004), (20, 0.004), (21, -0.027), (22, 0.009), (23, -0.04), (24, 0.047), (25, -0.028), (26, 0.016), (27, 0.029), (28, 0.027), (29, -0.026), (30, 0.054), (31, 0.007), (32, -0.011), (33, 0.002), (34, 0.025), (35, 0.003), (36, 0.005), (37, 0.015), (38, 0.012), (39, -0.024), (40, 0.009), (41, 0.029), (42, -0.072), (43, -0.0), (44, -0.023), (45, 0.015), (46, -0.062), (47, 0.02), (48, -0.015), (49, 0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.94017595 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010
Introduction: Lots of good links this week... Membase, powering Farmville's 500k operations *per second* . Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit. Tweets of Gold: kbsingh : I dont understand why some developers think its ok to leave operations people out of scalability decisions karmazilla : I find it a little odd when a database claims to support "massive scalability" when it is not distributed. pcapr : OH: teenagers are eventually consistent tv : Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies." Superfeedr makes The Case against Rate Limiting . Push, don't poll. Of course, the receiving systems may still need to rate limit. M ulti-core, Threads & Message Passing by Ilya Grigorik. We need threads, we need events, and we need message passing - it is not a question of which is better . Doug Cutting gives
2 0.77795768 1067 high scalability-2011-06-24-Stuff The Internet Says On Scalability For June 24, 2011
Introduction: Submitted for your scaling pleasure: Achievements: Watson uses 10,000's of watts, the computer between the ears uses 20. With only 200 million pages and 2TB of data, Watson is BigInsights, not BigData. That Google is pretty big: 1 billion unique monthly visitors tweetimages : We peaked at 22m avatars yesterday. Bandwidth peaked at 9GB of @twitter avatars in a single hour. Foursquare Surpasses 10 Million Users Reddit Hits 1.2B Monthly Pageviews, More Than Doubles Its Engineering Staff Twitter : 185 million tweets are posted daily; 1.6 billion search queries daily; indexing latency is less than 10 seconds. Quotable quotes: skr : OH: "people wait their whole lives for a situation where they can use bloom filters" joeweinman : @Werner at #structureconf : as of Nov 10, 2010, all Amazon.com traffic was served from AWS. <-- The child surpasses the parent. bbatsov : A compiled language does not scalability mak
3 0.75396264 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010
Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake : People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt
4 0.73989898 1109 high scalability-2011-09-02-Stuff The Internet Says On Scalability For September 2, 2011
Introduction: Scale the modern way / No brush / No lather / No rub-in / Big tube 35 cents - Drug stores / HighScalability: 8868 Tweets per second during VMAs ; Facebook: 250 million photos uploaded each day ; Earth: 7 Billion People Strong Potent quotables: @kevinweil : Wow, 8868 Tweets per second last night during the #VMAs. And that's just the writes -- imagine how many reads we were doing! @tristanbergh : #NoSQL isn't cool, it's a working kludge of existing architectures, bowing to the current tech limits, not transcending them @krishnan : I would love to switch the backend infra to Amazon anytime but our top 20 customers will not allow us @ianozsvald : Learning about all the horrible things that happen when you don't plan (@socialtiesapp) for scalability. Trying to be creative now... After a particularly difficult Jeopardy match, Watson asked IBM to make him a new cognitive chip so he could conti
5 0.73231548 854 high scalability-2010-07-09-Hot Scalability Links for July 9, 2010
Introduction: Facebook serves 3 billion Like buttons a day says VentureBeat. CloudScaling reports: Rumor Mill: Google EC2 Competitor Coming in 2010? It looks like GAE for PaaS and an EC2 clone for IaaS. Tweets of gold: alandipert : scalability is a drug seldo : Scalability lesson #23: if any part of your system involves a list that gets bigger over time, eventually that list will become too big. obfuscurity : Her: "Go look at the pictures on the database." Me: "You mean our fileserver?" Her: "Whatever." luiscab : Ouch, I just read on an Info Mgmt rag that Hadoop could easily be an acronym for "Heck, Another Darn Obscure Open-source Project." sanity : Depressed about how much time I've had to spend searching for the right database solution for a new project. Each has it's flaws ioshints : You cannot take a car, grow it 10 times and expect to get a mining truck. A contentious thread on Hacker News: Mong
6 0.73171949 1327 high scalability-2012-09-21-Stuff The Internet Says On Scalability For September 21, 2012
7 0.73155224 1007 high scalability-2011-03-18-Stuff The Internet Says On Scalability For March 18, 2011
8 0.72742456 1040 high scalability-2011-05-13-Stuff The Internet Says On Scalability For May 13, 2011
9 0.72683972 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
10 0.72626996 930 high scalability-2010-10-28-NoSQL Took Away the Relational Model and Gave Nothing Back
11 0.7165184 1097 high scalability-2011-08-12-Stuff The Internet Says On Scalability For August 12, 2011
12 0.71238333 1411 high scalability-2013-02-22-Stuff The Internet Says On Scalability For February 22, 2013
13 0.71237886 935 high scalability-2010-11-05-Hot Scalability Links For November 5th, 2010
14 0.7113654 1071 high scalability-2011-07-01-Stuff The Internet Says On Scalability For July 1, 2011
15 0.71124154 787 high scalability-2010-03-03-Hot Scalability Links for March 3, 2010
16 0.7101469 1141 high scalability-2011-11-11-Stuff The Internet Says On Scalability For November 11, 2011
17 0.70852035 1414 high scalability-2013-03-01-Stuff The Internet Says On Scalability For February 29, 2013
18 0.7056241 1024 high scalability-2011-04-15-Stuff The Internet Says On Scalability For April 15, 2011
19 0.70531344 842 high scalability-2010-06-16-Hot Scalability Links for June 16, 2010
20 0.70530611 1607 high scalability-2014-03-07-Stuff The Internet Says On Scalability For March 7th, 2014
topicId topicWeight
[(1, 0.145), (2, 0.225), (27, 0.296), (40, 0.024), (61, 0.063), (79, 0.118), (94, 0.054)]
simIndex simValue blogId blogTitle
1 0.96862668 555 high scalability-2009-04-04-Performance Anti-Pattern
Introduction: Want your apps to run faster? Here’s what not to do. By: Bart Smaalders, Sun Microsystems. Performance Anti-Patterns: - Fixing Performance at the End of the Project - Measuring and Comparing the Wrong Things - Algorithmic Antipathy - Reusing Software - Iterating Because That’s What Computers Do Well - Premature Optimization - Focusing on What You Can See Rather Than on the Problem - Software Layering - Excessive Numbers of Threads - Asymmetric Hardware Utilization - Not Optimizing for the Common Case - Needless Swapping of Cache Lines Between CPUs For more detail go there
2 0.87622416 28 high scalability-2007-07-25-Product: NetApp MetroCluster Software
Introduction: NetApp MetroCluster Software Cost-effective is an integrated high-availability storage cluster and site failover capability. NetApp MetroCluster is an integrated high-availability and disaster recovery solution that can reduce system complexity and simplify management while ensuring greater return on investment. MetroCluster uses clustered server technology to replicate data synchronously between sites located miles apart, eliminating data loss in case of a disruption. Simple and powerful recovery process minimizes downtime, with little or no user action required. At one company I worked at they used the NetApp snap mirror feature to replicate data across long distances to multiple datacenters. They had a very fast backbone and it worked well. The issue with NetApp is always one of cost, but if you can afford it, it's a good option.
3 0.8676163 1483 high scalability-2013-06-27-Paper: XORing Elephants: Novel Erasure Codes for Big Data
Introduction: Erasure codes are one of those seemingly magical mathematical creations that with the developments described in the paper XORing Elephants: Novel Erasure Codes for Big Data , are set to replace triple replication as the data storage protection mechanism of choice. The result says Robin Harris (StorageMojo) in an excellent article, Facebook’s advanced erasure codes : "WebCos will be able to store massive amounts of data more efficiently than ever before. Bad news: so will anyone else." Robin says with cheap disks triple replication made sense and was economical. With ever bigger BigData the overhead has become costly. But erasure codes have always suffered from unacceptably long time to repair times. This paper describes new Locally Repairable Codes (LRCs) that are efficiently repairable in disk I/O and bandwidth requirements: These systems are now designed to survive the loss of up to four storage elements – disks, servers, nodes or even entire data centers – without losing
4 0.85741943 544 high scalability-2009-03-18-QCon London 2009: Upgrading Twitter without service disruptions
Introduction: Evan Weaver from Twitter presented a talk on Twitter software upgrades, titled Improving running components as part of the Systems that never stop track at QCon London 2009 conference last Friday. The talk focused on several upgrades performed since last May, while Twitter was experiencing serious performance problems.
5 0.85559636 1097 high scalability-2011-08-12-Stuff The Internet Says On Scalability For August 12, 2011
Introduction: Submitted for your scaling pleasure, you may not scale often, but when you scale, please drink us: Quotably quotable quotes: @mardix : There is no single point of truth in #NoSQL . #Consistency is no longer global, it's relative to the one accessing it. #Scalability @kekline : RT @CurtMonash: "...from industry figures, Basho/Riak is our third-biggest competitor." How often do you encounter them? "Never have" #nosql @dave_jacobs : Love being in a city where I can overhear a convo about Heroku scalability while doing deadlifts. #ahsanfrancisco @satheeshilu : Doctor at #hospital in india says #ge #healthcare software is slow to handle 100K X-rays an year.Scalability is critical 4 Indian #software @sufw : How can it be possible that Tagged has 80m users and I have *never* heard of it!?! @EventCloudPro : One of my vacation realizations? Whole #bigdata thing has turned into a lotta #bighype - many distinct issues & nothing to do w/ #bigdata No
6 0.8386811 1141 high scalability-2011-11-11-Stuff The Internet Says On Scalability For November 11, 2011
same-blog 7 0.83694285 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010
10 0.83015263 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
11 0.80574661 900 high scalability-2010-09-11-Google's Colossus Makes Search Real-time by Dumping MapReduce
12 0.80388319 835 high scalability-2010-06-03-Hot Scalability Links for June 3, 2010
13 0.77474624 666 high scalability-2009-07-30-Learn How to Think at Scale
14 0.77169096 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?
15 0.74805439 717 high scalability-2009-10-07-How to Avoid the Top 5 Scale-Out Pitfalls
16 0.74623996 1622 high scalability-2014-03-31-How WhatsApp Grew to Nearly 500 Million Users, 11,000 cores, and 70 Million Messages a Second
17 0.72607303 537 high scalability-2009-03-12-QCon London 2009: Database projects to watch closely
18 0.72129661 19 high scalability-2007-07-16-Paper: Replication Under Scalable Hashing
19 0.72065061 1076 high scalability-2011-07-08-Stuff The Internet Says On Scalability For July 8, 2011
20 0.71412784 122 high scalability-2007-10-14-Product: The Spread Toolkit