high_scalability high_scalability-2007 high_scalability-2007-122 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Complex applications coordinating work across a lot of machines often need a highly performing fault tolerant message layer. Though a blast to write, it's probably a better use of your time to use an off the shelf solution. And that's where Spread comes in. Flickr, for example, uses Spread to create real-time event feeds from their web server logs. What exactly is Spread? From the Spread website: Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees. Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The
sentIndex sentText sentNum sentScore
1 Complex applications coordinating work across a lot of machines often need a highly performing fault tolerant message layer. [sent-1, score-0.321]
2 Though a blast to write, it's probably a better use of your time to use an off the shelf solution. [sent-2, score-0.181]
3 From the Spread website: Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. [sent-6, score-1.002]
4 Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. [sent-7, score-0.713]
5 Spread services range from reliable messaging to fully ordered messages with delivery guarantees. [sent-8, score-0.463]
6 Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. [sent-9, score-0.454]
7 The toolkit is designed to encapsulate the challenging aspects of asynchronous networks and enable the construction of reliable and scalable distributed applications. [sent-10, score-1.021]
8 Some of the services and benefits provided by Spread: Reliable and scalable messaging and group communication. [sent-11, score-0.335]
9 A very powerful but simple API simplifies the construction of distributed architectures. [sent-12, score-0.367]
10 Highly scalable from one local area network to complex wide area networks. [sent-14, score-0.598]
11 Enables message reliability in the presence of machine failures, process crashes and recoveries, and network partitions and merges. [sent-16, score-0.505]
12 Provides a range of reliability, ordering and stability guarantees for messages. [sent-17, score-0.237]
13 Completely distributed algorithms with no central point of failure. [sent-19, score-0.177]
14 In Building Scalable Web Sites Cal Henderson describes how Flickr uses Spread to create a log of real-time events, like photos uploaded and discussions started, as they happen. [sent-20, score-0.36]
15 As photos are uploaded these web server events are messaged in real-time to agents consuming the feed. [sent-22, score-0.579]
16 The advantage of this architecture is it sheds load away from the database. [sent-23, score-0.121]
17 Otherwise the database would have to be continuously polled for new events by each agent. [sent-24, score-0.25]
wordName wordTfidf (topN-words)
[('spread', 0.521), ('toolkit', 0.275), ('construction', 0.176), ('area', 0.164), ('messaging', 0.158), ('message', 0.147), ('uploaded', 0.144), ('reliability', 0.142), ('photos', 0.137), ('reliable', 0.134), ('breakfrom', 0.129), ('events', 0.129), ('polled', 0.121), ('sheds', 0.121), ('henderson', 0.116), ('encapsulate', 0.112), ('subsets', 0.112), ('coordinating', 0.1), ('communication', 0.1), ('distributed', 0.098), ('blast', 0.098), ('wide', 0.094), ('scalable', 0.094), ('range', 0.093), ('simplifies', 0.093), ('multicast', 0.092), ('robustness', 0.092), ('agents', 0.088), ('shelf', 0.083), ('group', 0.083), ('local', 0.082), ('consuming', 0.081), ('crashes', 0.081), ('unified', 0.08), ('resilient', 0.079), ('discussions', 0.079), ('point', 0.079), ('faults', 0.078), ('ordered', 0.078), ('ordering', 0.076), ('bus', 0.075), ('tolerant', 0.074), ('high', 0.072), ('tuned', 0.072), ('presence', 0.068), ('flickr', 0.068), ('guarantees', 0.068), ('challenging', 0.067), ('partitions', 0.067), ('aspects', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 122 high scalability-2007-10-14-Product: The Spread Toolkit
Introduction: Complex applications coordinating work across a lot of machines often need a highly performing fault tolerant message layer. Though a blast to write, it's probably a better use of your time to use an off the shelf solution. And that's where Spread comes in. Flickr, for example, uses Spread to create real-time event feeds from their web server logs. What exactly is Spread? From the Spread website: Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees. Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The
2 0.113014 199 high scalability-2008-01-01-S3 for image storing
Introduction: Hi all, Has anyone got any experience with using Amazon S3 as an uploaded photo store? I'm writing a website that I need to keep as low budget as possible, and I'm investigating solutions for storing uploaded photos from users - not too many, probably in the low thousands. The site is commercial so I'm straying away from the Flickrs of the world. S3 seems to offer a solution but I'd like to hear from those who have used it before. Thanks Andy
3 0.10670281 1197 high scalability-2012-02-21-Pixable Architecture - Crawling, Analyzing, and Ranking 20 Million Photos a Day
Introduction: This is a guest post by Alberto Lopez Toledo, PHD, CTO of Pixable, and Julio Viera, VP of Engineering at Pixable. Pixable aggregates photos from across your different social networks and finds the best ones so you never miss an important moment. That means currently processing the metadata of more than 20 million new photos per day: crawling, analyzing, ranking, and sorting them along with the other 5+ billion that are already stored in our database. Making sense of all that data has challenges, but two in particular rise above the rest: How to access millions of photos per day from Facebook, Twitter, Instagram, and other services in the most efficient manner. How to process, organize, index, and store all the meta-data related to those photos. Sure, Pixable’s infrastructure is changing continuously, but there are some things that we have learned over the last year. As a result, we have been able to build a scalable infrastructure that takes advantage of today’s tools,
Introduction: This is a guest post by Patrick Eaton , Software Engineer and Distributed Systems Architect at Stackdriver. Stackdriver provides intelligent monitoring-as-a-service for cloud hosted applications. Behind this easy-to-use service is a large distributed system for collecting and storing metrics and events, monitoring and alerting on them, analyzing them, and serving up all the results in a web UI. Because we ourselves run in the cloud (mostly on AWS), we spend a lot of time thinking about how to deal with faults in the cloud. We have developed a framework for thinking about fault mitigation for large, cloud-hosted systems. We endearingly call this framework the “Four Hamiltons” because it is inspired by an article from James Hamilton, the Vice President and Distinguished Engineer at Amazon Web Services. The article that led to this framework is called “ The Power Failure Seen Around the World ” . Hamilton analyzes the causes of the power outage that affected Super Bowl XL
5 0.10118368 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest
Introduction: This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone . Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time. The process: Identify the minimum feedback the client (UI, API) needs to know an operation succeeded . It's enough, for example, to update a client's view when a posting a message to a microblogging service. The client probably isn't aware of all the other steps that happen when a message is added and doesn't really care when they happen as long as the obvious cases happen in an appropariate
6 0.099244408 431 high scalability-2008-10-27-Notify.me Architecture - Synchronicity Kills
7 0.099142827 38 high scalability-2007-07-30-Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services
8 0.097493559 152 high scalability-2007-11-13-Flickr Architecture
9 0.096941561 677 high scalability-2009-08-09-NoSQL: If Only It Was That Easy
10 0.094143078 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
12 0.093379587 1577 high scalability-2014-01-13-NYTimes Architecture: No Head, No Master, No Single Point of Failure
14 0.088964976 1240 high scalability-2012-05-07-Startups are Creating a New System of the World for IT
16 0.086723037 114 high scalability-2007-10-07-Product: Wackamole
17 0.085971914 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
18 0.085932724 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
20 0.084464461 96 high scalability-2007-09-18-Amazon Architecture
topicId topicWeight
[(0, 0.154), (1, 0.039), (2, 0.005), (3, -0.001), (4, 0.0), (5, 0.011), (6, 0.032), (7, -0.03), (8, -0.047), (9, 0.049), (10, 0.006), (11, 0.065), (12, -0.007), (13, -0.057), (14, 0.013), (15, 0.037), (16, 0.009), (17, 0.003), (18, 0.027), (19, -0.008), (20, 0.017), (21, 0.031), (22, -0.053), (23, 0.04), (24, -0.015), (25, -0.031), (26, 0.02), (27, -0.0), (28, 0.007), (29, -0.024), (30, -0.012), (31, -0.033), (32, 0.036), (33, -0.047), (34, 0.006), (35, -0.059), (36, -0.013), (37, -0.048), (38, -0.03), (39, 0.004), (40, 0.016), (41, -0.003), (42, -0.025), (43, -0.052), (44, -0.044), (45, 0.004), (46, 0.008), (47, 0.007), (48, -0.052), (49, -0.053)]
simIndex simValue blogId blogTitle
same-blog 1 0.94755948 122 high scalability-2007-10-14-Product: The Spread Toolkit
Introduction: Complex applications coordinating work across a lot of machines often need a highly performing fault tolerant message layer. Though a blast to write, it's probably a better use of your time to use an off the shelf solution. And that's where Spread comes in. Flickr, for example, uses Spread to create real-time event feeds from their web server logs. What exactly is Spread? From the Spread website: Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees. Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The
2 0.71327317 1042 high scalability-2011-05-17-Facebook: An Example Canonical Architecture for Scaling Billions of Messages
Introduction: What should the architecture of your scalable, real-time, highly available service look like? There are as many options as there are developers, but if you are looking for a general template, this architecture as described by Prashant Malik, Facebook's lead for the Messages back end team, in Scaling the Messages Application Back End , is a very good example to consider. Although Messages is tasked with handling 135+ billion messages a month, from email, IM, SMS, text messages, and Facebook messages, you may think this is an example of BigArchitecture and doesn't apply to smaller sites. Not so. It's a good, well thought out example of a non-cloud architecture exhibiting many qualities any mom would be proud of: Layered - components are independent and isolated. Service/API Driven - each layer is connected via well defined interface that is the sole entry point for accessing that service. This prevents nasty complicated interdependencies. Clients hide behind an applicat
Introduction: When building a system on top of a set of wildly uncooperative and unruly computers you have knowledge problems: knowing when other nodes are dead; knowing when nodes become alive; getting information about other nodes so you can make local decisions, like knowing which node should handle a request based on a scheme for assigning nodes to a certain range of users; learning about new configuration data; agreeing on data values; and so on. How do you solve these problems? A common centralized approach is to use a database and all nodes query it for information. Obvious availability and performance issues for large distributed clusters. Another approach is to use Paxos , a protocol for solving consensus in a network to maintain strict consistency requirements for small groups of unreliable processes. Not practical when larger number of nodes are involved. So what's the super cool decentralized way to bring order to large clusters? Gossip protocols , which maintain relaxed consi
4 0.69633716 1544 high scalability-2013-11-07-Paper: Tempest: Scalable Time-Critical Web Services Platform
Introduction: An interesting and different implementation approach: Tempest: Scalable Time-Critical Web Services Platform : Tempest is a new framework for developing time-critical web services. Tempest enables developers to build scalable, fault-tolerant services that can then be automatically replicated and deployed across clusters of computing nodes. The platform automatically adapts to load fluctuations, reacts when components fail, and ensures consistency between replicas by repairing when inconsistencies do occur. Tempest relies on a family of epidemic protocols and on Ricochet, a reliable time critical multicast protocol with probabilistic guarantees. Tempest is built around a novel storage abstraction called the TempestCollection in which application developers store the state of a service. Our platform handles the replication of this state across clones of the service, persistence, and failure handling. To minimize the need for specialized knowledge on the part of the application deve
Introduction: Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching
6 0.67010468 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos
7 0.65632677 958 high scalability-2010-12-16-7 Design Patterns for Almost-infinite Scalability
8 0.65361559 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop
9 0.64738274 1577 high scalability-2014-01-13-NYTimes Architecture: No Head, No Master, No Single Point of Failure
10 0.64350808 1435 high scalability-2013-04-04-Paper: A Web of Things Application Architecture - Integrating the Real-World into the Web
12 0.63939261 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
13 0.63858736 431 high scalability-2008-10-27-Notify.me Architecture - Synchronicity Kills
14 0.63835967 1197 high scalability-2012-02-21-Pixable Architecture - Crawling, Analyzing, and Ranking 20 Million Photos a Day
15 0.63410521 507 high scalability-2009-02-03-Paper: Optimistic Replication
16 0.63167405 53 high scalability-2007-08-01-Product: MogileFS
17 0.62851161 368 high scalability-2008-08-17-Wuala - P2P Online Storage Cloud
18 0.62351334 1087 high scalability-2011-07-26-Web 2.0 Killed the Middleware Star
19 0.62323487 705 high scalability-2009-09-16-Paper: A practical scalable distributed B-tree
topicId topicWeight
[(1, 0.192), (2, 0.242), (10, 0.013), (15, 0.264), (40, 0.028), (61, 0.053), (79, 0.093), (94, 0.013)]
simIndex simValue blogId blogTitle
1 0.90312445 85 high scalability-2007-09-08-Making the case for PHP at Yahoo! (Oct 2002)
Introduction: This presentation by Michael Radwin describes why Yahoo! had standardized on PHP going forward. It describes how after reviewing all the web technologies including their own internal ones, PHP was choosen. It shows that not only technical reasons , but also business and development processes were taken into account.
2 0.87275147 88 high scalability-2007-09-10-Blog: Scalable Web Architectures by Royans Tharakan
Introduction: Royans' scalability blog and his main blog are excellent sources of scalability information. Take a look. A Quick Hit of What's Inside Sharding: Different from Partitioning and Federation ? , Adventures of scaling eins.de , Session, state and scalability Site: http://www.royans.net/
3 0.85813469 362 high scalability-2008-08-11-Distributed Computing & Google Infrastructure
Introduction: A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5 .
4 0.85804003 948 high scalability-2010-11-24-Great Introductory Video on Scalability from Harvard Computer Science
Introduction: Professor David Malan gives a very good lecture on scalability for dynamic websites. It's not highly technical, it's an extension course, but it's a great introduction to a wide variety of topics. I really like his teaching style. He continually asks questions, prompts for input, and gives accessible explanations. Some of the topics covered: vertical scaling; horizontal scaling; PHP acceleration; load balancing: DNS, L7, sticky sessions, load balancers; caching; MySQL: replication, load balancing, partitioning, high availability. Watch it on Academic Earth This is one lecture in a series of 13 lectures on building dynamic websites. Students learn how to: build dynamic websites with Ajax and with Linux , Apache , MySQL , and PHP ( LAMP ); set up domain names with DNS ; structure pages with XHTML and CSS how to program in JavaScript and PHP ; configure Apache and MySQL ; design and query databases with SQL ; use Ajax with both XML and JSON ;
5 0.84505427 1512 high scalability-2013-09-05-Paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale
Introduction: Ever wonder what powers Google's world spirit sensing Zeitgeist service ? No, it's not a homunculus of Georg Wilhelm Friedrich Hegel sitting in each browser. It's actually a stream processing (think streaming MapReduce on steroids) system called MillWheel, described in this very well written paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale . MillWheel isn't just used for Zeitgeist at Google, it's also used for streaming joins for a variety of Ads customers, generalized anomaly-detection service, and network switch and cluster health monitoring. Abstract: MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees. This paper describes MillWheel’s programming model as well as it
same-blog 6 0.83350718 122 high scalability-2007-10-14-Product: The Spread Toolkit
7 0.83195925 1455 high scalability-2013-05-10-Stuff The Internet Says On Scalability For May 10, 2013
8 0.82610536 414 high scalability-2008-10-15-Hadoop - A Primer
9 0.82203621 682 high scalability-2009-08-16-ThePort Network Architecture
10 0.81465256 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012
11 0.80635458 923 high scalability-2010-10-21-Machine VM + Cloud API - Rewriting the Cloud from Scratch
12 0.80052507 812 high scalability-2010-04-19-Strategy: Order Two Mediums Instead of Two Smalls and the EC2 Buffet
13 0.78671128 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
14 0.78323847 1231 high scalability-2012-04-20-Stuff The Internet Says On Scalability For April 20, 2012
15 0.78083169 1237 high scalability-2012-05-02-12 Ways to Increase Throughput by 32X and Reduce Latency by 20X
16 0.7756722 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
17 0.77200311 96 high scalability-2007-09-18-Amazon Architecture
18 0.77023703 904 high scalability-2010-09-21-Playfish's Social Gaming Architecture - 50 Million Monthly Users and Growing
19 0.77005917 459 high scalability-2008-12-03-Java World Interview on Scalability and Other Java Scalability Secrets
20 0.76967388 373 high scalability-2008-08-29-Product: ScaleOut StateServer is Memcached on Steroids