high_scalability high_scalability-2008 high_scalability-2008-215 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Atif Ghaffar has a nice strategy to deal with virus checking uploads: Upload item into a safe area. If necessary, the uploader blocks waiting for a result. Queue a work order into a job system so all the work can be distributed throughout your cluster. A service in your cluster performs the virus scan and informs the uploader of the result. Move the vetted item into your system. This removes the CPU bottleneck from your web servers and distributes it through your cluster. Keep your web servers providing prompt service to users. Let your cluster do the heavy lifting. This minimizes response time and maximizes throughput. A similar system can be used for creating thumbnails, transcoding, copyright checks, updating indexes, event notification or any other kind of intensive work.
sentIndex sentText sentNum sentScore
1 Atif Ghaffar has a nice strategy to deal with virus checking uploads: Upload item into a safe area. [sent-1, score-1.03]
2 If necessary, the uploader blocks waiting for a result. [sent-2, score-0.628]
3 Queue a work order into a job system so all the work can be distributed throughout your cluster. [sent-3, score-0.437]
4 A service in your cluster performs the virus scan and informs the uploader of the result. [sent-4, score-1.393]
5 This removes the CPU bottleneck from your web servers and distributes it through your cluster. [sent-6, score-0.473]
6 Keep your web servers providing prompt service to users. [sent-7, score-0.453]
7 A similar system can be used for creating thumbnails, transcoding, copyright checks, updating indexes, event notification or any other kind of intensive work. [sent-10, score-0.832]
wordName wordTfidf (topN-words)
[('uploader', 0.427), ('virus', 0.347), ('item', 0.24), ('atif', 0.214), ('vetted', 0.214), ('informs', 0.201), ('copyright', 0.192), ('prompt', 0.192), ('maximizes', 0.184), ('thumbnails', 0.166), ('transcoding', 0.157), ('minimizes', 0.144), ('distributes', 0.144), ('uploads', 0.135), ('scan', 0.133), ('notification', 0.129), ('checking', 0.125), ('checks', 0.123), ('removes', 0.123), ('updating', 0.116), ('upload', 0.113), ('safe', 0.113), ('performs', 0.109), ('throughout', 0.107), ('blocks', 0.104), ('cluster', 0.103), ('intensive', 0.101), ('waiting', 0.097), ('bottleneck', 0.093), ('work', 0.088), ('indexes', 0.084), ('heavy', 0.084), ('necessary', 0.083), ('providing', 0.075), ('service', 0.073), ('deal', 0.072), ('nice', 0.068), ('similar', 0.067), ('strategy', 0.065), ('kind', 0.063), ('creating', 0.062), ('servers', 0.061), ('order', 0.06), ('event', 0.059), ('response', 0.059), ('web', 0.052), ('job', 0.051), ('let', 0.049), ('cpu', 0.049), ('system', 0.043)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 215 high scalability-2008-01-16-Strategy: Asynchronous Queued Virus Scanning
Introduction: Atif Ghaffar has a nice strategy to deal with virus checking uploads: Upload item into a safe area. If necessary, the uploader blocks waiting for a result. Queue a work order into a job system so all the work can be distributed throughout your cluster. A service in your cluster performs the virus scan and informs the uploader of the result. Move the vetted item into your system. This removes the CPU bottleneck from your web servers and distributes it through your cluster. Keep your web servers providing prompt service to users. Let your cluster do the heavy lifting. This minimizes response time and maximizes throughput. A similar system can be used for creating thumbnails, transcoding, copyright checks, updating indexes, event notification or any other kind of intensive work.
2 0.087906852 259 high scalability-2008-02-25-Any Suggestions for the Architecture Template?
Introduction: Here's my template for describing the architecture of a system. The idea is to have people fill out this template and that then becomes the basis for a profile. This is how the Friends for Sale post was created and I think that turned out well. People always want more detail, but realistically you can only expect so much. The template is definitely too long, but it's more just a series of questions to jog people's memories and then they can answer whatever they think is important. What I want to ask is if you can think of any things to add/delete/change in the template? What do you want to know about the systems people are building? So if you have the time, please take a look and tell me what you think. Getting to Know You * What is the name of your system and where can we find out more about it? * What is your system is for? * Why did you decide to build this system? * How is your project financed? * What is your revenue model? * How do you market you
3 0.087906852 260 high scalability-2008-02-25-Architecture Template Advice Needed
Introduction: Here's my template for describing the architecture of a system. The idea is to have people fill out this template and that then becomes the basis for a profile. This is how the Friends for Sale post was created and I think that turned out well. People always want more detail, but realistically you can only expect so much. The template is definitely too long, but it's more just a series of questions to jog people's memories and then they can answer whatever they think is important. What I want to ask is if you can think of any things to add/delete/change in the template? What do you want to know about the systems people are building? So if you have the time, please take a look and tell me what you think. Getting to Know You * What is the name of your system and where can we find out more about it? * What is your system is for? * Why did you decide to build this system? * How is your project financed? * What is your revenue model? * How do you market y
Introduction: This is a guest post by Ron Pressler, the founder and CEO of Parallel Universe , a Y Combinator company building advanced middleware for real-time applications. A little over a month ago, we open-sourced a new in-memory data grid called Galaxy . An in-memory data grid, or IMDG, is a clustered data storage and processing middleware that uses RAM as the authoritative and primary storage, and distributes data over a cluster for purposes of data and processing scalability and high-availability. A common feature of IMDGs is co-location of code and data, meaning that application code runs on all cluster nodes, each instance processing those data items residing in the local node's RAM. While quite a few commercial and open-source IMDGs are available (like Terracotta, Gigaspaces, Oracle Coherence, GemFire, Websphere eXtreme Scale, Infinispan and Hazelcast), Galaxy has adopted a completely different architecture from all other IMDGs, to service some usage scenarios ill-fitted to the othe
5 0.066988491 38 high scalability-2007-07-30-Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services
Introduction: Can you really create an infinitely scalable infrastructure for less than $100 using Amazon's storage, grid, and queuing services platform? It appears so, at least for the right application. Amazon beams a spot light on the future battle of the roll-your-own versus the connect-the-dots approach to building next generation websites using core external services. Their argument is strong. Using Amazon's platform you can quickly build an infrastructure that would otherwise take an eternity to make, a pile of money to create, and an unbounded mass of people to implement and maintain. Yet Amazon doesn't provide SLAs, so you can you really trust them with your crown jewels? Facebook recently leap frogged Amazon's vision with an even more comprehensive set of services. The battle for the future is on. Site: http://aws.amazon.com/ Information Sources Slides: Building Highly Scalable Web Applications Podcast: Technometria: Amazon Web Services Amazon Services Home . Platform
6 0.065065145 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
7 0.063750125 41 high scalability-2007-07-30-Product: Flickr
8 0.059972271 840 high scalability-2010-06-10-The Four Meta Secrets of Scaling at Facebook
9 0.059878577 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
10 0.059730873 118 high scalability-2007-10-09-High Load on production Webservers after Sourcecode sync
11 0.059596051 578 high scalability-2009-04-23-Which Key value pair database to be used
12 0.055595059 274 high scalability-2008-03-12-YouTube Architecture
13 0.054159008 796 high scalability-2010-03-16-Justin.tv's Live Video Broadcasting Architecture
14 0.054045796 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?
15 0.053834949 1359 high scalability-2012-11-15-Gone Fishin': Justin.Tv's Live Video Broadcasting Architecture
16 0.053004693 343 high scalability-2008-06-09-Apple's iPhone to Use a Centralized Push Based Notification Architecture
17 0.052519087 836 high scalability-2010-06-04-Strategy: Cache Larger Chunks - Cache Hit Rate is a Bad Indicator
18 0.052186355 262 high scalability-2008-02-26-Architecture to Allow High Availability File Upload
19 0.051503282 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release
20 0.051377907 1418 high scalability-2013-03-06-Low Level Scalability Solutions - The Aggregation Collection
topicId topicWeight
[(0, 0.072), (1, 0.035), (2, -0.007), (3, -0.035), (4, -0.004), (5, -0.011), (6, 0.033), (7, -0.013), (8, -0.017), (9, 0.009), (10, 0.006), (11, 0.01), (12, -0.008), (13, -0.033), (14, 0.022), (15, 0.002), (16, -0.021), (17, 0.004), (18, -0.006), (19, 0.003), (20, -0.01), (21, -0.017), (22, 0.016), (23, -0.016), (24, -0.006), (25, 0.007), (26, 0.048), (27, 0.01), (28, -0.001), (29, 0.005), (30, 0.013), (31, 0.021), (32, 0.029), (33, -0.013), (34, -0.002), (35, -0.021), (36, -0.011), (37, 0.005), (38, -0.003), (39, 0.001), (40, -0.002), (41, -0.028), (42, 0.008), (43, 0.003), (44, -0.015), (45, 0.011), (46, -0.026), (47, -0.02), (48, 0.038), (49, -0.004)]
simIndex simValue blogId blogTitle
same-blog 1 0.94927442 215 high scalability-2008-01-16-Strategy: Asynchronous Queued Virus Scanning
Introduction: Atif Ghaffar has a nice strategy to deal with virus checking uploads: Upload item into a safe area. If necessary, the uploader blocks waiting for a result. Queue a work order into a job system so all the work can be distributed throughout your cluster. A service in your cluster performs the virus scan and informs the uploader of the result. Move the vetted item into your system. This removes the CPU bottleneck from your web servers and distributes it through your cluster. Keep your web servers providing prompt service to users. Let your cluster do the heavy lifting. This minimizes response time and maximizes throughput. A similar system can be used for creating thumbnails, transcoding, copyright checks, updating indexes, event notification or any other kind of intensive work.
2 0.74920565 326 high scalability-2008-05-25-Product: Condor - Compute Intensive Workload Management
Introduction: From their website: Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard
3 0.72971493 491 high scalability-2009-01-13-Product: Gearman - Open Source Message Queuing System
Introduction: Update: New Gearman Server & Library in C, MySQL UDFs . Gearman is an open source message queuing system that makes it easy to do distributed job processing using multiple languages. With Gearman you: farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, to call functions between languages, spread CPU usage around your network . Gearman is used by companies like LiveJournal, Yahoo!, and Digg. Digg, for example, runs 300,000 jobs a day through Gearman without any issues. Most large sites use something similar. Why would anyone ever even need a message queuing system? Message queuing is a handy way to move work off your web servers (like image manipulation), to generate thousands of documents in the background, to run the multiple requests in parallel needed to build a web page, or to perform tasks that can comfortably be run in the background and not part
4 0.7166847 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
Introduction: ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information. ZooKeeper can be used for leader election, group membership, and configuration maintenance. In addition ZooKeeper can be used for event notification, locking, and as a priority queue mechanism. It's a sort of central nervous system for distributed systems where the role of the brain is played by the coordination service, axons are the network, processes are the monitored and controlled body parts, and events are the hormones and neurotransmitters used for messaging. Every complex distributed application needs a coordination and orchestration system of some sort, so the ZooKeeper folks at Yahoo decide to build a good one and open source it for everyone to use. The target market for ZooKeeper are multi-host, multi-process C and Java based systems that operate in a data center. ZooKeeper works using distributed processes
5 0.71402317 406 high scalability-2008-10-08-Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest
Introduction: This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone . Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time. The process: Identify the minimum feedback the client (UI, API) needs to know an operation succeeded . It's enough, for example, to update a client's view when a posting a message to a microblogging service. The client probably isn't aware of all the other steps that happen when a message is added and doesn't really care when they happen as long as the obvious cases happen in an appropariate
6 0.69795823 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
7 0.69411761 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
8 0.68380249 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option
9 0.68115538 160 high scalability-2007-11-19-Tailrank Architecture - Learn How to Track Memes Across the Entire Blogosphere
10 0.68018895 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
11 0.67306298 663 high scalability-2009-07-28-37signals Architecture
12 0.66579342 528 high scalability-2009-03-06-Product: Lightcloud - Key-Value Database
13 0.66015154 825 high scalability-2010-05-10-Sify.com Architecture - A Portal at 3900 Requests Per Second
14 0.65518111 259 high scalability-2008-02-25-Any Suggestions for the Architecture Template?
15 0.65518111 260 high scalability-2008-02-25-Architecture Template Advice Needed
17 0.65103334 1418 high scalability-2013-03-06-Low Level Scalability Solutions - The Aggregation Collection
18 0.65036684 275 high scalability-2008-03-14-Problem: Mobbing the Least Used Resource Error
19 0.64780605 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
20 0.64644259 1124 high scalability-2011-09-26-17 Techniques Used to Scale Turntable.fm and Labmeeting to Millions of Users
topicId topicWeight
[(1, 0.075), (2, 0.205), (30, 0.074), (79, 0.077), (84, 0.43)]
simIndex simValue blogId blogTitle
same-blog 1 0.83967417 215 high scalability-2008-01-16-Strategy: Asynchronous Queued Virus Scanning
Introduction: Atif Ghaffar has a nice strategy to deal with virus checking uploads: Upload item into a safe area. If necessary, the uploader blocks waiting for a result. Queue a work order into a job system so all the work can be distributed throughout your cluster. A service in your cluster performs the virus scan and informs the uploader of the result. Move the vetted item into your system. This removes the CPU bottleneck from your web servers and distributes it through your cluster. Keep your web servers providing prompt service to users. Let your cluster do the heavy lifting. This minimizes response time and maximizes throughput. A similar system can be used for creating thumbnails, transcoding, copyright checks, updating indexes, event notification or any other kind of intensive work.
2 0.74606764 592 high scalability-2009-05-06-DyradLINQ
Introduction: The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
3 0.71475232 1384 high scalability-2013-01-09-The Story of How Turning Disk Into a Service Lead to a Deluge of Density
Introduction: We usually think of the wonderful advantages of service oriented architectures as a software thing, but it also applies to hardware. In Security Now 385 , that Doyen of Disk, Steve Gibson , tells the fascinating story (@ about 41:30) of how moving to a service oriented architecture in hard drives, modeling a drive as a linear stream of sectors, helped create the amazing high density disk drives we enjoy today. When drives switched to use the IDE (integrated drive electronics) interface, the controller function moved into the drive instead of the computer. No longer were low level drive signals moved across cables and into the motherboard. Now we just ask the drive for the desired sector and the drive takes care of it. This allowed manufacturers to do anything they wanted to behind the IDE interface. The drive stopped being dumb, it became smart, providing a sort of sector service. Density sky rocketed because there was no dependency on the computer. All the internals could co
4 0.64736921 1625 high scalability-2014-04-03-Leslie Lamport to Programmers: You're Doing it Wrong
Introduction: Famous computer scientist Leslie Lamport is definitely not a worse is better kind of guy. In Computation and State Machines he wants to make the case that to get better programs we need to teach programmers to think better. And programmers will think better when they learn to think in terms of concepts firmly grounded in the language of mathematics. I was disappointed that there was so much English in the paper. Surely it would have been more convincing if it was written as a mathematical proof. Or would it? This whole topic has been argued extensively throughout thousands of years of philosophy. Mathematics has always been a strange attractor for those trying to escape a flawed human rationality. In the end as alluring as the utopia of mathematics is, it lacks a coherent theory of meaning and programming is not about rearranging ungrounded symbols, it's about manipulating and shaping meaning. For programmers I think Ludwig Wittgenstein has the right sense of things. Mean
Introduction: Todd had originally posted an entry on collectl here at Collectl - Performance Data Collector . Collectl collects real-time data from a large number of subsystems like buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp, all using one tool and in one consistent format. Since then a lot has happened. It's now part of both Fedora and Debian distros, not to mention several others. There has also been a pretty good summary written up by Joe Brockmeier . It's also pretty well documented (I like to think) on sourceforge . There have also been a few blog postings by Martin Bach on his blog. Anyhow, awhile back I released a new version of collectl-utils and gave a complete face-lift to one of the utilities, colmux, which is a collectl multiplexor. This tool has the ability to run collectl on multiple systems, which in turn send all their output back to colmux. Colmux then sorts the output on a user-specified column
6 0.59590548 1165 high scalability-2011-12-28-Strategy: Guaranteed Availability Requires Reserving Instances in Specific Zones
8 0.55604434 730 high scalability-2009-10-28-GemFire: Solving the hardest problems in data management
9 0.52868849 343 high scalability-2008-06-09-Apple's iPhone to Use a Centralized Push Based Notification Architecture
10 0.52691633 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector
11 0.50426614 417 high scalability-2008-10-15-Outside.in Scales Up with Engine Yard and moving from PHP to Ruby on Rails
12 0.49390712 719 high scalability-2009-10-09-Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so
13 0.48197579 464 high scalability-2008-12-13-Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
14 0.47965938 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
15 0.47313592 621 high scalability-2009-06-06-Graph server
16 0.47110897 739 high scalability-2009-11-09-10 NoSQL Systems Reviewed
17 0.47067955 252 high scalability-2008-02-18-limit on the number of databases open
18 0.4701584 673 high scalability-2009-08-07-Strategy: Break Up the Memcache Dog Pile
19 0.46715298 1077 high scalability-2011-07-11-ATMCash Exploits Virtualization for Security - Immutability and Reversion
20 0.46687853 5 high scalability-2007-07-10-mixi.jp Architecture