high_scalability high_scalability-2009 high_scalability-2009-564 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I'm seeking for a design pattern or advice or directions. I need to count views/downloads of a set of resources, let them to be identified by their respective URLs. This is not a big problem. I also need to keep a list of viewed/downloaded resources in the last X days. This list needs to be updated every now and then to reflect real last X days of usage. So resources that were requested prior to X days get evicted from it. So it's sort of a black box, you feed messages (download request) in and it gives you that list of URLs with counters on the other end. How would you go about designing it?
sentIndex sentText sentNum sentScore
1 I'm seeking for a design pattern or advice or directions. [sent-1, score-0.543]
2 I need to count views/downloads of a set of resources, let them to be identified by their respective URLs. [sent-2, score-0.879]
3 I also need to keep a list of viewed/downloaded resources in the last X days. [sent-4, score-0.949]
4 This list needs to be updated every now and then to reflect real last X days of usage. [sent-5, score-1.274]
5 So resources that were requested prior to X days get evicted from it. [sent-6, score-1.255]
6 So it's sort of a black box, you feed messages (download request) in and it gives you that list of URLs with counters on the other end. [sent-7, score-1.152]
wordName wordTfidf (topN-words)
[('evicted', 0.312), ('respective', 0.298), ('list', 0.291), ('resources', 0.269), ('requested', 0.226), ('days', 0.22), ('last', 0.21), ('identified', 0.2), ('reflect', 0.194), ('urls', 0.192), ('prior', 0.188), ('counters', 0.184), ('black', 0.184), ('seeking', 0.179), ('count', 0.162), ('feed', 0.16), ('advice', 0.155), ('updated', 0.154), ('pattern', 0.143), ('designing', 0.128), ('download', 0.124), ('messages', 0.122), ('sort', 0.115), ('box', 0.114), ('request', 0.1), ('gives', 0.096), ('needs', 0.083), ('need', 0.077), ('let', 0.077), ('real', 0.068), ('design', 0.066), ('set', 0.065), ('keep', 0.063), ('go', 0.059), ('big', 0.058), ('every', 0.054), ('would', 0.043), ('get', 0.04), ('also', 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed
Introduction: I'm seeking for a design pattern or advice or directions. I need to count views/downloads of a set of resources, let them to be identified by their respective URLs. This is not a big problem. I also need to keep a list of viewed/downloaded resources in the last X days. This list needs to be updated every now and then to reflect real last X days of usage. So resources that were requested prior to X days get evicted from it. So it's sort of a black box, you feed messages (download request) in and it gives you that list of URLs with counters on the other end. How would you go about designing it?
2 0.13634582 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
Introduction: Counting at scale in a distributed environment is surprisingly hard . And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory , How to update video views count effectively? , Numbers Everyone Should Know (sharded counters) . Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments." Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."
3 0.10559785 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
Introduction: Facebook did it again. They've built another system capable of doing something useful with ginormous streams of realtime data. Last time we saw Facebook release their New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month . This time it's a realtime analytics system handling over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds . Alex Himel, Engineering Manager at Facebook, explains what they've built ( video ) and the scale required: Social plugins have become an important and growing source of traffic for millions of websites over the past year. We released a new version of Insights for Websites last week to give site owners better analytics on how people interact with their content and to help them optimize their websites in real time. To accomplish this, we had to engineer a system that could process over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds. Alex does a
4 0.10429595 761 high scalability-2010-01-17-Applications Become Black Boxes Using Markets to Scale and Control Costs
Introduction: This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. We tend to think compute of resources as residing primarily in datacenters. Given the fast pace of innovation we will likely see compute resources become pervasive. Some will reside in datacenters, but compute resources can be anywhere, not just in the datacenter, we'll actually see the bulk of compute resources live outside of datacenters in the future. Given the diversity of compute resources it's reasonable to assume they won't be homogeneous or conform to a standard API. They will specialize by service. Programmers will have to use those specialized service interfaces to build applications that are adaptive enough to take advantage of whatever leverage they can find, whenever and wherever they can find it. Once found the application will have to reorganize on the fly to use whatever new resources it has found and let go of whatever resources it doe
5 0.10050022 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
Introduction: Guest post by Thierry Schellenbach, Founder/CTO of Fashiolista.com, follow @tschellenbach on Twitter and Github Fashiolista started out as a hobby project which we built on the side. We had absolutely no idea it would grow into one of the largest online fashion communities. The entire first version took about two weeks to develop and our feed implementation was dead simple. We’ve come a long way since then and I’d like to share our experience with scaling feed systems. Feeds are a core component of many large startups such as Pinterest, Instagram, Wanelo and Fashiolista. At Fashiolista the feed system powers the flat feed , aggregated feed and the notification system. This article will explain the troubles we ran into when scaling our feeds and the design decisions involved with building your own solution. Understanding the basics of how these feed systems work is essential as more and more applications rely on them. Furthermore we’ve open sourced Feedly , the Python m
6 0.089734092 489 high scalability-2009-01-11-17 Distributed Systems and Web Scalability Resources
7 0.088375978 256 high scalability-2008-02-21-Tracking usage of public resources - throttling accesses per hour
8 0.086518012 1074 high scalability-2011-07-06-11 Common Web Use Cases Solved in Redis
9 0.084714979 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
11 0.084505327 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
12 0.078905083 400 high scalability-2008-10-01-The Pattern Bible for Distributed Computing
13 0.077877998 340 high scalability-2008-06-06-Economies of Non-Scale
14 0.07660111 141 high scalability-2007-11-05-Quick question about efficiently implementing Facebook 'news feed' like functionality
15 0.07502488 1609 high scalability-2014-03-11-Building a Social Music Service Using AWS, Scala, Akka, Play, MongoDB, and Elasticsearch
16 0.074880622 1201 high scalability-2012-02-29-Strategy: Put Mobile Video Into Cold Storage After 30 Days
17 0.072955839 1325 high scalability-2012-09-19-The 4 Building Blocks of Architecting Systems for Scale
18 0.072634384 1049 high scalability-2011-05-31-Awesome List of Advanced Distributed Systems Papers
19 0.07088723 1415 high scalability-2013-03-04-7 Life Saving Scalability Defenses Against Load Monster Attacks
20 0.069006257 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
topicId topicWeight
[(0, 0.091), (1, 0.046), (2, -0.018), (3, 0.007), (4, -0.004), (5, -0.02), (6, -0.003), (7, 0.044), (8, -0.017), (9, -0.038), (10, 0.0), (11, 0.044), (12, 0.003), (13, 0.005), (14, 0.027), (15, -0.004), (16, -0.021), (17, -0.014), (18, 0.015), (19, 0.001), (20, -0.021), (21, -0.025), (22, -0.03), (23, 0.042), (24, 0.01), (25, -0.027), (26, 0.01), (27, 0.065), (28, -0.012), (29, -0.009), (30, -0.008), (31, -0.04), (32, 0.007), (33, 0.07), (34, -0.036), (35, -0.029), (36, -0.002), (37, -0.052), (38, -0.023), (39, -0.007), (40, 0.002), (41, -0.02), (42, 0.057), (43, 0.037), (44, 0.036), (45, 0.018), (46, -0.049), (47, 0.055), (48, 0.007), (49, 0.009)]
simIndex simValue blogId blogTitle
same-blog 1 0.96882081 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed
Introduction: I'm seeking for a design pattern or advice or directions. I need to count views/downloads of a set of resources, let them to be identified by their respective URLs. This is not a big problem. I also need to keep a list of viewed/downloaded resources in the last X days. This list needs to be updated every now and then to reflect real last X days of usage. So resources that were requested prior to X days get evicted from it. So it's sort of a black box, you feed messages (download request) in and it gives you that list of URLs with counters on the other end. How would you go about designing it?
2 0.61746532 1415 high scalability-2013-03-04-7 Life Saving Scalability Defenses Against Load Monster Attacks
Introduction: We talked about 42 Monster Problems That Attack As Loads Increase . Here are a few ways you can defend yourself, secrets revealed by scaling masters across the ages. Note that these are low level programming level moves, not large architecture type strategies. Use Resources Proportional To a Fixed Limit This is probably the most important rule for achieving scalability within an application. What it means: Find the resource that has a fixed limit that you know you can support. For example, a guarantee to handle a certain number of objects in memory. So if we always use resources proportional to the number of objects it is likely we can prevent resource exhaustion. Devise ways of tying what you need to do to the individual resources. Some examples: Keep a list of purchase orders with line items over $20 (or whatever). Do not keep a list of the line items because the number of items can be much larger than the number of purchase orders. You have kept the resource usage
Introduction: This is a guest post by Matt Abrams (@abramsm), from Clearspring, discussing how they are able to accurately estimate the cardinality of sets with billions of distinct elements using surprisingly small data structures. Their servers receive well over 100 billion events per month. At Clearspring we like to count things. Counting the number of distinct elements (the cardinality) of a set is challenge when the cardinality of the set is large. To better understand the challenge of determining the cardinality of large sets let's imagine that you have a 16 character ID and you'd like to count the number of distinct IDs that you've seen in your logs. Here is an example: 4f67bfc603106cb2 These 16 characters represent 128 bits. 65K IDs would require 1 megabyte of space. We receive over 3 billion events per day, and each event has an ID. Those IDs require 384,000,000,000 bits or 45 gigabytes of storage. And that is just the space that the ID field requires! To get the
4 0.61419183 489 high scalability-2009-01-11-17 Distributed Systems and Web Scalability Resources
Introduction: Here's a short list of some great resources that I've found very inspirational and thought provoking. I've broken these resources up into two lists: Blogs and Presentations.
5 0.58640277 1526 high scalability-2013-10-02-RFC 1925 - The Twelve (Timeless) Networking Truths
Introduction: The Twelve Networking Truths is one of a long series of timeless truths documented in sacred April Fools' RFC s. Though issued in 1996, it's no less true today. It's hard to pick a favorite because they are all good. But if I had to pick, it would be: Some things in life can never be fully appreciated nor understood unless experienced firsthand. As we grow comfortable behind garden walls, clutching gadgets like lifelines and ideologies like shields, empathy is the true social network. Network Working Group R. Callon , Editor Request for Comments: 1925 IOOF Category: Informational 1 April 1996 The Twelve Networking Truths Status of this Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract Thi
6 0.58456194 311 high scalability-2008-04-29-Strategy: Sample to Reduce Data Set
7 0.56857562 951 high scalability-2010-12-01-8 Commonly Used Scalable System Design Patterns
8 0.56791431 293 high scalability-2008-03-31-Read HighScalability on Your Mobile Phone Using WidSets Widgets
9 0.56012559 1418 high scalability-2013-03-06-Low Level Scalability Solutions - The Aggregation Collection
10 0.55286813 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
11 0.55071878 1408 high scalability-2013-02-19-Puppet monitoring: how to monitor the success or failure of Puppet runs
12 0.54697895 478 high scalability-2008-12-29-Paper: Spamalytics: An Empirical Analysisof Spam Marketing Conversion
13 0.54316962 719 high scalability-2009-10-09-Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so
14 0.54222333 390 high scalability-2008-09-23-Scaling your cookie recipes
15 0.53963864 363 high scalability-2008-08-12-Strategy: Limit The New, Not The Old
16 0.53873587 1225 high scalability-2012-04-09-Why My Slime Mold is Better than Your Hadoop Cluster
17 0.5377897 561 high scalability-2009-04-08-N+1+caching is ok?
18 0.53667277 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
19 0.53459555 322 high scalability-2008-05-19-Conference: Infoscale 2008 in Italy (June 4-6)
20 0.53105694 1611 high scalability-2014-03-12-Paper: Scalable Eventually Consistent Counters over Unreliable Networks
topicId topicWeight
[(1, 0.083), (2, 0.293), (40, 0.276), (61, 0.201)]
simIndex simValue blogId blogTitle
same-blog 1 0.92727399 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed
Introduction: I'm seeking for a design pattern or advice or directions. I need to count views/downloads of a set of resources, let them to be identified by their respective URLs. This is not a big problem. I also need to keep a list of viewed/downloaded resources in the last X days. This list needs to be updated every now and then to reflect real last X days of usage. So resources that were requested prior to X days get evicted from it. So it's sort of a black box, you feed messages (download request) in and it gives you that list of URLs with counters on the other end. How would you go about designing it?
Introduction: While working with Memcache the other night, it dawned on me that it’s usage as a distributed caching mechanism was really just one of many ways to use it. That there are in fact many alternative usages that one could find for Memcache if they could just realize what Memcache really is at its core – a simple distributed hash-table – is an important point worthy of further discussion. To be clear, when I say “simple”, by no means am I implying that Memcache’s implementation is simple, just that the ideas behind it are such. Think about that for a minute. What else could we use a simple distributed hash-table for, besides caching? How about using it as an alternative to the traditional shard lookup method we used in our Master Index Lookup scalability strategy, discussed previously here.
3 0.88976264 860 high scalability-2010-07-17-Hot Scalability Links for July 17, 2010
Introduction: And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning! Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again? Tweets of Gold: jamesurquhart : Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features . tjake : People who choose their datastore based oh hearsay and not their own evaluation are doomed . b6n : No global lock ever goes unpunished. MichaelSurt
4 0.88824034 1419 high scalability-2013-03-07-It's a VM Wasteland - A Near Optimal Packing of VMs to Machines Reduces TCO by 22%
Introduction: In Algorithm Design for Performance Aware VM Consolidation we learn some shocking facts (gambling in Casablanca?): Average server utilization in many data centers is low, estimated between 5% and 15%. This is wasteful because an idle server often consumes more than 50% of peak power. Surely that's just for old style datacenters? Nope. In Google data centers, workloads that are consolidated use only 50% of the processor cores. Every other processor core is left unused simply to ensure that performance does not degrade. It's a VM wasteland. The goal is to reduce waste by packing VMs onto machines without hurting performance or wasting resources. The idea is to select VMs that interfere the least with each other and places them together on the same server. It's a NP-Complete problem, but this paper describes a practical method that performs provably close to the optimal. Interestingly they can optimize for performance or power efficiency, so you can use different algorithm
5 0.88676131 402 high scalability-2008-10-05-Paper: Scalability Design Patterns
Introduction: I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing . Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to: Introduce Scalability Optimize Algorithm Add Hardware Add Parallelism Add Intra-Process Parallelism Add Inter-Porcess Parallelism Add Hybrid Parallelism Optimize Decentralization Control Shared Resources Automate Scalability
7 0.87881041 330 high scalability-2008-05-27-Should Twitter be an All-You-Can-Eat Buffet or a Vending Machine?
8 0.84167731 1471 high scalability-2013-06-06-Paper: Memory Barriers: a Hardware View for Software Hackers
9 0.82763195 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010
10 0.8153342 1414 high scalability-2013-03-01-Stuff The Internet Says On Scalability For February 29, 2013
11 0.81523234 892 high scalability-2010-09-02-Distributed Hashing Algorithms by Example: Consistent Hashing
12 0.81293082 146 high scalability-2007-11-08-scaling drupal - an open-source infrastructure for high-traffic drupal sites
13 0.81018198 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010
14 0.80689597 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?
15 0.80269539 778 high scalability-2010-02-15-The Amazing Collective Compute Power of the Ambient Cloud
16 0.80163568 300 high scalability-2008-04-07-Scalr - Open Source Auto-scaling Hosting on Amazon EC2
17 0.79964048 379 high scalability-2008-09-04-Database question for upcoming project
18 0.79881245 97 high scalability-2007-09-18-Session management in highly scalable web sites
19 0.79283386 879 high scalability-2010-08-12-Think of Latency as a Pseudo-permanent Network Partition
20 0.78705978 768 high scalability-2010-02-01-What Will Kill the Cloud?