high_scalability high_scalability-2008 high_scalability-2008-221 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update: A fun exploration of applied searching in How to search for the word "pen1s" in 185 emails every second . When indexOf doesn't cut it you just trie harder. Has a drunken friend ever inspired you to create a first of its kind internet service that is loved by millions, deemed subversive by thousands, all while handling over 1.2 billion emails a year on one rickity old server? That's how Paul Tyma came to build Mailinator. Mailinator is a free no-setup web service for thwarting evil spammers by creating throw-away registration email addresses. If you don't give web sites you real email address they can't spam you. They spam Mailinator instead :-) I love design with a point-of-view and Mailinator has a big giant harry one: performance first, second, and last. Why? Because Mailinator is free and that allows Paul to showcase his different perspective on design. While competitors buy big Iron to handle load, Paul uses a big idea instead: pick the right problem and create a
sentIndex sentText sentNum sentScore
1 If you don't give web sites you real email address they can't spam you. [sent-7, score-0.603]
2 Mailinator runs for months unattended and very few emails are lost, even under constant spam attacks and high peak loads. [sent-27, score-0.649]
3 The original flow of email handling was: - Sendmail received email in a single on-disk mailbox. [sent-39, score-0.785]
4 The system broke down because of disk contention between Mailinator and the email subsystem. [sent-46, score-0.52]
5 - The web application, the email server, and all email storage run in one JVM. [sent-48, score-0.75]
6 - On arrival each email passes through a filter system and is stored in RAM if all filters are passed. [sent-68, score-0.525]
7 Emails are compressed in RAM: - Since 99% of emails are never looked at, compressed email saves RAM. [sent-73, score-0.893]
8 - Mailinator can store about 80,000 emails in RAM, using under 300MB of RAM compared to the 20,000 emails which were stored in 1GB RAM in the original design. [sent-75, score-0.781]
9 When a web site asks you for an email address you can just enter an mailinator address. [sent-84, score-1.04]
10 Typing in the email address effectively creates the mailinator account. [sent-86, score-1.04]
11 To be accepted an email must pass the following filter chain: - Bounce: all bounced emails are dropped. [sent-92, score-0.839]
12 - IP: too much email from a single IP are dropped - Subject: too much email on the same subject is dropped - Potty: subjects containing words that indicate hate or crimes or just downright nastiness are dropped. [sent-93, score-1.004]
13 - When a sender reaches a threshold email count the sender is blocked. [sent-97, score-0.505]
14 - This filtering is a little more complex than IP blocking because you have to parse enough of the email to get the subject line and matching subject strings is a little more resource intensive. [sent-103, score-0.669]
15 - When something like 20 emails with the same subject within 2 minutes, all emails with that subject are then banned for 1 hour. [sent-104, score-0.996]
16 - Interestingly, subjects are not banned forever because that would mean Mailinator would have to track subjects forever and the system design is inherently transient. [sent-105, score-0.491]
17 At the cost of a few "bad" emails getting through the system is much simpler because no persistent list must be managed and that list surely would become a bottleneck. [sent-107, score-0.565]
18 - From my reading Mailinator filters only on IP and subject, so it doesn't have to read the body of the email body to accept or reject the email. [sent-110, score-0.579]
19 This slows down spammers who are trying to send out spam as fast as possible and may make them rethink sending email again to that address. [sent-114, score-0.628]
20 Keeping emails for a short period of time, allowing some SPAM to get through, and accepting less than 100% uptime create a strong vision for the system that help drive the design in all areas. [sent-129, score-0.651]
wordName wordTfidf (topN-words)
[('mailinator', 0.62), ('email', 0.375), ('emails', 0.373), ('spam', 0.183), ('ip', 0.147), ('subject', 0.098), ('attacks', 0.093), ('disk', 0.081), ('smtp', 0.08), ('ram', 0.078), ('spammers', 0.07), ('design', 0.069), ('subjects', 0.068), ('paul', 0.067), ('reject', 0.067), ('resource', 0.066), ('sender', 0.065), ('system', 0.064), ('inbox', 0.059), ('sendmail', 0.057), ('banned', 0.054), ('period', 0.052), ('uptime', 0.051), ('filter', 0.051), ('zombie', 0.048), ('looked', 0.047), ('usage', 0.045), ('address', 0.045), ('goals', 0.045), ('would', 0.044), ('much', 0.044), ('compression', 0.044), ('addresses', 0.042), ('create', 0.042), ('survival', 0.041), ('must', 0.04), ('blocked', 0.04), ('forever', 0.04), ('resources', 0.039), ('robust', 0.038), ('original', 0.035), ('body', 0.035), ('filters', 0.035), ('memory', 0.033), ('compressed', 0.033), ('saves', 0.032), ('constraints', 0.032), ('accept', 0.032), ('filtering', 0.032), ('limited', 0.031)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999946 221 high scalability-2008-01-24-Mailinator Architecture
Introduction: Update: A fun exploration of applied searching in How to search for the word "pen1s" in 185 emails every second . When indexOf doesn't cut it you just trie harder. Has a drunken friend ever inspired you to create a first of its kind internet service that is loved by millions, deemed subversive by thousands, all while handling over 1.2 billion emails a year on one rickity old server? That's how Paul Tyma came to build Mailinator. Mailinator is a free no-setup web service for thwarting evil spammers by creating throw-away registration email addresses. If you don't give web sites you real email address they can't spam you. They spam Mailinator instead :-) I love design with a point-of-view and Mailinator has a big giant harry one: performance first, second, and last. Why? Because Mailinator is free and that allows Paul to showcase his different perspective on design. While competitors buy big Iron to handle load, Paul uses a big idea instead: pick the right problem and create a
2 0.31646752 253 high scalability-2008-02-19-Building a email communication system
Introduction: hi, the website i work for is looking to build a email system that can handle a fair few emails (up to a hundred thousand a day). These comprise emails like registration emails, newsletters, lots of user triggered emails and overnight emails. At present we queue them in SQL and feed them into an smtp server on one of our web servers when the queue drops below a certain level. this has caused our mail system to crash as well as hammer our DB server (shared!!!). We have got an architecture of what we want to build but thought there might be something we could buy off the shelf that allowed us to keep templated emails, lists of recipients, schedule sends etc and report on it. We can't find anything What do big websites like amazon etc use or people a little smaller but who still send loads of mail (flickr, ebuyer, or other ecommerce sites) Cheers tarqs
3 0.28806248 1269 high scalability-2012-06-20-iDoneThis - Scaling an Email-based App from Scratch
Introduction: This is a guest post by Rodrigo Guzman, CTO of iDoneThis , which makes status reporting happen at your company with the lightest possible touch. iDoneThis is a simple management application that emails your team at the end of every day to ask, "What'd you get done today?" Just reply with a few lines of what you got done. The following morning everyone on your team gets a digest with what the team accomplished the previous day to keep everyone in the loop and kickstart another awesome day. Before we launched, we built iDoneThis over a weekend in the most rudimentary way possible. I kid you not, we sent the first few batches of daily emails using the BCC field of a Gmail inbox. The upshot is that we’ve had users on the site from Day 3 of its existence on. We’ve gone from launch in January 2011 when we sent hundreds of emails out per day by hand to sending out over 1 million emails and handling over 200,000 incoming emails per month. In total, customers have recorded over 1.
4 0.2192596 202 high scalability-2008-01-06-Email Architecture
Introduction: I would like to know email architecture used by large ISPs.. or even used by google. Can someone point me to some sites?? Thanks..
5 0.19068463 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
Introduction: With Lavabit shutting down under murky circumstances , it seems fitting to repost an old (2009), yet still very good post by Ladar Levison on Lavabit's architecture. I don't know how much of this information is still current, but it should give you a general idea what Lavabit was all about. Getting to Know You What is the name of your system and where can we find out more about it? Note: these links are no longer valid... Lavabit http://lavabit.com http://lavabit.com/network.html http://lavabit.com/about.html What is your system for? Lavabit is a mid-sized email service provider. We currently have about 140,000 registered users with more than 260,000 email addresses. While most of our accounts belong to individual users, we also provide corporate email services to approximately 70 companies. Why did you decide to build this system? We built the system to compete against the other large free email providers, with an emphasis on serving the privacy c
6 0.13241188 478 high scalability-2008-12-29-Paper: Spamalytics: An Empirical Analysisof Spam Marketing Conversion
7 0.13068821 551 high scalability-2009-03-30-Lavabit Architecture - Creating a Scalable Email Service
9 0.1195717 1198 high scalability-2012-02-24-Stuff The Internet Says On Scalability For February 24, 2012
10 0.1167728 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
12 0.11481323 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
13 0.11298014 1306 high scalability-2012-08-16-Stuff The Internet Says On Scalability For August 17, 2012
14 0.10946524 168 high scalability-2007-11-30-Strategy: Efficiently Geo-referencing IPs
15 0.10755236 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
16 0.10293275 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
17 0.1027213 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
18 0.10161337 821 high scalability-2010-05-03-MocoSpace Architecture - 3 Billion Mobile Page Views a Month
19 0.10019267 289 high scalability-2008-03-27-Amazon Announces Static IP Addresses and Multiple Datacenter Operation
20 0.099929862 765 high scalability-2010-01-25-Let's Welcome our Neo-Feudal Overlords
topicId topicWeight
[(0, 0.167), (1, 0.091), (2, -0.014), (3, -0.034), (4, -0.001), (5, -0.041), (6, 0.025), (7, 0.039), (8, -0.027), (9, -0.026), (10, -0.015), (11, 0.044), (12, 0.014), (13, 0.024), (14, 0.067), (15, -0.036), (16, -0.031), (17, 0.031), (18, -0.02), (19, 0.037), (20, -0.019), (21, -0.035), (22, -0.019), (23, 0.033), (24, 0.015), (25, -0.029), (26, 0.03), (27, 0.025), (28, -0.05), (29, -0.008), (30, -0.023), (31, 0.002), (32, -0.021), (33, -0.028), (34, -0.022), (35, -0.015), (36, -0.018), (37, 0.073), (38, 0.028), (39, 0.02), (40, 0.043), (41, 0.092), (42, 0.061), (43, 0.041), (44, 0.017), (45, 0.033), (46, -0.004), (47, -0.054), (48, 0.014), (49, 0.011)]
simIndex simValue blogId blogTitle
same-blog 1 0.95755231 221 high scalability-2008-01-24-Mailinator Architecture
Introduction: Update: A fun exploration of applied searching in How to search for the word "pen1s" in 185 emails every second . When indexOf doesn't cut it you just trie harder. Has a drunken friend ever inspired you to create a first of its kind internet service that is loved by millions, deemed subversive by thousands, all while handling over 1.2 billion emails a year on one rickity old server? That's how Paul Tyma came to build Mailinator. Mailinator is a free no-setup web service for thwarting evil spammers by creating throw-away registration email addresses. If you don't give web sites you real email address they can't spam you. They spam Mailinator instead :-) I love design with a point-of-view and Mailinator has a big giant harry one: performance first, second, and last. Why? Because Mailinator is free and that allows Paul to showcase his different perspective on design. While competitors buy big Iron to handle load, Paul uses a big idea instead: pick the right problem and create a
2 0.79907334 551 high scalability-2009-03-30-Lavabit Architecture - Creating a Scalable Email Service
Introduction: Ladar Levison of Lavabit has written an incredible article on how they took a centralized off-the-shelf email server that could handle only few thousand users and built their own custom distributed infrastructure for handling hundreds of thousands of email users. Lavabit processes 70 gigabytes of data per day, is made up of 26 servers, hosts 260,000 email addresses, and processes 600,000 emails a day. That's a lot of email. Lavabit's mission has a little edge to it too: Lavabit was founded as a direct reaction to the larger free e-mail services available. We felt it was possible to create an e-mail service that was fast, reliable, feature rich and didn't achieve profitability by prostituting its user base to marketers. What I really like about this article is that Lavabit has some challenging elements in dealing with different email protocols while being able to scale to a lot of users. There's more going on than just trying to scale out a database. Many products contain com
Introduction: You know your product is doing well when most of your early blog posts deal with the status of the waiting list of hundreds of thousands of users eagerly waiting to download your product. That's the enviable position Mailbox , a free mobile email management app, found themselves early in their release cycle. Hasn't email been done already? Apparently not. Mailbox scaled to one million users in a paltry six weeks with a team of about 14 people . As of April they were delivering over 100 million messages per day . How did they do it? Mailbox engineering lead, Sean Beausoleil , gave an informative interview on readwrite.com on how Mailbox planned to scale... Gather signals early . A pre-release launch video helped generate interest, but it also allowed them to gauge early interest before even releasing. From the overwhelming response they knew they would need to have to scale and scale quickly. Have something unique . The average person might not think a mailbox app
4 0.77132553 253 high scalability-2008-02-19-Building a email communication system
Introduction: hi, the website i work for is looking to build a email system that can handle a fair few emails (up to a hundred thousand a day). These comprise emails like registration emails, newsletters, lots of user triggered emails and overnight emails. At present we queue them in SQL and feed them into an smtp server on one of our web servers when the queue drops below a certain level. this has caused our mail system to crash as well as hammer our DB server (shared!!!). We have got an architecture of what we want to build but thought there might be something we could buy off the shelf that allowed us to keep templated emails, lists of recipients, schedule sends etc and report on it. We can't find anything What do big websites like amazon etc use or people a little smaller but who still send loads of mail (flickr, ebuyer, or other ecommerce sites) Cheers tarqs
5 0.77102971 1269 high scalability-2012-06-20-iDoneThis - Scaling an Email-based App from Scratch
Introduction: This is a guest post by Rodrigo Guzman, CTO of iDoneThis , which makes status reporting happen at your company with the lightest possible touch. iDoneThis is a simple management application that emails your team at the end of every day to ask, "What'd you get done today?" Just reply with a few lines of what you got done. The following morning everyone on your team gets a digest with what the team accomplished the previous day to keep everyone in the loop and kickstart another awesome day. Before we launched, we built iDoneThis over a weekend in the most rudimentary way possible. I kid you not, we sent the first few batches of daily emails using the BCC field of a Gmail inbox. The upshot is that we’ve had users on the site from Day 3 of its existence on. We’ve gone from launch in January 2011 when we sent hundreds of emails out per day by hand to sending out over 1 million emails and handling over 200,000 incoming emails per month. In total, customers have recorded over 1.
6 0.74832493 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?
7 0.73141539 202 high scalability-2008-01-06-Email Architecture
8 0.72758776 985 high scalability-2011-02-08-Mollom Architecture - Killing Over 373 Million Spams at 100 Requests Per Second
9 0.72186381 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
10 0.71441299 1012 high scalability-2011-03-28-Aztec Empire Strategy: Use Dual Pipes in Your Aqueduct for High Availability
11 0.71104896 1568 high scalability-2013-12-23-What Happens While Your Brain Sleeps is Surprisingly Like How Computers Stay Sane
13 0.69814098 1638 high scalability-2014-04-28-How Disqus Went Realtime with 165K Messages Per Second and Less than .2 Seconds Latency
14 0.69611126 1584 high scalability-2014-01-22-How would you build the next Internet? Loons, Drones, Copters, Satellites, or Something Else?
15 0.68926114 1460 high scalability-2013-05-17-Stuff The Internet Says On Scalability For May 17, 2013
16 0.68658501 347 high scalability-2008-07-07-Five Ways to Stop Framework Fixation from Crashing Your Scaling Strategy
17 0.68626112 1635 high scalability-2014-04-21-This is why Microsoft won. And why they lost.
18 0.68482101 682 high scalability-2009-08-16-ThePort Network Architecture
19 0.68475145 80 high scalability-2007-09-06-Product: Perdition Mail Retrieval Proxy
20 0.68453521 1503 high scalability-2013-08-19-What can the Amazing Race to the South Pole Teach us About Startups?
topicId topicWeight
[(1, 0.079), (2, 0.283), (10, 0.062), (15, 0.023), (17, 0.013), (28, 0.096), (30, 0.021), (33, 0.01), (40, 0.012), (47, 0.017), (61, 0.077), (77, 0.018), (79, 0.074), (85, 0.023), (94, 0.064), (99, 0.019)]
simIndex simValue blogId blogTitle
same-blog 1 0.96341717 221 high scalability-2008-01-24-Mailinator Architecture
Introduction: Update: A fun exploration of applied searching in How to search for the word "pen1s" in 185 emails every second . When indexOf doesn't cut it you just trie harder. Has a drunken friend ever inspired you to create a first of its kind internet service that is loved by millions, deemed subversive by thousands, all while handling over 1.2 billion emails a year on one rickity old server? That's how Paul Tyma came to build Mailinator. Mailinator is a free no-setup web service for thwarting evil spammers by creating throw-away registration email addresses. If you don't give web sites you real email address they can't spam you. They spam Mailinator instead :-) I love design with a point-of-view and Mailinator has a big giant harry one: performance first, second, and last. Why? Because Mailinator is free and that allows Paul to showcase his different perspective on design. While competitors buy big Iron to handle load, Paul uses a big idea instead: pick the right problem and create a
2 0.95699096 1261 high scalability-2012-06-08-Stuff The Internet Says On Scalability For June 8, 2012
Introduction: It's HighScalability Time: 21TB : Tumblr relational data Quotable Quotes: @ajbaird : Scalability is not a "feature" tacked on at the end development. @h_ingo : I like Doron's comparison : Build a MySQL scale-out cluster instead, then buy 2 Ferrari's with the money saved :-) You might figure Harry Potter would have some sort of scaling spell, but no, he has to rely on the muggle powered Azure . Pottermore uses Azure to handle 110 million page impressions a day. Ian Bogost in What Should We Do for a Living? brings up a sobering idea from the Facebook Illusion , the Internet economy will not save us, it sucks at scaling jobs and exists because it is subsidized by surpluses from the old economy it was supposed to replace. Where is that replicator when we need it? In Praise of Idleness . Bruce Dawson argues against busy waiting and for locks, in most cases. Couldn't agree more, that's a lot of CPU doing nothing and programmers quickly lose trac
Introduction: Counting at scale in a distributed environment is surprisingly hard . And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory , How to update video views count effectively? , Numbers Everyone Should Know (sharded counters) . Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments." Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."
Introduction: In Resolved: Widespread Application Outage , Heroku tells their story of how they dealt with the Amazon outage . While taking 100% responsibility for the downtime, they also shared a number of the strategies they used to bring their service back to full working order. One of Heroku's most interesting strategies wasn't a technical hack at all, but how they consciously went about deploying their Ops personnel in response to the emergency. An outline of their strategy is: Monitoring systems immediately alerted Ops to the problem. An on-call engineer applied triage logic to the problem and classified it as serious, which caused the on-call Incident Commander to be woken out of restful slumber. The IC contacted AWS . They were in constant contact with their AWS representative and worked closely with AWS to solve problems. The IC alerted Heroku engineers. A full crew: support, data, and other engineering teams worked around the clock to bring every
5 0.94638216 1421 high scalability-2013-03-11-Low Level Scalability Solutions - The Conditioning Collection
Introduction: We talked about 42 Monster Problems That Attack As Loads Increase . And in The Aggregation Collection we talked about the value of prioritizing work and making smart queues as a way of absorbing and not reflecting traffic spikes. Now we move on to our next batch of strategies where the theme is conditioning , which is the idea of shaping and controlling flows of work within your application... Use Resources Proportional To a Fixed Limit This is probably the most important rule for achieving scalability within an application. What it means: Find the resource that has a fixed limit that you know you can support. For example, a guarantee to handle a certain number of objects in memory. So if we always use resources proportional to the number of objects it is likely we can prevent resource exhaustion. Devise ways of tying what you need to do to the individual resources. Some examples: Keep a list of purchase orders with line items over $20 (or whatever). Do not keep
6 0.94246417 1439 high scalability-2013-04-12-Stuff The Internet Says On Scalability For April 12, 2013
7 0.94190907 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
8 0.9413833 960 high scalability-2010-12-20-Netflix: Use Less Chatty Protocols in the Cloud - Plus 26 Fixes
9 0.94133282 1413 high scalability-2013-02-27-42 Monster Problems that Attack as Loads Increase
11 0.93961972 752 high scalability-2009-12-17-Oracle and IBM databases: Disk-based vs In-memory databases
12 0.9391281 556 high scalability-2009-04-05-At Some Point the Cost of Servers Outweighs the Cost of Programmers
13 0.9389655 1387 high scalability-2013-01-15-More Numbers Every Awesome Programmer Must Know
14 0.9388113 1418 high scalability-2013-03-06-Low Level Scalability Solutions - The Aggregation Collection
15 0.93794984 1207 high scalability-2012-03-12-Google: Taming the Long Latency Tail - When More Machines Equals Worse Results
16 0.93633819 1244 high scalability-2012-05-11-Stuff The Internet Says On Scalability For May 11, 2012
17 0.93579525 1456 high scalability-2013-05-13-The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution
18 0.93542081 1294 high scalability-2012-08-01-Prismatic Update: Machine Learning on Documents and Users
19 0.93506265 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
20 0.93496269 661 high scalability-2009-07-25-Latency is Everywhere and it Costs You Sales - How to Crush it