high_scalability high_scalability-2011 high_scalability-2011-1059 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles A new twist on "data-driven site" by Mac Slocum. How a billion points of app data shape TripAdvisor's website.
sentIndex sentText sentNum sentScore
1 Sometimes I get article proposals and then there's no follow up. [sent-1, score-0.416]
2 We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. [sent-3, score-0.081]
3 We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. [sent-4, score-0.704]
4 Too bad, it sounds like it would have been a good article. [sent-5, score-0.111]
5 Related Articles A new twist on "data-driven site" by Mac Slocum. [sent-6, score-0.171]
6 How a billion points of app data shape TripAdvisor's website. [sent-7, score-0.472]
wordName wordTfidf (topN-words)
[('tripadvisor', 0.39), ('proposals', 0.238), ('sem', 0.23), ('campaigns', 0.222), ('excess', 0.192), ('standby', 0.189), ('points', 0.181), ('responds', 0.179), ('mac', 0.177), ('shop', 0.177), ('twist', 0.171), ('duplicate', 0.159), ('warehouse', 0.149), ('shape', 0.139), ('maintenance', 0.129), ('serves', 0.126), ('redundancy', 0.124), ('generated', 0.123), ('dynamically', 0.121), ('supported', 0.12), ('day', 0.12), ('site', 0.111), ('sounds', 0.111), ('sometimes', 0.11), ('centers', 0.108), ('sharing', 0.107), ('drive', 0.106), ('follow', 0.105), ('media', 0.105), ('maintain', 0.104), ('cdn', 0.103), ('email', 0.099), ('active', 0.099), ('static', 0.097), ('articles', 0.096), ('bad', 0.093), ('worth', 0.088), ('general', 0.087), ('thought', 0.083), ('failure', 0.083), ('view', 0.081), ('distributed', 0.081), ('data', 0.08), ('though', 0.079), ('goes', 0.077), ('article', 0.073), ('billion', 0.072), ('machines', 0.071), ('content', 0.07), ('page', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1059 high scalability-2011-06-14-A TripAdvisor Short
Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles A new twist on "data-driven site" by Mac Slocum. How a billion points of app data shape TripAdvisor's website.
2 0.19874957 1072 high scalability-2011-07-01-TripAdvisor Strategy: No Architects, Engineers Work Across the Entire Stack
Introduction: If you are an insect , don't work at TripAdvisor, specialization is out. One of the most commented on strategies from the TripAdvisor architecture article is their rather opinionated take on the role of engineers in the organization. Typically engineers live in a box. They are specialized, they do database work and not much else, and they just do programming, not much else. TripAdvisor takes the road less traveled: Engineers work across entire stack - HTML, CSS, JS, Java, scripting. If you do not know something, you learn it. The only thing that gets in the way of delivering your project is you, as you are expected to work at all levels - design, code, test, monitoring, CSS, JS, Java, SQL, scripting. We do not have "architects." At TripAdvisor, if you design something, your code it, and if you code it you test it. Engineers who do not like to go outside their comfort zone, or who feel certain work is "beneath" them will simply get in the way. A radical take for an e
3 0.18709299 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
Introduction: This is a guest post by Andy Gelfond , VP of Engineering for TripAdvisor. Andy has been with TripAdvisor for six and a half years, wrote a lot of code in the earlier days, and has been building and running a first class engineering and operations team that is responsible for the worlds largest travel site. There's an update for this article at An Epic TripAdvisor Update: Why Not Run On The Cloud? The Grand Experiment . For TripAdvisor , scalability is woven into our organization on many levels - data center, software architecture, development/deployment/operations, and, most importantly, within the culture and organization. It is not enough to have a scalable data center, or a scalable software architecture. The process of designing, coding, testing, and deploying code also needs to be scalable. All of this starts with hiring and a culture and an organization that values and supports a distributed, fast, and effective development and operation of a complex and highly scalable co
4 0.18642096 1353 high scalability-2012-11-01-Cost Analysis: TripAdvisor and Pinterest costs on the AWS cloud
Introduction: This is a guest post by Ali Khajeh-Hosseini , Technical Lead at PlanForCloud.com. I read a recent blog post about TripAdvisor's experiment with AWS where they attempted to process 700K HTTP requests per minute on a replica of their live site. There was also an interesting blog post on Pinterest's massive growth on AWS . These blogs highlighted exactly the types of questions we're interested in, mainly: How much would it cost to deploy System X on Cloud Y? e.g. how much would it cost to host TripAdvisor on the AWS US-East cloud? Would it be cheaper to use deployment option X or Y? e.g. would it be cheaper to use reserved instances, different types of instances, different cloud providers... What happens to costs when the system grows? e.g. Pinterest has around 410TB of data on S3, what if that keeps growing at a rate of 25% every month, like it has been in the last 10 months? I created a couple of deployments in PlanForCloud to explore these qu
Introduction: This is a guest post by Ali Khajeh-Hosseini , Technical Lead at PlanForCloud . The original article was published on their site . With 29 cloud price reductions I thought it would be interesting to see how the bottom line would change compared to an article we published last year . The result is surprisingly little for TripAdvisor because prices for On Demand instances have not dropped as fast as for other other instances types. Over the last year and a half, we counted 29 price reductions in cloud services provided by AWS, Google Compute Engine, Windows Azure, and Rackspace Cloud. Price reductions have a direct effect on cloud users, but given the usual tiny reductions, how significant is that effect on the bottom line? Last year I wrote about cloud cost forecasts for TripAdvisor and Pinterest . TripAdvisor was experimenting with AWS and attempted to process 700K HTTP requests per minute on a replica of its live site, and Pinterest was growing massively on AWS . In th
6 0.14182813 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub
8 0.094897509 202 high scalability-2008-01-06-Email Architecture
9 0.094399512 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?
11 0.081779614 100 high scalability-2007-09-26-Use a CDN to Instantly Improve Your Website's Performance by 20% or More
12 0.081513762 382 high scalability-2008-09-09-Content Delivery Networks (CDN) – a comprehensive list of providers
13 0.079344817 152 high scalability-2007-11-13-Flickr Architecture
14 0.07589826 576 high scalability-2009-04-21-What CDN would you recommend?
15 0.075165659 182 high scalability-2007-12-12-Oracle Can Do Read-Write Splitting Too
16 0.074463144 758 high scalability-2010-01-11-Have We Reached the End of Scaling?
17 0.07433629 240 high scalability-2008-02-05-Handling of Session for a site running from more than 1 data center
18 0.073444128 1 high scalability-2007-07-06-Start Here
19 0.072200269 39 high scalability-2007-07-30-Product: Akamai
20 0.071971141 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.
topicId topicWeight
[(0, 0.127), (1, 0.066), (2, -0.011), (3, -0.052), (4, -0.026), (5, -0.063), (6, -0.038), (7, -0.025), (8, 0.024), (9, 0.053), (10, -0.019), (11, -0.034), (12, -0.026), (13, -0.04), (14, 0.027), (15, 0.024), (16, -0.014), (17, -0.022), (18, 0.006), (19, -0.036), (20, -0.02), (21, 0.014), (22, 0.008), (23, 0.016), (24, -0.056), (25, -0.069), (26, 0.021), (27, 0.009), (28, 0.001), (29, -0.006), (30, -0.008), (31, 0.066), (32, -0.029), (33, 0.047), (34, -0.028), (35, 0.008), (36, 0.054), (37, -0.011), (38, -0.023), (39, 0.047), (40, -0.027), (41, 0.014), (42, 0.02), (43, 0.05), (44, 0.015), (45, -0.084), (46, 0.025), (47, -0.005), (48, -0.097), (49, -0.002)]
simIndex simValue blogId blogTitle
same-blog 1 0.9319455 1059 high scalability-2011-06-14-A TripAdvisor Short
Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles A new twist on "data-driven site" by Mac Slocum. How a billion points of app data shape TripAdvisor's website.
2 0.70218259 730 high scalability-2009-10-28-GemFire: Solving the hardest problems in data management
Introduction: GemStone's website recently recieved a major facelift over at www.gemstone.com . I felt that the users of this site might find our detailed description of how we solve the hardest problems in data management interesting. This can be viewed at: http://www.gemstone.com/hardest-problems (PDF available for download). Also check out our industry page to see how GemFire applies to multiple industries, then head over to the solutions page to see how GemFire enables mainframe migration, real-time BI in data warehousing, RDB scalup/speedup and the cloud. Finally, check out our community site if you want a more technical view of GemFire. We hope you enjoy the new facelift and content!
3 0.65813321 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush
Introduction: Election night is a big traffic boost for news and social sites. Yahoo expects up to 400 million page views on Election Day. Data Center Knowledge has an excellent article how various sites are preparing to handle spikes in election night traffic. Some interesting bits: Prepare ahead . Don't wait to handle spikes, plan and prepare before the blessed event. Use a CDN . Daily Kos puts images on a CDN, but the dynamic nature of their site means the can't use CDN for their other content. Scale up . Daily Kos "to handle the traffic better, we moved to a cluster of six quad core Xeons with 8GB RAM for webheads that all boot off a central NFS (Network File System) root, with the capability of adding more webheads as needed,” . They also "added two 16GB eight-core Xeons and a 6×73GB RAID-10 array for database files running a MySQL master/slave setup." Add Cache . Daily Kos added 1GB instances memcached to each webhead. Change Caching Strategy . Daily Kos puts fully rendered pa
4 0.647466 238 high scalability-2008-02-04-IPS-IDS for heavy content site
Introduction: All, My site would have heavy content (video/pictures). I'm looking for an efficient IPS/IDS solution which would not introduce much of latency. I'm more familiar with Cisco ASA and also familiar with Juniper, Foundry and others. I also came across snort but haven't used it before. I'm more of looking for an appliance (for the ease of configuration,support etc...) Could any one share their thoughts on performane of IPS/IDS from this vendors? Thanks! Janakan Rajendran
5 0.64633822 159 high scalability-2007-11-18-Reverse Proxy
Introduction: Hi, I saw an year ago that Netapp sold netcache to blu-coat, my site is a heavy NetCache user and we cached 83% of our site. We tested with Blue-coat and F5 WA and we are not getting same performce as NetCache. Any of you guys have the same issue? or somebody knows another product can handle much traffic? Thanks Rodrigo
6 0.63191015 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub
7 0.62064999 440 high scalability-2008-11-11-Arhcitecture for content management
8 0.61948967 144 high scalability-2007-11-07-What CDN would you recommend?
9 0.61531889 1 high scalability-2007-07-06-Start Here
10 0.61399531 28 high scalability-2007-07-25-Product: NetApp MetroCluster Software
11 0.60036927 965 high scalability-2010-12-29-Pinboard.in Architecture - Pay to Play to Keep a System Small
12 0.60034484 714 high scalability-2009-10-02-HighScalability has Moved to Squarespace.com!
13 0.59936357 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site
14 0.59757513 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
15 0.59392822 1108 high scalability-2011-08-31-Pud is the Anti-Stack - Windows, CFML, Dropbox, Xeround, JungleDisk, ELB
16 0.59061503 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails
17 0.58955801 71 high scalability-2007-08-22-Profiling WEB applications
18 0.58793455 1070 high scalability-2011-06-29-Second Hand Seizure : A New Cause of Site Death
19 0.58189821 287 high scalability-2008-03-24-Advertise
20 0.57984942 611 high scalability-2009-05-31-Need help on Site loading & database optimization - URGENT
topicId topicWeight
[(2, 0.209), (61, 0.165), (77, 0.376), (79, 0.117), (94, 0.015)]
simIndex simValue blogId blogTitle
Introduction: Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.
2 0.86734468 258 high scalability-2008-02-24-Yandex Architecture
Introduction: Update: Anatomy of a crash in a new part of Yandex written in Django . Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3.5 billion pages in the search index. Over several thousand servers. 35 million searches a day. Several data centers around Russia. Two-layer architecture. The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. Languages
3 0.8403517 1116 high scalability-2011-09-15-Paper: It's Time for Low Latency - Inventing the 1 Microsecond Datacenter
Introduction: In It's Time for Low Latency Stephen Rumble et al. explore the idea that it's time to rearchitect our stack to live in the modern era of low-latency datacenter instead of high-latency WANs. The implications for program architectures will be revolutionary . Luiz André Barroso , Distinguished Engineer at Google, sees ultra low latency as a way to make computer resources, to be as much as possible, fungible, that is they are interchangeable and location independent, effectively turning a datacenter into single computer. Abstract from the paper: The operating systems community has ignored network latency for too long. In the past, speed-of-light delays in wide area networks and unoptimized network hardware have made sub-100µs round-trip times impossible. However, in the next few years datacenters will be deployed with low-latency Ethernet. Without the burden of propagation delays in the datacenter campus and network delays in the Ethernet devices, it will be up to us to finish
same-blog 4 0.83520067 1059 high scalability-2011-06-14-A TripAdvisor Short
Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles A new twist on "data-driven site" by Mac Slocum. How a billion points of app data shape TripAdvisor's website.
5 0.83467579 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database
Introduction: With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB , in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes. From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in: Java , Query Method: Java or P2P, Replication: P2P , Concurrency: STM , Misc: Open-Source, Especially for AI and Semantic Web. So it has some interesting features, like software transactional memory and P2P for data distribution , but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was: A HyperGraphD
6 0.81438828 753 high scalability-2009-12-21-Hot Holiday Scalability Links for 2009
7 0.81312048 211 high scalability-2008-01-13-Google Reveals New MapReduce Stats
8 0.80359751 1195 high scalability-2012-02-17-Stuff The Internet Says On Scalability For February 17, 2012
10 0.79882628 959 high scalability-2010-12-17-Stuff the Internet Says on Scalability For December 17th, 2010
11 0.79861367 1571 high scalability-2014-01-02-xkcd: How Standards Proliferate:
12 0.76413596 525 high scalability-2009-03-05-Product: Amazon Simple Storage Service
13 0.73456341 1158 high scalability-2011-12-16-Stuff The Internet Says On Scalability For December 16, 2011
14 0.73348039 1531 high scalability-2013-10-13-AIDA: Badoo’s journey into Continuous Integration
15 0.72346735 222 high scalability-2008-01-25-Application Database and DAL Architecture
16 0.7219345 1377 high scalability-2012-12-26-Ask HS: What will programming and architecture look like in 2020?
17 0.68130893 1567 high scalability-2013-12-20-Stuff The Internet Says On Scalability For December 20th, 2013
18 0.67680997 439 high scalability-2008-11-10-Scalability Perspectives #1: Nicholas Carr – The Big Switch
19 0.66241586 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
20 0.64297545 1107 high scalability-2011-08-29-The Three Ages of Google - Batch, Warehouse, Instant