high_scalability high_scalability-2011 high_scalability-2011-1059 knowledge-graph by maker-knowledge-mining

1059 high scalability-2011-06-14-A TripAdvisor Short


meta infos for this blog

Source: html

Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles   A new twist on "data-driven site"  by Mac Slocum. How a billion points of app data shape TripAdvisor's website.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sometimes I get article proposals and then there's no follow up. [sent-1, score-0.416]

2 We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. [sent-3, score-0.081]

3 We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. [sent-4, score-0.704]

4 Too bad, it sounds like it would have been a good article. [sent-5, score-0.111]

5 Related Articles   A new twist on "data-driven site"  by Mac Slocum. [sent-6, score-0.171]

6 How a billion points of app data shape TripAdvisor's website. [sent-7, score-0.472]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('tripadvisor', 0.39), ('proposals', 0.238), ('sem', 0.23), ('campaigns', 0.222), ('excess', 0.192), ('standby', 0.189), ('points', 0.181), ('responds', 0.179), ('mac', 0.177), ('shop', 0.177), ('twist', 0.171), ('duplicate', 0.159), ('warehouse', 0.149), ('shape', 0.139), ('maintenance', 0.129), ('serves', 0.126), ('redundancy', 0.124), ('generated', 0.123), ('dynamically', 0.121), ('supported', 0.12), ('day', 0.12), ('site', 0.111), ('sounds', 0.111), ('sometimes', 0.11), ('centers', 0.108), ('sharing', 0.107), ('drive', 0.106), ('follow', 0.105), ('media', 0.105), ('maintain', 0.104), ('cdn', 0.103), ('email', 0.099), ('active', 0.099), ('static', 0.097), ('articles', 0.096), ('bad', 0.093), ('worth', 0.088), ('general', 0.087), ('thought', 0.083), ('failure', 0.083), ('view', 0.081), ('distributed', 0.081), ('data', 0.08), ('though', 0.079), ('goes', 0.077), ('article', 0.073), ('billion', 0.072), ('machines', 0.071), ('content', 0.07), ('page', 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1059 high scalability-2011-06-14-A TripAdvisor Short

Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles   A new twist on "data-driven site"  by Mac Slocum. How a billion points of app data shape TripAdvisor's website.

2 0.19874957 1072 high scalability-2011-07-01-TripAdvisor Strategy: No Architects, Engineers Work Across the Entire Stack

Introduction: If you are an insect , don't work at TripAdvisor, specialization is out. One of the most commented on strategies from the TripAdvisor  architecture article is their rather opinionated take on the role of engineers in the organization. Typically engineers live in a box. They are specialized, they do database work and not much else, and they just do programming, not much else. TripAdvisor takes the road less traveled: Engineers work across entire stack - HTML, CSS, JS, Java, scripting. If you do not know something, you learn it. The only thing that gets in the way of delivering your project is you, as you are expected to work at all levels - design, code, test, monitoring, CSS, JS, Java, SQL, scripting. We do not have "architects."   At TripAdvisor, if you design something, your code it, and if you code it you test it. Engineers who do not like to go outside their comfort zone, or who feel certain work is "beneath" them will simply get in the way. A radical take for an e

3 0.18709299 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data

Introduction: This is a guest post by Andy Gelfond , VP of Engineering for TripAdvisor. Andy has been with TripAdvisor for six and a half years, wrote a lot of code in the earlier days, and has been building and running a first class engineering and operations team that is responsible for the worlds largest travel site. There's an update for this article at An Epic TripAdvisor Update: Why Not Run On The Cloud? The Grand Experiment .  For  TripAdvisor , scalability is woven into our organization on many levels - data center, software architecture, development/deployment/operations, and, most importantly, within the culture and organization. It is not enough to have a scalable data center, or a scalable software architecture. The process of designing, coding, testing, and deploying code also needs to be scalable. All of this starts with hiring and a culture and an organization that values and supports a distributed, fast, and effective development and operation of a complex and highly scalable co

4 0.18642096 1353 high scalability-2012-11-01-Cost Analysis: TripAdvisor and Pinterest costs on the AWS cloud

Introduction: This is a guest post by  Ali Khajeh-Hosseini , Technical Lead at PlanForCloud.com.    I read a recent blog post about  TripAdvisor's experiment with AWS  where they attempted to process 700K HTTP requests per minute on a replica of their live site. There was also an interesting blog post on  Pinterest's massive growth on AWS . These blogs highlighted exactly the types of questions we're interested in, mainly: How much would it cost to deploy System X on Cloud Y?  e.g. how much would it cost to host TripAdvisor on the AWS US-East cloud? Would it be cheaper to use deployment option X or Y?   e.g. would it be cheaper to use reserved instances, different types of instances, different cloud providers... What happens to costs when the system grows?   e.g. Pinterest has around 410TB of data on S3, what if that keeps growing at a rate of 25% every month, like it has been in the last 10 months? I created a couple of deployments in PlanForCloud to explore these qu

5 0.17350876 1480 high scalability-2013-06-24-Update on How 29 Cloud Price Drops Changed the Bottom Line of TripAdvisor and Pinterest - Results Mixed

Introduction: This is a guest post by Ali Khajeh-Hosseini , Technical Lead at PlanForCloud . The original article was published on their site . With 29 cloud price reductions I thought it would be interesting to see how the bottom line would change compared to an article we published last year . The result is surprisingly little for TripAdvisor because prices for On Demand instances have not dropped as fast as for other other instances types. Over the last year and a half, we counted 29 price reductions in cloud services provided by AWS, Google Compute Engine, Windows Azure, and Rackspace Cloud. Price reductions have a direct effect on cloud users, but given the usual tiny reductions, how significant is that effect on the bottom line? Last year I wrote about cloud cost forecasts for TripAdvisor and Pinterest . TripAdvisor was experimenting with AWS and attempted to process 700K HTTP requests per minute on a replica of its live site, and Pinterest was growing massively on AWS . In th

6 0.14182813 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub

7 0.10299871 1597 high scalability-2014-02-17-How the AOL.com Architecture Evolved to 99.999% Availability, 8 Million Visitors Per Day, and 200,000 Requests Per Second

8 0.094897509 202 high scalability-2008-01-06-Email Architecture

9 0.094399512 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?

10 0.088457517 1086 high scalability-2011-07-26-Sponsored Post: BetterWorks, New Relic, eHarmony, TripAdvisor, NoSQL Now!, Surge, Tungsten, Aconex, Mathworks, AppDynamics, ScaleOut, Couchbase, CloudSigma, ManageEngine, Site24x7

11 0.081779614 100 high scalability-2007-09-26-Use a CDN to Instantly Improve Your Website's Performance by 20% or More

12 0.081513762 382 high scalability-2008-09-09-Content Delivery Networks (CDN) – a comprehensive list of providers

13 0.079344817 152 high scalability-2007-11-13-Flickr Architecture

14 0.07589826 576 high scalability-2009-04-21-What CDN would you recommend?

15 0.075165659 182 high scalability-2007-12-12-Oracle Can Do Read-Write Splitting Too

16 0.074463144 758 high scalability-2010-01-11-Have We Reached the End of Scaling?

17 0.07433629 240 high scalability-2008-02-05-Handling of Session for a site running from more than 1 data center

18 0.073444128 1 high scalability-2007-07-06-Start Here

19 0.072200269 39 high scalability-2007-07-30-Product: Akamai

20 0.071971141 1331 high scalability-2012-10-02-An Epic TripAdvisor Update: Why Not Run on the Cloud? The Grand Experiment.


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.127), (1, 0.066), (2, -0.011), (3, -0.052), (4, -0.026), (5, -0.063), (6, -0.038), (7, -0.025), (8, 0.024), (9, 0.053), (10, -0.019), (11, -0.034), (12, -0.026), (13, -0.04), (14, 0.027), (15, 0.024), (16, -0.014), (17, -0.022), (18, 0.006), (19, -0.036), (20, -0.02), (21, 0.014), (22, 0.008), (23, 0.016), (24, -0.056), (25, -0.069), (26, 0.021), (27, 0.009), (28, 0.001), (29, -0.006), (30, -0.008), (31, 0.066), (32, -0.029), (33, 0.047), (34, -0.028), (35, 0.008), (36, 0.054), (37, -0.011), (38, -0.023), (39, 0.047), (40, -0.027), (41, 0.014), (42, 0.02), (43, 0.05), (44, 0.015), (45, -0.084), (46, 0.025), (47, -0.005), (48, -0.097), (49, -0.002)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9319455 1059 high scalability-2011-06-14-A TripAdvisor Short

Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles   A new twist on "data-driven site"  by Mac Slocum. How a billion points of app data shape TripAdvisor's website.

2 0.70218259 730 high scalability-2009-10-28-GemFire: Solving the hardest problems in data management

Introduction: GemStone's website recently recieved a major facelift over at www.gemstone.com . I felt that the users of this site might find our detailed description of how we solve the hardest problems in data management interesting. This can be viewed at: http://www.gemstone.com/hardest-problems (PDF available for download). Also check out our industry page to see how GemFire applies to multiple industries, then head over to the solutions page to see how GemFire  enables mainframe migration, real-time BI in data warehousing, RDB scalup/speedup and the cloud. Finally, check out our community site if you want a more technical view of GemFire. We hope you enjoy the new facelift and content!

3 0.65813321 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush

Introduction: Election night is a big traffic boost for news and social sites. Yahoo expects up to 400 million page views on Election Day. Data Center Knowledge has an excellent article how various sites are preparing to handle spikes in election night traffic. Some interesting bits: Prepare ahead . Don't wait to handle spikes, plan and prepare before the blessed event. Use a CDN . Daily Kos puts images on a CDN, but the dynamic nature of their site means the can't use CDN for their other content. Scale up . Daily Kos "to handle the traffic better, we moved to a cluster of six quad core Xeons with 8GB RAM for webheads that all boot off a central NFS (Network File System) root, with the capability of adding more webheads as needed,” . They also "added two 16GB eight-core Xeons and a 6×73GB RAID-10 array for database files running a MySQL master/slave setup." Add Cache . Daily Kos added 1GB instances memcached to each webhead. Change Caching Strategy . Daily Kos puts fully rendered pa

4 0.647466 238 high scalability-2008-02-04-IPS-IDS for heavy content site

Introduction: All, My site would have heavy content (video/pictures). I'm looking for an efficient IPS/IDS solution which would not introduce much of latency. I'm more familiar with Cisco ASA and also familiar with Juniper, Foundry and others. I also came across snort but haven't used it before. I'm more of looking for an appliance (for the ease of configuration,support etc...) Could any one share their thoughts on performane of IPS/IDS from this vendors? Thanks! Janakan Rajendran

5 0.64633822 159 high scalability-2007-11-18-Reverse Proxy

Introduction: Hi, I saw an year ago that Netapp sold netcache to blu-coat, my site is a heavy NetCache user and we cached 83% of our site. We tested with Blue-coat and F5 WA and we are not getting same performce as NetCache. Any of you guys have the same issue? or somebody knows another product can handle much traffic? Thanks Rodrigo

6 0.63191015 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub

7 0.62064999 440 high scalability-2008-11-11-Arhcitecture for content management

8 0.61948967 144 high scalability-2007-11-07-What CDN would you recommend?

9 0.61531889 1 high scalability-2007-07-06-Start Here

10 0.61399531 28 high scalability-2007-07-25-Product: NetApp MetroCluster Software

11 0.60036927 965 high scalability-2010-12-29-Pinboard.in Architecture - Pay to Play to Keep a System Small

12 0.60034484 714 high scalability-2009-10-02-HighScalability has Moved to Squarespace.com!

13 0.59936357 181 high scalability-2007-12-11-Hosting and CDN for startup video sharing site

14 0.59757513 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data

15 0.59392822 1108 high scalability-2011-08-31-Pud is the Anti-Stack - Windows, CFML, Dropbox, Xeround, JungleDisk, ELB

16 0.59061503 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails

17 0.58955801 71 high scalability-2007-08-22-Profiling WEB applications

18 0.58793455 1070 high scalability-2011-06-29-Second Hand Seizure : A New Cause of Site Death

19 0.58189821 287 high scalability-2008-03-24-Advertise

20 0.57984942 611 high scalability-2009-05-31-Need help on Site loading & database optimization - URGENT


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.209), (61, 0.165), (77, 0.376), (79, 0.117), (94, 0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.90565664 474 high scalability-2008-12-21-The I.H.S.D.F. Theorem: A Proposed Theorem for the Trade-offs in Horizontally Scalable Systems

Introduction: Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.

2 0.86734468 258 high scalability-2008-02-24-Yandex Architecture

Introduction: Update: Anatomy of a crash in a new part of Yandex written in Django . Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3.5 billion pages in the search index. Over several thousand servers. 35 million searches a day. Several data centers around Russia. Two-layer architecture. The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. Languages

3 0.8403517 1116 high scalability-2011-09-15-Paper: It's Time for Low Latency - Inventing the 1 Microsecond Datacenter

Introduction: In  It's Time for Low Latency   Stephen Rumble et al. explore the idea that it's time to rearchitect our stack to live in the modern era of low-latency datacenter instead of high-latency WANs. The implications for program architectures will be revolutionary .   Luiz André Barroso , Distinguished Engineer at Google, sees ultra low latency as a way to make computer resources, to be as much as possible, fungible, that is they are interchangeable and location independent, effectively turning a datacenter into single computer.  Abstract from the paper: The operating systems community has ignored network latency for too long. In the past, speed-of-light delays in wide area networks and unoptimized network hardware have made sub-100µs round-trip times impossible. However, in the next few years datacenters will be deployed with low-latency Ethernet. Without the burden of propagation delays in the datacenter campus and network delays in the Ethernet devices, it will be up to us to finish

same-blog 4 0.83520067 1059 high scalability-2011-06-14-A TripAdvisor Short

Introduction: Sometimes I get article proposals and then there's no follow up. Though these TripAdvisor data points are from 2010, I thought them worth sharing: Our site serves in excess of 100M dynamically generated page view a day (all media and static content goes through CDN), and we do this with about 100 machines, no single point of failure, supported by distributed service architecture that that responds to over 2B requests a day, and a data warehouse of over 20TB that is used to drive email campaigns, SEM, and general reporting. We are a Linux/Java/Apache/Tomcat/Postgres/Lucene shop, and have built our own distributed computing architecture. We also maintain duplicate data centers (one active, one standby) for redundancy and maintenance purposes. Too bad, it sounds like it would have been a good article. Related Articles   A new twist on "data-driven site"  by Mac Slocum. How a billion points of app data shape TripAdvisor's website.

5 0.83467579 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database

Introduction: With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB , in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes. From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in: Java ,  Query Method: Java or P2P, Replication: P2P , Concurrency: STM , Misc: Open-Source, Especially for AI and Semantic Web. So it has some interesting features, like software transactional memory and P2P   for data distribution , but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was: A HyperGraphD

6 0.81438828 753 high scalability-2009-12-21-Hot Holiday Scalability Links for 2009

7 0.81312048 211 high scalability-2008-01-13-Google Reveals New MapReduce Stats

8 0.80359751 1195 high scalability-2012-02-17-Stuff The Internet Says On Scalability For February 17, 2012

9 0.8033917 1493 high scalability-2013-07-17-Steve Ballmer Says Microsoft has Over 1 Million Servers - What Does that Really Mean?

10 0.79882628 959 high scalability-2010-12-17-Stuff the Internet Says on Scalability For December 17th, 2010

11 0.79861367 1571 high scalability-2014-01-02-xkcd: How Standards Proliferate:

12 0.76413596 525 high scalability-2009-03-05-Product: Amazon Simple Storage Service

13 0.73456341 1158 high scalability-2011-12-16-Stuff The Internet Says On Scalability For December 16, 2011

14 0.73348039 1531 high scalability-2013-10-13-AIDA: Badoo’s journey into Continuous Integration

15 0.72346735 222 high scalability-2008-01-25-Application Database and DAL Architecture

16 0.7219345 1377 high scalability-2012-12-26-Ask HS: What will programming and architecture look like in 2020?

17 0.68130893 1567 high scalability-2013-12-20-Stuff The Internet Says On Scalability For December 20th, 2013

18 0.67680997 439 high scalability-2008-11-10-Scalability Perspectives #1: Nicholas Carr – The Big Switch

19 0.66241586 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture

20 0.64297545 1107 high scalability-2011-08-29-The Three Ages of Google - Batch, Warehouse, Instant