high_scalability high_scalability-2008 high_scalability-2008-258 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Update: Anatomy of a crash in a new part of Yandex written in Django . Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3.5 billion pages in the search index. Over several thousand servers. 35 million searches a day. Several data centers around Russia. Two-layer architecture. The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. Languages
sentIndex sentText sentNum sentScore
1 Update: Anatomy of a crash in a new part of Yandex written in Django . [sent-1, score-0.129]
2 Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. [sent-2, score-0.642]
3 Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. [sent-4, score-0.428]
4 We only know a few fun facts about how they do things, nothing at a detailed architecture level. [sent-7, score-0.403]
5 Hopefully we'll learn more later, but I thought it would still be interesting. [sent-8, score-0.182]
6 From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3. [sent-9, score-0.1]
7 The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. [sent-15, score-1.017]
wordName wordTfidf (topN-words)
[('yandex', 0.406), ('search', 0.25), ('stern', 0.235), ('russian', 0.211), ('anatomy', 0.191), ('allen', 0.186), ('sizing', 0.175), ('pages', 0.166), ('pulls', 0.162), ('requested', 0.16), ('ilya', 0.153), ('facts', 0.149), ('innodb', 0.147), ('searches', 0.13), ('crash', 0.129), ('billion', 0.127), ('unexpected', 0.127), ('variable', 0.124), ('hopefully', 0.118), ('cto', 0.115), ('magic', 0.114), ('pieces', 0.113), ('brings', 0.11), ('learn', 0.109), ('bits', 0.109), ('revenue', 0.108), ('perl', 0.108), ('thousand', 0.108), ('caused', 0.106), ('million', 0.102), ('split', 0.101), ('interview', 0.1), ('fixed', 0.099), ('centers', 0.095), ('detailed', 0.094), ('wrong', 0.092), ('seconds', 0.091), ('index', 0.089), ('database', 0.086), ('took', 0.086), ('session', 0.085), ('went', 0.084), ('fun', 0.083), ('writing', 0.082), ('nothing', 0.077), ('thought', 0.073), ('later', 0.073), ('engine', 0.072), ('writes', 0.071), ('useful', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 258 high scalability-2008-02-24-Yandex Architecture
Introduction: Update: Anatomy of a crash in a new part of Yandex written in Django . Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3.5 billion pages in the search index. Over several thousand servers. 35 million searches a day. Several data centers around Russia. Two-layer architecture. The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. Languages
2 0.21265011 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
Introduction: Update 3: Always Refer to Your V1 As a Prototype . You really do have to plan to throw one away. Update 2: Lessons Learned Scaling the Audiogalaxy Search Engine . Things he should have done and fun things he couldn’t justify doing. Update: Design details of Audiogalaxy.com’s high performance MySQL search engine . At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows. Search was one of most interesting problems at Audiogalaxy. It was one of the core functions of the site, and somewhere between 50 to 70 million searches were performed every day. At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows.
3 0.17319921 332 high scalability-2008-05-28-Job queue and search engine
Introduction: Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat
4 0.14363658 899 high scalability-2010-09-09-How did Google Instant become Faster with 5-7X More Results Pages?
Introduction: We don't have a lot of details on how Google pulled off their technically very impressive Google Instant release, but in Google Instant behind the scenes , they did share some interesting facts: Google was serving more than a billion searches per day. With Google Instant they served 5-7X more results pages than previously. Typical search results were returned in less than a quarter of second. A team of 50+ worked on the project for an extended period of time. Although Google is associated with muscular data centers, they just didn't throw more server capacity at the problem, they worked smarter too. What were their general strategies? Increase backend server capacity. Add new caches to handle high request rates while keeping results fresh while the web is continuously crawled and re-indexed. Add User-state data to the back-ends to keep track of the results pages already shown to a given user, preventing the same results from being re-fetched repeatedly. Optim
5 0.12779757 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
Introduction: While Kngine just announce some improvement and new features , I would like you take you in small trip in Snippet Search research project at Kngine. What is Kngine? Kngine is startup company working in Searching technologies, We in Kngine aims to organize the human beings Systematic Knowledge and Experiences and make it accessible to everyone. We aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build Web 3.0 Web Search Engine on the advances of Web Search Engine, Semantic Web, Data Representation technologies a new form of Web Search Engine that will unleash a revolution of new possibilities. Introduction to Snippet Search Today, The Web Search Engine’s is the Web getaway, especially to get specific information. But unfortunately the search engines didn’t changed mush as the Web changed from 90’s. Since the 90’s the Web search engine still provide the same kind of results: Links to documents. We i
6 0.1128052 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
7 0.10587684 64 high scalability-2007-08-10-How do we make a large real-time search engine?
8 0.097306542 365 high scalability-2008-08-16-Strategy: Serve Pre-generated Static Files Instead Of Dynamic Pages
9 0.096490271 638 high scalability-2009-06-26-PlentyOfFish Architecture
10 0.094788544 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture
11 0.092036434 900 high scalability-2010-09-11-Google's Colossus Makes Search Real-time by Dumping MapReduce
12 0.087990403 331 high scalability-2008-05-27-eBay Architecture
13 0.082281001 342 high scalability-2008-06-08-Search fast in million rows
14 0.082211658 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
15 0.082150526 630 high scalability-2009-06-14-kngine 'Knowledge Engine' milestone 2
16 0.081138596 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
17 0.080266133 1052 high scalability-2011-06-03-Stuff The Internet Says On Scalability For June 3, 2011
18 0.07788267 240 high scalability-2008-02-05-Handling of Session for a site running from more than 1 data center
19 0.077324495 1297 high scalability-2012-08-03-Stuff The Internet Says On Scalability For August 3, 2012
20 0.075993098 1372 high scalability-2012-12-14-Stuff The Internet Says On Scalability For December 14, 2012
topicId topicWeight
[(0, 0.121), (1, 0.074), (2, -0.037), (3, -0.057), (4, 0.02), (5, 0.006), (6, -0.041), (7, 0.036), (8, 0.046), (9, 0.024), (10, -0.001), (11, -0.032), (12, -0.006), (13, 0.001), (14, 0.009), (15, 0.017), (16, -0.077), (17, -0.007), (18, 0.062), (19, 0.016), (20, 0.083), (21, -0.057), (22, -0.027), (23, 0.014), (24, -0.036), (25, -0.044), (26, -0.102), (27, 0.051), (28, 0.013), (29, 0.103), (30, -0.059), (31, 0.018), (32, -0.065), (33, 0.004), (34, 0.126), (35, -0.004), (36, -0.029), (37, 0.021), (38, -0.032), (39, -0.033), (40, 0.075), (41, 0.0), (42, -0.008), (43, 0.084), (44, -0.015), (45, 0.052), (46, 0.058), (47, 0.038), (48, -0.002), (49, -0.017)]
simIndex simValue blogId blogTitle
same-blog 1 0.96653825 258 high scalability-2008-02-24-Yandex Architecture
Introduction: Update: Anatomy of a crash in a new part of Yandex written in Django . Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3.5 billion pages in the search index. Over several thousand servers. 35 million searches a day. Several data centers around Russia. Two-layer architecture. The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. Languages
2 0.84455806 332 high scalability-2008-05-28-Job queue and search engine
Introduction: Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat
3 0.842255 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
Introduction: Update 3: Always Refer to Your V1 As a Prototype . You really do have to plan to throw one away. Update 2: Lessons Learned Scaling the Audiogalaxy Search Engine . Things he should have done and fun things he couldn’t justify doing. Update: Design details of Audiogalaxy.com’s high performance MySQL search engine . At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows. Search was one of most interesting problems at Audiogalaxy. It was one of the core functions of the site, and somewhere between 50 to 70 million searches were performed every day. At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows.
4 0.82378572 342 high scalability-2008-06-08-Search fast in million rows
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
5 0.80707604 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
Introduction: While Kngine just announce some improvement and new features , I would like you take you in small trip in Snippet Search research project at Kngine. What is Kngine? Kngine is startup company working in Searching technologies, We in Kngine aims to organize the human beings Systematic Knowledge and Experiences and make it accessible to everyone. We aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build Web 3.0 Web Search Engine on the advances of Web Search Engine, Semantic Web, Data Representation technologies a new form of Web Search Engine that will unleash a revolution of new possibilities. Introduction to Snippet Search Today, The Web Search Engine’s is the Web getaway, especially to get specific information. But unfortunately the search engines didn’t changed mush as the Web changed from 90’s. Since the 90’s the Web search engine still provide the same kind of results: Links to documents. We i
6 0.80506796 246 high scalability-2008-02-12-Search the tags across all post
7 0.78576976 630 high scalability-2009-06-14-kngine 'Knowledge Engine' milestone 2
8 0.74513578 899 high scalability-2010-09-09-How did Google Instant become Faster with 5-7X More Results Pages?
9 0.73167342 1601 high scalability-2014-02-25-Peter Norvig's 9 Master Steps to Improving a Program
10 0.72431129 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
11 0.68877256 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
12 0.68290758 1253 high scalability-2012-05-28-The Anatomy of Search Technology: Crawling using Combinators
13 0.64007592 810 high scalability-2010-04-14-Parallel Information Retrieval and Other Search Engine Goodness
14 0.63682812 1295 high scalability-2012-08-02-Ask DuckDuckGo: Is there Anything you Want to Know About DDG?
15 0.62310362 64 high scalability-2007-08-10-How do we make a large real-time search engine?
16 0.61764085 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
17 0.61300749 900 high scalability-2010-09-11-Google's Colossus Makes Search Real-time by Dumping MapReduce
18 0.58020443 331 high scalability-2008-05-27-eBay Architecture
19 0.57232404 365 high scalability-2008-08-16-Strategy: Serve Pre-generated Static Files Instead Of Dynamic Pages
20 0.5668118 600 high scalability-2009-05-15-Wolfram|Alpha Architecture
topicId topicWeight
[(1, 0.099), (2, 0.222), (61, 0.079), (77, 0.408), (79, 0.078)]
simIndex simValue blogId blogTitle
Introduction: Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.
Introduction: James Hamilton in Counting Servers is Hard has an awesome breakdown of what one million plus servers really means in terms of resource usage. The summary from his calculations are eye popping: Facilities: 15 to 30 large datacenters Capital expense: $4.25 Billion Total power: 300MW Power Consumption: 2.6TWh annually The power consumption is about the same as used by Nicaragua and the capital cost is about a third of what Americans spent on video games in 2012. Now that's web scale.
same-blog 3 0.89428329 258 high scalability-2008-02-24-Yandex Architecture
Introduction: Update: Anatomy of a crash in a new part of Yandex written in Django . Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn: 3.5 billion pages in the search index. Over several thousand servers. 35 million searches a day. Several data centers around Russia. Two-layer architecture. The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user. Languages
4 0.89214271 1116 high scalability-2011-09-15-Paper: It's Time for Low Latency - Inventing the 1 Microsecond Datacenter
Introduction: In It's Time for Low Latency Stephen Rumble et al. explore the idea that it's time to rearchitect our stack to live in the modern era of low-latency datacenter instead of high-latency WANs. The implications for program architectures will be revolutionary . Luiz André Barroso , Distinguished Engineer at Google, sees ultra low latency as a way to make computer resources, to be as much as possible, fungible, that is they are interchangeable and location independent, effectively turning a datacenter into single computer. Abstract from the paper: The operating systems community has ignored network latency for too long. In the past, speed-of-light delays in wide area networks and unoptimized network hardware have made sub-100µs round-trip times impossible. However, in the next few years datacenters will be deployed with low-latency Ethernet. Without the burden of propagation delays in the datacenter campus and network delays in the Ethernet devices, it will be up to us to finish
5 0.8751722 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database
Introduction: With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB , in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes. From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in: Java , Query Method: Java or P2P, Replication: P2P , Concurrency: STM , Misc: Open-Source, Especially for AI and Semantic Web. So it has some interesting features, like software transactional memory and P2P for data distribution , but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was: A HyperGraphD
6 0.85708475 1195 high scalability-2012-02-17-Stuff The Internet Says On Scalability For February 17, 2012
7 0.84405965 959 high scalability-2010-12-17-Stuff the Internet Says on Scalability For December 17th, 2010
8 0.8431294 525 high scalability-2009-03-05-Product: Amazon Simple Storage Service
9 0.84151679 753 high scalability-2009-12-21-Hot Holiday Scalability Links for 2009
10 0.82804102 211 high scalability-2008-01-13-Google Reveals New MapReduce Stats
11 0.82397658 1531 high scalability-2013-10-13-AIDA: Badoo’s journey into Continuous Integration
12 0.79758537 1059 high scalability-2011-06-14-A TripAdvisor Short
13 0.77615517 1377 high scalability-2012-12-26-Ask HS: What will programming and architecture look like in 2020?
14 0.77393466 439 high scalability-2008-11-10-Scalability Perspectives #1: Nicholas Carr – The Big Switch
15 0.76695079 1158 high scalability-2011-12-16-Stuff The Internet Says On Scalability For December 16, 2011
16 0.76298964 1571 high scalability-2014-01-02-xkcd: How Standards Proliferate:
17 0.75386363 612 high scalability-2009-05-31-Parallel Programming for real-world
18 0.75364822 1188 high scalability-2012-02-06-The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
19 0.72413832 1567 high scalability-2013-12-20-Stuff The Internet Says On Scalability For December 20th, 2013
20 0.7155475 977 high scalability-2011-01-21-PaaS shouldn’t be built in Silos