high_scalability high_scalability-2010 high_scalability-2010-775 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene . Its features include: Distributed and Highly Available Search Engine. Each index is fully sharded with a configurable number of shards. Each shard can have zero or more replicas. Read / Search operations performed on either replica shard. Multi Tenant with Multi Types. Support for more than one index. Support for more than one type per index. Index level configuration (number of shards, index storage, ...). Various set of APIs. HTTP RESTful API. Native Java API. All APIs perform automatic node operation rerouting. Document oriented. No need for upfront schema definition. Schema can be defined per type for customization of the indexing process. Reliable, Asynchronous Write Behind for long term persistency. (Near) Real Time Search. Built on top of Lucene. Each shard is a fully functional Lucene index. All the power of Lucen
sentIndex sentText sentNum sentScore
1 ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene . [sent-1, score-0.395]
2 Each index is fully sharded with a configurable number of shards. [sent-3, score-0.688]
3 Read / Search operations performed on either replica shard. [sent-5, score-0.434]
4 Index level configuration (number of shards, index storage, . [sent-9, score-0.427]
5 All APIs perform automatic node operation rerouting. [sent-16, score-0.461]
6 Schema can be defined per type for customization of the indexing process. [sent-19, score-0.645]
7 Reliable, Asynchronous Write Behind for long term persistency. [sent-20, score-0.107]
8 Each shard is a fully functional Lucene index. [sent-23, score-0.468]
9 All the power of Lucene easily exposed through simple configuration / plugins. [sent-24, score-0.35]
10 Single document level operations are atomic, consistent, isolated and durable. [sent-26, score-0.471]
wordName wordTfidf (topN-words)
[('restful', 0.318), ('lucene', 0.308), ('shard', 0.209), ('tenant', 0.199), ('search', 0.195), ('customization', 0.194), ('index', 0.185), ('operation', 0.182), ('multi', 0.174), ('upfront', 0.163), ('fully', 0.153), ('exposed', 0.147), ('type', 0.146), ('isolated', 0.146), ('configurable', 0.143), ('zero', 0.138), ('atomic', 0.137), ('configuration', 0.133), ('top', 0.125), ('automatic', 0.125), ('replica', 0.123), ('shards', 0.121), ('sharded', 0.117), ('performed', 0.114), ('defined', 0.113), ('schema', 0.111), ('operations', 0.11), ('level', 0.109), ('indexing', 0.109), ('near', 0.109), ('term', 0.107), ('document', 0.106), ('functional', 0.106), ('source', 0.101), ('apis', 0.098), ('follow', 0.097), ('asynchronous', 0.095), ('number', 0.09), ('updates', 0.089), ('either', 0.087), ('consistent', 0.087), ('news', 0.084), ('apache', 0.084), ('per', 0.083), ('perform', 0.079), ('engine', 0.075), ('distributed', 0.075), ('node', 0.075), ('behind', 0.071), ('easily', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
Introduction: ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene . Its features include: Distributed and Highly Available Search Engine. Each index is fully sharded with a configurable number of shards. Each shard can have zero or more replicas. Read / Search operations performed on either replica shard. Multi Tenant with Multi Types. Support for more than one index. Support for more than one type per index. Index level configuration (number of shards, index storage, ...). Various set of APIs. HTTP RESTful API. Native Java API. All APIs perform automatic node operation rerouting. Document oriented. No need for upfront schema definition. Schema can be defined per type for customization of the indexing process. Reliable, Asynchronous Write Behind for long term persistency. (Near) Real Time Search. Built on top of Lucene. Each shard is a fully functional Lucene index. All the power of Lucen
2 0.17729454 342 high scalability-2008-06-08-Search fast in million rows
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
3 0.16911031 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
Introduction: This is a guest post by Zardosht Kasheff , Software Developer at Tokutek , a storage engine company that delivers 21st-Century capabilities to the leading open source data management platforms. As software developers, we value abstraction. The simpler the API, the more attractive it becomes. Arguably, MongoDB’s greatest strengths are its elegant API and its agility , which let developers simply code. But when MongoDB runs into scalability problems on big data , developers need to peek underneath the covers to understand the underlying issues and how to fix them. Without understanding, one may end up with an inefficient solution that costs time and money. For example, one may shard prematurely, increasing hardware and management costs, when a simpler replication setup would do. Or, one may increase the size of a replica set when upgrading to SSDs would suffice. This article shows how to reason about some big data scalability problems in an effort to find efficient solut
4 0.16092911 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure
5 0.14924307 682 high scalability-2009-08-16-ThePort Network Architecture
Introduction: ThePort Network's Director of Engineering, TJ Muehleman was kind of enough to share some of the architectural details for their white label social media system. It currently runs about 50 social networks varying in size from less than 1000 members to more than 300,000 members, all on a Microsoft stack. In addition to their social networking platform, they offer Javascript APIs and web service APIs (both REST and SOAP) which account for a significant percentage of overall system usage. ThePort is an excellent example of a real world in-the-trenches product offering real value to customers. One of the most interesting problems they have to solve is multi-tenancy. How do you provide good performance, complete customization, support, develop new features, and provide individual search indexes for each customer? It's not an easy problem to solve. How did they solve their problems and build a successful system? Site: http://theport.com Platform Microsoft.NET 3.5 C# / VB.NET
6 0.1469721 152 high scalability-2007-11-13-Flickr Architecture
7 0.13820019 358 high scalability-2008-07-26-Sharding the Hibernate Way
8 0.138042 332 high scalability-2008-05-28-Job queue and search engine
9 0.12112959 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
10 0.12076151 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
11 0.11856006 1565 high scalability-2013-12-16-22 Recommendations for Building Effective High Traffic Web Software
12 0.11289527 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
13 0.10404623 1068 high scalability-2011-06-27-TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
14 0.10302491 64 high scalability-2007-08-10-How do we make a large real-time search engine?
15 0.10267533 847 high scalability-2010-06-23-Product: dbShards - Share Nothing. Shard Everything.
16 0.10241037 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
17 0.096017323 345 high scalability-2008-06-11-Pyshards aspires to build sharding toolkit for Python
18 0.094628334 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
19 0.093893804 134 high scalability-2007-10-26-Paper: Wikipedia's Site Internals, Configuration, Code Examples and Management Issues
20 0.092088833 300 high scalability-2008-04-07-Scalr - Open Source Auto-scaling Hosting on Amazon EC2
topicId topicWeight
[(0, 0.135), (1, 0.059), (2, -0.026), (3, -0.013), (4, 0.014), (5, 0.098), (6, 0.039), (7, -0.027), (8, 0.021), (9, 0.01), (10, 0.006), (11, 0.046), (12, -0.057), (13, -0.025), (14, 0.0), (15, 0.054), (16, -0.116), (17, -0.004), (18, 0.026), (19, 0.006), (20, 0.056), (21, -0.058), (22, -0.014), (23, 0.014), (24, -0.101), (25, -0.02), (26, -0.043), (27, -0.057), (28, 0.011), (29, 0.114), (30, -0.017), (31, 0.007), (32, -0.014), (33, -0.008), (34, 0.154), (35, -0.042), (36, -0.018), (37, -0.029), (38, -0.079), (39, -0.033), (40, 0.024), (41, 0.083), (42, 0.049), (43, 0.003), (44, -0.056), (45, 0.053), (46, 0.001), (47, 0.095), (48, 0.007), (49, -0.028)]
simIndex simValue blogId blogTitle
same-blog 1 0.97981721 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
Introduction: ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene . Its features include: Distributed and Highly Available Search Engine. Each index is fully sharded with a configurable number of shards. Each shard can have zero or more replicas. Read / Search operations performed on either replica shard. Multi Tenant with Multi Types. Support for more than one index. Support for more than one type per index. Index level configuration (number of shards, index storage, ...). Various set of APIs. HTTP RESTful API. Native Java API. All APIs perform automatic node operation rerouting. Document oriented. No need for upfront schema definition. Schema can be defined per type for customization of the indexing process. Reliable, Asynchronous Write Behind for long term persistency. (Near) Real Time Search. Built on top of Lucene. Each shard is a fully functional Lucene index. All the power of Lucen
2 0.73249269 342 high scalability-2008-06-08-Search fast in million rows
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
3 0.70954382 246 high scalability-2008-02-12-Search the tags across all post
Introduction: Let suppose i have table which stored tags .Now user can enter keywords and i have to search through all the records in table and find post which contain tags entered by user .user can enter more than 1 keywords. What strategy ,technique i use to search fast .There maybe more than millions records and many users are firing same query. Thanks
4 0.67458856 332 high scalability-2008-05-28-Job queue and search engine
Introduction: Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat
5 0.66143548 630 high scalability-2009-06-14-kngine 'Knowledge Engine' milestone 2
Introduction: Kngine is Knowledge Web search engine designed to provide meaningful search results, such as: semantic information about the keywords/concepts, answer the user’s questions, discover the relations between the keywords/concepts, and link the different kind of data together, such as: Movies, Subtitles, Photos, Price at sale store, User reviews, and Influenced story Goals Kngine long-term goal is to make all human beings systematic knowledge and experience accessible to everyone. I aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build on the advances of Web search engine, semantic web, data representation technologies a new form of Web search engine that will unleash a revolution of new possibilities. Kngine tries to combine the power of Web search engines with the power of Semantic search and the data representation to provide meaningful search results compromising user needs. Status Kngine starts as a research project in O
6 0.66055793 358 high scalability-2008-07-26-Sharding the Hibernate Way
7 0.64363152 258 high scalability-2008-02-24-Yandex Architecture
8 0.64065856 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
9 0.61603677 476 high scalability-2008-12-28-How to Organize a Database Table’s Keys for Scalability
10 0.6118269 345 high scalability-2008-06-11-Pyshards aspires to build sharding toolkit for Python
11 0.60957778 24 high scalability-2007-07-24-Product: Hibernate Shards
12 0.60287923 857 high scalability-2010-07-13-DbShards Part Deux - The Internals
13 0.59686053 152 high scalability-2007-11-13-Flickr Architecture
14 0.594477 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
15 0.58111358 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
16 0.57104242 623 high scalability-2009-06-10-Dealing with multi-partition transactions in a distributed KV solution
17 0.56961805 810 high scalability-2010-04-14-Parallel Information Retrieval and Other Search Engine Goodness
18 0.56808168 64 high scalability-2007-08-10-How do we make a large real-time search engine?
19 0.55966556 847 high scalability-2010-06-23-Product: dbShards - Share Nothing. Shard Everything.
20 0.55652553 207 high scalability-2008-01-10-Sharding with Cookie-Based Session Storage
topicId topicWeight
[(1, 0.117), (2, 0.198), (10, 0.056), (13, 0.108), (47, 0.016), (61, 0.222), (79, 0.057), (85, 0.035), (94, 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 0.95810902 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
Introduction: ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene . Its features include: Distributed and Highly Available Search Engine. Each index is fully sharded with a configurable number of shards. Each shard can have zero or more replicas. Read / Search operations performed on either replica shard. Multi Tenant with Multi Types. Support for more than one index. Support for more than one type per index. Index level configuration (number of shards, index storage, ...). Various set of APIs. HTTP RESTful API. Native Java API. All APIs perform automatic node operation rerouting. Document oriented. No need for upfront schema definition. Schema can be defined per type for customization of the indexing process. Reliable, Asynchronous Write Behind for long term persistency. (Near) Real Time Search. Built on top of Lucene. Each shard is a fully functional Lucene index. All the power of Lucen
2 0.92967635 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
Introduction: Slashdot effect : overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school? Site: http://slashdot.org Information Sources Slashdot's Setup, Part 1- Hardware Slashdot's Setup, Part 2- Software History of Slashdot Part 3- Going Corporate The History of Slashdot Part 4 - Yesterday, Today, Tomorrow The Platform MySQL Linux (CentOS/RHEL) Pound Apache Perl Memcached LVS The Stats Started building the system in 1999
3 0.92860204 1411 high scalability-2013-02-22-Stuff The Internet Says On Scalability For February 22, 2013
Introduction: Hey, it's HighScalability time: Quotable Quotes: @p337er : I have committed some truly horrendous crimes against scalability today. @ErrataRob : doubling performance doesn't double scalability. @rsingel : In 2008 when Yahoo.com linked out, I had a Wired story get 1M visitors in an hour from their homepage. @philiph : Lets solve this scalability problem with a queuing system @jaykreps : Transferring data across data centers? Read this page and go tune your TCP buffer sizes... @gwestr : In which the node community showers schadenfreude upon the rails community for "scalability is not my problem" architectures @pbailis : Makes sense, though I think there's a tradeoff re: coordination and scalability (always homogeneous vs dynamically heterogenous) @pembleton : To summarize Yoav's philosophy: we started as quick as we can and then we accelerated #operationgrandma in #reversim @surfichris : “We chose Heroku because we be
4 0.92645675 1461 high scalability-2013-05-20-The Tumblr Architecture Yahoo Bought for a Cool Billion Dollars
Introduction: It's being reportedYahoo bought Tumblr for $1.1 billion. You may recallInstagram was profiled on HighScalabilityand they were also bought by Facebook for a ton of money. A coincidence? You be the judge.Just what is Yahoo buying? The business acumen of the deal is not something I can judge, but if you are doing due diligence on the technology then Tumblr would probably get a big thumbs up. To see why, please keep on reading...With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patt
Introduction: With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr's situation. Now with twenty engineers there's enough energy to work on issues an
6 0.92638093 1191 high scalability-2012-02-13-Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
7 0.92502666 1089 high scalability-2011-07-29-Stuff The Internet Says On Scalability For July 29, 2011
8 0.92372191 1031 high scalability-2011-04-28-PaaS on OpenStack - Run Applications on Any Cloud, Any Time Using Any Thing
9 0.92343712 1287 high scalability-2012-07-20-Stuff The Internet Says On Scalability For July 20, 2012
10 0.91686469 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things
11 0.91394299 1184 high scalability-2012-01-31-Performance in the Cloud: Business Jitter is Bad
12 0.91281486 337 high scalability-2008-05-31-memcached and Storage of Friend list
13 0.91108888 1538 high scalability-2013-10-28-Design Decisions for Scaling Your High Traffic Feeds
14 0.90998238 100 high scalability-2007-09-26-Use a CDN to Instantly Improve Your Website's Performance by 20% or More
15 0.90981942 856 high scalability-2010-07-12-Creating Scalable Digital Libraries
16 0.90664279 501 high scalability-2009-01-25-Where do I start?
17 0.90454102 6 high scalability-2007-07-11-Friendster Architecture
18 0.90105474 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
19 0.89963084 383 high scalability-2008-09-10-Shard servers -- go big or small?
20 0.89954531 477 high scalability-2008-12-29-100% on Amazon Web Services: Soocial.com - a lesson of porting your service to Amazon