high_scalability high_scalability-2008 high_scalability-2008-342 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
sentIndex sentText sentNum sentScore
1 This table has many columns but search performed based on 1 columns ,this table can have more than million rows. [sent-2, score-1.967]
2 The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood . [sent-3, score-1.242]
3 I need to take this 2 words and then search on column whether that column contain this words and how many times . [sent-4, score-1.829]
4 If the results return say 1200 results then without comparing each and every column i can't determine no of results. [sent-6, score-1.125]
5 I just know that this can be accomplished with distributed system but how can i make this system. [sent-11, score-0.27]
6 I also see in this website that LinkedIn uses Lucene for search . [sent-12, score-0.441]
7 My table has also lots of insertion ,however updation in not very frequent. [sent-14, score-0.599]
wordName wordTfidf (topN-words)
[('columns', 0.425), ('column', 0.374), ('table', 0.313), ('lucene', 0.286), ('frequent', 0.255), ('search', 0.241), ('hollywood', 0.227), ('words', 0.213), ('insertion', 0.169), ('accomplished', 0.146), ('funny', 0.137), ('parameters', 0.132), ('comparing', 0.131), ('results', 0.13), ('contain', 0.116), ('helpful', 0.113), ('linkedin', 0.112), ('compare', 0.11), ('performed', 0.106), ('whether', 0.104), ('determine', 0.098), ('return', 0.092), ('index', 0.086), ('every', 0.074), ('type', 0.068), ('query', 0.065), ('lots', 0.064), ('say', 0.059), ('website', 0.059), ('many', 0.057), ('approach', 0.053), ('also', 0.053), ('possible', 0.053), ('need', 0.053), ('uses', 0.052), ('times', 0.051), ('million', 0.049), ('ca', 0.047), ('something', 0.044), ('know', 0.038), ('based', 0.038), ('without', 0.037), ('see', 0.036), ('distributed', 0.035), ('take', 0.033), ('make', 0.028), ('system', 0.023), ('like', 0.019), ('data', 0.017)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 342 high scalability-2008-06-08-Search fast in million rows
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
2 0.17729454 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
Introduction: ElasticSearch is an open source, distributed, RESTful search engine built on top of Lucene . Its features include: Distributed and Highly Available Search Engine. Each index is fully sharded with a configurable number of shards. Each shard can have zero or more replicas. Read / Search operations performed on either replica shard. Multi Tenant with Multi Types. Support for more than one index. Support for more than one type per index. Index level configuration (number of shards, index storage, ...). Various set of APIs. HTTP RESTful API. Native Java API. All APIs perform automatic node operation rerouting. Document oriented. No need for upfront schema definition. Schema can be defined per type for customization of the indexing process. Reliable, Asynchronous Write Behind for long term persistency. (Near) Real Time Search. Built on top of Lucene. Each shard is a fully functional Lucene index. All the power of Lucen
3 0.17341527 589 high scalability-2009-05-05-Drop ACID and Think About Data
Introduction: The abstract for the talk given by Bob Ippolito, co-founder and CTO of Mochi Media, Inc: Building large systems on top of a traditional single-master RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Is your application a good fit for caches, bloom filters, bitmap indexes, column stores, distributed key/value stores, or document databases? Learn how they work (in theory and practice) and decide for yourself. Bob does an excellent job highlighting different products and the key concepts to understand when pondering the wide variety of new database offerings. It's unlikely you'll be able to say oh, this is the database for me after watching the presentation, but you will be much better informed on your options. And I imagine slightly confused as to what to do :-) An interesting observation in the talk is that the more robust products are internal
4 0.16372018 578 high scalability-2009-04-23-Which Key value pair database to be used
Introduction: My Table has 2 columsn .Column1 is id,Column2 contains information given by user about item in Column1 .User can give 3 types of information about item.I separate the opinion of single user by comma,and opinion of another user by ;. Example- 23-34,us,56;78,in,78 I need to calculate opinions of all users very fast.My idea is to have index on key so the searching would be very fast.Currently i m using mysql .My problem is that maximum column size is below my requirement .If any overflow occurs i make new row with same id and insert data into new row. Practically I would have around maximum 5-10 for each row. I think if there is any database which removes this application code. I just learn about key value pair database which is exactly i needed . But which doesn't put constraint(i mean much better than RDMS on column size. This application is not in production.
5 0.16292354 332 high scalability-2008-05-28-Job queue and search engine
Introduction: Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat
6 0.14443873 246 high scalability-2008-02-12-Search the tags across all post
7 0.13950872 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
8 0.13254772 658 high scalability-2009-07-17-Against all the odds
9 0.12249528 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
10 0.11791687 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
11 0.10739473 1080 high scalability-2011-07-15-Stuff The Internet Says On Scalability For July 15, 2011
12 0.10319018 222 high scalability-2008-01-25-Application Database and DAL Architecture
13 0.10315229 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
14 0.1018455 682 high scalability-2009-08-16-ThePort Network Architecture
15 0.10039165 829 high scalability-2010-05-20-Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning
16 0.096664444 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
17 0.091425568 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files
18 0.091318503 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.
19 0.08943411 961 high scalability-2010-12-21-SQL + NoSQL = Yes !
20 0.08861205 339 high scalability-2008-06-04-LinkedIn Architecture
topicId topicWeight
[(0, 0.089), (1, 0.07), (2, -0.022), (3, -0.023), (4, 0.018), (5, 0.064), (6, -0.008), (7, 0.016), (8, 0.057), (9, 0.001), (10, 0.022), (11, 0.003), (12, -0.053), (13, -0.012), (14, 0.032), (15, 0.009), (16, -0.095), (17, -0.009), (18, 0.047), (19, -0.019), (20, 0.054), (21, -0.069), (22, -0.004), (23, 0.044), (24, -0.034), (25, -0.051), (26, -0.085), (27, 0.001), (28, 0.0), (29, 0.096), (30, -0.079), (31, 0.013), (32, -0.047), (33, 0.053), (34, 0.101), (35, 0.045), (36, -0.018), (37, -0.039), (38, -0.054), (39, -0.08), (40, 0.098), (41, -0.014), (42, -0.008), (43, 0.024), (44, -0.041), (45, 0.064), (46, 0.002), (47, 0.052), (48, 0.084), (49, 0.025)]
simIndex simValue blogId blogTitle
same-blog 1 0.97814304 342 high scalability-2008-06-08-Search fast in million rows
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
2 0.85668778 246 high scalability-2008-02-12-Search the tags across all post
Introduction: Let suppose i have table which stored tags .Now user can enter keywords and i have to search through all the records in table and find post which contain tags entered by user .user can enter more than 1 keywords. What strategy ,technique i use to search fast .There maybe more than millions records and many users are firing same query. Thanks
3 0.84652925 332 high scalability-2008-05-28-Job queue and search engine
Introduction: Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat
4 0.83473021 746 high scalability-2009-11-26-Kngine Snippet Search New Indexing Technology
Introduction: While Kngine just announce some improvement and new features , I would like you take you in small trip in Snippet Search research project at Kngine. What is Kngine? Kngine is startup company working in Searching technologies, We in Kngine aims to organize the human beings Systematic Knowledge and Experiences and make it accessible to everyone. We aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build Web 3.0 Web Search Engine on the advances of Web Search Engine, Semantic Web, Data Representation technologies a new form of Web Search Engine that will unleash a revolution of new possibilities. Introduction to Snippet Search Today, The Web Search Engine’s is the Web getaway, especially to get specific information. But unfortunately the search engines didn’t changed mush as the Web changed from 90’s. Since the 90’s the Web search engine still provide the same kind of results: Links to documents. We i
5 0.79557896 630 high scalability-2009-06-14-kngine 'Knowledge Engine' milestone 2
Introduction: Kngine is Knowledge Web search engine designed to provide meaningful search results, such as: semantic information about the keywords/concepts, answer the user’s questions, discover the relations between the keywords/concepts, and link the different kind of data together, such as: Movies, Subtitles, Photos, Price at sale store, User reviews, and Influenced story Goals Kngine long-term goal is to make all human beings systematic knowledge and experience accessible to everyone. I aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build on the advances of Web search engine, semantic web, data representation technologies a new form of Web search engine that will unleash a revolution of new possibilities. Kngine tries to combine the power of Web search engines with the power of Semantic search and the data representation to provide meaningful search results compromising user needs. Status Kngine starts as a research project in O
6 0.78792965 258 high scalability-2008-02-24-Yandex Architecture
7 0.73822016 775 high scalability-2010-02-10-ElasticSearch - Open Source, Distributed, RESTful Search Engine
8 0.72136027 1601 high scalability-2014-02-25-Peter Norvig's 9 Master Steps to Improving a Program
9 0.70435643 1395 high scalability-2013-01-28-DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
10 0.69959271 269 high scalability-2008-03-08-Audiogalaxy.com Architecture
11 0.69850051 810 high scalability-2010-04-14-Parallel Information Retrieval and Other Search Engine Goodness
12 0.69581616 64 high scalability-2007-08-10-How do we make a large real-time search engine?
13 0.65687937 899 high scalability-2010-09-09-How did Google Instant become Faster with 5-7X More Results Pages?
14 0.65361696 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
15 0.61002558 1253 high scalability-2012-05-28-The Anatomy of Search Technology: Crawling using Combinators
16 0.59831476 1295 high scalability-2012-08-02-Ask DuckDuckGo: Is there Anything you Want to Know About DDG?
17 0.58721536 1650 high scalability-2014-05-19-A Short On How the Wayback Machine Stores More Pages than Stars in the Milky Way
18 0.57290089 587 high scalability-2009-05-01-FastBit: An Efficient Compressed Bitmap Index Technology
19 0.56664819 578 high scalability-2009-04-23-Which Key value pair database to be used
20 0.56505078 281 high scalability-2008-03-18-Database Design 101
topicId topicWeight
[(2, 0.226), (30, 0.141), (43, 0.291), (61, 0.098), (94, 0.087)]
simIndex simValue blogId blogTitle
1 0.80033678 505 high scalability-2009-02-01-More Chips Means Less Salsa
Introduction: Yes, I just got through watching the Superbowl so chips and salsa are on my mind and in my stomach. In recreational eating more chips requires downing more salsa. With mulitcore chips it turns out as cores go up salsa goes down, salsa obviously being a metaphor for speed. Sandia National Laboratories found in their simulations: a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added. The problem is the lack of memory bandwidth as well as contention between processors over the memory bus available to each processor. The implication for those following a diagonal scaling strategy is to work like heck to make your system fit within eight multicores. After that you'll need to consider some sort of partitioning strategy. What's interesti
same-blog 2 0.76983035 342 high scalability-2008-06-08-Search fast in million rows
Introduction: I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
3 0.69236743 1336 high scalability-2012-10-09-Batoo JPA - The new JPA Implementation that runs over 15 times faster...
Introduction: This post is by Hasan Ceylan , an Open Source software enthusiast from Istanbul. I loved the JPA 1.0 back in early 2000s. I started using it together with EJB 3.0 even before the stable releases. I loved it so much that I contributed bits and parts for JBoss 3.x implementations. Those were the days our company was considerably still small in size. Creating new features and applications were more priority than the performance, because there were a lot of ideas that we have and we needed to develop and market those as fast as we can. Now, we no longer needed to write tedious and error prone xml descriptions for the data model and deployment descriptors. Nor we needed to use the curse called “XDoclet”. On the other side, our company grew steadily, our web site has become the top portal in the country for live events and ticketing. We now had the performance problems! Although the company grew considerably, due to the economics in the industry, we did not make a lot of money. The ch
4 0.67705584 1182 high scalability-2012-01-27-Stuff The Internet Says On Scalability For January 27, 2012
Introduction: If you’ve got the time, we’ve got the HighScalability: 9nm : IBM's carbon nanotube transistor that outperforms silicon; YouTube : 4 Billion Views/Day; 864GB RAM : 37signals Memcache, $12K Quotable Quotes: Chad Dickerson : You can only get growth by feeding opportunities. @launchany : It amazes me how many NoSQL database vendors spend more time detailing their scalability and no time detailing the data model and design Google : Let's make TCP faster. WhatsApp : we are now able to easily push our systems to over 2 million tcp connections! Sidney Dekker : In a complex system…doing the same thing twice will not predictably or necessarily lead to the same results. @Rasmusfjord : Just heard about an Umbraco site running on Azure that handles 20.000 requests /*second* Herb Sutter with an epic post, Welcome to the Jungle , touching on a lot of themes we've explored on HighScalability, only in a dramatically more competent way. What's after
5 0.67592782 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
Introduction: Dr. Daniel Abadi, author of the DBMS Musings blog and Cofounder of Hadapt , which offers a product improving Hadoop performance by 50x on relational data, is now taking his talents to graph data in Hadoop's tremendous inefficiency on graph data management (and how to avoid it) , which shares the secrets of getting Hadoop to perform 1000x better on graph data. TL;DR: Analysing graph data is at the heart of important data mining problems . Hadoop is the tool of choice for many of these problems. Hadoop style MapReduce works best on KeyValue processing, not graph processing, and can be well over a factor of 1000 less efficient than it needs to be. Hadoop inefficiency has consequences in real world. Inefficiencies on graph data problems like improving power utilization, minimizing carbon emissions, and improving product designs, leads to a lot value being left on the table in the form of negative environmental consequences, increased server costs, increased data center spa
6 0.65947795 177 high scalability-2007-12-08-thesimsonstage.ea.com
7 0.65565509 726 high scalability-2009-10-22-Paper: The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM
8 0.63551825 1624 high scalability-2014-04-01-The Mullet Cloud Selection Pattern
9 0.63536286 261 high scalability-2008-02-25-Make Your Site Run 10 Times Faster
10 0.63013202 917 high scalability-2010-10-08-4 Scalability Themes from Surgecon
11 0.61600816 1603 high scalability-2014-02-28-Stuff The Internet Says On Scalability For February 28th, 2014
12 0.61548173 893 high scalability-2010-09-03-Hot Scalability Links For Sep 3, 2010
13 0.61385173 470 high scalability-2008-12-18-Risk Analysis on the Cloud (Using Excel and GigaSpaces)
14 0.61091298 54 high scalability-2007-08-02-Multilanguage Website
15 0.60873747 464 high scalability-2008-12-13-Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
16 0.60739821 1475 high scalability-2013-06-13-Busting 4 Modern Hardware Myths - Are Memory, HDDs, and SSDs Really Random Access?
17 0.60570186 721 high scalability-2009-10-13-Why are Facebook, Digg, and Twitter so hard to scale?
18 0.60104728 1123 high scalability-2011-09-23-The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…
19 0.59694105 437 high scalability-2008-11-03-How Sites are Scaling Up for the Election Night Crush
20 0.59682429 703 high scalability-2009-09-12-How Google Taught Me to Cache and Cash-In