high_scalability high_scalability-2009 high_scalability-2009-561 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data th
sentIndex sentText sentNum sentScore
1 Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. [sent-1, score-0.375]
2 The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where . [sent-2, score-2.696]
3 ) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ? [sent-5, score-1.902]
4 If you have 100 widgets, it requires 101 queries to get the details of them all. [sent-7, score-0.247]
5 I can see why this is bad, but what if you're doing entity caching? [sent-8, score-0.124]
6 If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. [sent-11, score-0.793]
7 Assuming of course that there is a high probability of all of the matching entities being in the cache. [sent-13, score-0.478]
8 I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data that are in use these days. [sent-14, score-0.761]
wordName wordTfidf (topN-words)
[('widget', 0.541), ('widgets', 0.398), ('retrieve', 0.318), ('select', 0.31), ('matching', 0.203), ('ids', 0.202), ('ibatis', 0.174), ('implied', 0.133), ('surely', 0.124), ('hibernate', 0.116), ('probability', 0.113), ('details', 0.105), ('entities', 0.104), ('assuming', 0.102), ('query', 0.1), ('documentation', 0.098), ('asking', 0.097), ('entity', 0.096), ('obviously', 0.089), ('recommendations', 0.089), ('avoiding', 0.089), ('mechanisms', 0.087), ('whose', 0.084), ('caching', 0.081), ('wanted', 0.073), ('id', 0.064), ('storing', 0.061), ('bad', 0.061), ('table', 0.06), ('answer', 0.06), ('question', 0.058), ('course', 0.058), ('days', 0.058), ('similar', 0.054), ('requires', 0.052), ('list', 0.051), ('used', 0.049), ('queries', 0.048), ('tools', 0.045), ('case', 0.044), ('get', 0.042), ('another', 0.038), ('set', 0.034), ('scalable', 0.032), ('one', 0.031), ('problem', 0.031), ('run', 0.03), ('may', 0.029), ('first', 0.029), ('see', 0.028)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 561 high scalability-2009-04-08-N+1+caching is ok?
Introduction: Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data th
2 0.17674048 293 high scalability-2008-03-31-Read HighScalability on Your Mobile Phone Using WidSets Widgets
Introduction: Jean-Paul de Vooght of our Switzerland contingent created a nifty little WidSets widget that lets you better read HighScalability from your mobile phone. I thought untethered readers might like to give it a try. Thanks to Jean-Paul for making it available! WidSets is: a simple service that brings you information normally accessed via the Internet by sending it directly to your mobile phone . Using mini-applications called widgets, it sends you the latest updates to your favorite websites. The system uses RSS feeds to push information from these websites directly to your mobile phone the minute they’re updated .
3 0.13647616 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
Introduction: We just released a social section to our iOS app several days ago and we are already facing scaling issues with the users' news feeds. We're basically using a Fan-out-on-write (push) model for the users' news feeds (posts of people and topics they follow) and we're using Redis for this (backend is Rails on Heroku). However, our current 60,000 news feeds is ballooning our Redis store to almost 1GB in a just a few days (it's growing way too fast for our budget). Currently we're storing the entire news feed for the user (post id, post text, author, icon url, etc) and we cap the entries to 300 per feed. I'm wondering if we need to just store the post IDs of each user feed in Redis and then store the rest of the post information somewhere else? Would love some feedback here. In this case, our iOS app would make an api call to our Rails app to retrieve a user's news feed. Rails app would retrieve news feed list (just post IDs) from Redis, and then Rails app would need to query to g
4 0.12233895 771 high scalability-2010-02-04-Hot Scalability Links for February 4, 2010
Introduction: Lots of cool stuff happening this week... Voldemort gets rebalancing. It's one thing to shard data to scale, it's a completely different level of functionality to manage those shards intelligently. Voldemort has stepped up by adding advanced rebalancing functionality: Dynamic addition of new nodes to the cluster; Deletion of nodes from cluster; Load balancing of data inside a cluster. Microsoft Finally Opens Azure for Business. Out of the blue Microsoft opens up their platform as a service service. Good to have more competition and we'll keep an eye out for experience reports. New details on LinkedIn architecture by Greg Linden. LinkedIn appears to only use caching minimally, preferring to spend their efforts and machine resources on making sure they can recompute computations quickly than on hiding poor performance behind caching layers . The end of SQL and relational databases? by David Intersimone . For new projects, I believe, we have genuine non-relational a
5 0.12154493 358 high scalability-2008-07-26-Sharding the Hibernate Way
Introduction: Update : A very nice JavaWorld podcast interview with Google engineer Max Ross on Hibernate Shards . Max defines Hibernate Shards (horizontal partitioning), how it works (pretty well), virtual shards (don't ask), what they need to do in the future (query, replication, operational tools), and how it relates to Google AppEngine (not much). To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you us
6 0.10779367 714 high scalability-2009-10-02-HighScalability has Moved to Squarespace.com!
7 0.10002115 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
8 0.094786145 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
9 0.093292907 1032 high scalability-2011-05-02-Stack Overflow Makes Slow Pages 100x Faster by Simple SQL Tuning
10 0.093187451 62 high scalability-2007-08-08-Partial String Matching
11 0.092137508 279 high scalability-2008-03-17-Microsoft's New Database Cloud Ready to Rumble with Amazon
12 0.088963687 606 high scalability-2009-05-25-non-sequential, unique identifier, strategy question
13 0.088148415 24 high scalability-2007-07-24-Product: Hibernate Shards
14 0.087207429 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
15 0.077877924 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
16 0.069985576 986 high scalability-2011-02-10-Database Isolation Levels And Their Effects on Performance and Scalability
17 0.067468226 1579 high scalability-2014-01-14-SharePoint VPS solution
18 0.064931378 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability
19 0.061963815 359 high scalability-2008-07-29-Ehcache - A Java Distributed Cache
20 0.061767519 825 high scalability-2010-05-10-Sify.com Architecture - A Portal at 3900 Requests Per Second
topicId topicWeight
[(0, 0.065), (1, 0.049), (2, -0.021), (3, -0.019), (4, 0.023), (5, 0.031), (6, -0.012), (7, -0.014), (8, 0.015), (9, -0.035), (10, -0.002), (11, 0.02), (12, -0.027), (13, 0.009), (14, 0.009), (15, -0.035), (16, -0.039), (17, -0.036), (18, 0.021), (19, -0.009), (20, -0.01), (21, -0.048), (22, -0.001), (23, -0.003), (24, -0.012), (25, -0.009), (26, -0.003), (27, -0.017), (28, 0.001), (29, 0.036), (30, -0.01), (31, -0.034), (32, 0.018), (33, 0.046), (34, 0.02), (35, 0.015), (36, 0.064), (37, -0.014), (38, -0.057), (39, -0.011), (40, -0.039), (41, 0.056), (42, 0.025), (43, 0.005), (44, 0.05), (45, 0.032), (46, -0.021), (47, 0.029), (48, -0.043), (49, 0.021)]
simIndex simValue blogId blogTitle
same-blog 1 0.93174261 561 high scalability-2009-04-08-N+1+caching is ok?
Introduction: Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data th
2 0.69063312 358 high scalability-2008-07-26-Sharding the Hibernate Way
Introduction: Update : A very nice JavaWorld podcast interview with Google engineer Max Ross on Hibernate Shards . Max defines Hibernate Shards (horizontal partitioning), how it works (pretty well), virtual shards (don't ask), what they need to do in the future (query, replication, operational tools), and how it relates to Google AppEngine (not much). To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you us
3 0.62167835 281 high scalability-2008-03-18-Database Design 101
Introduction: I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might
4 0.6211766 476 high scalability-2008-12-28-How to Organize a Database Table’s Keys for Scalability
Introduction: The key (no pun intended) to understanding how to organize your dataset’s data is to think of each shard not as an individual database, but as one large singular database. Just as in a normal single server database setup where you have a unique key for each row within a table, each row key within each individual shard must be unique to the whole dataset partitioned across all shards. There are a few different ways we can accomplish uniqueness of row keys across a shard cluster. Each has its pro’s and con’s and the one chosen should be specific to the problems you’re trying to solve.
5 0.60742635 24 high scalability-2007-07-24-Product: Hibernate Shards
Introduction: If you want to adopt a shard architecture, but don't want to start from scratch, you may want to consider Hibernate's sharding system. Hibernate Shards is a framework that is designed to encapsulate and minimize this complexity by adding support for horizontal partitioning to Hibernate Core. Hibernate Shards key features: Standard Hibernate programming model - Hibernate Shards allows you to continue using the Hibernate APIs you know and love: SessionFactory, Session, Criteria, Query. If you already know how to use Hibernate, you already know how to use Hibernate Shards. Flexible sharding strategies - Distribute data across your shards any way you want. Use one of the default strategies we provide or plug in your own application-specific logic. Support for virtual shards - Think your sharding strategy is never going to change? Think again. Adding new shards and redistributing your data is one of the toughest operational challenges you will face once you've deployed your
6 0.58925951 1303 high scalability-2012-08-13-Ask HighScalability: Facing scaling issues with news feeds on Redis. Any advice?
7 0.58864075 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
8 0.58666641 1135 high scalability-2011-10-31-15 Ways to Make Your Application Feel More Responsive under Google App Engine
9 0.57924676 514 high scalability-2009-02-18-Numbers Everyone Should Know
10 0.57492697 546 high scalability-2009-03-20-Alternate strategy for database sharding
11 0.5673238 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
12 0.55951124 678 high scalability-2009-08-09-Writing about cisco loadbalancer?
13 0.55737633 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard
14 0.55512226 435 high scalability-2008-10-30-The case for functional decomposition
15 0.55335003 606 high scalability-2009-05-25-non-sequential, unique identifier, strategy question
16 0.55161965 141 high scalability-2007-11-05-Quick question about efficiently implementing Facebook 'news feed' like functionality
17 0.54149508 345 high scalability-2008-06-11-Pyshards aspires to build sharding toolkit for Python
18 0.53805715 578 high scalability-2009-04-23-Which Key value pair database to be used
19 0.53743786 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
20 0.53455812 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
topicId topicWeight
[(1, 0.012), (2, 0.137), (40, 0.042), (51, 0.031), (53, 0.263), (61, 0.186), (79, 0.083), (85, 0.097)]
simIndex simValue blogId blogTitle
same-blog 1 0.84702998 561 high scalability-2009-04-08-N+1+caching is ok?
Introduction: Hibernate and iBATIS and other similar tools have documentation with recommendations for avoiding the "N+1 select" problem. The problem being that if you wanted to retrieve a set of widgets from a table, one query would be used to to retrieve all the ids of the matching widgets (select widget_id from widget where ...) and then for each id, another select is used to retrieve the details of that widget (select * from widget where widget_id = ?). If you have 100 widgets, it requires 101 queries to get the details of them all. I can see why this is bad, but what if you're doing entity caching? i.e. If you run the first query to get your list of ids, and then for each widget you retrive it from the cache. Surely in that case, N+1(+caching) is good? Assuming of course that there is a high probability of all of the matching entities being in the cache. I may be asking a daft question here - one whose answer is obviously implied by the large scalable mechanisms for storing data th
Introduction: Instagram is a free photo sharing and social networking service for your iPhone that has been an instant success . Growing to 14 million users in just over a year, they reached 150 million photos in August while amassing several terabytes of photos, and they did this with just 3 Instaneers, all on the Amazon stack. The Instagram team has written up what can be considered the canonical description of an early stage startup in this era: What Powers Instagram: Hundreds of Instances, Dozens of Technologies . Instagram uses a pastiche of different technologies and strategies. The team is small yet has experience rapid growth riding the crest of a rising social and mobile wave, it uses a hybrid of SQL and NoSQL, it uses a ton of open source projects, they chose the cloud over colo, Amazon services are highly leveraged rather than building their own, reliability is through availability zones, async work scheduling links components together, the system is composed as much as possible
3 0.68139517 1224 high scalability-2012-04-09-The Instagram Architecture Facebook Bought for a Cool Billion Dollars
Introduction: It's been a well kept secret, but you may have heard Facebook will Buy Photo-Sharing Service Instagram for $1 Billion . Just what is Facebook buying? Here's a quick gloss I did a little over a year ago on a presentation Instagram gave on their architecture. In that article I called Instagram's architecture the " canonical description of an early stage startup in this era." Little did we know how true that would turn out to be. If you want to learn how they did it then don't take a picture, just keep on reading... Instagram is a free photo sharing and social networking service for your iPhone that has been an instant success . Growing to 14 million users in just over a year (now 30 million users), they reached 150 million photos in August while amassing several terabytes of photos, and they did this with just 3 Instaneers, all on the Amazon stack. The Instagram team has written up what can be considered the canonical description of an early stage startup in this era: Wh
4 0.64259827 1287 high scalability-2012-07-20-Stuff The Internet Says On Scalability For July 20, 2012
Introduction: It's HighScalability Time: 4 Trillion Objects: Windows Azure Storage Quotable Quotes: @benjchristensen : “What if we could make the data dense and cheap instead of sparse and expensive?” James Gosling @liquidrinc @sinetpd360 : People trying new things and sharing is what helps create scalability. Jim Rickabaugh #siis2012 @rbranson : This h1.4xlarge running 160GB PostgreSQL database pushing ~17,200 index scan rows/sec. r_await is 0.79ms, box is 92% idle. @sturadnidge : faster net and disk greatly reduces repair time and impact so we can load up the instances with far more dat With Amazon announcing 2TB SSD instances the age of SSD has almost arrived. Netflix has already published a very thorough post on the wonderfulness of SSD for both performance and taming the long latency tail . They see 100K IOPS or 1GByte/sec on a untuned system. Netflix projects: The hi1.4xlarge configuration is about half the system cost for the same throughput; The mea
5 0.64218301 522 high scalability-2009-02-25-Learn how to manage change and complexity by Zachman Live.
Introduction: John Zachman (Father of enterprise architecture) Given this renascent interest, who better to explain the principles behind Enterprise Architecture than the man himself, John Zachman, the originator of the " Zachman Framework for Enterprise Architecture" Join this workshop in Johannesburg 25th Mar 09 and Cape town in 27th March 09 and Mr.Zachman will explain how and why Enterprise Architecture provides measure, such an implementation is a daunting task with opportunities to fail lurking in many places. For more details visit http://www.ITArchitectureSummit.com For registrations, group discounts or further details please contact Caroline.smith@icmgworld.com
6 0.63907373 332 high scalability-2008-05-28-Job queue and search engine
8 0.63394725 1181 high scalability-2012-01-25-Google Goes MoreSQL with Tenzing - SQL Over MapReduce
9 0.63390112 347 high scalability-2008-07-07-Five Ways to Stop Framework Fixation from Crashing Your Scaling Strategy
10 0.62955081 1411 high scalability-2013-02-22-Stuff The Internet Says On Scalability For February 22, 2013
11 0.62390924 99 high scalability-2007-09-23-HA for switches
12 0.62289959 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
13 0.6208716 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale
14 0.61975932 930 high scalability-2010-10-28-NoSQL Took Away the Relational Model and Gave Nothing Back
15 0.61862963 322 high scalability-2008-05-19-Conference: Infoscale 2008 in Italy (June 4-6)
16 0.61800772 739 high scalability-2009-11-09-10 NoSQL Systems Reviewed
17 0.61625141 141 high scalability-2007-11-05-Quick question about efficiently implementing Facebook 'news feed' like functionality
18 0.61530828 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching
19 0.61361498 670 high scalability-2009-08-05-Anti-RDBMS: A list of distributed key-value stores
20 0.61295068 1522 high scalability-2013-09-25-Great Open Source Solution for Boring HA and Scalability Problems