high_scalability high_scalability-2008 high_scalability-2008-379 knowledge-graph by maker-knowledge-mining

379 high scalability-2008-09-04-Database question for upcoming project


meta infos for this blog

Source: html

Introduction: We will be developing an RIA that will have a lot of database access. Think something like a QuickBooks but with about 50 transactions entered per hour per user. Users will be in the system for 7 to 9 hours a day and there will be around 20,000 users, all logged in at the same time. Reporting will be done just like a QuickBooks style app plus a lot of extra things you don't do in QuickBooks. Our operations is familiar with W2003 Server and MS SQL Server so they are recommending we stick with that. I originally requested Linux and PostgreSQL. How far can a single database server get me? If we have a 4 processor, 8 core, 128gb server, how far am I going to get before I need to shard or do something else? I know there are a lot of factors involved but in general for this size of a site, what should the strategy be? I've read almost all articles on this website but most of the applications are not RIA type of apps with this type of usage or they are architectures for


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We will be developing an RIA that will have a lot of database access. [sent-1, score-0.248]

2 Think something like a QuickBooks but with about 50 transactions entered per hour per user. [sent-2, score-0.578]

3 Users will be in the system for 7 to 9 hours a day and there will be around 20,000 users, all logged in at the same time. [sent-3, score-0.269]

4 Reporting will be done just like a QuickBooks style app plus a lot of extra things you don't do in QuickBooks. [sent-4, score-0.513]

5 Our operations is familiar with W2003 Server and MS SQL Server so they are recommending we stick with that. [sent-5, score-0.428]

6 If we have a 4 processor, 8 core, 128gb server, how far am I going to get before I need to shard or do something else? [sent-8, score-0.445]

7 I know there are a lot of factors involved but in general for this size of a site, what should the strategy be? [sent-9, score-0.549]

8 I've read almost all articles on this website but most of the applications are not RIA type of apps with this type of usage or they are architectures for sites with millions of users which we also won't have. [sent-10, score-0.962]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('quickbooks', 0.507), ('ria', 0.507), ('recommending', 0.198), ('entered', 0.161), ('requested', 0.156), ('far', 0.15), ('logged', 0.139), ('type', 0.137), ('originally', 0.13), ('factors', 0.126), ('stick', 0.125), ('server', 0.122), ('ms', 0.118), ('users', 0.117), ('involved', 0.108), ('hour', 0.106), ('familiar', 0.105), ('lot', 0.103), ('processor', 0.102), ('plus', 0.1), ('shard', 0.098), ('extra', 0.098), ('style', 0.095), ('something', 0.09), ('developing', 0.089), ('articles', 0.083), ('else', 0.081), ('hours', 0.078), ('per', 0.078), ('architectures', 0.078), ('general', 0.075), ('almost', 0.075), ('usage', 0.072), ('strategy', 0.07), ('linux', 0.07), ('sites', 0.07), ('wo', 0.068), ('apps', 0.067), ('millions', 0.067), ('size', 0.067), ('sql', 0.066), ('transactions', 0.065), ('core', 0.061), ('website', 0.059), ('done', 0.059), ('app', 0.058), ('database', 0.056), ('get', 0.055), ('going', 0.052), ('day', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 379 high scalability-2008-09-04-Database question for upcoming project

Introduction: We will be developing an RIA that will have a lot of database access. Think something like a QuickBooks but with about 50 transactions entered per hour per user. Users will be in the system for 7 to 9 hours a day and there will be around 20,000 users, all logged in at the same time. Reporting will be done just like a QuickBooks style app plus a lot of extra things you don't do in QuickBooks. Our operations is familiar with W2003 Server and MS SQL Server so they are recommending we stick with that. I originally requested Linux and PostgreSQL. How far can a single database server get me? If we have a 4 processor, 8 core, 128gb server, how far am I going to get before I need to shard or do something else? I know there are a lot of factors involved but in general for this size of a site, what should the strategy be? I've read almost all articles on this website but most of the applications are not RIA type of apps with this type of usage or they are architectures for

2 0.10131562 152 high scalability-2007-11-13-Flickr Architecture

Introduction: Update: Flickr hits 2 Billion photos served. That's a lot of hamburgers. Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it? Site: http://www.flickr.com Information Sources Flickr and PHP (an early document) Capacity Planning for LAMP Federation at Flickr: Doing Billions of Queries a Day by Dathan Pattishall. Building Scalable Web Sites by Cal Henderson from Flickr. Database War Stories #3: Flickr by Tim O'Reilly Cal Henderson's Talks . A lot of useful PowerPoint presentations. Platform PHP MySQL Shards Memcached for a caching layer. Squid in reverse-proxy for html and images. Linux (RedHat) Smarty for templating Perl PEAR for XML and Email parsing ImageMagick, for ima

3 0.092290573 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?

Introduction: For everything given something seems to be taken. Caching is a great scalability solution, but caching also comes with problems . Sharding is a great scalability solution, but as Foursquare recently revealed in a post-mortem about their 17 hours of downtime, sharding also has problems. MongoDB, the database Foursquare uses, also contributed their post-mortem of what went wrong too. Now that everyone has shared and resharded, what can we learn to help us skip these mistakes and quickly move on to a different set of mistakes? First, like for Facebook , huge props to Foursquare and MongoDB for being upfront and honest about their problems. This helps everyone get better and is a sign we work in a pretty cool industry. Second, overall, the fault didn't flow from evil hearts or gross negligence. As usual the cause was more mundane: a key system, that could be a little more robust, combined with a very popular application built by a small group of people, under immense pressure

4 0.084796406 672 high scalability-2009-08-06-An Unorthodox Approach to Database Design : The Coming of the Shard

Introduction: Update 4: Why you don’t want to shard. by Morgon on the MySQL Performance Blog. Optimize everything else first, and then if performance still isn’t good enough, it’s time to take a very bitter medicine. Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding. Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you. Update: Dan Pritchett shares some excellent Sharding Lessons : Size Your Shards, Use Math on Shard C

5 0.080613717 304 high scalability-2008-04-19-How to build a real-time analytics system?

Introduction: Hello everybody! I am a developer of a website with a lot of traffic. Right now we are managing the whole website using perl + postgresql + fastcgi + memcached + mogileFS + lighttpd + roundrobin DNS distributed over 5 servers and I must say it works like a charm, load is stable and everything works very fast and we are recording about 8 million pageviews per day. The only problem is with postgres database since we have it installed only on one server and if this server goes down, the whole "cluster" goes down. That's why we have a master2slave replication so we still have a backup database except that when the master goes down, all inserts/updates are disabled so the whole website is just read only. But this is not a problem since this configuration is working for us and we don't have any problems with it. Right now we are planning to build our own analytics service that would be customized for our needs. We tried various different software packages but were not satisfi

6 0.074946262 70 high scalability-2007-08-22-How many machines do you need to run your site?

7 0.07323879 1361 high scalability-2012-11-22-Gone Fishin': PlentyOfFish Architecture

8 0.072815791 598 high scalability-2009-05-12-P2P server technology?

9 0.072549634 638 high scalability-2009-06-26-PlentyOfFish Architecture

10 0.072497338 276 high scalability-2008-03-15-New Website Design Considerations

11 0.072267689 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

12 0.071038291 383 high scalability-2008-09-10-Shard servers -- go big or small?

13 0.070867442 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?

14 0.069384336 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

15 0.069336116 72 high scalability-2007-08-22-Wikimedia architecture

16 0.068977863 501 high scalability-2009-01-25-Where do I start?

17 0.067812987 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?

18 0.065893821 261 high scalability-2008-02-25-Make Your Site Run 10 Times Faster

19 0.065843664 232 high scalability-2008-01-29-When things aren't scalable

20 0.065306477 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.121), (1, 0.061), (2, -0.018), (3, -0.075), (4, 0.011), (5, -0.012), (6, -0.026), (7, -0.006), (8, 0.02), (9, -0.002), (10, -0.023), (11, 0.011), (12, -0.021), (13, 0.047), (14, 0.045), (15, -0.008), (16, -0.036), (17, 0.001), (18, -0.008), (19, 0.031), (20, 0.015), (21, -0.039), (22, -0.034), (23, -0.025), (24, 0.017), (25, -0.039), (26, -0.013), (27, -0.051), (28, 0.006), (29, 0.006), (30, 0.021), (31, 0.025), (32, 0.002), (33, -0.011), (34, -0.003), (35, 0.03), (36, 0.007), (37, 0.036), (38, 0.0), (39, -0.013), (40, -0.011), (41, 0.019), (42, -0.003), (43, -0.046), (44, 0.021), (45, 0.001), (46, 0.025), (47, 0.027), (48, -0.038), (49, 0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96301639 379 high scalability-2008-09-04-Database question for upcoming project

Introduction: We will be developing an RIA that will have a lot of database access. Think something like a QuickBooks but with about 50 transactions entered per hour per user. Users will be in the system for 7 to 9 hours a day and there will be around 20,000 users, all logged in at the same time. Reporting will be done just like a QuickBooks style app plus a lot of extra things you don't do in QuickBooks. Our operations is familiar with W2003 Server and MS SQL Server so they are recommending we stick with that. I originally requested Linux and PostgreSQL. How far can a single database server get me? If we have a 4 processor, 8 core, 128gb server, how far am I going to get before I need to shard or do something else? I know there are a lot of factors involved but in general for this size of a site, what should the strategy be? I've read almost all articles on this website but most of the applications are not RIA type of apps with this type of usage or they are architectures for

2 0.76669997 222 high scalability-2008-01-25-Application Database and DAL Architecture

Introduction: Hi gurus, I'm totally new to this high scalability thing. I'm trying to create a website with scalability in mind (personal project). In my application I'll have forums for different groups of people (each group will have their own forums, members of groups can still post in other groups' forums but each group will mainly be using their forums most of the time). Now, I'm going to start with about 2000 groups with the potential of reaching up to 10000 groups (this is the maximum due to the nature of my application). I was thinking that having all posts in one table will be way too much for one table (esp. that some groups are expected to post hundreds or even thousands times per day, let's say about 500 of the groups, the rest of the groups won't be that active though) as I'll have to index the PostID, ParentPostID, GroupID and PostDate which can produce large indexes (consequentially causing slow inserts) if having everything in one table. So, I'm thinking of a way to divide the posts

3 0.75481355 606 high scalability-2009-05-25-non-sequential, unique identifier, strategy question

Introduction: (Please bare with me, I'm a new, passionate, confident and terrified programmer :D ) Background: I'm pre-launch and 1 year into the development of my application. My target is to be able to eventually handle millions of registered users with 5-10% of them concurrent. Up to this point I've used auto-increment to assign unique identifiers to rows. I am now considering switching to a non-sequential strategy. Oh, I'm using the LAMP configuration. My reasons for avoiding auto-increment: 1. Complicates replication when scaling horizontally. Risk of collision is significant (when running multiple masters). Note: I've read the other entries in this forum that relate to ID generation and there have been some great suggestions -- including a strategy that uses auto-increment in a way that avoids this pitfall... That said, I'm still nervous about it. 2. Potential bottleneck when retrieving/assigning IDs -- IDs assigned at the database. My reasons for being nervous about

4 0.74464995 671 high scalability-2009-08-05-Stack Overflow Architecture

Introduction: Update 2 : Stack Overflow Architecture Update - Now At 95 Million Page Views A Month Update: Startup – ASP.NET MVC, Cloud Scale & Deployment shows an interesting alternative approach for a Windows stack using ServerPath/GoGrid for a dedicated database machine, elastic VMs for the front end, and a free load balancer. Stack Overflow is a much loved programmer question and answer site written by two guys nobody has ever heard of before. Well, not exactly. The site was created by top programmer and blog stars Jeff Atwood and Joel Spolsky . In that sense Stack Overflow is like a celebrity owned restaurant, only it should be around for a while. Joel estimates 1/3 of all the programmers in the world have used the site so they must be serving up something good. I fell in deep like with Stack Overflow for purely selfish reasons, it helped me solve a few difficult problems that were jabbing my eyes out with pain. I also appreciate their no-apologies anthropologically based desig

5 0.73762131 304 high scalability-2008-04-19-How to build a real-time analytics system?

Introduction: Hello everybody! I am a developer of a website with a lot of traffic. Right now we are managing the whole website using perl + postgresql + fastcgi + memcached + mogileFS + lighttpd + roundrobin DNS distributed over 5 servers and I must say it works like a charm, load is stable and everything works very fast and we are recording about 8 million pageviews per day. The only problem is with postgres database since we have it installed only on one server and if this server goes down, the whole "cluster" goes down. That's why we have a master2slave replication so we still have a backup database except that when the master goes down, all inserts/updates are disabled so the whole website is just read only. But this is not a problem since this configuration is working for us and we don't have any problems with it. Right now we are planning to build our own analytics service that would be customized for our needs. We tried various different software packages but were not satisfi

6 0.73255092 711 high scalability-2009-09-22-How Ravelry Scales to 10 Million Requests Using Rails

7 0.73231697 1288 high scalability-2012-07-23-Ask HighScalability: How Do I Build My MegaUpload + Itunes + YouTube Startup?

8 0.72309196 1268 high scalability-2012-06-20-Ask HighScalability: How do I organize millions of images?

9 0.71392524 1131 high scalability-2011-10-24-StackExchange Architecture Updates - Running Smoothly, Amazon 4x More Expensive

10 0.71254665 435 high scalability-2008-10-30-The case for functional decomposition

11 0.70994914 675 high scalability-2009-08-08-1dbase vs. many and cloud hosting vs. dedicated server(s)?

12 0.70466721 152 high scalability-2007-11-13-Flickr Architecture

13 0.70107204 511 high scalability-2009-02-12-MySpace Architecture

14 0.69749123 1438 high scalability-2013-04-10-Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution

15 0.69485831 965 high scalability-2010-12-29-Pinboard.in Architecture - Pay to Play to Keep a System Small

16 0.6900683 1507 high scalability-2013-08-26-Reddit: Lessons Learned from Mistakes Made Scaling to 1 Billion Pageviews a Month

17 0.68936509 262 high scalability-2008-02-26-Architecture to Allow High Availability File Upload

18 0.6838932 519 high scalability-2009-02-23-Database Sharding at Netlog, with MySQL and PHP

19 0.68365812 383 high scalability-2008-09-10-Shard servers -- go big or small?

20 0.68347883 231 high scalability-2008-01-29-Too many databases


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.125), (2, 0.204), (10, 0.013), (18, 0.213), (30, 0.034), (40, 0.087), (61, 0.138), (79, 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.87474459 475 high scalability-2008-12-22-SLAs in the SaaS space

Introduction: This may be a bit higher level then the general discussion here, but I think this is an important issue in how it relates to reliability and uptime. What kind of SLAs should we be expecting from SaaS services and platforms (e.g. AWS, Google App Engine, Google Premium Apps, salesforce.com, etc.)? Up to today, most SaaS services either have no SLAs or offer very weak penalties. What will it take to get these services up to the point where they can offer the SLAs that users (and more importantly, businesses) require? I presume most of the members here want to see more movement into the cloud and to SaaS services, and I'm thinking that until we see more substantial SLA guarantees, most businesses will continue to shy away as long as they can. Would love to hear what others think. Or am I totally off base?

same-blog 2 0.863325 379 high scalability-2008-09-04-Database question for upcoming project

Introduction: We will be developing an RIA that will have a lot of database access. Think something like a QuickBooks but with about 50 transactions entered per hour per user. Users will be in the system for 7 to 9 hours a day and there will be around 20,000 users, all logged in at the same time. Reporting will be done just like a QuickBooks style app plus a lot of extra things you don't do in QuickBooks. Our operations is familiar with W2003 Server and MS SQL Server so they are recommending we stick with that. I originally requested Linux and PostgreSQL. How far can a single database server get me? If we have a 4 processor, 8 core, 128gb server, how far am I going to get before I need to shard or do something else? I know there are a lot of factors involved but in general for this size of a site, what should the strategy be? I've read almost all articles on this website but most of the applications are not RIA type of apps with this type of usage or they are architectures for

3 0.84461945 1140 high scalability-2011-11-10-Kill the Telcos Save the Internet - The Unsocial Network

Introduction: Someone is killing the Internet. Since you probably use the Internet everyday you might find this surprising. It almost sounds silly, and the reason is technical, but our crack team of networking experts has examined the patient and made the diagnosis. What did they find? Diagnostic team : the  Packet Pushers  gang ( Greg Ferro , Jan Zorz , Ivan Pepelnjak ) in the podcast  How We Are Killing the Internet . Diagnosis : invasive tunnelation. ( tubes anyone? ) Prognosis : even Dr. House might not be able to help. Cure : go back to what the Internet was; kill the tunnels; route IPv4 and IPv6; have public addresses on everything; disrupt the telcos. This is a classic story in a strange setting--the network--but the themes are universal: centralization vs. decentralization (that's where the telcos obviously come in), good vs. evil, order vs. disorder, tyranny vs. freedom, change vs. stasis, simplicity vs. complexity. And it's all being carried out on battlefield few get

4 0.82079482 139 high scalability-2007-10-30-Paper: Dynamo: Amazon’s Highly Available Key-value Store

Introduction: Update 2 : Read/WriteWeb has a good article talking about the scalability issues of relational databases and how Dynamo solves them: Amazon Dynamo: The Next Generation Of Virtual Distributed Storage . But since Dynamo is just another frustrating walled garden protected by barbed wire and guard dogs, its relevance is somewhat overstated. Update : Greg Linden has a take on the paper where he questions some of Amazon's design choices: emphasizing write availability over fast reads, a lack of indexing support, use of random distribution for load balancing, and punting on some scalability issues. Werner Vogels, Amazon's avuncular CTO, just announced a new paper on the internal database technology Amazon uses to handle tens of millions customers. I'll dive into more details later, but I thought you'd want to read it hot off the blog. The bad news is it won't be a service. They are keeping this tech not so secret, but very safe. Happily, it's another real-life example to learn from.

5 0.80918288 1344 high scalability-2012-10-19-Stuff The Internet Says On Scalability For October 19, 2012

Introduction: It's HighScalability Time: @davilagrau : Youtube, GitHub,..., Are cloud services facing a entropic limit to scalability? Async all the way down?  The Tyranny of the Clock : The cost of logic and memory dominated Turing's thinking, but today, communication rather than logic should dominate our thinking. Clock-free design uses less than half, about 40%, as much energy per addition as its clocked counterpart. We can regain the efficiency of local decision making by revolting against the pervasive beat of an external clock.  Why Google Compute Engine for OpenStack . Smart move. Having OpenStack work inside a super charged cloud, in private clouds, and as a bridge between the two ought to be quite attractive to developers looking for some sort of ally for independence. All it will take are a few victories to cement new alliances. 3 Lessons That Startups Can Learn From Facebook’s Failed Credits Experiment . I thought this was a great idea too. So what happened? FACEBOOK DID NOT

6 0.80094665 1390 high scalability-2013-01-21-Processing 100 Million Pixels a Day - Small Amounts of Contention Cause Big Problems at Scale

7 0.78590214 64 high scalability-2007-08-10-How do we make a large real-time search engine?

8 0.7828207 423 high scalability-2008-10-19-Alternatives to Google App Engine

9 0.78232336 280 high scalability-2008-03-17-Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

10 0.777349 482 high scalability-2009-01-04-Alternative Memcache Usage: A Highly Scalable, Highly Available, In-Memory Shard Index

11 0.77652097 757 high scalability-2010-01-04-11 Strategies to Rock Your Startup’s Scalability in 2010

12 0.77539438 187 high scalability-2007-12-14-The Current Pros and Cons List for SimpleDB

13 0.77251536 564 high scalability-2009-04-10-counting # of views, calculating most-least viewed

14 0.770145 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management

15 0.76828796 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010

16 0.7681061 685 high scalability-2009-08-20-Dependency Injection and AOP frameworks for .NET

17 0.76717532 1089 high scalability-2011-07-29-Stuff The Internet Says On Scalability For July 29, 2011

18 0.76635444 1609 high scalability-2014-03-11-Building a Social Music Service Using AWS, Scala, Akka, Play, MongoDB, and Elasticsearch

19 0.76596427 1492 high scalability-2013-07-17-How do you create a 100th Monkey software development culture?

20 0.7648316 961 high scalability-2010-12-21-SQL + NoSQL = Yes !