high_scalability high_scalability-2009 high_scalability-2009-516 knowledge-graph by maker-knowledge-mining

516 high scalability-2009-02-19-Heavy upload server scalability


meta infos for this blog

Source: html

Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). [sent-1, score-1.344]

2 We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. [sent-2, score-0.805]

3 Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). [sent-3, score-1.009]

4 Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? [sent-4, score-0.606]

5 Should we go towards NAS sharding, more servers, NIO on tomcat. [sent-6, score-0.184]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nas', 0.432), ('compound', 0.294), ('night', 0.294), ('nio', 0.239), ('clients', 0.22), ('rising', 0.212), ('sticky', 0.209), ('files', 0.191), ('uploads', 0.185), ('inputs', 0.179), ('tomcat', 0.179), ('nfs', 0.177), ('recommend', 0.176), ('balancer', 0.14), ('sessions', 0.137), ('thanks', 0.137), ('basically', 0.136), ('towards', 0.132), ('hardware', 0.128), ('sharding', 0.114), ('backup', 0.112), ('worked', 0.108), ('per', 0.1), ('load', 0.096), ('currently', 0.093), ('servers', 0.084), ('via', 0.079), ('written', 0.079), ('requests', 0.075), ('file', 0.072), ('put', 0.072), ('http', 0.066), ('day', 0.066), ('solution', 0.065), ('infrastructure', 0.06), ('running', 0.055), ('number', 0.054), ('go', 0.052), ('architecture', 0.05), ('every', 0.048), ('software', 0.048), ('could', 0.047), ('scale', 0.041), ('would', 0.038), ('system', 0.03), ('data', 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 516 high scalability-2009-02-19-Heavy upload server scalability

Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!

2 0.24438563 98 high scalability-2007-09-18-Sync data on all servers

Introduction: I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. Any ideas? I just want a "real time, non resource-hungry" solution alternative for rsync. If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. - Non-shared hardware, all server has their own local disks. - As files are replicated, I can save a lot of money, RAID is not a MUST. Thn

3 0.13358539 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

Introduction: I've been trying to find a high availability file storage solution without success. I tried GlusterFS which looks very promising but experienced problems with stability and don't want something I can't easily control and rely on. Other solutions are too complicated or have a SPOF. So I'm thinking of the following setup: Two NFS servers, a primary and a warm backup. The primary server will be rsynced with the warm backup every minute or two. I can do it so frequently as a PHP script will know which directories have changed recently from a database and only rsync those. Both servers will be NFS mounted on a cluster of web servers as /mnt/nfs-primary (sym linked as /home/websites) and /mnt/nfs-backup. I'll then use Ucarp (http://www.ucarp.org/project/ucarp) to monitor both NFS servers availability every couple of seconds and when one goes down, the Ucarp up script will be set to change the symbolic link on all web servers for the /home/websites dir from /mnt/nfs-primary to /mn

4 0.13301542 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System

Introduction: Update :   Parascale’s CTO on what’s different about Parascale . Let's say you have gigglebytes of data to store and you aren't sure you want to use a CDN . Amazon's S3 doesn't excite you. And you aren't quite ready to join the grid nation. You want to keep it all in house. Wouldn't it be nice to have something like the Google File System you could use to create a unified file system out of all your disks sitting on all your nodes? According to Robin Harris, a.k.a StorageMojo (a great blog BTW), you can now have your own GFS: Parascale launches Google-like storage software . Parascale calls their softwate a Virtual Storage Network (VSN). It "aggregates disks across commodity Linux x86 servers to deliver petabyte-scale file storage. With features such as automated, transparent file replication and file migration, Parascale eliminates storage hotspots and delivers massive read/write bandwidth." Why should you care? I don't know about you, but the "storage problem" is one

5 0.13215801 310 high scalability-2008-04-29-High performance file server

Introduction: What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. we have 3 applications working as a pipeline, which process data stored in the NFS drive. The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. The data load to the pipeline is like 1 GBytes per minute. I think the NFS drive is the bottleneck here. Would buying a specialized file server improve the performance of data read write from the disk ?

6 0.12572663 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service

7 0.12535483 683 high scalability-2009-08-18-Hardware Architecture Example (geographical level mapping of servers)

8 0.11938266 262 high scalability-2008-02-26-Architecture to Allow High Availability File Upload

9 0.11880652 283 high scalability-2008-03-18-Shared filesystem on EC2

10 0.11180025 399 high scalability-2008-10-01-Joyent - Cloud Computing Built on Accelerators

11 0.11041577 619 high scalability-2009-06-05-HotPads Shows the True Cost of Hosting on Amazon

12 0.10588711 620 high scalability-2009-06-05-SSL RPC API Scalability

13 0.10409953 42 high scalability-2007-07-30-Product: GridLayer. Utility computing for online application

14 0.09856084 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication

15 0.096746318 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.

16 0.093517467 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability

17 0.092411645 313 high scalability-2008-05-02-Friends for Sale Architecture - A 300 Million Page View-Month Facebook RoR App

18 0.091846168 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?

19 0.091013581 128 high scalability-2007-10-21-Paper: Standardizing Storage Clusters (with pNFS)

20 0.089463219 1126 high scalability-2011-09-27-Use Instance Caches to Save Money: Latency == $$$


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.109), (1, 0.05), (2, -0.036), (3, -0.114), (4, -0.01), (5, -0.033), (6, 0.074), (7, -0.064), (8, 0.006), (9, 0.043), (10, -0.009), (11, -0.03), (12, 0.034), (13, -0.043), (14, 0.027), (15, 0.046), (16, -0.002), (17, 0.046), (18, -0.044), (19, 0.059), (20, 0.018), (21, 0.027), (22, -0.031), (23, -0.024), (24, 0.055), (25, -0.018), (26, 0.052), (27, -0.033), (28, -0.069), (29, 0.016), (30, 0.017), (31, -0.03), (32, -0.021), (33, 0.017), (34, -0.045), (35, 0.047), (36, 0.052), (37, -0.057), (38, 0.013), (39, -0.052), (40, -0.027), (41, -0.03), (42, -0.026), (43, -0.013), (44, -0.017), (45, 0.03), (46, 0.038), (47, 0.011), (48, 0.048), (49, -0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95826846 516 high scalability-2009-02-19-Heavy upload server scalability

Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!

2 0.78663868 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.

Introduction: I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far. Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster. So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think). So, for our database table of uploaded files, here's how it currently looks (simplified): fileid (pkey) filename ownerid For adding the replication and scalability, I would add a few more columns: serveroneid servertwoid serverthreeid s3 At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cro

3 0.7741729 283 high scalability-2008-03-18-Shared filesystem on EC2

Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files

4 0.71437895 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

Introduction: I've been trying to find a high availability file storage solution without success. I tried GlusterFS which looks very promising but experienced problems with stability and don't want something I can't easily control and rely on. Other solutions are too complicated or have a SPOF. So I'm thinking of the following setup: Two NFS servers, a primary and a warm backup. The primary server will be rsynced with the warm backup every minute or two. I can do it so frequently as a PHP script will know which directories have changed recently from a database and only rsync those. Both servers will be NFS mounted on a cluster of web servers as /mnt/nfs-primary (sym linked as /home/websites) and /mnt/nfs-backup. I'll then use Ucarp (http://www.ucarp.org/project/ucarp) to monitor both NFS servers availability every couple of seconds and when one goes down, the Ucarp up script will be set to change the symbolic link on all web servers for the /home/websites dir from /mnt/nfs-primary to /mn

5 0.71055514 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing

Introduction: I am looking for a way to distribute files over servers in different physical locations. My main concern is that I have bandwidth limitations on each location, and wish to spread the bandwidth load evenly. Atm. I just have 1:1 copies of the files on all servers, and have the application pick a random server to serve the file as a temp fix... It's a small video streaming service. I want to spoonfeed the stream to the client with a max bandwidth output, and support seek. At present I use php to limit the network stream, and read the file at a given offset sendt as a get parameter from the player for seek. It's psuedo streaming, but it works. I have been looking at MogileFS, which would solve the storage part. With MogileFS I can make use of my current php solution as it supports lighttpd and apache (with mod_rewrite or similar). However I don't see how I can apply MogileFS to check for bandwidth % usage? Any reccomendations for how I can solve this?

6 0.70205384 620 high scalability-2009-06-05-SSL RPC API Scalability

7 0.68858814 72 high scalability-2007-08-22-Wikimedia architecture

8 0.68708611 98 high scalability-2007-09-18-Sync data on all servers

9 0.67212236 118 high scalability-2007-10-09-High Load on production Webservers after Sourcecode sync

10 0.67080516 619 high scalability-2009-06-05-HotPads Shows the True Cost of Hosting on Amazon

11 0.66626567 140 high scalability-2007-11-02-How WordPress.com Tracks 300 Servers Handling 10 Million Pageviews

12 0.65746802 81 high scalability-2007-09-06-Scaling IMAP and POP3

13 0.65492278 111 high scalability-2007-10-04-Number of load balanced servers

14 0.65177041 157 high scalability-2007-11-16-Product: lbpool - Load Balancing JDBC Pool

15 0.64884943 566 high scalability-2009-04-13-High Performance Web Pages – Real World Examples: Netflix Case Study

16 0.64685231 1593 high scalability-2014-02-10-13 Simple Tricks for Scaling Python and Django with Apache from HackerEarth

17 0.64170921 1260 high scalability-2012-06-07-Case Study on Scaling PaaS infrastructure

18 0.63940287 251 high scalability-2008-02-18-How to deal with an I-O bottleneck to disk?

19 0.63789934 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way

20 0.63723725 1521 high scalability-2013-09-23-Salesforce Architecture - How they Handle 1.3 Billion Transactions a Day


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.049), (2, 0.303), (20, 0.207), (30, 0.086), (61, 0.173), (79, 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92137355 516 high scalability-2009-02-19-Heavy upload server scalability

Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!

2 0.85271966 1566 high scalability-2013-12-18-How to get started with sizing and capacity planning, assuming you don't know the software behavior?

Introduction: Here's a common situation and question from the mechanical-sympathy Google group by Avinash Agrawal on the black art of capacity planning: How to get started with sizing and capacity planning, assuming we don't know the software behavior and its completely new product to deal with? Gil Tene , Vice President of Technology and CTO & Co-Founder, wrote a very  understandable and useful answer  that is worth highlighting: Start with requirements. I see way too many "capacity planning" exercises that go off spending weeks measuring some irrelevant metrics about a system (like how many widgets per hour can this thing do) without knowing what they actually need it to do. There are two key sets of metrics to state here: the "how much" set and the "how bad" set: In the "How Much" part, you need to establish, based on expected business needs, Numbers for things (like connections, users, streams, transactions or messages per second) that you expect to interact with at the peak t

3 0.83963615 714 high scalability-2009-10-02-HighScalability has Moved to Squarespace.com!

Introduction: You may have noticed something is a little a different when visiting HighScalability today: We've Moved! HighScalability.com has switched hosting services to Squarespace.com. House warming gifts are completely unnecessary. Thanks for the thought though. It's been a long long long process. Importing a largish Drupal site to Wordpress and then into Squarespace is a bit like dental work without the happy juice, but the results are worth it. While the site is missing a few features I think it looks nicer, feels faster, and I'm betting it will be more scalable and more reliable. All good things. I'll explain more about the move later in this post, but there's some admistrivia that needs to be handled to make the move complete: If you have a user account and have posted on HighScalability before then you have a user account, but since I don't know your passwords I had to make new passwords up for you. So please contact  me and I'll give you your password so you can login and change it.

4 0.83605975 685 high scalability-2009-08-20-Dependency Injection and AOP frameworks for .NET

Introduction: We're looking to implement a framework to do Dependency Injection and AOP for a new solution we're working on. It will likely get hit pretty hard, so we'd like to chose a framework that's proven to scale well, and operates well under pressure. Right now, we're looking closely at Spring.NET, Castle Project's Windsor framework, and Unity. Does anyone have any feedback on implementing any of these in large, high traffic environments?

5 0.82181865 772 high scalability-2010-02-05-High Availability Principle : Concurrency Control

Introduction: One important high availability principle is concurrency control.  The idea is to allow only that much traffic through to your system which your system can handle successfully.  For example: if your system is certified to handle a concurrency of 100 then the 101st request should either timeout, be asked to try later  or wait until one of the previous 100 requests finish.  The 101st request should not be allowed to negatively impact the experience of the other 100 users.  Only the 101st request should be impacted. Read more here...

6 0.82038379 1135 high scalability-2011-10-31-15 Ways to Make Your Application Feel More Responsive under Google App Engine

7 0.81985962 342 high scalability-2008-06-08-Search fast in million rows

8 0.81941634 23 high scalability-2007-07-24-Major Websites Down: Or Why You Want to Run in Two or More Data Centers.

9 0.81915671 745 high scalability-2009-11-25-Brian Aker's Hilarious NoSQL Stand Up Routine

10 0.81752372 961 high scalability-2010-12-21-SQL + NoSQL = Yes !

11 0.81590688 825 high scalability-2010-05-10-Sify.com Architecture - A Portal at 3900 Requests Per Second

12 0.81486923 917 high scalability-2010-10-08-4 Scalability Themes from Surgecon

13 0.81328362 252 high scalability-2008-02-18-limit on the number of databases open

14 0.81219655 703 high scalability-2009-09-12-How Google Taught Me to Cache and Cash-In

15 0.80978072 1074 high scalability-2011-07-06-11 Common Web Use Cases Solved in Redis

16 0.80716431 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things

17 0.80660462 1337 high scalability-2012-10-10-Antirez: You Need to Think in Terms of Organizing Your Data for Fetching

18 0.80651134 1138 high scalability-2011-11-07-10 Core Architecture Pattern Variations for Achieving Scalability

19 0.80638188 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

20 0.80443454 1615 high scalability-2014-03-19-Strategy: Three Techniques to Survive Traffic Surges by Quickly Scaling Your Site