high_scalability high_scalability-2007 high_scalability-2007-98 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. Any ideas? I just want a "real time, non resource-hungry" solution alternative for rsync. If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. - Non-shared hardware, all server has their own local disks. - As files are replicated, I can save a lot of money, RAID is not a MUST. Thn
sentIndex sentText sentNum sentScore
1 I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). [sent-1, score-1.625]
2 For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. [sent-2, score-1.217]
3 I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. [sent-3, score-0.767]
4 I just want a "real time, non resource-hungry" solution alternative for rsync. [sent-5, score-0.263]
5 If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? [sent-6, score-0.29]
6 - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. [sent-7, score-0.259]
7 - Non-shared hardware, all server has their own local disks. [sent-8, score-0.064]
8 - As files are replicated, I can save a lot of money, RAID is not a MUST. [sent-9, score-0.191]
wordName wordTfidf (topN-words)
[('arround', 0.446), ('nas', 0.444), ('rsync', 0.307), ('fuse', 0.181), ('atm', 0.164), ('mogilefs', 0.164), ('option', 0.162), ('english', 0.157), ('gigs', 0.157), ('dont', 0.148), ('destination', 0.138), ('finished', 0.138), ('sorry', 0.136), ('welcome', 0.131), ('nfs', 0.122), ('advance', 0.119), ('amount', 0.118), ('module', 0.116), ('non', 0.11), ('rewrite', 0.11), ('prefer', 0.11), ('equal', 0.109), ('compare', 0.098), ('raid', 0.095), ('alternative', 0.087), ('replicated', 0.084), ('serving', 0.077), ('reliable', 0.07), ('apache', 0.069), ('want', 0.066), ('money', 0.066), ('files', 0.066), ('taking', 0.066), ('ideas', 0.065), ('local', 0.064), ('save', 0.064), ('growing', 0.063), ('failure', 0.063), ('seems', 0.062), ('time', 0.061), ('lot', 0.061), ('apps', 0.059), ('servers', 0.058), ('content', 0.053), ('file', 0.05), ('takes', 0.05), ('instead', 0.049), ('hard', 0.045), ('data', 0.045), ('right', 0.045)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 98 high scalability-2007-09-18-Sync data on all servers
Introduction: I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. Any ideas? I just want a "real time, non resource-hungry" solution alternative for rsync. If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. - Non-shared hardware, all server has their own local disks. - As files are replicated, I can save a lot of money, RAID is not a MUST. Thn
2 0.24438563 516 high scalability-2009-02-19-Heavy upload server scalability
Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!
3 0.15449926 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?
Introduction: I've been trying to find a high availability file storage solution without success. I tried GlusterFS which looks very promising but experienced problems with stability and don't want something I can't easily control and rely on. Other solutions are too complicated or have a SPOF. So I'm thinking of the following setup: Two NFS servers, a primary and a warm backup. The primary server will be rsynced with the warm backup every minute or two. I can do it so frequently as a PHP script will know which directories have changed recently from a database and only rsync those. Both servers will be NFS mounted on a cluster of web servers as /mnt/nfs-primary (sym linked as /home/websites) and /mnt/nfs-backup. I'll then use Ucarp (http://www.ucarp.org/project/ucarp) to monitor both NFS servers availability every couple of seconds and when one goes down, the Ucarp up script will be set to change the symbolic link on all web servers for the /home/websites dir from /mnt/nfs-primary to /mn
4 0.15359208 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System
Introduction: Update : Parascale’s CTO on what’s different about Parascale . Let's say you have gigglebytes of data to store and you aren't sure you want to use a CDN . Amazon's S3 doesn't excite you. And you aren't quite ready to join the grid nation. You want to keep it all in house. Wouldn't it be nice to have something like the Google File System you could use to create a unified file system out of all your disks sitting on all your nodes? According to Robin Harris, a.k.a StorageMojo (a great blog BTW), you can now have your own GFS: Parascale launches Google-like storage software . Parascale calls their softwate a Virtual Storage Network (VSN). It "aggregates disks across commodity Linux x86 servers to deliver petabyte-scale file storage. With features such as automated, transparent file replication and file migration, Parascale eliminates storage hotspots and delivers massive read/write bandwidth." Why should you care? I don't know about you, but the "storage problem" is one
5 0.14049056 488 high scalability-2009-01-08-file synchronization solutions
Introduction: I have two servers connected via Internet (NOT IN THE SAME LAN) serving the same website (http://www.ourexample.com).The problem is files uploaded on serverA and serverB cannot see each other immediately,thus rsync with certain intervals is not a good solution. Can anybody give me some advice on the following options? 1.NFS over Internet for file sharing 2.sshfs 3.inotify(our system's kernel does not support this and we donot want to risk upgrading our kernel as well) 4.drbd in active-active mode 5 or any other solutions Any suggestions will be welcomed. Thank you in advance.
6 0.13149229 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing
7 0.1314341 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication
8 0.12312539 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.
9 0.10111936 42 high scalability-2007-07-30-Product: GridLayer. Utility computing for online application
10 0.098445348 310 high scalability-2008-04-29-High performance file server
11 0.089768454 1147 high scalability-2011-11-25-Stuff The Internet Says On Scalability For November 25, 2011
12 0.086479135 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
13 0.08525829 53 high scalability-2007-08-01-Product: MogileFS
14 0.084885135 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability
15 0.084722444 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service
16 0.081430107 67 high scalability-2007-08-17-What is the best hosting option?
17 0.078159958 690 high scalability-2009-08-31-Scaling MySQL on Amazon Web Services
18 0.072502188 143 high scalability-2007-11-06-Product: ChironFS
19 0.069830991 776 high scalability-2010-02-12-Hot Scalability Links for February 12, 2010
20 0.063514248 283 high scalability-2008-03-18-Shared filesystem on EC2
topicId topicWeight
[(0, 0.1), (1, 0.044), (2, -0.013), (3, -0.053), (4, -0.014), (5, -0.006), (6, 0.026), (7, -0.032), (8, 0.023), (9, 0.011), (10, -0.016), (11, -0.041), (12, -0.013), (13, -0.028), (14, 0.062), (15, 0.032), (16, -0.005), (17, 0.039), (18, -0.056), (19, 0.001), (20, 0.013), (21, 0.009), (22, -0.011), (23, 0.042), (24, 0.025), (25, 0.002), (26, 0.091), (27, -0.032), (28, -0.089), (29, 0.007), (30, -0.044), (31, 0.003), (32, -0.005), (33, 0.001), (34, -0.04), (35, 0.016), (36, 0.006), (37, -0.016), (38, 0.017), (39, -0.059), (40, -0.015), (41, -0.061), (42, -0.058), (43, -0.005), (44, -0.008), (45, -0.021), (46, 0.005), (47, 0.016), (48, 0.046), (49, -0.016)]
simIndex simValue blogId blogTitle
same-blog 1 0.9158904 98 high scalability-2007-09-18-Sync data on all servers
Introduction: I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. Any ideas? I just want a "real time, non resource-hungry" solution alternative for rsync. If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. - Non-shared hardware, all server has their own local disks. - As files are replicated, I can save a lot of money, RAID is not a MUST. Thn
2 0.83859468 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.
Introduction: I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far. Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster. So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think). So, for our database table of uploaded files, here's how it currently looks (simplified): fileid (pkey) filename ownerid For adding the replication and scalability, I would add a few more columns: serveroneid servertwoid serverthreeid s3 At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cro
3 0.79779607 283 high scalability-2008-03-18-Shared filesystem on EC2
Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files
4 0.75911498 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing
Introduction: I am looking for a way to distribute files over servers in different physical locations. My main concern is that I have bandwidth limitations on each location, and wish to spread the bandwidth load evenly. Atm. I just have 1:1 copies of the files on all servers, and have the application pick a random server to serve the file as a temp fix... It's a small video streaming service. I want to spoonfeed the stream to the client with a max bandwidth output, and support seek. At present I use php to limit the network stream, and read the file at a given offset sendt as a get parameter from the player for seek. It's psuedo streaming, but it works. I have been looking at MogileFS, which would solve the storage part. With MogileFS I can make use of my current php solution as it supports lighttpd and apache (with mod_rewrite or similar). However I don't see how I can apply MogileFS to check for bandwidth % usage? Any reccomendations for how I can solve this?
5 0.75103086 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System
Introduction: Update : Parascale’s CTO on what’s different about Parascale . Let's say you have gigglebytes of data to store and you aren't sure you want to use a CDN . Amazon's S3 doesn't excite you. And you aren't quite ready to join the grid nation. You want to keep it all in house. Wouldn't it be nice to have something like the Google File System you could use to create a unified file system out of all your disks sitting on all your nodes? According to Robin Harris, a.k.a StorageMojo (a great blog BTW), you can now have your own GFS: Parascale launches Google-like storage software . Parascale calls their softwate a Virtual Storage Network (VSN). It "aggregates disks across commodity Linux x86 servers to deliver petabyte-scale file storage. With features such as automated, transparent file replication and file migration, Parascale eliminates storage hotspots and delivers massive read/write bandwidth." Why should you care? I don't know about you, but the "storage problem" is one
6 0.74202174 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?
7 0.71162093 143 high scalability-2007-11-06-Product: ChironFS
8 0.71053487 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option
9 0.69155395 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files
10 0.68780327 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?
11 0.68643409 278 high scalability-2008-03-16-Product: GlusterFS
12 0.68316406 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
13 0.68255085 516 high scalability-2009-02-19-Heavy upload server scalability
14 0.66378677 128 high scalability-2007-10-21-Paper: Standardizing Storage Clusters (with pNFS)
16 0.65352315 488 high scalability-2009-01-08-file synchronization solutions
17 0.6522373 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability
18 0.63522077 1442 high scalability-2013-04-17-Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS
19 0.63268 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way
20 0.62154335 619 high scalability-2009-06-05-HotPads Shows the True Cost of Hosting on Amazon
topicId topicWeight
[(1, 0.09), (2, 0.215), (51, 0.357), (61, 0.074), (79, 0.064), (94, 0.071)]
simIndex simValue blogId blogTitle
1 0.86639112 1168 high scalability-2012-01-04-How Facebook Handled the New Year's Eve Onslaught
Introduction: How does Facebook handle the massive New Year's Eve traffic spike? Thanks to Mike Swift, in Facebook gets ready for New Year's Eve , we get a little insight as to their method for the madness, nothing really detailed, but still interesting. Problem Setup Facebook expects tha one billion+ photos will be shared on New Year's eve. Facebook's 800 million users are scattered around the world. Three quarters live outside the US. Each user is linked to an average of 130 friends. Photos and posts must appear in less than a second. Opening a homepage requires executing requests on a 100 different servers, and those requests have to be ranked, sorted, and privacy-checked, and then rendered. Different events put different stresses on different parts of Facebook. Photo and Video Uploads - Holidays require hundreds of terabytes of capacity News Feed - News events like big sports events and the death of Steve Jobs drive user status updates Coping Strategies Try
2 0.85303873 818 high scalability-2010-04-30-Behind the scenes of an online marketplace
Introduction: In a presentation originally held at the 4. O2 Hosting Event in Hamburg, I spoke about the technology at a large online marketplace in Germany called Hitmeister . Â Some of the topics discussed include: what makes up a marketplace? technically system principles development patterns tools philosophy data model hardware I am looking forward to comments and suggestions for both the presentation and our work.
same-blog 3 0.81684095 98 high scalability-2007-09-18-Sync data on all servers
Introduction: I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. Any ideas? I just want a "real time, non resource-hungry" solution alternative for rsync. If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. - Non-shared hardware, all server has their own local disks. - As files are replicated, I can save a lot of money, RAID is not a MUST. Thn
4 0.76566935 481 high scalability-2009-01-02-Strategy: Understanding Your Data Leads to the Best Scalability Solutions
Introduction: In article Building Super-Scalable Web Systems with REST Udi Dahan tells an interesting story of how they made a weather reporting system scale for over 10 million users. So many users hitting their weather database didn't scale. Caching in a straightforward way wouldn't work because weather is obviously local. Caching all local reports would bring the entire database into memory, which would work for some companies, but wasn't cost efficient for them. So in typical REST fashion they turned locations into URIs. For example: http://weather.myclient.com/UK/London. This allows the weather information to be cached by intermediaries instead of hitting their servers. Hopefully for each location their servers will be hit a few times and then the caches will be hit until expiry. In order to send users directly to the correct location an IP location check is performed on login and stored in a cookie. The lookup is done once and from then on out a GET is performed directly on the r
5 0.76331592 510 high scalability-2009-02-09-Paper: Consensus Protocols: Two-Phase Commit
Introduction: Henry Robinson has created an excellent series of articles on consensus protocols. Henry starts with a very useful discussion of what all this talk about consensus really means: The consensus problem is the problem of getting a set of nodes in a distributed system to agree on something - it might be a value, a course of action or a decision. Achieving consensus allows a distributed system to act as a single entity, with every individual node aware of and in agreement with the actions of the whole of the network. In this article Henry tackles Two-Phase Commit, the protocol most databases use to arrive at a consensus for database writes. The article is very well written with lots of pretty and informative pictures. He did a really good job. In conclusion we learn 2PC is very efficient, a minimal number of messages are exchanged and latency is low. The problem is when a co-ordinator fails availability is dramatically reduced. This is why 2PC isn't generally used on highly distributed
6 0.76188213 1644 high scalability-2014-05-07-Update on Disqus: It's Still About Realtime, But Go Demolishes Python
7 0.75426555 741 high scalability-2009-11-16-Building Scalable Systems Using Data as a Composite Material
8 0.74603289 1134 high scalability-2011-10-28-Stuff The Internet Says On Scalability For October 28, 2011
10 0.71588176 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers
12 0.69464028 138 high scalability-2007-10-30-Feedblendr Architecture - Using EC2 to Scale
13 0.69036239 953 high scalability-2010-12-03-GPU vs CPU Smackdown : The Rise of Throughput-Oriented Architectures
14 0.66388839 298 high scalability-2008-04-07-Lazy web sites run faster
16 0.64271426 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
17 0.63228476 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability
18 0.62395412 332 high scalability-2008-05-28-Job queue and search engine
19 0.62242723 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos
20 0.61723518 846 high scalability-2010-06-22-Sponsored Post: Jobs: Etsy, Digg, Huffington Post Event: Velocity Conference