high_scalability high_scalability-2007 high_scalability-2007-98 knowledge-graph by maker-knowledge-mining

98 high scalability-2007-09-18-Sync data on all servers

meta infos for this blog

Source: html

Introduction: I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. Any ideas? I just want a "real time, non resource-hungry" solution alternative for rsync. If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. - Non-shared hardware, all server has their own local disks. - As files are replicated, I can save a lot of money, RAID is not a MUST. Thn

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I have a few apache servers ( arround 11 atm ) serving a small amount of data ( arround 44 gigs right now ). [sent-1, score-1.625]

2 For some time I have been using rsync to keep all the content equal on all servers, but the amount of data has been growing, and rsync takes a few too much time to "compare" all data from source to destination, and create a lot of I/O. [sent-2, score-1.217]

3 I have been taking a look at MogileFS, it seems a good and reliable option, but as the fuse module is not finished, we should have to rewrite all our apps, and its not an option atm. [sent-3, score-0.767]

4 I just want a "real time, non resource-hungry" solution alternative for rsync. [sent-5, score-0.263]

5 If I get more features on the way, then they are welcome :) Why I prefer to use a Distributed File System instead of using NAS + NFS? [sent-6, score-0.29]

6 - I need 2 NAS, if I dont want a point of failure, and NAS hard is expensive. [sent-7, score-0.259]

7 - Non-shared hardware, all server has their own local disks. [sent-8, score-0.064]

8 - As files are replicated, I can save a lot of money, RAID is not a MUST. [sent-9, score-0.191]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('arround', 0.446), ('nas', 0.444), ('rsync', 0.307), ('fuse', 0.181), ('atm', 0.164), ('mogilefs', 0.164), ('option', 0.162), ('english', 0.157), ('gigs', 0.157), ('dont', 0.148), ('destination', 0.138), ('finished', 0.138), ('sorry', 0.136), ('welcome', 0.131), ('nfs', 0.122), ('advance', 0.119), ('amount', 0.118), ('module', 0.116), ('non', 0.11), ('rewrite', 0.11), ('prefer', 0.11), ('equal', 0.109), ('compare', 0.098), ('raid', 0.095), ('alternative', 0.087), ('replicated', 0.084), ('serving', 0.077), ('reliable', 0.07), ('apache', 0.069), ('want', 0.066), ('money', 0.066), ('files', 0.066), ('taking', 0.066), ('ideas', 0.065), ('local', 0.064), ('save', 0.064), ('growing', 0.063), ('failure', 0.063), ('seems', 0.062), ('time', 0.061), ('lot', 0.061), ('apps', 0.059), ('servers', 0.058), ('content', 0.053), ('file', 0.05), ('takes', 0.05), ('instead', 0.049), ('hard', 0.045), ('data', 0.045), ('right', 0.045)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 98 high scalability-2007-09-18-Sync data on all servers

2 0.24438563 516 high scalability-2009-02-19-Heavy upload server scalability

Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!

3 0.15449926 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

Introduction: I've been trying to find a high availability file storage solution without success. I tried GlusterFS which looks very promising but experienced problems with stability and don't want something I can't easily control and rely on. Other solutions are too complicated or have a SPOF. So I'm thinking of the following setup: Two NFS servers, a primary and a warm backup. The primary server will be rsynced with the warm backup every minute or two. I can do it so frequently as a PHP script will know which directories have changed recently from a database and only rsync those. Both servers will be NFS mounted on a cluster of web servers as /mnt/nfs-primary (sym linked as /home/websites) and /mnt/nfs-backup. I'll then use Ucarp (http://www.ucarp.org/project/ucarp) to monitor both NFS servers availability every couple of seconds and when one goes down, the Ucarp up script will be set to change the symbolic link on all web servers for the /home/websites dir from /mnt/nfs-primary to /mn

4 0.15359208 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System

Introduction: Update : Parascale’s CTO on what’s different about Parascale . Let's say you have gigglebytes of data to store and you aren't sure you want to use a CDN . Amazon's S3 doesn't excite you. And you aren't quite ready to join the grid nation. You want to keep it all in house. Wouldn't it be nice to have something like the Google File System you could use to create a unified file system out of all your disks sitting on all your nodes? According to Robin Harris, a.k.a StorageMojo (a great blog BTW), you can now have your own GFS: Parascale launches Google-like storage software . Parascale calls their softwate a Virtual Storage Network (VSN). It "aggregates disks across commodity Linux x86 servers to deliver petabyte-scale file storage. With features such as automated, transparent file replication and file migration, Parascale eliminates storage hotspots and delivers massive read/write bandwidth." Why should you care? I don't know about you, but the "storage problem" is one

5 0.14049056 488 high scalability-2009-01-08-file synchronization solutions

Introduction: I have two servers connected via Internet (NOT IN THE SAME LAN) serving the same website (http://www.ourexample.com).The problem is files uploaded on serverA and serverB cannot see each other immediately,thus rsync with certain intervals is not a good solution. Can anybody give me some advice on the following options? 1.NFS over Internet for file sharing 2.sshfs 3.inotify(our system's kernel does not support this and we donot want to risk upgrading our kernel as well) 4.drbd in active-active mode 5 or any other solutions Any suggestions will be welcomed. Thank you in advance.

6 0.13149229 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing

7 0.1314341 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication

8 0.12312539 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.

9 0.10111936 42 high scalability-2007-07-30-Product: GridLayer. Utility computing for online application

10 0.098445348 310 high scalability-2008-04-29-High performance file server

11 0.089768454 1147 high scalability-2011-11-25-Stuff The Internet Says On Scalability For November 25, 2011

12 0.086479135 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue

13 0.08525829 53 high scalability-2007-08-01-Product: MogileFS

14 0.084885135 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability

15 0.084722444 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service

16 0.081430107 67 high scalability-2007-08-17-What is the best hosting option?

17 0.078159958 690 high scalability-2009-08-31-Scaling MySQL on Amazon Web Services

18 0.072502188 143 high scalability-2007-11-06-Product: ChironFS

19 0.069830991 776 high scalability-2010-02-12-Hot Scalability Links for February 12, 2010

20 0.063514248 283 high scalability-2008-03-18-Shared filesystem on EC2

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.1), (1, 0.044), (2, -0.013), (3, -0.053), (4, -0.014), (5, -0.006), (6, 0.026), (7, -0.032), (8, 0.023), (9, 0.011), (10, -0.016), (11, -0.041), (12, -0.013), (13, -0.028), (14, 0.062), (15, 0.032), (16, -0.005), (17, 0.039), (18, -0.056), (19, 0.001), (20, 0.013), (21, 0.009), (22, -0.011), (23, 0.042), (24, 0.025), (25, 0.002), (26, 0.091), (27, -0.032), (28, -0.089), (29, 0.007), (30, -0.044), (31, 0.003), (32, -0.005), (33, 0.001), (34, -0.04), (35, 0.016), (36, 0.006), (37, -0.016), (38, 0.017), (39, -0.059), (40, -0.015), (41, -0.061), (42, -0.058), (43, -0.005), (44, -0.008), (45, -0.021), (46, 0.005), (47, 0.016), (48, 0.046), (49, -0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9158904 98 high scalability-2007-09-18-Sync data on all servers

2 0.83859468 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.

Introduction: I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far. Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster. So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think). So, for our database table of uploaded files, here's how it currently looks (simplified): fileid (pkey) filename ownerid For adding the replication and scalability, I would add a few more columns: serveroneid servertwoid serverthreeid s3 At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cro

3 0.79779607 283 high scalability-2008-03-18-Shared filesystem on EC2

Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files

4 0.75911498 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing

Introduction: I am looking for a way to distribute files over servers in different physical locations. My main concern is that I have bandwidth limitations on each location, and wish to spread the bandwidth load evenly. Atm. I just have 1:1 copies of the files on all servers, and have the application pick a random server to serve the file as a temp fix... It's a small video streaming service. I want to spoonfeed the stream to the client with a max bandwidth output, and support seek. At present I use php to limit the network stream, and read the file at a given offset sendt as a get parameter from the player for seek. It's psuedo streaming, but it works. I have been looking at MogileFS, which would solve the storage part. With MogileFS I can make use of my current php solution as it supports lighttpd and apache (with mod_rewrite or similar). However I don't see how I can apply MogileFS to check for bandwidth % usage? Any reccomendations for how I can solve this?

5 0.75103086 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System

6 0.74202174 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

7 0.71162093 143 high scalability-2007-11-06-Product: ChironFS

8 0.71053487 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option

9 0.69155395 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files

10 0.68780327 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?

11 0.68643409 278 high scalability-2008-03-16-Product: GlusterFS

12 0.68316406 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases

13 0.68255085 516 high scalability-2009-02-19-Heavy upload server scalability

14 0.66378677 128 high scalability-2007-10-21-Paper: Standardizing Storage Clusters (with pNFS)

15 0.66108704 508 high scalability-2009-02-05-Beta testers wanted for ultra high-scalability-performance clustered object storage system designed for web content delivery

16 0.65352315 488 high scalability-2009-01-08-file synchronization solutions

17 0.6522373 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability

18 0.63522077 1442 high scalability-2013-04-17-Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS

19 0.63268 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way

20 0.62154335 619 high scalability-2009-06-05-HotPads Shows the True Cost of Hosting on Amazon

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.09), (2, 0.215), (51, 0.357), (61, 0.074), (79, 0.064), (94, 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.86639112 1168 high scalability-2012-01-04-How Facebook Handled the New Year's Eve Onslaught

Introduction: How does Facebook handle the massive New Year's Eve traffic spike? Thanks to Mike Swift, in Facebook gets ready for New Year's Eve , we get a little insight as to their method for the madness, nothing really detailed, but still interesting. Problem Setup Facebook expects tha one billion+ photos will be shared on New Year's eve. Facebook's 800 million users are scattered around the world. Three quarters live outside the US. Each user is linked to an average of 130 friends. Photos and posts must appear in less than a second. Opening a homepage requires executing requests on a 100 different servers, and those requests have to be ranked, sorted, and privacy-checked, and then rendered. Different events put different stresses on different parts of Facebook. Photo and Video Uploads - Holidays require hundreds of terabytes of capacity News Feed - News events like big sports events and the death of Steve Jobs drive user status updates Coping Strategies Try

2 0.85303873 818 high scalability-2010-04-30-Behind the scenes of an online marketplace

Introduction: In a presentation originally held at the 4. O2 Hosting Event in Hamburg, I spoke about the technology at a large online marketplace in Germany called Hitmeister . Â Some of the topics discussed include: what makes up a marketplace? technically system principles development patterns tools philosophy data model hardware I am looking forward to comments and suggestions for both the presentation and our work.

same-blog 3 0.81684095 98 high scalability-2007-09-18-Sync data on all servers

4 0.76566935 481 high scalability-2009-01-02-Strategy: Understanding Your Data Leads to the Best Scalability Solutions

Introduction: In article Building Super-Scalable Web Systems with REST Udi Dahan tells an interesting story of how they made a weather reporting system scale for over 10 million users. So many users hitting their weather database didn't scale. Caching in a straightforward way wouldn't work because weather is obviously local. Caching all local reports would bring the entire database into memory, which would work for some companies, but wasn't cost efficient for them. So in typical REST fashion they turned locations into URIs. For example: http://weather.myclient.com/UK/London. This allows the weather information to be cached by intermediaries instead of hitting their servers. Hopefully for each location their servers will be hit a few times and then the caches will be hit until expiry. In order to send users directly to the correct location an IP location check is performed on login and stored in a cookie. The lookup is done once and from then on out a GET is performed directly on the r

5 0.76331592 510 high scalability-2009-02-09-Paper: Consensus Protocols: Two-Phase Commit

Introduction: Henry Robinson has created an excellent series of articles on consensus protocols. Henry starts with a very useful discussion of what all this talk about consensus really means: The consensus problem is the problem of getting a set of nodes in a distributed system to agree on something - it might be a value, a course of action or a decision. Achieving consensus allows a distributed system to act as a single entity, with every individual node aware of and in agreement with the actions of the whole of the network. In this article Henry tackles Two-Phase Commit, the protocol most databases use to arrive at a consensus for database writes. The article is very well written with lots of pretty and informative pictures. He did a really good job. In conclusion we learn 2PC is very efficient, a minimal number of messages are exchanged and latency is low. The problem is when a co-ordinator fails availability is dramatically reduced. This is why 2PC isn't generally used on highly distributed

6 0.76188213 1644 high scalability-2014-05-07-Update on Disqus: It's Still About Realtime, But Go Demolishes Python

7 0.75426555 741 high scalability-2009-11-16-Building Scalable Systems Using Data as a Composite Material

8 0.74603289 1134 high scalability-2011-10-28-Stuff The Internet Says On Scalability For October 28, 2011

9 0.72978634 1271 high scalability-2012-06-25-StubHub Architecture: The Surprising Complexity Behind the World’s Largest Ticket Marketplace

10 0.71588176 1629 high scalability-2014-04-10-Paper: Scalable Atomic Visibility with RAMP Transactions - Scale Linearly to 100 Servers

11 0.71135575 838 high scalability-2010-06-08-Sponsored Post: Jobs: Digg, Huffington Post Events: Velocity Conference, Social Developer Summit

12 0.69464028 138 high scalability-2007-10-30-Feedblendr Architecture - Using EC2 to Scale

13 0.69036239 953 high scalability-2010-12-03-GPU vs CPU Smackdown : The Rise of Throughput-Oriented Architectures

14 0.66388839 298 high scalability-2008-04-07-Lazy web sites run faster

15 0.6523757 1638 high scalability-2014-04-28-How Disqus Went Realtime with 165K Messages Per Second and Less than .2 Seconds Latency

16 0.64271426 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management

17 0.63228476 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability

18 0.62395412 332 high scalability-2008-05-28-Job queue and search engine

19 0.62242723 529 high scalability-2009-03-10-Paper: Consensus Protocols: Paxos

20 0.61723518 846 high scalability-2010-06-22-Sponsored Post: Jobs: Etsy, Digg, Huffington Post Event: Velocity Conference