high_scalability high_scalability-2008 high_scalability-2008-283 knowledge-graph by maker-knowledge-mining

283 high scalability-2008-03-18-Shared filesystem on EC2


meta infos for this blog

Source: html

Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I'm looking for a way to share files between EC2 nodes. [sent-2, score-0.492]

2 It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. [sent-4, score-0.471]

3 We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. [sent-5, score-0.942]

4 This takes ages, and will take even longer the more files we get. [sent-6, score-0.488]

5 What worries me is that it seems to make each node a point of failure for the entire system. [sent-7, score-0.536]

6 One node crashes and soon the entire cluster has crashed. [sent-8, score-0.554]

7 It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. [sent-10, score-0.325]

8 We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. [sent-12, score-0.06]

9 GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). [sent-13, score-0.778]

10 The files are mostly thumbnails, but also some larger images and media files. [sent-14, score-0.716]

11 Does anyone have a good solution for sharing files between EC2 nodes? [sent-15, score-0.569]

12 org/] concept of using the local filesystem as a cache for S3, but I'm not sure if ThruDB is mature enough yet. [sent-18, score-0.482]

13 Or maybe some kind of distributed filesystem built on top of git would work? [sent-19, score-0.487]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('files', 0.427), ('thrudb', 0.337), ('glusterfs', 0.315), ('filesystem', 0.216), ('node', 0.172), ('defeats', 0.169), ('filing', 0.157), ('worries', 0.153), ('crashed', 0.153), ('reconfigure', 0.149), ('db', 0.149), ('thumbnails', 0.146), ('restarting', 0.146), ('ages', 0.134), ('mature', 0.124), ('crashes', 0.117), ('seems', 0.115), ('trouble', 0.112), ('kind', 0.112), ('restart', 0.11), ('removing', 0.102), ('entire', 0.096), ('slave', 0.095), ('installed', 0.095), ('git', 0.092), ('cluster', 0.091), ('machine', 0.089), ('horizontal', 0.088), ('thanks', 0.088), ('concept', 0.082), ('soon', 0.078), ('mostly', 0.076), ('images', 0.076), ('sharing', 0.075), ('media', 0.074), ('starting', 0.072), ('backup', 0.072), ('recently', 0.071), ('past', 0.069), ('master', 0.068), ('maybe', 0.067), ('anyone', 0.067), ('reliable', 0.065), ('share', 0.065), ('larger', 0.063), ('longer', 0.061), ('include', 0.061), ('ideas', 0.061), ('local', 0.06), ('instances', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 283 high scalability-2008-03-18-Shared filesystem on EC2

Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files

2 0.20149402 278 high scalability-2008-03-16-Product: GlusterFS

Introduction: Adapted from their website: GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband HBA). Cluster file systems are still not mature for enterprise market. They are too complex to deploy and maintain though they are extremely scalable and cheap. Can be entirely built out of commodity OS and hardware. GlusterFS hopes to solves this problem. GlusterFS achieved 35 GBps read throughput . The GlusterFS Aggregated I/O Benchmark was performed on 64 bricks clustered storage system over 10 Gbps Infiniband interconnect. A cluster of 220 clients pounded the storage system with multiple dd (disk-dump) instances, each reading / writing a 1 GB file with 1MB block size. GlusterFS was configured with unify translator and round-robin scheduler

3 0.18644832 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?

Introduction: We are serving dynamic (PHP) websites and their assets (JS/CSS/images/videos/binary downloads) via the same apache hosts. The static files are only being used as origins for CDN services used to distribute those files. Yet, in the current development-deploy pipeline, these files are checked into the same version control repositories as the code is. This is what we would like to change, for several reasons (decouple asset deployment from development & developers, lessen size of code repositories, etc.) My idea is to do the following: Set up a media server (cluster) which serves as an API (REST e.g.). You can PUT files to it, and get back the URL the file is available through from the public. In between input and output, the media service deals with everything that's necessary to serve the files: Upload them to the CDN, create the public URL, write the meta data to a (relational?) database, assign a version number... This API can be used by a) the application/website directly to provid

4 0.12894788 525 high scalability-2009-03-05-Product: Amazon Simple Storage Service

Introduction: Update: HostedFTP.com - Amazon S3 Performance Report . How fast is S3? Based on their own study HostedFTP.com has found: 10 to 12 MB/second when storing and receiving files and 140 ms per file stored as a fixed overhead cost. Update: A Quantitative Comparison of Rackspace and Amazon Cloud Storage Solutions . S3 isn't the only cloud storage service out there. Mosso is saying they can save you so money while offering support. There are number of scenarios in their paper, but For 5TB of cloud storage Mosso will save you 17% over S3 without support and 42% with support. For their CDN on a Global test Mosso says the average response time is 333ms for CloudFront vs. 107ms for Cloud Files which means globally, Cloud Files is 3.1 times or 211% faster than CloudFront. Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers. This service allows you to link directly to files at a cost of 15 cents per GB of storage, and 20 cents per GB

5 0.1226511 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.

Introduction: I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far. Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster. So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think). So, for our database table of uploaded files, here's how it currently looks (simplified): fileid (pkey) filename ownerid For adding the replication and scalability, I would add a few more columns: serveroneid servertwoid serverthreeid s3 At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cro

6 0.11880652 516 high scalability-2009-02-19-Heavy upload server scalability

7 0.11266251 619 high scalability-2009-06-05-HotPads Shows the True Cost of Hosting on Amazon

8 0.10812598 1102 high scalability-2011-08-22-Strategy: Run a Scalable, Available, and Cheap Static Site on S3 or GitHub

9 0.10739019 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication

10 0.10562605 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way

11 0.10465055 1142 high scalability-2011-11-14-Using Gossip Protocols for Failure Detection, Monitoring, Messaging and Other Good Things

12 0.10257481 1268 high scalability-2012-06-20-Ask HighScalability: How do I organize millions of images?

13 0.10211971 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files

14 0.10011287 7 high scalability-2007-07-12-FeedBurner Architecture

15 0.099445172 1221 high scalability-2012-04-03-Hazelcast 2.0: Big Data In-Memory

16 0.098397657 30 high scalability-2007-07-26-Product: AWStats a Log Analyzer

17 0.096737333 1440 high scalability-2013-04-15-Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

18 0.095911883 601 high scalability-2009-05-17-Product: Hadoop

19 0.095588684 304 high scalability-2008-04-19-How to build a real-time analytics system?

20 0.095486812 53 high scalability-2007-08-01-Product: MogileFS


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.132), (1, 0.07), (2, -0.02), (3, -0.088), (4, -0.013), (5, -0.004), (6, 0.065), (7, -0.06), (8, 0.027), (9, 0.024), (10, -0.03), (11, -0.08), (12, 0.019), (13, -0.076), (14, 0.065), (15, 0.022), (16, -0.005), (17, 0.003), (18, -0.048), (19, 0.013), (20, -0.009), (21, 0.007), (22, -0.039), (23, 0.104), (24, 0.023), (25, 0.02), (26, 0.114), (27, -0.059), (28, -0.066), (29, -0.03), (30, 0.018), (31, -0.039), (32, -0.03), (33, -0.037), (34, -0.029), (35, 0.019), (36, 0.014), (37, -0.07), (38, 0.017), (39, -0.053), (40, -0.054), (41, -0.032), (42, -0.065), (43, 0.008), (44, -0.041), (45, 0.055), (46, 0.026), (47, 0.03), (48, -0.021), (49, -0.006)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97062498 283 high scalability-2008-03-18-Shared filesystem on EC2

Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files

2 0.76540768 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.

Introduction: I am planning the scaling of a hosted service, similar to typepad etc. and would appreciate feedback on my plan so far. Looking into scaling storage, I have come accross MogileFS and OpenAFS. My concern with these is I am not at all experienced with them and as the sole tech guy I don't want to build something into this hosting service that proves complex to update and adminster. So, I'm thinking of building replication and scalability right into the application, in a similar but simplified way to how MogileFS works (I think). So, for our database table of uploaded files, here's how it currently looks (simplified): fileid (pkey) filename ownerid For adding the replication and scalability, I would add a few more columns: serveroneid servertwoid serverthreeid s3 At the time the user uploads a file, it will go to a specific server (managed by the application) and the id of that server will be placed in the "serverone" column. Then hourly or so, a cro

3 0.74526578 1442 high scalability-2013-04-17-Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS

Introduction: Tachyon   ( github ) is interesting new filesystem brought to by the folks at the UC Berkeley AMP Lab : Tachyon is a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.It offers up to 300 times higher throughput than HDFS, by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that is frequently read. It has a Java-like File API, native support for raw tables, a pluggable file system, and it works with Hadoop with no modifications.   It might work well for streaming media too as you wouldn't have to wait for the complete file to hit the disk before rendering. Discuss on Hacker News

4 0.74037385 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?

Introduction: We are serving dynamic (PHP) websites and their assets (JS/CSS/images/videos/binary downloads) via the same apache hosts. The static files are only being used as origins for CDN services used to distribute those files. Yet, in the current development-deploy pipeline, these files are checked into the same version control repositories as the code is. This is what we would like to change, for several reasons (decouple asset deployment from development & developers, lessen size of code repositories, etc.) My idea is to do the following: Set up a media server (cluster) which serves as an API (REST e.g.). You can PUT files to it, and get back the URL the file is available through from the public. In between input and output, the media service deals with everything that's necessary to serve the files: Upload them to the CDN, create the public URL, write the meta data to a (relational?) database, assign a version number... This API can be used by a) the application/website directly to provid

5 0.71698266 488 high scalability-2009-01-08-file synchronization solutions

Introduction: I have two servers connected via Internet (NOT IN THE SAME LAN) serving the same website (http://www.ourexample.com).The problem is files uploaded on serverA and serverB cannot see each other immediately,thus rsync with certain intervals is not a good solution. Can anybody give me some advice on the following options? 1.NFS over Internet for file sharing 2.sshfs 3.inotify(our system's kernel does not support this and we donot want to risk upgrading our kernel as well) 4.drbd in active-active mode 5 or any other solutions Any suggestions will be welcomed. Thank you in advance.

6 0.71212119 98 high scalability-2007-09-18-Sync data on all servers

7 0.70720041 278 high scalability-2008-03-16-Product: GlusterFS

8 0.67114478 605 high scalability-2009-05-22-Distributed content system with bandwidth balancing

9 0.66428334 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files

10 0.65693408 516 high scalability-2009-02-19-Heavy upload server scalability

11 0.6441642 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option

12 0.64081341 53 high scalability-2007-08-01-Product: MogileFS

13 0.60850286 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

14 0.60608137 50 high scalability-2007-07-31-BerkeleyDB & other distributed high performance key-value databases

15 0.60058755 566 high scalability-2009-04-13-High Performance Web Pages – Real World Examples: Netflix Case Study

16 0.59423435 262 high scalability-2008-02-26-Architecture to Allow High Availability File Upload

17 0.59135836 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System

18 0.59044802 143 high scalability-2007-11-06-Product: ChironFS

19 0.58295405 884 high scalability-2010-08-23-6 Ways to Kill Your Servers - Learning How to Scale the Hard Way

20 0.57941389 1288 high scalability-2012-07-23-Ask HighScalability: How Do I Build My MegaUpload + Itunes + YouTube Startup?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.054), (2, 0.207), (10, 0.018), (61, 0.106), (66, 0.166), (77, 0.017), (79, 0.239), (85, 0.067), (94, 0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95316589 283 high scalability-2008-03-18-Shared filesystem on EC2

Introduction: Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files

2 0.92202842 225 high scalability-2008-01-27-Windows and SQL Server : Receive so much negativity in terms of the Highly Available, Scalable Platform..

Introduction: I remain neutral, but time and again, when people talk Windows or SQL Server, they seem to consider them unreliable with limits around scalability, performance and availability. And then you start looking at some of the big boys you have listed here in the architectural section and most of them are on Linux, MySQL,Oracle platforms that we dont see Windows and SQL Server in there.. What are your thoughts ?

3 0.90421623 375 high scalability-2008-09-01-A Scalability checklist?

Introduction: Hi everyone, I'm researching on Scalability for a college paper, and found this site great, but it has too many tips, articles and the like, but I can't see a hierarchical organization of subjects, I would need something like a checklist of things or fields, or technologies to take into account when assesing scalability. So far I've identified these: - Hardware scalability: - scale out - scale up - Cache What types of cache are there? app-level, os-level, network-level, I/O-level? - Load Balancing - DB Clustering Am I missing something important? (I'm sure I am) I don't expect you to give a lecture here, but maybe point some things out, give me some useful links... Thanks!

4 0.8790251 1048 high scalability-2011-05-27-Stuff The Internet Says On Scalability For May 27, 2011

Introduction: Submitted for your scaling pleasure:  Good idea: Open The Index And Speed Up The Internet . SmugMug estimates 50% of their CPU is spent serving crawler robots. Having a common meta-data repository wouldn't prevent search engines from having their own special sauce. Then the problem becomes one of syncing data between repositories and processing change events. A generous soul could even offer a shared MapReduce service over the data. Now that would speed up the internet . Scaling Achievements: YouTube Sees 3 Billion Views per Day ; Twitter produces a sustained feed of 35 Mb per second ;  companies processing billions of APIs calls  (Twitter, Netflix, Amazon, NPR, Google, Facebook, eBay, Bing);  Astronomers Identify the Farthest Object Ever Observed, 13.14 Billion Light Years Away Quotes that are Quotably Quotable: eekygeeky : When cloud computing news is slow? Switch to "big data"-100% of the vaguery, none of the used-up, mushy marketing feel!

5 0.87833965 912 high scalability-2010-10-01-Google Paper: Large-scale Incremental Processing Using Distributed Transactions and Notifications

Introduction: This paper,  Large-scale Incremental Processing Using Distributed Transactions and Notifications  by Daniel Peng and Frank Dabek, is Google's much anticipated description of Percolator, their new  real-time indexing  system. The abstract: Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.   We have built Percolator, a system f

6 0.86607194 1494 high scalability-2013-07-19-Stuff The Internet Says On Scalability For July 19, 2013

7 0.86559427 1420 high scalability-2013-03-08-Stuff The Internet Says On Scalability For March 8, 2013

8 0.86526334 867 high scalability-2010-07-27-YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World

9 0.86262578 1403 high scalability-2013-02-08-Stuff The Internet Says On Scalability For February 8, 2013

10 0.86075252 786 high scalability-2010-03-02-Using the Ambient Cloud as an Application Runtime

11 0.85921884 871 high scalability-2010-08-04-Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm

12 0.8575623 526 high scalability-2009-03-05-Strategy: In Cloud Computing Systematically Drive Load to the CPU

13 0.85631883 1392 high scalability-2013-01-23-Building Redundant Datacenter Networks is Not For Sissies - Use an Outside WAN Backbone

14 0.85592544 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters

15 0.85556626 1535 high scalability-2013-10-21-Google's Sanjay Ghemawat on What Made Google Google and Great Big Data Career Advice

16 0.85546732 1018 high scalability-2011-04-07-Paper: A Co-Relational Model of Data for Large Shared Data Banks

17 0.85545266 601 high scalability-2009-05-17-Product: Hadoop

18 0.85513902 1485 high scalability-2013-07-01-PRISM: The Amazingly Low Cost of ­Using BigData to Know More About You in Under a Minute

19 0.85448754 1548 high scalability-2013-11-13-Google: Multiplex Multiple Works Loads on Computers to Increase Machine Utilization and Save Money

20 0.84738815 1242 high scalability-2012-05-09-Cell Architectures