high_scalability high_scalability-2008 high_scalability-2008-310 knowledge-graph by maker-knowledge-mining

310 high scalability-2008-04-29-High performance file server


meta infos for this blog

Source: html

Introduction: What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. we have 3 applications working as a pipeline, which process data stored in the NFS drive. The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. The data load to the pipeline is like 1 GBytes per minute. I think the NFS drive is the bottleneck here. Would buying a specialized file server improve the performance of data read write from the disk ?


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. [sent-1, score-0.944]

2 we have 3 applications working as a pipeline, which process data stored in the NFS drive. [sent-2, score-0.537]

3 The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. [sent-3, score-1.806]

4 The data load to the pipeline is like 1 GBytes per minute. [sent-4, score-0.682]

5 Would buying a specialized file server improve the performance of data read write from the disk ? [sent-6, score-0.754]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nfs', 0.638), ('pipeline', 0.453), ('folder', 0.215), ('drive', 0.211), ('debian', 0.191), ('processes', 0.175), ('stored', 0.155), ('output', 0.128), ('buying', 0.128), ('bunch', 0.126), ('specialized', 0.121), ('data', 0.119), ('bottleneck', 0.116), ('previous', 0.114), ('process', 0.107), ('step', 0.104), ('applications', 0.088), ('shared', 0.081), ('improve', 0.08), ('huge', 0.078), ('amount', 0.077), ('disk', 0.068), ('working', 0.068), ('app', 0.067), ('file', 0.065), ('store', 0.063), ('second', 0.062), ('write', 0.059), ('read', 0.049), ('think', 0.047), ('run', 0.045), ('per', 0.045), ('first', 0.044), ('load', 0.043), ('servers', 0.038), ('application', 0.036), ('server', 0.035), ('performance', 0.03), ('like', 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 310 high scalability-2008-04-29-High performance file server

Introduction: What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. we have 3 applications working as a pipeline, which process data stored in the NFS drive. The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. The data load to the pipeline is like 1 GBytes per minute. I think the NFS drive is the bottleneck here. Would buying a specialized file server improve the performance of data read write from the disk ?

2 0.34011191 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

Introduction: I've been trying to find a high availability file storage solution without success. I tried GlusterFS which looks very promising but experienced problems with stability and don't want something I can't easily control and rely on. Other solutions are too complicated or have a SPOF. So I'm thinking of the following setup: Two NFS servers, a primary and a warm backup. The primary server will be rsynced with the warm backup every minute or two. I can do it so frequently as a PHP script will know which directories have changed recently from a database and only rsync those. Both servers will be NFS mounted on a cluster of web servers as /mnt/nfs-primary (sym linked as /home/websites) and /mnt/nfs-backup. I'll then use Ucarp (http://www.ucarp.org/project/ucarp) to monitor both NFS servers availability every couple of seconds and when one goes down, the Ucarp up script will be set to change the symbolic link on all web servers for the /home/websites dir from /mnt/nfs-primary to /mn

3 0.22337461 128 high scalability-2007-10-21-Paper: Standardizing Storage Clusters (with pNFS)

Introduction: pNFS (parallel NFS) is the next generation of NFS and its main claim to fame is that it's clustered, which "enables clients to directly access file data spread over multiple storage servers in parallel. As a result, each client can leverage the full aggregate bandwidth of a clustered storage service at the granularity of an individual file." About pNFS StorageMojo says: pNFS is going to commoditize parallel data access. In 5 years we won’t know how we got along without it . Something to watch.

4 0.16363499 1501 high scalability-2013-08-13-In Memoriam: Lavabit Architecture - Creating a Scalable Email Service

Introduction: With Lavabit shutting down  under murky circumstances , it seems fitting to repost an old (2009), yet still very good post by Ladar Levison on Lavabit's architecture. I don't know how much of this information is still current, but it should give you a general idea what Lavabit was all about. Getting to Know You What is the name of your system and where can we find out more about it? Note: these links are no longer valid... Lavabit http://lavabit.com http://lavabit.com/network.html http://lavabit.com/about.html What is your system for? Lavabit is a mid-sized email service provider. We currently have about 140,000 registered users with more than 260,000 email addresses. While most of our accounts belong to individual users, we also provide corporate email services to approximately 70 companies. Why did you decide to build this system? We built the system to compete against the other large free email providers, with an emphasis on serving the privacy c

5 0.15732104 1618 high scalability-2014-03-24-Big, Small, Hot or Cold - Examples of Robust Data Pipelines from Stripe, Tapad, Etsy and Square

Introduction: This is a  guest repost  by Pete Soderling , Founder at  Hakka Labs , creating a community where software engineers come to grow. In response to a recent post from MongoHQ entitled “ You don’t have big data ," I would generally agree with many of the author’s points. However, regardless of whether you call it big data, small data, hot data or cold data - we are all in a position to admit that *more* data is here to stay - and that’s due to many different factors. Perhaps primarily, as the article mentions, this is due to the decreasing cost of storage over time. Other factors include access to open APIs, the sheer volume of ever-increasing consumer activity online, as well as a plethora of other incentives that are developing (mostly) behind the scenes as companies “share” data with each other. (You know  they do this , right?) But one of the most important things I’ve learned over the past couple of years is that it’s crucial for forward thinking companies to start to design

6 0.14945945 63 high scalability-2007-08-09-Lots of questions for high scalability - high availability

7 0.14555791 1521 high scalability-2013-09-23-Salesforce Architecture - How they Handle 1.3 Billion Transactions a Day

8 0.13215801 516 high scalability-2009-02-19-Heavy upload server scalability

9 0.13061152 68 high scalability-2007-08-20-TypePad Architecture

10 0.11979818 1472 high scalability-2013-06-07-Stuff The Internet Says On Scalability For June 7, 2013

11 0.10919528 1384 high scalability-2013-01-09-The Story of How Turning Disk Into a Service Lead to a Deluge of Density

12 0.10496105 840 high scalability-2010-06-10-The Four Meta Secrets of Scaling at Facebook

13 0.10310745 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector

14 0.10269906 1293 high scalability-2012-07-30-Prismatic Architecture - Using Machine Learning on Social Networks to Figure Out What You Should Read on the Web

15 0.098445348 98 high scalability-2007-09-18-Sync data on all servers

16 0.098423555 1402 high scalability-2013-02-07-Ask HighScalability: Web asset server concept - 3rd party software available?

17 0.098184854 448 high scalability-2008-11-22-Google Architecture

18 0.097609892 411 high scalability-2008-10-14-Implementing the Lustre File System with Sun Storage: High Performance Storage for High Performance Computing

19 0.097152904 150 high scalability-2007-11-12-Slashdot Architecture - How the Old Man of the Internet Learned to Scale

20 0.086157791 959 high scalability-2010-12-17-Stuff the Internet Says on Scalability For December 17th, 2010


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.105), (1, 0.051), (2, -0.014), (3, -0.058), (4, -0.01), (5, 0.052), (6, 0.077), (7, -0.013), (8, 0.009), (9, 0.033), (10, 0.032), (11, -0.013), (12, 0.016), (13, -0.001), (14, 0.038), (15, 0.045), (16, -0.047), (17, 0.026), (18, -0.044), (19, 0.045), (20, 0.004), (21, -0.002), (22, 0.024), (23, 0.013), (24, 0.049), (25, 0.01), (26, 0.032), (27, -0.035), (28, -0.049), (29, 0.036), (30, -0.009), (31, -0.026), (32, 0.032), (33, 0.0), (34, -0.035), (35, 0.021), (36, 0.037), (37, -0.03), (38, 0.023), (39, -0.005), (40, -0.009), (41, -0.093), (42, -0.049), (43, 0.015), (44, 0.064), (45, 0.044), (46, 0.013), (47, -0.024), (48, 0.002), (49, 0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90681463 310 high scalability-2008-04-29-High performance file server

Introduction: What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. we have 3 applications working as a pipeline, which process data stored in the NFS drive. The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. The data load to the pipeline is like 1 GBytes per minute. I think the NFS drive is the bottleneck here. Would buying a specialized file server improve the performance of data read write from the disk ?

2 0.767223 128 high scalability-2007-10-21-Paper: Standardizing Storage Clusters (with pNFS)

Introduction: pNFS (parallel NFS) is the next generation of NFS and its main claim to fame is that it's clustered, which "enables clients to directly access file data spread over multiple storage servers in parallel. As a result, each client can leverage the full aggregate bandwidth of a clustered storage service at the granularity of an individual file." About pNFS StorageMojo says: pNFS is going to commoditize parallel data access. In 5 years we won’t know how we got along without it . Something to watch.

3 0.72221708 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases

Introduction: This is the third guest post ( part 1 , part 2 ) of a series by Greg Lindahl, CTO of blekko, the spam free search engine. Previously, Greg was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters. blekko's home-grown NoSQL database was designed from the start to support a web-scale search engine, with 1,000s of servers and petabytes of disk. Data replication is a very important part of keeping the database up and serving queries. Like many NoSQL database authors, we decided to keep R=3 copies of each piece of data in the database, and not use RAID to improve reliability. The key goal we were shooting for was a database which degrades gracefully when there are many small failures over time, without needing human intervention. Why don't we like RAID for big NoSQL databases? Most big storage systems use RAID levels like 3, 4, 5, or 10 to improve relia

4 0.70270878 103 high scalability-2007-09-28-Kosmos File System (KFS) is a New High End Google File System Option

Introduction: There's a new clustered file system on the spindle: Kosmos File System (KFS) . Thanks to Rich Skrenta for turning me on to KFS and I think his blog post says it all. KFS is an open source project written in C++ by search startup Kosmix . The team members have a good pedigree so there's a better than average chance this software will be worth considering. After you stop trying to turn KFS into "Kentucky Fried File System" in your mind, take a look at KFS' intriguing feature set: Incremental scalability: New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes. Availability: Replication is used to provide availability due to chunk server failures. Typically, files are replicated 3-way. Per file degree of replication: The degree of replication is configurable on a per file basis, with a max. limit of 64. Re-replication: Whenever the degree of replication for a file drops below the configured amount (

5 0.69900656 516 high scalability-2009-02-19-Heavy upload server scalability

Introduction: Hi, We are running a backup solution that uploads every night the files our clients worked on during the day (Cabonite-like). We have currently about 10GB of data per night, via http PUT requests (1 per file), and the files are written as-is on a NAS. Our architecture is basically compound of a load balancer (hardware, sticky sessions), 5 servers (Tomcat under RHEL4/5, ) and a NAS (nfs 3). Since our number of clients is rising, (as is our system load) how would you recommend we could scale our infrastructure? hardware and software? Should we go towards NAS sharding, more servers, NIO on tomcat...? Thanks for your inputs!

6 0.6859656 98 high scalability-2007-09-18-Sync data on all servers

7 0.6836161 1035 high scalability-2011-05-05-Paper: A Study of Practical Deduplication

8 0.66181582 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication

9 0.66157222 666 high scalability-2009-07-30-Learn How to Think at Scale

10 0.66153324 448 high scalability-2008-11-22-Google Architecture

11 0.65987116 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops

12 0.65412629 558 high scalability-2009-04-06-How do you monitor the performance of your cluster?

13 0.65405226 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

14 0.65034759 492 high scalability-2009-01-16-Database Sharding for startups

15 0.64998037 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector

16 0.63498276 1104 high scalability-2011-08-25-Colmux - Finding Memory Leaks, High I-O Wait Times, and Hotness on 3000 Node Clusters

17 0.63488561 112 high scalability-2007-10-04-You Can Now Store All Your Stuff on Your Own Google Like File System

18 0.63231593 716 high scalability-2009-10-06-Building a Unique Data Warehouse

19 0.62859529 748 high scalability-2009-11-30-Why Existing Databases (RAC) are So Breakable!

20 0.62578785 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.26), (2, 0.205), (10, 0.079), (30, 0.16), (49, 0.042), (85, 0.033), (94, 0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96177906 319 high scalability-2008-05-14-Scaling an image upload service

Introduction: Hi, First of all I want to to say that this is an extremely interesting and informative website. i have enjoyed reading the various posts on how the big sites scale to meet the needs of their customers. The service we are developing is a webcam service. The client application sends images to the server via HTTP POST and they are saved in folder specified by the users id. When a new image is sent to the server it will overwrite the current image. Users can then view the images via our web server. Ideally we want the images to upload as quickly as possible and allow users to view them as quickly as possible. Would I be correct to assume that when the number of uploading clients exceeds the capability of the server the only way to scale is to add more hardware. Also I assume that to use HTTP accelerator caches will not speed up viewing the images as the new images will invalidate the cache. I appreciate any input on the subject.

same-blog 2 0.95063663 310 high scalability-2008-04-29-High performance file server

Introduction: What have bunch of applications which run on Debian servers, which processes huge amount of data stored in a shared NFS drive. we have 3 applications working as a pipeline, which process data stored in the NFS drive. The first application processes the data and store the output in some folder in the NFS drive, the second app in the pipeline process the data from the previous step and so on. The data load to the pipeline is like 1 GBytes per minute. I think the NFS drive is the bottleneck here. Would buying a specialized file server improve the performance of data read write from the disk ?

3 0.94195247 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users

Introduction: This is a guest post by Dan Bartow, VP of SOASTA , talking about how they pelted MySpace with 1 million concurrent users using 800 EC2 instances. I thought this was an interesting story because: that's a lot of users, it takes big cajones to test your live site like that, and not everything worked out quite as expected. I'd like to thank Dan for taking the time to write and share this article. In December of 2009 MySpace launched a new wave of streaming music video offerings in New Zealand, building on the previous success of MySpace music.  These new features included the ability to watch music videos, search for artist’s videos, create lists of favorites, and more. The anticipated load increase from a feature like this on a popular site like MySpace is huge, and they wanted to test these features before making them live.   If you manage the infrastructure that sits behind a high traffic application you don’t want any surprises.  You want to understand your breakin

4 0.93417883 1081 high scalability-2011-07-18-Building your own Facebook Realtime Analytics System

Introduction: Recently, I was reading Todd Hoff's write-up on  FaceBook real time analytics system . As usual, Todd did an excellent job in summarizing  this video  from Engineering Manager at Facebook  Alex Himel . In the first post , I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. References Real Time analytics for Big Data: Facebook's New Realtime Analytics System Real Time Analytics for Big Data: An Alternative Approach

5 0.93053162 334 high scalability-2008-05-29-Amazon Improves Diagonal Scaling Support with High-CPU Instances

Introduction: Now you can buy more cores on EC2 without adding more machines: The High-CPU Medium Instance is billed at $0.20 (20 cents) per hour. It features 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units Each), and 350 GB of instance storage, all on a 32-bit platform. The High-CPU Extra Large Instance is billed at $0.80 (80 cents) per hour. It features 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), and 1,690 GB of instance storage, all on a 64-bit platform. Diagonal Scaling is making a site faster by removing machines. More on this intriguing idea in Diagonal Scaling - Don't Forget to Scale Out AND Up .

6 0.92239058 263 high scalability-2008-02-27-Product: System Imager - Automate Deployment and Installs

7 0.91189611 699 high scalability-2009-09-10-How to handle so many socket connection

8 0.90968388 783 high scalability-2010-02-24-Hot Scalability Links for February 24, 2010

9 0.90406835 1284 high scalability-2012-07-16-Cinchcast Architecture - Producing 1,500 Hours of Audio Every Day

10 0.90180129 16 high scalability-2007-07-16-Book: High Performance MySQL

11 0.89732724 1082 high scalability-2011-07-18-New Relic Architecture - Collecting 20+ Billion Metrics a Day

12 0.89658618 182 high scalability-2007-12-12-Oracle Can Do Read-Write Splitting Too

13 0.88927054 748 high scalability-2009-11-30-Why Existing Databases (RAC) are So Breakable!

14 0.8865155 308 high scalability-2008-04-22-Simple NFS failover solution with symbolic link?

15 0.88499695 1618 high scalability-2014-03-24-Big, Small, Hot or Cold - Examples of Robust Data Pipelines from Stripe, Tapad, Etsy and Square

16 0.88131976 1072 high scalability-2011-07-01-TripAdvisor Strategy: No Architects, Engineers Work Across the Entire Stack

17 0.88096362 500 high scalability-2009-01-22-Heterogeneous vs. Homogeneous System Architectures

18 0.88000637 535 high scalability-2009-03-12-Paper: Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

19 0.87976915 407 high scalability-2008-10-10-The Art of Capacity Planning: Scaling Web Resources

20 0.87918758 913 high scalability-2010-10-01-Hot Scalability Links For Oct 1, 2010