high_scalability high_scalability-2012 high_scalability-2012-1276 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is a guest post by Douglas Wilson, EMEA Field Application Engineer at Raima, based on insights from biulding their Raima Database Manager . Scalability and Hardware Scalability is the ability to maintain performance as demands on the system increase, by adding further resources. Normally those resources will be in the form of hardware. Since processor speeds are no longer increasing much, scaling up the hardware normally means adding extra processors or cores, and more memory. Scalability and Software However, scalability requires software that can utilize the extra hardware effectively. The software must be designed to allow parallel processing. In the context of a database engine this means that the server component must be multi-threaded , to allow the operating system to schedule parallel tasks on all the cores that are available. Not only that, but the database engine must provide an efficient way to break its workload into as many parallel tasks as there
sentIndex sentText sentNum sentScore
1 Since processor speeds are no longer increasing much, scaling up the hardware normally means adding extra processors or cores, and more memory. [sent-4, score-0.527]
2 The software must be designed to allow parallel processing. [sent-6, score-0.4]
3 In the context of a database engine this means that the server component must be multi-threaded , to allow the operating system to schedule parallel tasks on all the cores that are available. [sent-7, score-1.008]
4 Not only that, but the database engine must provide an efficient way to break its workload into as many parallel tasks as there are cores. [sent-8, score-0.726]
5 So, for example, if the database server always uses only four threads then it will make very little difference whether this server runs on a four-core machine or an eight-core machine. [sent-9, score-0.507]
6 Distributed Design Splitting up the workload of a database engine to take full advantage of the available hardware is non-trivial, and not all data management systems do this well. [sent-10, score-0.618]
7 B-Trees allow indexed values to be located quickly, and they also allow relatively efficient insertion and deletion, but for this they need to be “balanced”, i. [sent-15, score-0.718]
8 the tree structure must have the same depth across all its branches. [sent-17, score-0.322]
9 The need to keep the tree balanced means that a single insertion or deletion may trigger changes that ripple all the way to the root of the tree. [sent-18, score-0.786]
10 This makes it difficult to share the management of a B-Tree between multiple threads, and therefore between multiple cores. [sent-19, score-0.403]
11 The threads may frequently compete for access to the root of the tree, which becomes a bottleneck . [sent-20, score-0.473]
12 Minimize Shared Resources Scalability is all about minimizing the number of such shared resources, so that different threads can run on different cores without ever having to wait for each other to release shared resources. [sent-21, score-0.583]
13 Without this independence, adding extra cores will not greatly improve performance . [sent-22, score-0.396]
14 RDM has intelligent support for distributed databases and allows the application to distribute data across the available hardware and minimize contention between different threads and processes. [sent-24, score-0.526]
15 Since this objective cannot usually be achieved without knowledge of the data structure and use cases, the database engine must allow the application writer to specify which data will be handled by which server. [sent-25, score-0.945]
16 RDM’s server process (called the Transactional File Server, or TFS) is a relatively lightweight process and multiple instances of the TFS can run on multi-core systems; each assigned to different databases, and completely independent of each other. [sent-26, score-0.36]
17 RDM therefore provides building blocks that allow the creation of a truly scalable application. [sent-28, score-0.305]
18 Multi Version Concurrency Control - Simultaneous Access In situations where simultaneous read and write access to the same data is required, support for Multi Version Concurrency Control (MVCC) allows this simultaneous access without blocking the threads or processes involved. [sent-29, score-0.945]
19 It may also help by reducing the number of processes trying to access the master database. [sent-33, score-0.363]
20 Replication – the client side can retrieve data from a master database or a replicated slave database – this is transparent to the application. [sent-36, score-0.559]
wordName wordTfidf (topN-words)
[('rdm', 0.318), ('threads', 0.224), ('tfs', 0.212), ('allow', 0.206), ('cores', 0.183), ('simultaneous', 0.154), ('deletion', 0.153), ('engine', 0.147), ('mvcc', 0.146), ('insertion', 0.144), ('tree', 0.141), ('database', 0.129), ('processors', 0.127), ('writer', 0.124), ('extra', 0.123), ('retrieve', 0.117), ('must', 0.113), ('multiple', 0.108), ('lightweight', 0.105), ('master', 0.105), ('normally', 0.104), ('balanced', 0.099), ('therefore', 0.099), ('raima', 0.096), ('processes', 0.095), ('workload', 0.092), ('efficient', 0.092), ('emea', 0.09), ('adding', 0.09), ('management', 0.088), ('shared', 0.088), ('connect', 0.086), ('root', 0.086), ('hardware', 0.083), ('ripple', 0.083), ('access', 0.083), ('parallel', 0.081), ('douglas', 0.08), ('may', 0.08), ('data', 0.079), ('resources', 0.078), ('server', 0.077), ('allows', 0.073), ('concurrency', 0.073), ('tasks', 0.072), ('wilson', 0.071), ('relatively', 0.07), ('multi', 0.068), ('structure', 0.068), ('databases', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 1276 high scalability-2012-07-04-Top Features of a Scalable Database
Introduction: This is a guest post by Douglas Wilson, EMEA Field Application Engineer at Raima, based on insights from biulding their Raima Database Manager . Scalability and Hardware Scalability is the ability to maintain performance as demands on the system increase, by adding further resources. Normally those resources will be in the form of hardware. Since processor speeds are no longer increasing much, scaling up the hardware normally means adding extra processors or cores, and more memory. Scalability and Software However, scalability requires software that can utilize the extra hardware effectively. The software must be designed to allow parallel processing. In the context of a database engine this means that the server component must be multi-threaded , to allow the operating system to schedule parallel tasks on all the cores that are available. Not only that, but the database engine must provide an efficient way to break its workload into as many parallel tasks as there
2 0.1983602 636 high scalability-2009-06-23-Learn How to Exploit Multiple Cores for Better Performance and Scalability
Introduction: InfoQueue has this excellent talk by Brian Goetz on the new features being added to Java SE 7 that will allow programmers to fully exploit our massively multi-processor future. While the talk is about Java it's really more general than that and there's a lot to learn here for everyone. Brian starts with a short, coherent, and compelling explanation of why programmers can't expect to be saved by ever faster CPUs and why we must learn to exploit the strengths of multiple core computers to make our software go faster. Some techniques for exploiting multiple cores are given in an equally short, coherent, and compelling explanation of why divide and conquer as the secret to multi-core bliss, fork-join, how the Java approach differs from map-reduce, and lots of other juicy topics. The multi-core "problem" is only going to get worse. Tilera founder Anant Agarwal estimates by 2017 embedded processors could have 4,096 cores, server CPUs might have 512 cores and desktop chips could use
3 0.17924938 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
Introduction: We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point will be soon. Let's take a short trip down web architecture lane: It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database It's 1995: Scale-up the database. It's 1998: LAMP It's 1999: Stateless + Load Balanced + Database + SAN It's 2001: In-memory data-grid. It's 2003: Add a caching layer. It's 2004: Add scale-out and partitioning. It's 2005: Add asynchronous job scheduling and maybe a distributed file system. It's 2007: Move it all into the cloud. It's 2008: C
4 0.16296925 1508 high scalability-2013-08-28-Sean Hull's 20 Biggest Bottlenecks that Reduce and Slow Down Scalability
Introduction: This article is a lightly edited version of 20 Obstacles to Scalability by Sean Hull ( with permission) from the always excellent and thought provoking ACM Queue . 1. TWO-PHASE COMMIT Normally when data is changed in a database, it is written both to memory and to disk. When a commit happens, a relational database makes a commitment to freeze the data somewhere on real storage media. Remember, memory doesn't survive a crash or reboot. Even if the data is cached in memory, the database still has to write it to disk. MySQL binary logs or Oracle redo logs fit the bill. With a MySQL cluster or distributed file system such as DRBD (Distributed Replicated Block Device) or Amazon Multi-AZ (Multi-Availability Zone), a commit occurs not only locally, but also at the remote end. A two-phase commit means waiting for an acknowledgment from the far end. Because of network and other latency, those commits can be slowed down by milliseconds, as though all the cars on a highway were slowe
Introduction: This is a guestrepostby Ron Pressler, the founder and CEO ofParallel Universe, a Y Combinator company building advanced middleware for real-time applications. Little's Law helps us determine the maximum request rate a server can handle. When we apply it, we find that the dominating factor limiting a server's capacity is not the hardware but theOS.Should we buy more hardware if software is the problem? If not, how can we remove that software limitation in a way that does not make the code much harder to write and understand?Many modern web applications are composed of multiple (often many)HTTPservices (this is often called a micro-service architecture). This architecture has many advantages in terms of code reuse and maintainability, scalability and fault tolerance. In this post I'd like to examine one particular bottleneck in the approach, which hinders scalability as well as fault tolerance, and various ways to deal with it (I am using the term "scalability" very loosely in this post
6 0.14967911 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
7 0.14546233 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
8 0.13723591 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
10 0.1335714 448 high scalability-2008-11-22-Google Architecture
11 0.12920216 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
12 0.12912963 1204 high scalability-2012-03-06-Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
13 0.1291243 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
15 0.12864824 511 high scalability-2009-02-12-MySpace Architecture
16 0.12731676 961 high scalability-2010-12-21-SQL + NoSQL = Yes !
17 0.12280226 317 high scalability-2008-05-10-Hitting 300 SimbleDB Requests Per Second on a Small EC2 Instance
18 0.12219626 687 high scalability-2009-08-24-How Google Serves Data from Multiple Datacenters
19 0.12142841 1236 high scalability-2012-04-30-Masstree - Much Faster than MongoDB, VoltDB, Redis, and Competitive with Memcached
20 0.12054147 1319 high scalability-2012-09-10-Russ’ 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware
topicId topicWeight
[(0, 0.242), (1, 0.11), (2, -0.028), (3, -0.012), (4, -0.029), (5, 0.095), (6, 0.098), (7, -0.039), (8, -0.12), (9, -0.029), (10, 0.013), (11, 0.027), (12, -0.02), (13, -0.014), (14, 0.02), (15, -0.015), (16, 0.003), (17, 0.006), (18, 0.032), (19, 0.05), (20, 0.013), (21, -0.041), (22, -0.03), (23, -0.023), (24, 0.071), (25, -0.038), (26, -0.011), (27, -0.037), (28, 0.056), (29, 0.02), (30, -0.01), (31, 0.042), (32, 0.006), (33, -0.004), (34, -0.002), (35, -0.033), (36, 0.011), (37, -0.021), (38, 0.049), (39, 0.042), (40, -0.015), (41, -0.052), (42, -0.044), (43, -0.004), (44, 0.004), (45, 0.016), (46, -0.032), (47, 0.005), (48, 0.042), (49, 0.102)]
simIndex simValue blogId blogTitle
same-blog 1 0.95416534 1276 high scalability-2012-07-04-Top Features of a Scalable Database
Introduction: This is a guest post by Douglas Wilson, EMEA Field Application Engineer at Raima, based on insights from biulding their Raima Database Manager . Scalability and Hardware Scalability is the ability to maintain performance as demands on the system increase, by adding further resources. Normally those resources will be in the form of hardware. Since processor speeds are no longer increasing much, scaling up the hardware normally means adding extra processors or cores, and more memory. Scalability and Software However, scalability requires software that can utilize the extra hardware effectively. The software must be designed to allow parallel processing. In the context of a database engine this means that the server component must be multi-threaded , to allow the operating system to schedule parallel tasks on all the cores that are available. Not only that, but the database engine must provide an efficient way to break its workload into as many parallel tasks as there
2 0.85001063 1553 high scalability-2013-11-25-How To Make an Infinitely Scalable Relational Database Management System (RDBMS)
Introduction: This is a guest post by Mark Travis , Founder of InfiniSQL . InfiniSQL is the specific "Infinitely Scalable RDBMS" to which the title refers. It is free software, and instructions for getting, building, running and testing it are available in the guide . Benchmarking shows that an InfiniSQL cluster can handle over 500,000 complex transactions per second with over 100,000 simultaneous connections, all on twelve small servers. The methods used to test are documented, and the code is all available so that any practitioner can achieve similar results. There are two main characteristics which make InfiniSQL extraordinary: It performs transactions with records on multiple nodes better than any clustered/distributed RDBMS It is free, open source. Not just a teaser "community" version with the good stuff proprietary. The community version of InfiniSQL will also be the enterprise version, when it is ready. InfiniSQL is still in early stages of development--it already has m
Introduction: This a guest post by Rajkumar Iyer , a Member of Technical Staff at Aerospike. About a year ago, Aerospike embarked upon a quest to increase in-memory database performance - 1 Million TPS on a single inexpensive commodity server. NoSQL has the reputation of speed, and we saw great benefit from improving latency and throughput of cacheless architectures. At that time, we took a version of Aerospike delivering about 200K TPS, improved a few things - performance went to 500k TPS - and published the Aerospike 2.0 Community Edition. We then used kernel tuning techniques and published the recipe for how we achieved 1 M TPS on $5k of hardware. This year we continued the quest. Our goal was to achieve 1 Million database transactions per second per server; more than doubling previous performance. This compares to Cassandra’s boast of 1M TPS on over 300 servers in Google Compute Engine - at a cost of $2 million dollars per year. We achieved this without kernel tuning. This article d
4 0.77800047 1299 high scalability-2012-08-06-Paper: High-Performance Concurrency Control Mechanisms for Main-Memory Databases
Introduction: If you stayed up all night watching the life reaffirming Curiosity landing on Mars , then this paper, High-Performance Concurrency Control Mechanisms for Main-Memory Databases , has nothing to do with that at all, but it is an excellent look at how to use optimistic MVCC schemes to reduce lock overhead on in-memory datastructures: A database system optimized for in-memory storage can support much higher transaction rates than current systems. However, standard concurrency control methods used today do not scale to the high transaction rates achievable by such systems. In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. Both use multiversioning to isolate read-only transactions from updates but differ in how atomicity is ensured: one is optimistic and one is pessimistic. To avoid expensive context switching, transactions never block during normal processing but they may have to wait before commit to ensure corr
5 0.77747512 484 high scalability-2009-01-05-Lessons Learned at 208K: Towards Debugging Millions of Cores
Introduction: How do we debug and profile a cloud full of processors and threads? It's a problem more will be seeing as we code big scary programs that run on even bigger scarier clouds. Logging gets you far, but sometimes finding the root cause of problem requires delving deep into a program's execution. I don't know about you, but setting up 200,000+ gdb instances doesn't sound all that appealing. Tools like STAT (Stack Trace Analysis Tool) are being developed to help with this huge task. STAT "gathers and merges stack traces from a parallel application’s processes." So STAT isn't a low level debugger, but it will help you find the needle in a million haystacks. Abstract: Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large paralle
6 0.76637191 636 high scalability-2009-06-23-Learn How to Exploit Multiple Cores for Better Performance and Scalability
7 0.76566529 398 high scalability-2008-09-30-Scalability Worst Practices
8 0.76083547 983 high scalability-2011-02-02-Piccolo - Building Distributed Programs that are 11x Faster than Hadoop
9 0.75778711 958 high scalability-2010-12-16-7 Design Patterns for Almost-infinite Scalability
10 0.75579125 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)
11 0.75098062 1580 high scalability-2014-01-15-Vedis - An Embedded Implementation of Redis Supporting Terabyte Sized Databases
12 0.74711061 1319 high scalability-2012-09-10-Russ’ 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware
13 0.74637091 350 high scalability-2008-07-15-ZooKeeper - A Reliable, Scalable Distributed Coordination System
14 0.73504233 1591 high scalability-2014-02-05-Little’s Law, Scalability and Fault Tolerance: The OS is your bottleneck. What you can do?
15 0.73494846 609 high scalability-2009-05-28-Scaling PostgreSQL using CUDA
16 0.73455811 1454 high scalability-2013-05-08-Typesafe Interview: Scala + Akka is an IaaS for Your Process Architecture
17 0.72989529 1487 high scalability-2013-07-05-Stuff The Internet Says On Scalability For July 5, 2013
18 0.72918397 1425 high scalability-2013-03-18-Beyond Threads and Callbacks - Application Architecture Pros and Cons
19 0.72696346 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
20 0.72269082 236 high scalability-2008-02-03-Ideas on how to scale a shared inventory database???
topicId topicWeight
[(1, 0.17), (2, 0.269), (10, 0.054), (30, 0.015), (40, 0.017), (47, 0.014), (54, 0.09), (56, 0.012), (61, 0.074), (79, 0.149), (85, 0.017), (94, 0.04)]
simIndex simValue blogId blogTitle
1 0.98114944 1430 high scalability-2013-03-27-The Changing Face of Scale - The Downside of Scaling in the Contextual Age
Introduction: Robert Scoble is a kind of Brothers Grimm for the digital age. Instead of inspired romantics walking around the country side collecting the folk tales of past ages, he is an inspired technologist documenting the current mythology of startups. One of the developments Robert is exploring is the rise of the contextual age . Where every bit of information about you is continually being prodded, pulled, and observed, shoveled into a great learning machine, and turned into a fully actionable knowledge graph of context. A digital identity more real to software than your physical body ever was. Sinner or saviour, the Age of Context has interesting implications for startups. It raises the entrance bar to dizzying heights. Much of the reason companies are tearing down the Golden Age of the Web, one open protocol at a time, is to create a walled garden of monopolized information. To operate in this world you will have to somehow create a walled garden of your own. And it will be a damn
same-blog 2 0.97725689 1276 high scalability-2012-07-04-Top Features of a Scalable Database
Introduction: This is a guest post by Douglas Wilson, EMEA Field Application Engineer at Raima, based on insights from biulding their Raima Database Manager . Scalability and Hardware Scalability is the ability to maintain performance as demands on the system increase, by adding further resources. Normally those resources will be in the form of hardware. Since processor speeds are no longer increasing much, scaling up the hardware normally means adding extra processors or cores, and more memory. Scalability and Software However, scalability requires software that can utilize the extra hardware effectively. The software must be designed to allow parallel processing. In the context of a database engine this means that the server component must be multi-threaded , to allow the operating system to schedule parallel tasks on all the cores that are available. Not only that, but the database engine must provide an efficient way to break its workload into as many parallel tasks as there
Introduction: Most every programmer who gets sucked into deep performance analysis for long running processes eventually realizes memory allocation is the heart of evil at the center of many of their problems. So you replace malloc with something less worse. Or you tune your garbage collector like a fine ukulele. But there's a smarter approach brought to you from the folks at RAMCloud , a Stanford University production, which is a large scale, distributed, in-memory key-value database. What they've found is that typical memory management approaches don't work and using a log structured approach yields massive benefits: Performance measurements of log-structured memory in RAMCloud show that it enables high client through- put at 80-90% memory utilization, even with artificially stressful workloads. In the most stressful workload, a single RAMCloud server can support 270,000-410,000 durable 100-byte writes per second at 90% memory utilization. The two-level approach to cleaning improves perform
4 0.97023726 1118 high scalability-2011-09-19-Big Iron Returns with BigMemory
Introduction: This is a guest post by Greg Luck Founder and CTO, Ehcache Terracotta Inc. Note: this article contains a bit too much of a product pitch, but the points are still generally valid and useful. The legendary Moore’s Law, which states that the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years, has held true since 1965. It follows that integrated circuits will continue to get smaller, with chip fabrication currently at a minuscule 22nm process (1). Users of big iron hardware, or servers that are dense in terms of CPU power and memory capacity, benefit from this trend as their hardware becomes cheaper and more powerful over time. At some point soon, however, density limits imposed by quantum mechanics will preclude further density increases. At the same time, low-cost commodity hardware influences enterprise architects to scale their applications horizontally, where processing is spread across clusters of l
5 0.96555811 327 high scalability-2008-05-27-How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
Introduction: Update 3 : ReadWriteWeb says Google App Engine Announces New Pricing Plans, APIs, Open Access . Pricing is specified but I'm not sure what to make of it yet. An image manipulation library is added (thus the need to pay for more CPU :-) and memcached support has been added. Memcached will help resolve the can't write for every read problem that pops up when keeping counters. Update 2 : onGWT.com threw a GAE load party and a lot of people came. The results at Load test : Google App Engine = 1, Community = 0 . GAE handled a peak of 35 requests/second and a sustained 10 requests/second. Some think performance was good, others not so good. My GMT watch broke and I was late to arrive. Maybe next time. Also added a few new design rules from the post. Update : Added a few new rules gleaned from the GAE Meetup : Design By Explicit Cost Model and Puts are Precious. How do you structure your database using a distributed hash table like BigTable ? The answer isn't what you might expect. If
6 0.96521002 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
7 0.96465808 1304 high scalability-2012-08-14-MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) and Familiar (SQL)
9 0.96366853 274 high scalability-2008-03-12-YouTube Architecture
10 0.96333635 1011 high scalability-2011-03-25-Did the Microsoft Stack Kill MySpace?
11 0.96323729 1186 high scalability-2012-02-02-The Data-Scope Project - 6PB storage, 500GBytes-sec sequential IO, 20M IOPS, 130TFlops
12 0.96321684 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
13 0.96272439 920 high scalability-2010-10-15-Troubles with Sharding - What can we learn from the Foursquare Incident?
14 0.96265739 837 high scalability-2010-06-07-Six Ways Twitter May Reach its Big Hairy Audacious Goal of One Billion Users
15 0.96258622 1470 high scalability-2013-06-05-A Simple 6 Step Transition Guide for Moving Away from X to AWS
16 0.96255177 1209 high scalability-2012-03-14-The Azure Outage: Time Is a SPOF, Leap Day Doubly So
17 0.96254349 1345 high scalability-2012-10-22-Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale
19 0.96220481 1479 high scalability-2013-06-21-Stuff The Internet Says On Scalability For June 21, 2013
20 0.96171516 1502 high scalability-2013-08-16-Stuff The Internet Says On Scalability For August 16, 2013