high_scalability high_scalability-2011 high_scalability-2011-1161 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Constructing a scalable risk analysis solution is a fascinating architectural challenge. If you come from Financial Services you are sure to appreciate that. But even architects from other domains are bound to find the challenges fascinating, and the architectural patterns of my suggested solution highly useful in other domains. Recently I held an interesting webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on experience gathered with Financial Services customers. Seeing the vast interest in the webinar, I would like to share the highlights with you here. From an architectural point of view, risk analysis is a data-intensive and a compute-intensive process, which also has an elaborate orchestration logic. volumes in this domain are massive and ever-increasing, together with an ever-increasing demand to reduce response time. These trends are aggravated by global financial regulatory reforms set following the late-2000s
sentIndex sentText sentNum sentScore
1 Constructing a scalable risk analysis solution is a fascinating architectural challenge. [sent-1, score-0.699]
2 But even architects from other domains are bound to find the challenges fascinating, and the architectural patterns of my suggested solution highly useful in other domains. [sent-3, score-0.378]
3 Recently I held an interesting webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on experience gathered with Financial Services customers. [sent-4, score-0.547]
4 From an architectural point of view, risk analysis is a data-intensive and a compute-intensive process, which also has an elaborate orchestration logic. [sent-6, score-0.84]
5 volumes in this domain are massive and ever-increasing, together with an ever-increasing demand to reduce response time. [sent-7, score-0.073]
6 These trends are aggravated by global financial regulatory reforms set following the late-2000s financial crisis, which mandate reducing exposure to risk by shortening risk settlement cycles. [sent-8, score-1.452]
7 Traditional architectures were based on overnight batch processing using compute grids and relational databases. [sent-9, score-0.125]
8 But can the traditional architecture meet the new demand for near-real-time processing? [sent-10, score-0.247]
9 Constructing a massively-scalable near-realtime risk analysis solution requires a new architectural approach. [sent-12, score-0.72]
10 it's important to realize that the intraday data changes very frequently but is of limited volume, whereas historical data changes much less frequently but is of much higher volumes. [sent-14, score-0.862]
11 Good architecture should accomodate for these inherent differences, employing a multi-tiered architecture with in-memory data grid for intraday data and NoSQL database for historical data, and a processing layer to unify the two datastores, making them look as one for querying purposes. [sent-15, score-1.31]
12 Another challenge in risk calculations is streaming the results back to the clients as they arrive, and also to dispatch ticks and other events, which arrive at high rate, back to UI. [sent-16, score-0.777]
13 I find Event Driven Architecture (EDA) is highly suitable for handling these use cases. [sent-17, score-0.117]
14 Supporting asynchronous data fetch and having the ability to treat data mutation as an event that can be dispatched are some of the characteristics that I'd be looking for when implementing such architectures. [sent-18, score-0.659]
15 The risk calculations are usually accompnied by ETL pre-processing of the data for aligning data format to industry standards, and post-processing of calculation results for result aggregation. [sent-19, score-0.851]
16 This pre-processing and post-processing logic should happen very efficiently given the high rate of data streamed. [sent-20, score-0.322]
17 Remote invocation of this logic on the data is too cumbersome. [sent-21, score-0.409]
18 Having the ability to execute this logic co-located with the data, preferably on the very same VM, is what I'm looking for in such architectures. [sent-22, score-0.505]
19 For such challenges I find Elastic Application Platforms to be a suitable tool, providing both the in-memory data grid and the ability to execute business logic and messaging co-located with the data, having it all scalable via sharding the data and HA via redundant synchronous replicas. [sent-23, score-1.008]
20 For my implementation I used GigaSpaces XAP, which in addition to all of that provided me easy integration with the back-end NoSQL database of the historical data, which made it easy to host the end-to-end big-data solution as one cohesive solution. [sent-24, score-0.266]
wordName wordTfidf (topN-words)
[('risk', 0.313), ('orchestration', 0.227), ('intraday', 0.215), ('financial', 0.199), ('architectural', 0.172), ('logic', 0.169), ('historical', 0.159), ('preferably', 0.155), ('data', 0.153), ('grid', 0.136), ('arrive', 0.131), ('analysis', 0.128), ('calculation', 0.125), ('processing', 0.125), ('suitable', 0.117), ('solution', 0.107), ('calculations', 0.107), ('webinar', 0.106), ('architecture', 0.1), ('challenges', 0.099), ('reforms', 0.097), ('cumbersome', 0.097), ('mutation', 0.097), ('intersect', 0.092), ('settlement', 0.092), ('predefined', 0.092), ('frequently', 0.091), ('execute', 0.091), ('ability', 0.09), ('onboard', 0.087), ('unify', 0.087), ('invocation', 0.087), ('fascinating', 0.086), ('dispatched', 0.084), ('mandate', 0.084), ('remote', 0.082), ('implementing', 0.082), ('formatting', 0.082), ('employing', 0.082), ('constructing', 0.082), ('regulatory', 0.082), ('ticks', 0.082), ('datastores', 0.079), ('meet', 0.074), ('challenge', 0.073), ('demand', 0.073), ('exposure', 0.073), ('crisis', 0.073), ('dispatch', 0.071), ('xap', 0.071)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000004 1161 high scalability-2011-12-22-Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions
Introduction: Constructing a scalable risk analysis solution is a fascinating architectural challenge. If you come from Financial Services you are sure to appreciate that. But even architects from other domains are bound to find the challenges fascinating, and the architectural patterns of my suggested solution highly useful in other domains. Recently I held an interesting webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on experience gathered with Financial Services customers. Seeing the vast interest in the webinar, I would like to share the highlights with you here. From an architectural point of view, risk analysis is a data-intensive and a compute-intensive process, which also has an elaborate orchestration logic. volumes in this domain are massive and ever-increasing, together with an ever-increasing demand to reduce response time. These trends are aggravated by global financial regulatory reforms set following the late-2000s
2 0.14044406 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
Introduction: We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point will be soon. Let's take a short trip down web architecture lane: It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database It's 1995: Scale-up the database. It's 1998: LAMP It's 1999: Stateless + Load Balanced + Database + SAN It's 2001: In-memory data-grid. It's 2003: Add a caching layer. It's 2004: Add scale-out and partitioning. It's 2005: Add asynchronous job scheduling and maybe a distributed file system. It's 2007: Move it all into the cloud. It's 2008: C
3 0.13669632 395 high scalability-2008-09-25-Is your cloud as scalable as you think it is?
Introduction: An unstated assumption is that clouds are scalable. But are they? Stick thousands upon thousands of machines together and there are a lot of potential bottlenecks just waiting to choke off your scalability supply. And if the cloud is scalable what are the chances that your application is really linearly scalable? At 10 machines all may be well. Even at 50 machines the seas look calm. But at 100, 200, or 500 machines all hell might break loose. How do you know? You know through real life testing. These kinds of tests are brutally hard and complicated. who wants to do all the incredibly precise and difficult work of producing cloud scalability tests? GridDynamics has stepped up to the challenge and has just released their Cloud Performance Reports . The report is quite detailed so I'll just cover what I found most interesting. GridDynamics in this report test three configurations: GridGain running a Monte-Carlo simulation on EC2 . This test is a CPU only test,
4 0.13555382 1313 high scalability-2012-08-28-Making Hadoop Run Faster
Introduction: Making Hadoop Run Faster One of the challenges in processing data is that the speed at which we can input data is quite often much faster than the speed at which we can process it. This problem becomes even more pronounced in the context of Big Data, where the volume of data keeps on growing, along with a corresponding need for more insights, and thus the need for more complex processing also increases. Batch Processing to the Rescue Hadoop was designed to deal with this challenge in the following ways: 1. Use a distributed file system: This enables us to spread the load and grow our system as needed. 2. Optimize for write speed: To enable fast writes the Hadoop architecture was designed so that writes are first logged, and then processed. This enables fairly fast write speeds. 3. Use batch processing (Map/Reduce) to balance the speed for the data feeds with the processing speed. Batch Processing Challenges The challenge with batch-processing is that it assumes
5 0.11956386 1160 high scalability-2011-12-21-In Memory Data Grid Technologies
Introduction: After winning a CSC Leading Edge Forum (LEF) research grant, I (Paul Colmer) wanted to publish some of the highlights of my research to share with the wider technology community. What is an In Memory Data Grid? It is not an in-memory relational database, a NOSQL database or a relational database. It is a different breed of software datastore. In summary an IMDG is an ‘off the shelf’ software product that exhibits the following characteristics: The data model is distributed across many servers in a single location or across multiple locations. This distribution is known as a data fabric. This distributed model is known as a ‘shared nothing’ architecture. All servers can be active in each site. All data is stored in the RAM of the servers. Servers can be added or removed non-disruptively, to increase the amount of RAM available. The data model is non-relational and is object-based. Distributed applications written on the .NET and Java application platforms are s
6 0.11591016 954 high scalability-2010-12-06-What the heck are you actually using NoSQL for?
7 0.11072756 1546 high scalability-2013-11-11-Ask HS: What is a good OLAP database choice with node.js?
8 0.10751475 1509 high scalability-2013-08-30-Stuff The Internet Says On Scalability For August 30, 2013
9 0.10750134 470 high scalability-2008-12-18-Risk Analysis on the Cloud (Using Excel and GigaSpaces)
12 0.10481809 906 high scalability-2010-09-22-Applying Scalability Patterns to Infrastructure Architecture
13 0.10470779 1110 high scalability-2011-09-06-Big Data Application Platform
14 0.10429768 513 high scalability-2009-02-16-Handle 1 Billion Events Per Day Using a Memory Grid
16 0.10296699 933 high scalability-2010-11-01-Hot Trend: Move Behavior to Data for a New Interactive Application Architecture
17 0.099686705 750 high scalability-2009-12-16-Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud
20 0.098770075 393 high scalability-2008-09-25-GridGain: One Compute Grid, Many Data Grids
topicId topicWeight
[(0, 0.188), (1, 0.051), (2, 0.031), (3, 0.039), (4, -0.027), (5, 0.083), (6, 0.001), (7, -0.054), (8, -0.032), (9, 0.044), (10, 0.012), (11, 0.078), (12, -0.009), (13, -0.016), (14, 0.07), (15, 0.01), (16, 0.05), (17, -0.02), (18, 0.045), (19, 0.01), (20, -0.034), (21, -0.017), (22, 0.097), (23, 0.019), (24, 0.046), (25, -0.056), (26, -0.071), (27, -0.011), (28, -0.02), (29, 0.047), (30, 0.023), (31, 0.035), (32, -0.003), (33, 0.026), (34, -0.035), (35, 0.028), (36, 0.031), (37, -0.034), (38, -0.011), (39, 0.013), (40, 0.057), (41, -0.012), (42, 0.007), (43, -0.016), (44, 0.048), (45, 0.006), (46, 0.014), (47, -0.038), (48, -0.012), (49, -0.028)]
simIndex simValue blogId blogTitle
same-blog 1 0.9510138 1161 high scalability-2011-12-22-Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions
Introduction: Constructing a scalable risk analysis solution is a fascinating architectural challenge. If you come from Financial Services you are sure to appreciate that. But even architects from other domains are bound to find the challenges fascinating, and the architectural patterns of my suggested solution highly useful in other domains. Recently I held an interesting webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on experience gathered with Financial Services customers. Seeing the vast interest in the webinar, I would like to share the highlights with you here. From an architectural point of view, risk analysis is a data-intensive and a compute-intensive process, which also has an elaborate orchestration logic. volumes in this domain are massive and ever-increasing, together with an ever-increasing demand to reduce response time. These trends are aggravated by global financial regulatory reforms set following the late-2000s
2 0.81698799 1110 high scalability-2011-09-06-Big Data Application Platform
Introduction: It's time to think of the architecture and application platforms surrounding "Big Data" databases. Big Data is often centered around new database technologies mostly from the emerging NoSQL world. The main challenge that these databases solve is how to handle massive amount of data at a reasonable cost and without poor performanc - distributed databases emerged to address this challenge and today we're seeing high adoption rate and quite impressive success stories such as the Netflix use of Cassandra/DataStax solution . All that indicate the speed in which this market evolves. The need for a Big Data Application Platform Application platforms provide a framework for making the development of applications simpler. They do this by carving out the generic parts of applications such as security, scalability, and reliability (which are attributes of a 'good' application) from the parts of the applications that are specific to our business domain. Most of the existing app
3 0.80325639 1570 high scalability-2014-01-01-Paper: Nanocubes: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets
Introduction: How do you turn Big Data into fast, useful, and interesting visualizations? Using R and technology called Nanocubes . The visualizations are stunning and amazingly reactive. Almost as interesting as the technologies behind them. David Smith wrote a great article explaining the technology and showing a demo by Simon Urbanek of a visualization that uses 32Tb of Twitter data. It runs smoothly and interactively on a single machine with 16Gb of RAM. For more information and demos go to nanocubes.net . David Smith sums it up nicely: Despite the massive number of data points and the beauty and complexity of the real-time data visualization, it runs impressively quickly. The underlying data structure is based on Nanocubes, a fast datastructure for in-memory data cubes. The basic idea is that nanocubes aggregate data hierarchically, so that as you zoom in and out of the interactive application, one pixel on the screen is mapped to just one data point, aggregated from the many that
Introduction: This is a guest repost by Pete Soderling , Founder at Hakka Labs , creating a community where software engineers come to grow. In response to a recent post from MongoHQ entitled “ You don’t have big data ," I would generally agree with many of the author’s points. However, regardless of whether you call it big data, small data, hot data or cold data - we are all in a position to admit that *more* data is here to stay - and that’s due to many different factors. Perhaps primarily, as the article mentions, this is due to the decreasing cost of storage over time. Other factors include access to open APIs, the sheer volume of ever-increasing consumer activity online, as well as a plethora of other incentives that are developing (mostly) behind the scenes as companies “share” data with each other. (You know they do this , right?) But one of the most important things I’ve learned over the past couple of years is that it’s crucial for forward thinking companies to start to design
5 0.7798245 1216 high scalability-2012-03-27-Big Data In the Cloud Using Cloudify
Introduction: Edd Dumbill wrote an interesting article on O’Reilly Radar covering the current solutions for running Big Data in the Cloud Big data and cloud technology go hand-in-hand. Big data needs clusters of servers for processing, which clouds can readily provide. Big PaaS Edd touched briefly on the role of PaaS for delivering Big Data applications in the cloud Beyond IaaS, several cloud services provide application layer support for big data work. Sometimes referred to as managed solutions, or platform as a service (PaaS), these services remove the need to ucale things such as databases or MapReduce, reducing your workload and maintenance burden. Additionally, PaaS providers can realize great efficiencies by hosting at the application level, and pass those savings on to the customer. To put it simply, managing data clusters is one thing. Being able to process the data is yet another challenge that we need to think about when we’re dealing with application platforms, as I no
6 0.77830726 1160 high scalability-2011-12-21-In Memory Data Grid Technologies
8 0.7701748 633 high scalability-2009-06-19-GemFire 6.0: New innovations in data management
9 0.76750886 1211 high scalability-2012-03-19-LinkedIn: Creating a Low Latency Change Data Capture System with Databus
10 0.75889343 822 high scalability-2010-05-04-Business continuity with real-time data integration
11 0.7538203 250 high scalability-2008-02-17-Web Accelerators - snake oil or miracle remedy?
12 0.74700159 697 high scalability-2009-09-09-GridwiseTech revolutionizes data management
13 0.74501967 393 high scalability-2008-09-25-GridGain: One Compute Grid, Many Data Grids
14 0.7389127 292 high scalability-2008-03-30-Scaling Out MySQL
16 0.7387476 956 high scalability-2010-12-08-How To Get Experience Working With Large Datasets
17 0.73614532 817 high scalability-2010-04-29-Product: SciDB - A Science-Oriented DBMS at 100 Petabytes
18 0.7331292 126 high scalability-2007-10-20-Should you build your next website using 3tera's grid OS?
19 0.73002243 668 high scalability-2009-08-01-15 Scalability and Performance Best Practices
20 0.72934979 809 high scalability-2010-04-13-Strategy: Saving Your Butt With Deferred Deletes
topicId topicWeight
[(1, 0.163), (2, 0.2), (10, 0.05), (30, 0.023), (47, 0.012), (61, 0.074), (76, 0.194), (79, 0.117), (85, 0.031), (94, 0.059)]
simIndex simValue blogId blogTitle
1 0.93418258 966 high scalability-2010-12-31-Facebook in 20 Minutes: 2.7M Photos, 10.2M Comments, 4.6M Messages
Introduction: To celebrate the new year Facebook has shared the results of a little end of the year introspection. It has been a fecund year for Facebook: 43,869,800 changed their status to single 3,025,791 changed their status to "it's complicated" 28,460,516 changed their status to in a relationship 5,974,574 changed their status to engaged 36,774,801 changes their status to married If these numbers are simply to large to grasp, it doesn't get any better when you look at happens in a mere 20 minutes: Shared links: 1,000,000 Tagged photos: 1,323,000 Event invites sent out: 1,484,000 Wall Posts: 1,587,000 Status updates: 1,851,000 Friend requests accepted: 1,972,000 Photos uploaded: 2,716,000 Comments: 10,208,000 Message: 4,632,000 If you want to see how Facebook supports these huge numbers take a look at a few posts . One wonders what the new year will bring? Related Articles What the World Eats from Time Magazine A Day in the Life of an An
2 0.9192937 172 high scalability-2007-12-02-nginx: high performance smpt-pop-imap proxy
Introduction: nginx is a high performance smtp/pop/imap proxy that lets you do custom authorization and lookups and is very scalable. (just add nodes) Nginx by default is a reverse proxy and this is what it is doing here for pop/imap connections. It is also an excellelent reverse proxy for web servers. Advantage: You dont have to have a speacial database or ldap schema. Just an url to do auth and lookup with. A url that may be accessed by a unix or a tcp socket. Write your own auth handler - according to your own policy. For example: A user called atif tries to login with the pass testxyz. You pass this infomation to a URL such as socket:/var/tmp/xyz.sock or http://auth.corp.mailserver.net:someport/someurl The auth server replies with either a FAILURE such as Auth-Status: Invalid Login or password or with a success such as Auth-Status: OK Auth-Server: OneOfThe100Servers Auth-Port: optionalyAPort We have implemented it at our ISP and it has saves us a
same-blog 3 0.91805327 1161 high scalability-2011-12-22-Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions
Introduction: Constructing a scalable risk analysis solution is a fascinating architectural challenge. If you come from Financial Services you are sure to appreciate that. But even architects from other domains are bound to find the challenges fascinating, and the architectural patterns of my suggested solution highly useful in other domains. Recently I held an interesting webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on experience gathered with Financial Services customers. Seeing the vast interest in the webinar, I would like to share the highlights with you here. From an architectural point of view, risk analysis is a data-intensive and a compute-intensive process, which also has an elaborate orchestration logic. volumes in this domain are massive and ever-increasing, together with an ever-increasing demand to reduce response time. These trends are aggravated by global financial regulatory reforms set following the late-2000s
4 0.88680917 1122 high scalability-2011-09-23-Stuff The Internet Says On Scalability For September 23, 2011
Introduction: I'd walk a mile for HighScalability : 1/12th the World Population on Facebook in One Day ; 1.8 ZettaBytes of data in 2011; 1 Billion Foursquare Checkins ; 2 million on Spotify ; 1 Million on GitHub ; $1,279-per-hour, 30,000-core cluster built on EC2 ; Patent trolls cost .5 trillion dollars ; 235 terabytes of data collected by the U.S. Library of Congress in April . Potent quotables: @jstogdill : Corporations over protect low value info assets (which screws up collaboration) and under protects high value assets. #strataconf @sbtourist : I think BigMemory-like approaches based on large put-and-forget memory cans, are rarely a solution to performance/scalability problems. 1 Million TCP Connections . Remember when 10K was a real limit and you had to build out boxes just to handle the load? Amazing. We don't know how much processing can be attached to these connections, how much memory the apps use, or what the response latency is to
5 0.88610744 889 high scalability-2010-08-30-Pomegranate - Storing Billions and Billions of Tiny Little Files
Introduction: Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works: We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second ( RPS ). Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box. The features of Pomegranate are: It handles billions of small files efficiently, even in on
6 0.87682056 1179 high scalability-2012-01-23-Facebook Timeline: Brought to You by the Power of Denormalization
7 0.87575734 813 high scalability-2010-04-19-The cost of High Availability (HA) with Oracle
8 0.87050939 65 high scalability-2007-08-16-Scaling Secret #2: Denormalizing Your Way to Speed and Profit
9 0.862647 176 high scalability-2007-12-07-Synchronizing databases in different geographic locations
10 0.86195761 665 high scalability-2009-07-29-Strategy: Let Google and Yahoo Host Your Ajax Library - For Free
11 0.85701698 483 high scalability-2009-01-04-Paper: MapReduce: Simplified Data Processing on Large Clusters
12 0.85329628 1564 high scalability-2013-12-13-Stuff The Internet Says On Scalability For December 13th, 2013
13 0.84366226 1552 high scalability-2013-11-22-Stuff The Internet Says On Scalability For November 22th, 2013
14 0.83780062 1265 high scalability-2012-06-15-Stuff The Internet Says On Scalability For June 15, 2012
15 0.83611166 514 high scalability-2009-02-18-Numbers Everyone Should Know
16 0.83504134 1514 high scalability-2013-09-09-Need Help with Database Scalability? Understand I-O
17 0.83501112 1654 high scalability-2014-06-05-Cloud Architecture Revolution
18 0.83452737 1431 high scalability-2013-03-29-Stuff The Internet Says On Scalability For March 29, 2013
19 0.83409721 881 high scalability-2010-08-16-Scaling an AWS infrastructure - Tools and Patterns
20 0.83409685 195 high scalability-2007-12-28-Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half