high_scalability high_scalability-2014 high_scalability-2014-1578 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL). Question(s) I am looking for answer 1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent. - Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives. 2. Persistence: After some evaluation data to be stored in DBMS System. - End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option. Looking if there are any alter
sentIndex sentText sentNum sentScore
1 Agents sends data every 15 Secs may be even 5 secs. [sent-2, score-0.236]
2 Working on a service/system to which all agent can post data/tuples with marginal payload. [sent-3, score-0.657]
3 Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL). [sent-5, score-0.462]
4 But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent. [sent-9, score-0.625]
5 - Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. [sent-10, score-0.534]
6 Persistence: After some evaluation data to be stored in DBMS System. [sent-13, score-0.344]
7 - End of processing data is an aggregated record for which MySql looks scalable. [sent-14, score-0.472]
8 Looking if there are any alternatives for above two scenarios and get expert advice. [sent-17, score-0.361]
wordName wordTfidf (topN-words)
[('agent', 0.395), ('agents', 0.298), ('dbms', 0.246), ('segregated', 0.219), ('advices', 0.219), ('secs', 0.219), ('upto', 0.196), ('marginal', 0.174), ('notified', 0.17), ('activemq', 0.17), ('multicast', 0.156), ('udp', 0.149), ('aggregated', 0.137), ('ultimately', 0.132), ('alternatives', 0.131), ('suggests', 0.13), ('stored', 0.128), ('bus', 0.128), ('async', 0.125), ('expert', 0.121), ('sends', 0.121), ('generates', 0.119), ('data', 0.115), ('persistence', 0.11), ('scenarios', 0.109), ('considering', 0.104), ('hbase', 0.103), ('advice', 0.102), ('evaluation', 0.101), ('drop', 0.1), ('sending', 0.099), ('status', 0.098), ('volume', 0.096), ('record', 0.093), ('remote', 0.092), ('post', 0.088), ('communication', 0.084), ('message', 0.083), ('rate', 0.073), ('looks', 0.071), ('currently', 0.069), ('side', 0.068), ('event', 0.061), ('internet', 0.06), ('written', 0.059), ('working', 0.056), ('processing', 0.056), ('looking', 0.052), ('end', 0.049), ('mysql', 0.048)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1578 high scalability-2014-01-14-Ask HS: Design and Implementation of scalable services?
Introduction: We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL). Question(s) I am looking for answer 1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent. - Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives. 2. Persistence: After some evaluation data to be stored in DBMS System. - End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option. Looking if there are any alter
2 0.12891181 431 high scalability-2008-10-27-Notify.me Architecture - Synchronicity Kills
Introduction: What's cool about starting a new project is you finally have a chance to do it right. You of course eventually mess everything up in your own way, but for that one moment the world has a perfect order, a rightness that feels satisfying and good. Arne Claassen, the CTO of notify.me, a brand new real time notification delivery service, is in this honeymoon period now. Arne has been gracious enough to share with us his philosophy of how to build a notification service. I think you'll find it fascinating because Arne goes into a lot of useful detail about how his system works. His main design philosophy is to minimize the bottlenecks that form around synchronous access, that is when some resource is requested and the requestor ties up more resources, waiting for a response. If the requested resource can’t be delivered in a timely manner, more and more requests pile up until the server can’t accept any new ones. Nobody gets what they want and you have an outage. Breaking synchronous op
3 0.10197268 1082 high scalability-2011-07-18-New Relic Architecture - Collecting 20+ Billion Metrics a Day
Introduction: This is a guest post by Brian Doll , Application Performance Engineer at New Relic. New Relic’s multitenant, SaaS web application monitoring service collects and persists over 100,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. We believe that good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. Here we'll show you how we do it. New Relic is Application Performance Management (APM) as a Service In-app agent instrumentation (bytecode instrumentation, etc.) Support for 5 programming languages (Ruby, Java, PHP, .NET, Python) 175,000+ app processes monitored globally 10,000+ customers The Stats 20+ Billion application metrics collected every day 1.7+ Billion web page metrics collected every week Each "timeslice" metric is about 250 bytes 100k timeslice records inserted every second 7 Billion new rows of d
4 0.096413389 228 high scalability-2008-01-28-Product: ISPMan Centralized ISP Management System
Introduction: From FRESH Ports and their website: ISPman is an ISP management software written in perl, using an LDAP backend to manage virtual hosts for an ISP. It can be used to manage, DNS, virtual hosts for apache config, postfix configuration, cyrus mail boxes, proftpd etc. ISPMan was written as a management tool for the network at 4unet where between 30 to 50 domains are hosted and the number is crazily growing. Managing these domains and their users was a little time consuming, and needed an Administrator who knows linux and these daemons fluently. Now the help-desk can easily manage the domains and users. LDAP data can be easily replicated site wide, and mail box server can be scaled from 1 to n as required. An LDAP entry called maildrop tells the SMTP server (postfix) where to deliver the mail. The SMTP servers can be loadbalanced with one of many load balancing techniques. The program is written with scalability and High availability in mind. This may not be the right s
5 0.089913845 857 high scalability-2010-07-13-DbShards Part Deux - The Internals
Introduction: This is a follow up article by Cory Isaacson to the first article on DbShards, Product: dbShards - Share Nothing. Shard Everything , describing some of the details about how DbShards works on the inside. The dbShards architecture is a true “shared nothing” implementation of Database Sharding. The high-level view of dbShards is shown here: The above diagram shows how dbShards works for achieving massive database scalability across multiple database servers, using native DBMS engines and our dbShards components. The important components are: dbS/Client : A design goal of dbShards is to make database sharding as seamless as possible to an application, so that application developers can write the same type of code they always have. A key component to making this possible is the dbShards Client. The dbShards Client is our intelligent driver that is an exact API emulation of a given vendor’s database driver. For example, with MySQL we have full support for JDBC, and the the
6 0.086451098 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
7 0.08354383 795 high scalability-2010-03-16-1 Billion Reasons Why Adobe Chose HBase
8 0.08210218 1408 high scalability-2013-02-19-Puppet monitoring: how to monitor the success or failure of Puppet runs
9 0.079426266 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
10 0.079332843 943 high scalability-2010-11-16-Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
11 0.076539382 1271 high scalability-2012-06-25-StubHub Architecture: The Surprising Complexity Behind the World’s Largest Ticket Marketplace
12 0.076399848 122 high scalability-2007-10-14-Product: The Spread Toolkit
13 0.07242161 545 high scalability-2009-03-19-Product: Redis - Not Just Another Key-Value Store
14 0.069352254 179 high scalability-2007-12-10-Future of EJB3 !! ??
15 0.067593969 151 high scalability-2007-11-12-a8cjdbc - Database Clustering via JDBC
16 0.064930722 27 high scalability-2007-07-25-Product: 3 PAR REMOTE COPY
17 0.064762495 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs
18 0.064753123 933 high scalability-2010-11-01-Hot Trend: Move Behavior to Data for a New Interactive Application Architecture
19 0.064723998 401 high scalability-2008-10-04-Is MapReduce going mainstream?
20 0.064662397 1008 high scalability-2011-03-22-Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
topicId topicWeight
[(0, 0.093), (1, 0.035), (2, -0.013), (3, -0.01), (4, 0.016), (5, 0.057), (6, 0.038), (7, -0.003), (8, 0.012), (9, 0.008), (10, -0.004), (11, 0.041), (12, 0.032), (13, -0.043), (14, 0.02), (15, 0.039), (16, 0.002), (17, -0.006), (18, -0.022), (19, -0.011), (20, -0.007), (21, -0.001), (22, -0.011), (23, 0.02), (24, 0.041), (25, -0.0), (26, -0.003), (27, 0.007), (28, -0.008), (29, 0.012), (30, -0.021), (31, 0.002), (32, -0.003), (33, 0.018), (34, 0.014), (35, 0.025), (36, 0.028), (37, -0.002), (38, 0.01), (39, 0.002), (40, 0.051), (41, 0.02), (42, 0.023), (43, 0.015), (44, 0.028), (45, 0.001), (46, -0.012), (47, -0.001), (48, -0.043), (49, -0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.89543068 1578 high scalability-2014-01-14-Ask HS: Design and Implementation of scalable services?
Introduction: We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL). Question(s) I am looking for answer 1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent. - Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives. 2. Persistence: After some evaluation data to be stored in DBMS System. - End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option. Looking if there are any alter
2 0.73841256 1211 high scalability-2012-03-19-LinkedIn: Creating a Low Latency Change Data Capture System with Databus
Introduction: This is a guest post by Siddharth Anand , a senior member of LinkedIn's Distributed Data Systems team. Over the past 3 years, I've had the good fortune to work with many emerging NoSQL products in the context of supporting the needs of a high-traffic, customer facing web site. In 2010, I helped Netflix to successfully transition its web scale use-cases from Oracle to SimpleDB , AWS' hosted database service. On completion of that migration, we started a second migration, this time from SimpleDB to Cassandra. The first transition was key to our move from our own data center to AWS' cloud. The second was key to our expansion from one AWS Region to multiple geographically-distributed Regions -- today Netflix serves traffic out of two AWS Regions, one in Virginia, the other in Ireland ( F1 ). Both of these transitions have been successful, but have involved integration pain points such as the creation of database replication technology. In December 2011, I moved to LinkedIn's D
3 0.73676717 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge
Introduction: Mobile developers have a huge scaling problem ahead: doing something useful with massive continuous streams of telemetry data from millions and millions of devices. This is a really good problem to have. It means smartphone sales are finally fulfilling their destiny: slaughtering PCs in the sales arena. And it also means mobile devices aren't just containers for simple standalone apps anymore, they are becoming the dominant interface to giant backend systems. While developers are now rocking mobile development on the client side, their next challenge is how to code those tricky backend bits. A company facing those same exact problems right now is Medialets , a mobile rich media ad platform. What they do is help publishers create high quality interactive ads, though for our purposes their ad stuff isn't that interesting. What I did find really interesting about their system is how they are tackling the problem of defeating the mobile device data deluge. Each day Medialets munc
4 0.73593748 1362 high scalability-2012-11-26-BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data
Introduction: This is a guest post by Jon Vlachogiannis . Jon is the founder and CTO of BugSense . BugSense, is an error-reporting and quality metrics service that tracks thousand of apps every day. When mobile apps crash, BugSense helps developers pinpoint and fix the problem. The startup delivers first-class service to its customers, which include VMWare, Samsung, Skype and thousands of independent app developers. Tracking more than 200M devices requires fast, fault tolerant and cheap infrastructure. The last six months, we’ve decided to use our BigData infrastructure, to provide the users with metrics about their apps performance and stability and let them know how the errors affect their user base and revenues. We knew that our solution should be scalable from day one, because more than 4% of the smartphones out there, will start DDOSing us with data. We wanted to be able to: Abstract the application logic and feed browsers with JSON Run complex algorithms on the fly Expe
Introduction: This is a guest repost by Pete Soderling , Founder at Hakka Labs , creating a community where software engineers come to grow. In response to a recent post from MongoHQ entitled “ You don’t have big data ," I would generally agree with many of the author’s points. However, regardless of whether you call it big data, small data, hot data or cold data - we are all in a position to admit that *more* data is here to stay - and that’s due to many different factors. Perhaps primarily, as the article mentions, this is due to the decreasing cost of storage over time. Other factors include access to open APIs, the sheer volume of ever-increasing consumer activity online, as well as a plethora of other incentives that are developing (mostly) behind the scenes as companies “share” data with each other. (You know they do this , right?) But one of the most important things I’ve learned over the past couple of years is that it’s crucial for forward thinking companies to start to design
6 0.71602374 1020 high scalability-2011-04-12-Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast
7 0.70818347 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
9 0.69876629 956 high scalability-2010-12-08-How To Get Experience Working With Large Datasets
10 0.69187337 558 high scalability-2009-04-06-How do you monitor the performance of your cluster?
11 0.69116259 553 high scalability-2009-04-03-Collectl interface to Ganglia - any interest?
12 0.68711072 907 high scalability-2010-09-23-Working With Large Data Sets
13 0.68369162 716 high scalability-2009-10-06-Building a Unique Data Warehouse
14 0.66943884 882 high scalability-2010-08-18-Misco: A MapReduce Framework for Mobile Systems - Start of the Ambient Cloud?
16 0.66455668 809 high scalability-2010-04-13-Strategy: Saving Your Butt With Deferred Deletes
17 0.66347367 119 high scalability-2007-10-10-WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers
18 0.66314822 449 high scalability-2008-11-24-Product: Scribe - Facebook's Scalable Logging System
19 0.66302633 1161 high scalability-2011-12-22-Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions
20 0.66176689 587 high scalability-2009-05-01-FastBit: An Efficient Compressed Bitmap Index Technology
topicId topicWeight
[(1, 0.128), (2, 0.242), (16, 0.285), (30, 0.063), (61, 0.054), (79, 0.015), (85, 0.09)]
simIndex simValue blogId blogTitle
same-blog 1 0.8581174 1578 high scalability-2014-01-14-Ask HS: Design and Implementation of scalable services?
Introduction: We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL). Question(s) I am looking for answer 1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent. - Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives. 2. Persistence: After some evaluation data to be stored in DBMS System. - End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option. Looking if there are any alter
2 0.85064834 110 high scalability-2007-10-03-Why most large-scale Web sites are not written in Java
Introduction: There i s a l ot of i nformation in the b l ogosphere descr i bing the arch i tecture of many popu l ar s i tes, such as Google, Amazon, eBay, LinkedIn, TypePad, W i kiPedia and others. I've summar i zed th i s issue in a b l og post here I wou l d rea l ly appreciate your opinion on th i s matter.
Introduction: With a new Planet of the Apes coming out, this may be a touchy subject with our new overlords, but Netflix is using a whole lot more trouble injecting monkeys to test and iteratively harden their systems. We learned previously how Netflix used Chaos Monkey , a tool to test failover handling by continuously failing EC2 nodes. That was just a start. More monkeys have been added to the barrel. Node failure is just one problem in a system. Imagine a problem and you can imagine creating a monkey to test if your system is handling that problem properly. Yury Izrailevsky talks about just this approach in this very interesting post: The Netflix Simian Army . I know what you are thinking, if monkeys are so great then why has Netflix been down lately. Dmuino addressed this potential embarrassment, putting all fears of cloud inferiority to rest: Unfortunately we're not running 100% on the cloud today. We're working on it, and we could use more help. The latest outage was caused by a com
4 0.7642529 388 high scalability-2008-09-23-Event: CloudCamp Silicon Valley Unconference on 30th September
Introduction: CloudCamp is an interesting unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate. CloudCamp Silicon Valley 08 is scheduled for Tuesday, September 30, 2008 from 06:00 PM - 10:00 PM in Sun Microsystems' EBC Briefing Center 15 Network Circle Menlo Park, CA 94025 CloudCamp follows an interactive, unscripted unconference format. You can propose your own session or you can attend a session proposed by someone else. Either way, you are encouraged to engage in the discussion and “Vote with your feet”, which means … “find another session if you don’t find the session helpful”. Pick and choose from the conversat
5 0.76186478 1640 high scalability-2014-04-30-10 Tips for Optimizing NGINX and PHP-fpm for High Traffic Sites
Introduction: Adrian Singer has boiled down 7 years of experience to a set of 10 very useful tips on how to best optimize NGINX and PHP-fpm for high traffic sites : Switch from TCP to UNIX domain sockets . When communicating to processes on the same machine UNIX sockets have better performance the TCP because there's less copying and fewer context switches. Adjust Worker Processes . Set the worker_processes in your nginx.conf file to the number of cores your machine has and increase the number of worker_connections. Setup upstream load balancing . Multiple upstream backends on the same machine produce higher throughout than a single one. Disable access log files . Log files on high traffic sites involve a lot of I/O that has to be synchronized across all threads. Can have a big impact. Enable GZip . Cache information about frequently accessed files . Adjust client timeouts . Adjust output buffers . /etc/sysctl.conf tuning . Monitor . Continually monitor the number
6 0.75980002 1558 high scalability-2013-12-04-How Can Batching Requests Actually Reduce Latency?
7 0.75977951 1071 high scalability-2011-07-01-Stuff The Internet Says On Scalability For July 1, 2011
8 0.74215609 484 high scalability-2009-01-05-Lessons Learned at 208K: Towards Debugging Millions of Cores
9 0.71976948 1004 high scalability-2011-03-14-Twitter by the Numbers - 460,000 New Accounts and 140 Million Tweets Per Day
10 0.71230996 790 high scalability-2010-03-09-Applications as Virtual States
11 0.70884413 1652 high scalability-2014-05-21-9 Principles of High Performance Programs
12 0.70199192 1600 high scalability-2014-02-21-Stuff The Internet Says On Scalability For February 21st, 2014
13 0.70165062 1349 high scalability-2012-10-29-Gone Fishin': Welcome to High Scalability
14 0.69733882 52 high scalability-2007-08-01-Product: Memcached
15 0.69144803 237 high scalability-2008-02-03-Product: Collectl - Performance Data Collector
16 0.69091791 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability
17 0.68652898 942 high scalability-2010-11-15-Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests
18 0.68596292 699 high scalability-2009-09-10-How to handle so many socket connection
19 0.68522847 638 high scalability-2009-06-26-PlentyOfFish Architecture
20 0.6851446 317 high scalability-2008-05-10-Hitting 300 SimbleDB Requests Per Second on a Small EC2 Instance