high_scalability high_scalability-2007 high_scalability-2007-58 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Cacti is a network statistics graphing tool designed as a frontend to RRDtool's data storage and graphing functionality. It is intended to be intuitive and easy to use, as well as robust and scalable. It is generally used to graph time-series data like CPU load and bandwidth use. The frontend is written in PHP; it can handle multiple users, each with their own graph sets, so it is sometimes used by web hosting providers (especially dedicated server, virtual private server, and colocation providers) to display bandwidth statistics for their customers. It can be used to configure the data collection itself, allowing certain setups to be monitored without any manual configuration of RRDtool.
sentIndex sentText sentNum sentScore
1 Cacti is a network statistics graphing tool designed as a frontend to RRDtool's data storage and graphing functionality. [sent-1, score-1.623]
2 It is intended to be intuitive and easy to use, as well as robust and scalable. [sent-2, score-0.505]
3 It is generally used to graph time-series data like CPU load and bandwidth use. [sent-3, score-0.641]
4 The frontend is written in PHP; it can handle multiple users, each with their own graph sets, so it is sometimes used by web hosting providers (especially dedicated server, virtual private server, and colocation providers) to display bandwidth statistics for their customers. [sent-4, score-2.144]
5 It can be used to configure the data collection itself, allowing certain setups to be monitored without any manual configuration of RRDtool. [sent-5, score-1.17]
wordName wordTfidf (topN-words)
[('rrdtool', 0.471), ('graphing', 0.407), ('frontend', 0.294), ('providers', 0.238), ('statistics', 0.237), ('setups', 0.199), ('graph', 0.177), ('monitored', 0.173), ('colocation', 0.167), ('intuitive', 0.153), ('intended', 0.142), ('bandwidth', 0.138), ('configure', 0.126), ('display', 0.125), ('robust', 0.111), ('manual', 0.111), ('used', 0.106), ('private', 0.106), ('sometimes', 0.104), ('generally', 0.102), ('allowing', 0.1), ('collection', 0.099), ('hosting', 0.095), ('sets', 0.094), ('certain', 0.091), ('dedicated', 0.09), ('php', 0.085), ('especially', 0.078), ('tool', 0.074), ('configuration', 0.068), ('virtual', 0.068), ('written', 0.067), ('server', 0.067), ('designed', 0.067), ('cpu', 0.058), ('easy', 0.057), ('data', 0.056), ('handle', 0.054), ('multiple', 0.047), ('users', 0.042), ('well', 0.042), ('without', 0.041), ('storage', 0.041), ('load', 0.041), ('network', 0.04), ('web', 0.031), ('use', 0.022), ('like', 0.021)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 58 high scalability-2007-08-04-Product: Cacti
Introduction: Cacti is a network statistics graphing tool designed as a frontend to RRDtool's data storage and graphing functionality. It is intended to be intuitive and easy to use, as well as robust and scalable. It is generally used to graph time-series data like CPU load and bandwidth use. The frontend is written in PHP; it can handle multiple users, each with their own graph sets, so it is sometimes used by web hosting providers (especially dedicated server, virtual private server, and colocation providers) to display bandwidth statistics for their customers. It can be used to configure the data collection itself, allowing certain setups to be monitored without any manual configuration of RRDtool.
2 0.14938958 101 high scalability-2007-09-27-Product: Ganglia Monitoring System
Introduction: Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
3 0.12848827 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
Introduction: Update: Social networks in the database: using a graph database . A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j , a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships." A graph looks something like: For more lovely examples take a look at the Graph Image Gal
4 0.12350857 140 high scalability-2007-11-02-How WordPress.com Tracks 300 Servers Handling 10 Million Pageviews
Introduction: WordPress.com hosts 300 servers in 5 different data centers. It's always useful to learn how large installations manage all their unruly children: Currently we Nagios for server health monitoring, Munin for graphing various server metrics, and a wiki to keep track of all the server hardware specs, IPs, vendor IDs, etc. All of these tools have suited us well up until now, but there have been some scaling issues. The post covers how these different tools are working for them and the comment section has some interesting discussions too.
5 0.1162081 944 high scalability-2010-11-17-Some Services are More Equal than Others
Introduction: Remember when the iPhone launched? Remember the complaints about the device not maintaining calls well? Was it really the hardware? Or was it the service provider network, overwhelmed by not just the call volume but millions of hyper-customers experimenting with their new toy? Look – a video! Look a video and a call. Hey, I’m on Facebook, Twitter, YouTube, and streaming audio at the same time I’m making a call! How awesome is that? Meanwhile, there’s an entire army of operators at a service provider’s NOC who are stalking through the data center with scissors because it’s the only way to stop the madness. Service providers, probably better than any other, understand “services”. For longer than the enterprise has been talking about them, service providers have been implementing them. They’ve got their own set of standards and reference architectures and even language to describe them, but in a nutshell that’s what a service provider does: offers services. The proble
6 0.11212949 479 high scalability-2008-12-29-Platform virtualization - top 25 providers (software, hardware, combined)
7 0.10760257 621 high scalability-2009-06-06-Graph server
8 0.10413127 808 high scalability-2010-04-12-Poppen.de Architecture
9 0.10255086 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
10 0.096507072 105 high scalability-2007-10-01-Statistics Logging Scalability
11 0.092596993 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned
12 0.088273712 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
13 0.085207969 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
14 0.083052613 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front
16 0.078361996 444 high scalability-2008-11-14-Private-Public Cloud
17 0.076560594 1593 high scalability-2014-02-10-13 Simple Tricks for Scaling Python and Django with Apache from HackerEarth
19 0.072444476 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication
20 0.072062172 42 high scalability-2007-07-30-Product: GridLayer. Utility computing for online application
topicId topicWeight
[(0, 0.097), (1, 0.039), (2, 0.009), (3, -0.047), (4, -0.027), (5, 0.008), (6, 0.028), (7, -0.042), (8, 0.008), (9, 0.081), (10, 0.041), (11, 0.013), (12, -0.007), (13, -0.046), (14, -0.009), (15, 0.012), (16, 0.016), (17, 0.118), (18, 0.03), (19, 0.025), (20, -0.079), (21, -0.054), (22, -0.03), (23, -0.032), (24, 0.019), (25, 0.049), (26, -0.029), (27, 0.007), (28, 0.004), (29, -0.037), (30, -0.043), (31, -0.03), (32, 0.001), (33, -0.004), (34, -0.024), (35, 0.067), (36, -0.004), (37, 0.013), (38, -0.056), (39, 0.045), (40, -0.043), (41, 0.031), (42, 0.019), (43, 0.033), (44, 0.058), (45, -0.029), (46, 0.007), (47, -0.019), (48, 0.038), (49, -0.06)]
simIndex simValue blogId blogTitle
same-blog 1 0.9458921 58 high scalability-2007-08-04-Product: Cacti
Introduction: Cacti is a network statistics graphing tool designed as a frontend to RRDtool's data storage and graphing functionality. It is intended to be intuitive and easy to use, as well as robust and scalable. It is generally used to graph time-series data like CPU load and bandwidth use. The frontend is written in PHP; it can handle multiple users, each with their own graph sets, so it is sometimes used by web hosting providers (especially dedicated server, virtual private server, and colocation providers) to display bandwidth statistics for their customers. It can be used to configure the data collection itself, allowing certain setups to be monitored without any manual configuration of RRDtool.
2 0.6974178 628 high scalability-2009-06-13-Neo4j - a Graph Database that Kicks Buttox
Introduction: Update: Social networks in the database: using a graph database . A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j , a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships." A graph looks something like: For more lovely examples take a look at the Graph Image Gal
3 0.68956482 1285 high scalability-2012-07-18-Disks Ain't Dead Yet: GraphChi - a disk-based large-scale graph computation
Introduction: GraphChi uses a Parallel Sliding Windows method which can: process a graph with mutable edge values efficiently from disk, with only a small number of non-sequential disk accesses, while supporting the asynchronous model of computation. The result is graphs with billions of edges can be processed on just a single machine. It uses a vertex-centric computation model similar to Pregel , which supports iterative algorithims as apposed to the batch style of MapReduce. Streaming graph updates are supported. About GraphChi, Carlos Guestrin, codirector of Carnegie Mellon's Select Lab, says : A Mac Mini running GraphChi can analyze Twitter's social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes. "The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers Related Articles Aapo Kyrola Home Page Your Laptop Can Now Analyze Big Data by JOHN PAVLUS Example Applications Runn
4 0.68708313 621 high scalability-2009-06-06-Graph server
Introduction: I've seen mentioned in few times sites like Digg or LinkedIn using graph servers to hold their social graphs. But the only sort of open source graph server I've found is http://neo4j.org/ . Can anyone recommend an open source graph server? Thanks Aaron
5 0.68560934 626 high scalability-2009-06-10-Paper: Graph Databases and the Future of Large-Scale Knowledge Management
Introduction: Relational databases, document databases, and distributed hash tables get most of the hype these days, but there's another option: graph databases. Back to the future it seems. Here's a really interesting paper by Marko A. Rodriguez introducing the graph model and it's extension to representing the world wide web of data. Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis.
6 0.6850009 1406 high scalability-2013-02-14-When all the Program's a Graph - Prismatic's Plumbing Library
7 0.64385474 1136 high scalability-2011-11-03-Paper: G2 : A Graph Processing System for Diagnosing Distributed Systems
8 0.63726515 155 high scalability-2007-11-15-Video: Dryad: A general-purpose distributed execution platform
9 0.63415128 766 high scalability-2010-01-26-Product: HyperGraphDB - A Graph Database
10 0.62851739 805 high scalability-2010-04-06-Strategy: Make it Really Fast vs Do the Work Up Front
11 0.62062901 801 high scalability-2010-03-30-Running Large Graph Algorithms - Evaluation of Current State-of-the-Art and Lessons Learned
12 0.58379656 631 high scalability-2009-06-15-Large-scale Graph Computing at Google
13 0.56834781 722 high scalability-2009-10-15-Hot Scalability Links for Oct 15 2009
14 0.55773556 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
15 0.54117483 827 high scalability-2010-05-14-Hot Scalability Links for May 14, 2010
16 0.53462273 680 high scalability-2009-08-13-Reconnoiter - Large-Scale Trending and Fault-Detection
17 0.52160025 542 high scalability-2009-03-17-IBM WebSphere eXtreme Scale (IMDG)
18 0.51585495 1283 high scalability-2012-07-13-Stuff The Internet Says On Scalability For July 13, 2012
19 0.49662879 1486 high scalability-2013-07-03-5 Rockin' Tips for Scaling PHP to 30,000 Concurrent Users Per Server
20 0.49280205 240 high scalability-2008-02-05-Handling of Session for a site running from more than 1 data center
topicId topicWeight
[(1, 0.072), (2, 0.252), (61, 0.039), (79, 0.127), (93, 0.296), (94, 0.067)]
simIndex simValue blogId blogTitle
1 0.89078319 403 high scalability-2008-10-06-Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview
Introduction: Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?
2 0.88912922 349 high scalability-2008-07-10-Can cloud computing smite down evil zombie botnet armies?
Introduction: In the more cool stuff I've never heard of before department is something called Self Cleansing Intrusion Tolerance (SCIT). Botnets are created when vulnerable computers live long enough to become infected with the will to do the evil bidding of their evil masters. Security is almost always about removing vulnerabilities (a process which to outside observers often looks like a dog chasing its tail ). SCIT takes a different approach, it works on the availability angle. Something I never thought of before, but which makes a great deal of sense once I thought about it. With SCIT you stop and restart VM instances every minute (or whatever depending in your desired window vulnerability).... This short exposure window means worms and viri do not have long enough to fully infect a machine and carry out a coordinated attack. A machine is up for a while. Does work. And then is torn down again only to be reborn as a clean VM with no possibility of infection (unless of course the VM
same-blog 3 0.86836267 58 high scalability-2007-08-04-Product: Cacti
Introduction: Cacti is a network statistics graphing tool designed as a frontend to RRDtool's data storage and graphing functionality. It is intended to be intuitive and easy to use, as well as robust and scalable. It is generally used to graph time-series data like CPU load and bandwidth use. The frontend is written in PHP; it can handle multiple users, each with their own graph sets, so it is sometimes used by web hosting providers (especially dedicated server, virtual private server, and colocation providers) to display bandwidth statistics for their customers. It can be used to configure the data collection itself, allowing certain setups to be monitored without any manual configuration of RRDtool.
4 0.84496766 168 high scalability-2007-11-30-Strategy: Efficiently Geo-referencing IPs
Introduction: A lot of apps need to map IP addresses to locations. Jeremy Cole in On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS succinctly explains the many uses for such a feature: Geo-referencing IPs is, in a nutshell, converting an IP address, perhaps from an incoming web visitor, a log file, a data file, or some other place, into the name of some entity owning that IP address. There are a lot of reasons you may want to geo-reference IP addresses to country, city, etc., such as in simple ad targeting systems, geographic load balancing, web analytics, and many more applications. This is difficult to do efficiently, at least it gives me a bit of brain freeze. In the same post Jeremy nicely explains where to get the geo-rereferncing data, how to load data, and the performance of different approaches for IP address searching. It's a great practical introduction to the subject.
5 0.81728911 166 high scalability-2007-11-27-Solving the Client Side API Scalability Problem with a Little Game Theory
Introduction: Now that the internet has become defined as a mashup over a collection of service APIs, we have a little problem: for clients using APIs is a lot like drinking beer through a straw. You never get as much beer as you want and you get a headache after. But what if I've been a good boy and deserve a bigger straw? Maybe we can use game theory to model trust relationships over a life time of interactions over many different services and then give more capabilities/beer to those who have earned them? Let's say Twitter limits me to downloading only 20 tweets at a time through their API. But I want more. I may even want to do something so radical as download all my tweets. Of course Twitter can't let everyone do that. They would be swamped serving all this traffic and service would be denied. So Twitter does that rational thing and limits API access as a means of self protection. As does Google, Yahoo, Skynet , and everyone else. But when I hit the API limit I think, but hey it
6 0.79722446 1450 high scalability-2013-05-01-Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
7 0.79326677 1513 high scalability-2013-09-06-Stuff The Internet Says On Scalability For September 6, 2013
8 0.77490681 1198 high scalability-2012-02-24-Stuff The Internet Says On Scalability For February 24, 2012
9 0.75489622 1279 high scalability-2012-07-09-Data Replication in NoSQL Databases
10 0.74453074 1330 high scalability-2012-09-28-Stuff The Internet Says On Scalability For September 28, 2012
11 0.72935981 1233 high scalability-2012-04-25-The Anatomy of Search Technology: blekko’s NoSQL database
12 0.72619134 573 high scalability-2009-04-16-Serving 250M quotes-day at CNBC.com with aiCache
13 0.72399944 1637 high scalability-2014-04-25-Stuff The Internet Says On Scalability For April 25th, 2014
14 0.72344971 944 high scalability-2010-11-17-Some Services are More Equal than Others
15 0.71322715 1378 high scalability-2012-12-28-Stuff The Internet Says On Scalability For December 28, 2012
16 0.71191889 538 high scalability-2009-03-16-Are Cloud Based Memory Architectures the Next Big Thing?
17 0.70791572 17 high scalability-2007-07-16-Paper: Guide to Cost-effective Database Scale-Out using MySQL
18 0.69662142 188 high scalability-2007-12-19-How can I learn to scale my project?
19 0.69543165 1313 high scalability-2012-08-28-Making Hadoop Run Faster
20 0.69365406 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs