high_scalability high_scalability-2009 high_scalability-2009-707 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Save 25% on Hadoop Conference Tickets Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City. Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About.com, and other companies. Readers get a 25% discount if you register b
sentIndex sentText sentNum sentScore
1 Save 25% on Hadoop Conference Tickets Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2. [sent-1, score-0.119]
2 Now, there's going to be a conference dedicated to learning more about Hadoop. [sent-3, score-0.163]
3 Hadoop World, as it's being called, will be the first Hadoop event on the east coast. [sent-5, score-0.087]
4 Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! [sent-6, score-0.136]
5 Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. [sent-7, score-0.187]
6 In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. [sent-8, score-0.34]
7 In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About. [sent-9, score-0.452]
8 Readers get a 25% discount if you register by Sept. [sent-11, score-0.169]
9 Essential storage tradeoff: Simple Reads vs. [sent-16, score-0.12]
10 Data in denormalized chunks is easy to read and complex to write. [sent-18, score-0.188]
11 Kickfire uses column-oriented storage and execution to address I/O bottlenecks and FPGA-based data-flow architecture to address processing and memory bottlenecks. [sent-20, score-0.302]
12 A DBMS that is optimized for compression through and through--especially with a query executor that features just in time decompression will not just reduce IO and storage overhead, but also offer better query performance with lower CPU resource utilization. [sent-22, score-0.669]
13 Some perspective to this DIY storage server mentioned at Storagemojo by by Joerg Moellenkamp. [sent-31, score-0.12]
wordName wordTfidf (topN-words)
[('decompression', 0.244), ('hadoop', 0.196), ('squid', 0.19), ('discount', 0.169), ('varnish', 0.164), ('conference', 0.163), ('joerg', 0.146), ('storagemojoby', 0.146), ('executor', 0.137), ('parallelismby', 0.137), ('sessions', 0.136), ('diy', 0.131), ('eharmony', 0.131), ('databasesby', 0.126), ('supermicro', 0.126), ('doug', 0.122), ('bradford', 0.122), ('presenters', 0.122), ('storage', 0.12), ('traction', 0.119), ('allen', 0.116), ('bryan', 0.113), ('dare', 0.111), ('vertica', 0.109), ('denormalized', 0.105), ('imply', 0.105), ('friday', 0.104), ('morning', 0.102), ('extensions', 0.102), ('cloudera', 0.101), ('creator', 0.1), ('denormalization', 0.097), ('tradeoff', 0.095), ('addition', 0.094), ('analytic', 0.094), ('daniel', 0.094), ('address', 0.091), ('hotel', 0.091), ('overcome', 0.089), ('october', 0.089), ('suck', 0.089), ('tracks', 0.088), ('adopt', 0.087), ('east', 0.087), ('cutting', 0.086), ('breaks', 0.085), ('query', 0.084), ('chunks', 0.083), ('prior', 0.083), ('movement', 0.082)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 707 high scalability-2009-09-17-Hot Links for 2009-9-17
Introduction: Save 25% on Hadoop Conference Tickets Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City. Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About.com, and other companies. Readers get a 25% discount if you register b
2 0.15572153 74 high scalability-2007-08-23-Product: Varnish
Introduction: Varnish is a state-of-the-art, high-performance HTTP accelerator. Varnish is targeted primarily at the FreeBSD 6 and Linux 2.6 platforms, and will take full advantage of the virtual memory system and advanced I/O features offered by these operating systems. Varnish was written from the ground up to be a high performance caching reverse proxy. Squid is a forward proxy that can be configured as a reverse proxy. Besides - Squid is rather old and designed like computer programs where supposed to be designed in 1980. Varnish is reported to be 10x-20x faster than Squid on the same hardware.
3 0.13684914 1321 high scalability-2012-09-12-Using Varnish for Paywalls: Moving Logic to the Edge
Introduction: This is a guest post from Per Buer , founder and CEO of Varnish Software , provider of Varnish Cache, an open source web application accelerator freely available at varnish-cache.org . Varnish powers a lot of really big websites worldwide. We at Varnish Software are all about speed. Varnish Cache is built for speed. It executes its policy code more or less a thousand times faster than your typical Java or PHP based application servers, mostly due to the fact that the configuration is compiled into system call free machine code. System calls require expensive context switches, stall the CPU and wreck havoc in the CPU cache so avoiding them makes the code fly. There are strong limitations on what kind of logic you can move into Varnish Cache, but the logic that you do move there will run very fast. An example is using Varnish for access control to serve access controlled content from the caching edge layer. The Varnish Paywall Who gets to access your content? In a tradi
4 0.1298088 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop
Introduction: Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop. Read more and get the Hadoop distribution from Yahoo
Introduction: Who's Hiring? An exciting opportunity for a Software Engineer to join Apple 's Messaging Services team. We build the cloud systems that power some of the busiest applications in the world. You'll have the opportunity to explore a wide range of technologies, developing the server software that is driving the future of messaging and mobile services. To apply please visit this URL . Two Sigma is building our next generation research environment, and we're looking for a functional programmer with a passion for distributed computing. We're scaling machine learning and operations research to tens of thousands of CPUs. Please send qualifications to buildstuff@twosigma.com . Blurocket is looking for smart and fun people to build its next generation ecommerce platform. If creating scalable services is in your DNA, let us know! (Salary $250k+). Apply over at StackOverflow . LogicMonitor is looking for a Front End developer to have a huge impact, be valued, realize the
6 0.12495589 694 high scalability-2009-09-04-Hot Links for 2009-9-4
7 0.12038124 601 high scalability-2009-05-17-Product: Hadoop
8 0.115421 996 high scalability-2011-02-28-A Practical Guide to Varnish - Why Varnish Matters
10 0.11250932 669 high scalability-2009-08-03-Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2
11 0.1117495 662 high scalability-2009-07-27-Handle 700 Percent More Requests Using Squid and APC Cache
12 0.11130779 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems
16 0.10882422 1254 high scalability-2012-05-30-Strategy: Get Servers for Free and Make Users Happy by Turning on Compression
18 0.10835008 811 high scalability-2010-04-16-Hot Scalability Links for April 16, 2010
topicId topicWeight
[(0, 0.177), (1, 0.009), (2, 0.012), (3, 0.02), (4, 0.026), (5, 0.086), (6, -0.025), (7, -0.006), (8, 0.028), (9, 0.046), (10, 0.01), (11, -0.066), (12, 0.09), (13, -0.047), (14, -0.01), (15, -0.013), (16, 0.009), (17, -0.005), (18, 0.011), (19, 0.02), (20, -0.039), (21, 0.09), (22, 0.085), (23, 0.018), (24, 0.024), (25, 0.012), (26, 0.026), (27, -0.02), (28, -0.014), (29, 0.003), (30, 0.059), (31, 0.037), (32, -0.004), (33, 0.011), (34, 0.019), (35, 0.051), (36, -0.055), (37, 0.078), (38, -0.019), (39, -0.037), (40, -0.056), (41, 0.045), (42, -0.05), (43, -0.072), (44, -0.016), (45, 0.023), (46, -0.038), (47, 0.005), (48, 0.017), (49, 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.94535905 707 high scalability-2009-09-17-Hot Links for 2009-9-17
Introduction: Save 25% on Hadoop Conference Tickets Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City. Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About.com, and other companies. Readers get a 25% discount if you register b
2 0.74237329 443 high scalability-2008-11-14-Paper: Pig Latin: A Not-So-Foreign Language for Data Processing
Introduction: Yahoo has developed a new language called Pig Latin that fit in a sweet spot between high-level declarative querying in the spirit of SQL, and low-level, procedural programming `a la map-reduce and combines best of both worlds. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. Pig has just graduated from the Apache Incubator and joined Hadoop as a subproject. The paper has a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. References: Apache Pig Wiki
3 0.72857368 851 high scalability-2010-07-02-Hot Scalability Links for July 2, 2010
Introduction: What says 4th of July like Nathan's ultimate scalable hot dog eating contest? This totally requires a scale-up strategy. Facebook at 60,000 servers and counting. Deepak Singh has collected some impressive massive data stats on extreme Hadoop usage: Facebook : 36 PB of uncompressed data, 2250 machines, 23,000 cores, 32 GB of RAM per machine, processing 80-90TB/day; Yahoo : 70 PB of data in HDFS, 170 PB spread across the globe, 34000 servers, Processing 3 PB per day, 120 TB flow through Hadoop every day; Twitter : 7 TB/day into HDFS; LinkedIn: 120 Billion relationships; 82 Hadoop jobs daily (IIRC); 16 TB of intermedia data. Who knew DevOps could be so funny? Adam Jacob, CTO of Opscode, gave a hilarious talk at the Velocity conference on the true nature of DevOps. Warning: your neck may get sore from nodding in agreement so much and your belly may ache from laughing so much. Pig at LinkedIn . Not your average article: For me, understanding my work over the last year b
4 0.71406674 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop
Introduction: Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop. Read more and get the Hadoop distribution from Yahoo
5 0.68864059 968 high scalability-2011-01-04-Map-Reduce With Ruby Using Hadoop
Introduction: A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java. Fire-Up Your Hadoop Cluster I choose the Cloudera distribution of Hadoop which is still 100% Apache licensed, but has some additional benefits. One of these benefits is that it is released by Doug Cutting , who started Hadoop and drove it’s development at Yahoo! He also started Lucene , which is another of my favourite Apache Projects, so I have good faith that he knows what he is doing. Another benefit, as you will see, is that it is simple to fire-up a Hadoop cluster. I am going to use C
6 0.67973846 601 high scalability-2009-05-17-Product: Hadoop
7 0.67429173 1265 high scalability-2012-06-15-Stuff The Internet Says On Scalability For June 15, 2012
8 0.67015535 1173 high scalability-2012-01-12-Peregrine - A Map Reduce Framework for Iterative and Pipelined Jobs
10 0.63209438 1076 high scalability-2011-07-08-Stuff The Internet Says On Scalability For July 8, 2011
11 0.62051123 1445 high scalability-2013-04-24-Strategy: Using Lots of RAM Often Cheaper than Using a Hadoop Cluster
12 0.61245984 1414 high scalability-2013-03-01-Stuff The Internet Says On Scalability For February 29, 2013
13 0.60804218 883 high scalability-2010-08-20-Hot Scalability Links For Aug 20, 2010
14 0.60477692 397 high scalability-2008-09-28-Product: Happy = Hadoop + Python
15 0.60032088 848 high scalability-2010-06-25-Hot Scalability Links for June 25, 2010
16 0.59518087 819 high scalability-2010-04-30-Hot Scalability Links for April 30, 2010
17 0.5949648 647 high scalability-2009-07-02-Hypertable is a New BigTable Clone that Runs on HDFS or KFS
18 0.59139526 1382 high scalability-2013-01-07-Analyzing billions of credit card transactions and serving low-latency insights in the cloud
19 0.58594865 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release
20 0.58501333 1174 high scalability-2012-01-13-Stuff The Internet Says On Scalability For January 13, 2012
topicId topicWeight
[(1, 0.234), (2, 0.126), (10, 0.068), (30, 0.072), (47, 0.032), (61, 0.065), (77, 0.03), (79, 0.081), (85, 0.077), (98, 0.124)]
simIndex simValue blogId blogTitle
same-blog 1 0.94443977 707 high scalability-2009-09-17-Hot Links for 2009-9-17
Introduction: Save 25% on Hadoop Conference Tickets Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City. Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About.com, and other companies. Readers get a 25% discount if you register b
2 0.89808714 420 high scalability-2008-10-15-Tokyo Tech Tsubame Grid Storage Implementation
Introduction: This Sun BluePrint article describes the storage architecture of the Tokyo Institute of Technology TSUBAME grid. The Tokyo Institute of Technology is of the world's leading technical institutes, and recently created the fastest supercomputer in Asia, and one of the largest supercomputers outside of the United States. By deploying Sun Fire x64 servers and data servers in a grid architecture, Tokyo Tech built a cost-effective and flexible supercomputer consisting of hundreds of systems, thousands of processors, terabytes of memory and a petabyte of storage that supports users running common off-the-shelf applications. This is the second of a three-article series. It describes the steps to install and configuring the Lustre file system within the storage architecture.
3 0.88965791 1014 high scalability-2011-03-31-8 Lessons We Can Learn from the MySpace Incident - Balance, Vision, Fearlessness
Introduction: A surprising amount of heat and light was generated by the whole Micrsoft vs MySpace discussion. Why people feel so passionate about this I'm not quite sure, but fortunately for us, in the best sense of the web, it generated an amazing number of insightful comments and observations. If we stand back and take a look at the whole incident, what can we take a way that might help us in the future? All computer companies are technology companies first. A repeated theme was that you can't be an entertainment company first. You are a technology company providing entertainment using technology. The tech can inform the entertainment side, the entertainment side drives features, but they really can't be separated. An awesome stack that does nothing is useless. A great idea on a poor stack is just as useless. There's a difficult balance that must be achieved and both management and developers must be aware that there's something to balance. All pigs are equal . All business f
4 0.8831957 624 high scalability-2009-06-10-Hive - A Petabyte Scale Data Warehouse using Hadoop
Introduction: This post about using Hive and Hadoop for analytics comes straight from Facebook engineers. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook's Lexicon product. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook. Read the rest of the article on Engineering @ Facebook's Notes page
5 0.87820381 824 high scalability-2010-05-06-Going global on EC2
Introduction: Since its inception, Amazon EC2 has enabled companies to run highly scalable infrastructure with minimal overhead. Over the years, Amazon Web Services has expanded with new offerings and additional regions around the world. All this growth has made establishing a global footprint easier than ever. And yet, most EC2 customers still choose to operate in a single region. While this is fine for many applications, customers with significant web infrastructure are depriving users of drastically improved performance. Deploying infrastructure in EC2's new regions cuts out one of the biggest sources of latency: distance. In this post , I describe how Bizo significantly reduced load times by implementing Global Server Load Balancing (GSLB) to distribute traffic across all Amazon regions. Click here to read more on Bizo's dev blog
7 0.87645209 1082 high scalability-2011-07-18-New Relic Architecture - Collecting 20+ Billion Metrics a Day
8 0.87359703 688 high scalability-2009-08-26-Hot Links for 2009-8-26
9 0.87253571 788 high scalability-2010-03-04-How MySpace Tested Their Live Site with 1 Million Concurrent Users
10 0.87196696 466 high scalability-2008-12-16-Facebook is Hiring
11 0.87150139 425 high scalability-2008-10-22-Scalability Best Practices: Lessons from eBay
12 0.87140822 918 high scalability-2010-10-12-The CIO’s Problem: Cloud “Mess” or Cloud “Mash”
13 0.86947894 1557 high scalability-2013-12-02-Evolution of Bazaarvoice’s Architecture to 500M Unique Users Per Month
14 0.86841786 1579 high scalability-2014-01-14-SharePoint VPS solution
15 0.86770439 570 high scalability-2009-04-15-Implementing large scale web analytics
16 0.8664993 1115 high scalability-2011-09-14-Big List of Scalabilty Conferences
17 0.86532146 603 high scalability-2009-05-19-Scaling Memcached: 500,000+ Operations-Second with a Single-Socket UltraSPARC T2
18 0.86321002 755 high scalability-2009-12-28-Zynga Needs a Server-side Systems Engineer
19 0.86311001 617 high scalability-2009-06-04-New Book: Even Faster Web Sites: Performance Best Practices for Web Developers
20 0.86270374 1482 high scalability-2013-06-26-Leveraging Cloud Computing at Yelp - 102 Million Monthly Vistors and 39 Million Reviews