high_scalability high_scalability-2008 high_scalability-2008-397 knowledge-graph by maker-knowledge-mining

397 high scalability-2008-09-28-Product: Happy = Hadoop + Python

meta infos for this blog

Source: html

Introduction: Has a Java only Hadoop been getting you down? Now you can be Happy . Happy is a framework for writing map-reduce programs for Hadoop using Jython. It files off the sharp edges on Hadoop and makes writing map-reduce programs a breeze . There's really no history yet on Happy, but I'm delighted at the idea of being able to map-reduce in other languages . The more ways the better. From the website: Happy is a framework that allows Hadoop jobs to be written and run in Python 2.2 using Jython. It is an easy way to write map-reduce programs for Hadoop, and includes some new useful features as well. The current release supports Hadoop 0.17.2. Map-reduce jobs in Happy are defined by sub-classing happy.HappyJob and implementing a map(records, task) and reduce(key, values, task) function. Then you create an instance of the class, set the job parameters (such as inputs and outputs) and call run(). When you call run(), Happy serializes your job instance and copies it and all accompanyi

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Happy is a framework for writing map-reduce programs for Hadoop using Jython. [sent-3, score-0.319]

2 It files off the sharp edges on Hadoop and makes writing map-reduce programs a breeze . [sent-4, score-0.666]

3 There's really no history yet on Happy, but I'm delighted at the idea of being able to map-reduce in other languages . [sent-5, score-0.23]

4 From the website: Happy is a framework that allows Hadoop jobs to be written and run in Python 2. [sent-7, score-0.377]

5 It is an easy way to write map-reduce programs for Hadoop, and includes some new useful features as well. [sent-9, score-0.271]

6 Map-reduce jobs in Happy are defined by sub-classing happy. [sent-13, score-0.255]

7 HappyJob and implementing a map(records, task) and reduce(key, values, task) function. [sent-14, score-0.068]

8 Then you create an instance of the class, set the job parameters (such as inputs and outputs) and call run(). [sent-15, score-0.593]

9 When you call run(), Happy serializes your job instance and copies it and all accompanying libraries out to the Hadoop cluster. [sent-16, score-0.626]

10 Then for each task in the Hadoop job, your job instance is de-serialized and map or reduce is called. [sent-17, score-0.73]

11 The task results are written out using a collector, but aggregate statistics and other roll-up information can be stored in the happy. [sent-18, score-0.595]

12 Jython modules and Java jar files that are being called by your code can be specified using the environment variable HAPPY_PATH. [sent-20, score-0.513]

13 These are added to the Python path at startup, and are also automatically included when jobs are sent to Hadoop. [sent-21, score-0.53]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hadoop', 0.432), ('happy', 0.39), ('task', 0.263), ('programs', 0.206), ('jobs', 0.18), ('delighted', 0.161), ('usingthe', 0.161), ('job', 0.154), ('instance', 0.144), ('serializes', 0.144), ('sharp', 0.139), ('jar', 0.135), ('path', 0.134), ('dictionary', 0.128), ('edited', 0.128), ('python', 0.125), ('outputs', 0.125), ('writing', 0.113), ('run', 0.111), ('specified', 0.108), ('files', 0.105), ('call', 0.103), ('edges', 0.103), ('collector', 0.101), ('reduce', 0.099), ('inputs', 0.098), ('stored', 0.094), ('parameters', 0.094), ('included', 0.092), ('returned', 0.09), ('runtime', 0.089), ('written', 0.086), ('variable', 0.085), ('copies', 0.081), ('java', 0.081), ('modules', 0.08), ('statistics', 0.076), ('aggregate', 0.076), ('defined', 0.075), ('values', 0.071), ('records', 0.071), ('map', 0.07), ('history', 0.069), ('implementing', 0.068), ('sent', 0.067), ('includes', 0.065), ('release', 0.062), ('startup', 0.06), ('supports', 0.058), ('automatically', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 397 high scalability-2008-09-28-Product: Happy = Hadoop + Python

2 0.24616954 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop

Introduction: Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop. Read more and get the Hadoop distribution from Yahoo

3 0.21179111 601 high scalability-2009-05-17-Product: Hadoop

Introduction: Update 5: Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds and has its green cred questioned because it took 40 times the number of machines Greenplum used to do the same work. Update 4: Introduction to Pig . Pig allows you to skip programming Hadoop at the low map-reduce level. You don't have to know Java. Using the Pig Latin language, which is a scripting data flow language, you can think about your problem as a data flow program. 10 lines of Pig Latin = 200 lines of Java. Update 3 : Scaling Hadoop to 4000 nodes at Yahoo! . 30,000 cores with nearly 16PB of raw disk; sorted 6TB of data completed in 37 minutes; 14,000 map tasks writes (reads) 360 MB (about 3 blocks) of data into a single file with a total of 5.04 TB for the whole job. Update 2 : Hadoop Summit and Data-Intensive Computing Symposium Videos and Slides . Topics include: Pig, JAQL, Hbase, Hive, Data-Intensive Scalable Computing, Clouds and ManyCore: The Revolution, Simplicity and Complexity

4 0.207223 968 high scalability-2011-01-04-Map-Reduce With Ruby Using Hadoop

Introduction: A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java. Fire-Up Your Hadoop Cluster I choose the Cloudera distribution of Hadoop which is still 100% Apache licensed, but has some additional benefits. One of these benefits is that it is released by Doug Cutting , who started Hadoop and drove it’s development at Yahoo! He also started Lucene , which is another of my favourite Apache Projects, so I have good faith that he knows what he is doing. Another benefit, as you will see, is that it is simple to fire-up a Hadoop cluster. I am going to use C

5 0.19859016 414 high scalability-2008-10-15-Hadoop - A Primer

Introduction: Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the Google File System and of MapReduce to process vast amounts of data "Hadoop is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. It enables applications to easily scale out to thousands of nodes and petabytes of data" (Wikipedia) * What platform does Hadoop run on? * Java 1.5.x or higher, preferably from Sun * Linux * Windows for development * Solaris

6 0.17794096 862 high scalability-2010-07-20-Strategy: Consider When a Service Starts Billing in Your Algorithm Cost

7 0.17031954 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems

8 0.14065428 1173 high scalability-2012-01-12-Peregrine - A Map Reduce Framework for Iterative and Pipelined Jobs

9 0.13916145 1313 high scalability-2012-08-28-Making Hadoop Run Faster

10 0.13712347 666 high scalability-2009-07-30-Learn How to Think at Scale

11 0.13194253 1386 high scalability-2013-01-14-MongoDB and GridFS for Inter and Intra Datacenter Data Replication

12 0.12486821 1445 high scalability-2013-04-24-Strategy: Using Lots of RAM Often Cheaper than Using a Hadoop Cluster

13 0.12338401 56 high scalability-2007-08-03-Running Hadoop MapReduce on Amazon EC2 and Amazon S3

14 0.12299868 936 high scalability-2010-11-09-Facebook Uses Non-Stored Procedures to Update Social Graphs

15 0.11685464 650 high scalability-2009-07-02-Product: Hbase

16 0.111424 669 high scalability-2009-08-03-Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2

17 0.10664114 233 high scalability-2008-01-30-How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data

18 0.10282169 1000 high scalability-2011-03-08-Medialets Architecture - Defeating the Daunting Mobile Device Data Deluge

19 0.10149064 448 high scalability-2008-11-22-Google Architecture

20 0.10069351 1112 high scalability-2011-09-07-What Google App Engine Price Changes Say About the Future of Web Architecture

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.119), (1, 0.043), (2, 0.012), (3, 0.019), (4, 0.031), (5, 0.059), (6, 0.127), (7, 0.029), (8, 0.111), (9, 0.081), (10, 0.08), (11, -0.022), (12, 0.141), (13, -0.177), (14, 0.066), (15, -0.112), (16, -0.066), (17, -0.016), (18, -0.041), (19, 0.068), (20, -0.03), (21, 0.012), (22, 0.096), (23, 0.051), (24, -0.004), (25, 0.057), (26, 0.083), (27, -0.024), (28, 0.036), (29, 0.031), (30, 0.065), (31, 0.126), (32, -0.059), (33, 0.029), (34, 0.021), (35, -0.016), (36, -0.09), (37, 0.006), (38, -0.014), (39, -0.022), (40, 0.014), (41, 0.107), (42, -0.103), (43, -0.031), (44, 0.01), (45, 0.005), (46, 0.045), (47, 0.037), (48, -0.0), (49, -0.007)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99653512 397 high scalability-2008-09-28-Product: Happy = Hadoop + Python

2 0.9014259 968 high scalability-2011-01-04-Map-Reduce With Ruby Using Hadoop

3 0.86486131 443 high scalability-2008-11-14-Paper: Pig Latin: A Not-So-Foreign Language for Data Processing

Introduction: Yahoo has developed a new language called Pig Latin that fit in a sweet spot between high-level declarative querying in the spirit of SQL, and low-level, procedural programming `a la map-reduce and combines best of both worlds. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. Pig has just graduated from the Apache Incubator and joined Hadoop as a subproject. The paper has a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. References: Apache Pig Wiki

4 0.85356551 627 high scalability-2009-06-11-Yahoo! Distribution of Hadoop

5 0.7832194 862 high scalability-2010-07-20-Strategy: Consider When a Service Starts Billing in Your Algorithm Cost

Introduction: At Monday's Cloud Computing Meetup , Paco Nathan gave an excellent Getting Started on Hadoop talk ( slides ). I found one of Paco's strategies particularly interesting: consider when a service starts charging in cost calculations. Depending on your use case it may be cheaper to go with a more expensive service that charges only for work accomplished rather than charging for both work + startup time. The example is comparing the cost of running Hadoop on AWS yourself versus using Amazon's prepackaged Hadoop service, Elastic MapReduce (EMR). The thought may have gone through your mind as it did mine that it doesn't necessarily make sense to use Amazon's Hadoop service. Why pay a premium for EMR when Hadoop will run directly on AWS? One reason is that Amazon has made significant changes to Hadoop to make it run more efficiently and easily on AWS. The other more surprising reason is cost. When starting a 500 node Hadoop cluster, for example, you have to wait for all the node

6 0.7366727 414 high scalability-2008-10-15-Hadoop - A Primer

7 0.73425776 669 high scalability-2009-08-03-Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2

8 0.73301417 601 high scalability-2009-05-17-Product: Hadoop

9 0.70505017 1445 high scalability-2013-04-24-Strategy: Using Lots of RAM Often Cheaper than Using a Hadoop Cluster

10 0.69396728 1173 high scalability-2012-01-12-Peregrine - A Map Reduce Framework for Iterative and Pipelined Jobs

11 0.67373627 1313 high scalability-2012-08-28-Making Hadoop Run Faster

12 0.64164722 1265 high scalability-2012-06-15-Stuff The Internet Says On Scalability For June 15, 2012

13 0.60511339 376 high scalability-2008-09-03-MapReduce framework Disco

14 0.59609628 254 high scalability-2008-02-19-Hadoop Getting Closer to 1.0 Release

15 0.59019887 1088 high scalability-2011-07-27-Making Hadoop 1000x Faster for Graph Problems

16 0.5664252 650 high scalability-2009-07-02-Product: Hbase

17 0.56602287 1382 high scalability-2013-01-07-Analyzing billions of credit card transactions and serving low-latency insights in the cloud

18 0.5449931 415 high scalability-2008-10-15-Need help with your Hadoop deployment? This company may help!

19 0.54009145 56 high scalability-2007-08-03-Running Hadoop MapReduce on Amazon EC2 and Amazon S3

20 0.52551115 1512 high scalability-2013-09-05-Paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.165), (2, 0.181), (61, 0.019), (79, 0.11), (85, 0.02), (94, 0.078), (96, 0.309)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.87459427 868 high scalability-2010-07-30-Basho Lives up to their Name With Consistent Smashing

Introduction: For some Friday Fun nerd style, I thought this demonstration from Basho on the difference between single master, sharding, and consistent smashing was really clever. I love the use of safety glasses! And it's harder to crash a server with a hammer than you might think... RecommendedÂ reading: http://labs.google.com/papers/bigtable.html http://research.yahoo.com/project/212

2 0.81554526 348 high scalability-2008-07-09-Federation at Flickr: Doing Billions of Queries Per Day

Introduction: Flickr's lone database guy Dathan Pattishall made his excellent presentation available on how on how Flickr scales its backend to handle tremendous loads. Some of this information is available in Flickr Architecture , but the paper is so good it's worth another read. If you want to see sharding done right, at scale, take a look.

3 0.80765063 281 high scalability-2008-03-18-Database Design 101

Introduction: I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might

4 0.80663538 162 high scalability-2007-11-20-what is j2ee stack

Introduction: I see everyone talk about lamp stack is less than j2ee stack .i m newbie can anyone plz explain what is j2ee stack

same-blog 5 0.79057795 397 high scalability-2008-09-28-Product: Happy = Hadoop + Python

6 0.78598446 422 high scalability-2008-10-17-Scaling Spam Eradication Using Purposeful Games: Die Spammer Die!

7 0.7816906 1528 high scalability-2013-10-07-Ask HS: Is Microsoft the Right Technology for a Scalable Web-based System?

8 0.76679605 117 high scalability-2007-10-08-Paper: Understanding and Building High Availability-Load Balanced Clusters

9 0.73785537 1212 high scalability-2012-03-21-The Conspecific Hybrid Cloud

10 0.73574823 1549 high scalability-2013-11-15-Stuff The Internet Says On Scalability For November 15th, 2013

11 0.73524749 435 high scalability-2008-10-30-The case for functional decomposition

12 0.73439789 1035 high scalability-2011-05-05-Paper: A Study of Practical Deduplication

13 0.73330599 1052 high scalability-2011-06-03-Stuff The Internet Says On Scalability For June 3, 2011

14 0.7068361 828 high scalability-2010-05-17-7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

15 0.69643623 1228 high scalability-2012-04-16-Instagram Architecture Update: What’s new with Instagram?

16 0.68647742 303 high scalability-2008-04-18-Scaling Mania at MySQL Conference 2008

17 0.67690146 1473 high scalability-2013-06-10-The 10 Deadly Sins Against Scalability

18 0.67611825 1221 high scalability-2012-04-03-Hazelcast 2.0: Big Data In-Memory

19 0.66393137 1044 high scalability-2011-05-19-Zynga's Z Cloud - Scale Fast or Fail Fast by Merging Private and Public Clouds

20 0.65850258 229 high scalability-2008-01-29-Building scalable storage into application - Instead of MogileFS OpenAFS etc.