Note: In case you want to know about setting up Hadoop on a Ubuntu server, please refer to my previous posts. In fact, in this post I assume that you have already read them and working with a similar setup.
1. Install Hadoop 2.5.x on Ubuntu Trusty (14.04.1 LTS)
2. Basic setup of the Hadoop file system
The bellow table lists the example jobs available on Hadoop 2.5.2;
Job Name | Description |
aggregatewordcount | An Aggregate based map/reduce program that counts the words in the input files. |
aggregatewordhist | An Aggregate based map/reduce program that computes the histogram of the words in the input files. |
bbp | A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. |
dbcount | An example job that count the pageview counts from a database. |
distbbp | A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. |
grep | A map/reduce program that counts the matches of a regex in the input. |
join | A job that effects a join over sorted, equally partitioned datasets |
multifilewc | A job that counts words from several files. |
pentomino | A map/reduce tile laying program to find solutions to pentomino problems. |
pi | A map/reduce program that estimates Pi using a quasi-Monte Carlo method. |
randomtextwriter | A map/reduce program that writes 10GB of random textual data per node. |
randomwriter | A map/reduce program that writes 10GB of random data per node. |
secondarysort | An example defining a secondary sort to the reduce. |
sort | A map/reduce program that sorts the data written by the random writer. |
sudoku | A sudoku solver. |
teragen | Generate data for the terasort |
terasort | Run the terasort |
teravalidate | Checking results of terasort |
wordcount | A map/reduce program that counts the words in the input files. |
wordmean | A map/reduce program that counts the average length of the words in the input files. |
wordmedian | A map/reduce program that counts the median length of the words in the input files. |
wordstandarddeviation | A map/reduce program that counts the standard deviation of the length of the words in the input files. |