Friday, November 28, 2014

Basic setup of the Hadoop file system

In my last post, I have describe the simplest way of setting up Hadoop 2.5.2 on a fresh Ubuntu Trusty installation. In this post I am going to setup the Hadoop file system (or DataNodes) in my local installation.

First, let us create a DataNode, which will be used to store data on Hadoop’s file system. Typically, multiple DataNodes are used to achieve RAID like features but in this case we will configure only one node. (Note: In RAID, we are replicating the data in multiple disks of the same server, but in Hadoop, the DataNodes can be in multiple servers!)

:/user/local/hadoop$ bin/hdf namenode –format

Typically, DataNode connects to the NameNode and responds to requests coming from them. Once we locate the data nodes, then we can even directly talk with them. In certain scenarios (e.g. data replication), DataNode may also directly communicate with other DataNodes. More on this later, but for now let us focus on our simple use case!

After formatting\getting ready the DataNode, let’s start the daemons related to DataNode and NameNode.

:/user/local/hadoop$ sbin/start-dfs.sh

If everything goes well, you should be able to verify the running daemons by visiting the web interface running on port 50070 of the Ubuntu installation. In my case, the out put was like bellow;

> http://<my_server_IP>:50070/ 

image

If by any chance, you are unable to make this running, please check the log files located at logs directory of Hadoop installation (e.g. /usr/local/hadoop/logs)

The web interface can be used to get many useful information on our setup including the configured DataNodes, configuration details, etc… Another good feature of the web interface is the browsing capability of our distributed file system. Click on Utilities and select Browse the file system. Initially, the view should be something like bellow (i.e. empty).

image

Now let us create some folders via the command line interface.

:/user/local/hadoop$ bin/hdf dfs –mkdir /helloworld

This will create a folder name helloworld on our distributed file system. Let us verify this by refreshing our web based file browser.

image

Similarly, there are many command available to work with the file system such as directory listing, directory removing, coping files, etc.… Please look at the help pages of hdfs for more information.

We will conclude this post by removing my helloworld directory from the file system. In a future post we will continue our quest of running Map reduce jobs on a local machine.

:/user/local/hadoop$ bin/hdf dfs –rmdir /helloworld

Technorati Tags: ,,,

No comments:

Post a Comment