In my last post, I have describe the simplest way of setting up Hadoop 2.5.2 on a fresh Ubuntu Trusty installation. In this post I am going to setup the Hadoop file system (or DataNodes) in my local installation.
First, let us create a DataNode, which will be used to store data on Hadoop’s file system. Typically, multiple DataNodes are used to achieve RAID like features but in this case we will configure only one node. (Note: In RAID, we are replicating the data in multiple disks of the same server, but in Hadoop, the DataNodes can be in multiple servers!)
:/user/local/hadoop$ bin/hdf namenode –format
Typically, DataNode connects to the NameNode and responds to requests coming from them. Once we locate the data nodes, then we can even directly talk with them. In certain scenarios (e.g. data replication), DataNode may also directly communicate with other DataNodes. More on this later, but for now let us focus on our simple use case!
After formatting\getting ready the DataNode, let’s start the daemons related to DataNode and NameNode.
:/user/local/hadoop$ sbin/start-dfs.sh
If everything goes well, you should be able to verify the running daemons by visiting the web interface running on port 50070 of the Ubuntu installation. In my case, the out put was like bellow;
> http://<my_server_IP>:50070/
If by any chance, you are unable to make this running, please check the log files located at logs directory of Hadoop installation (e.g. /usr/local/hadoop/logs)
The web interface can be used to get many useful information on our setup including the configured DataNodes, configuration details, etc… Another good feature of the web interface is the browsing capability of our distributed file system. Click on Utilities and select Browse the file system. Initially, the view should be something like bellow (i.e. empty).
Now let us create some folders via the command line interface.
:/user/local/hadoop$ bin/hdf dfs –mkdir /helloworld
This will create a folder name helloworld on our distributed file system. Let us verify this by refreshing our web based file browser.
Similarly, there are many command available to work with the file system such as directory listing, directory removing, coping files, etc.… Please look at the help pages of hdfs for more information.
We will conclude this post by removing my helloworld directory from the file system. In a future post we will continue our quest of running Map reduce jobs on a local machine.
:/user/local/hadoop$ bin/hdf dfs –rmdir /helloworld
This article on the basic setup of Hadoop File System provides a useful introduction to distributed storage architecture and big data environment configuration. Understanding Hadoop File System setup is very important for students and professionals working with scalable data processing, distributed computing, and enterprise analytics platforms. Learners interested in similar implementation concepts can also explore Big Data Projects to understand how large-scale data processing systems are implemented in real-world environments.
ReplyDeleteDistributed file systems increasingly play a major role in cloud computing and analytics platforms where scalability, fault tolerance, and efficient data handling are essential. Students looking to build advanced data-intensive applications can further refer to Cloud Computing Projects for ideas related to distributed infrastructure, scalable storage systems, and enterprise-level computing environments. This post provides a practical overview of Hadoop file system configuration concepts.
ReplyDelete