How to create hadoop cluster by automation using Ansible.

Anirudh Bambhania
4 min readJan 13, 2021

--

In this article we will create an Ansible playbook for configuring the hadoop and start cluster services.

Hadoop :- The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Ansible :- It is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.

Ansible Inventory: First we have to create an ansible inventory file in which we have to give the IP of the target nodes on which we have to install and configure the hadoop cluster. Here group [hadoop-name] means the target node will be the namenode or master node, [hadoop-data] means the target node will be data node or slave node and [hadoop:children] is a group which has both the target nodes.

Ansible Palybook:

  • First we need to have the required software to install the hadoop, for these we require the java jdk and hadoop file which we will copy to the target nodes by copy module and install it with command module. We have used hadoop group that means the task will run on both the target nodes.
  • Next we have used [hadoop-name] group which will be the name node so after installing we need to configure the files of the hadoop. In name node we also need to make a directory so that all the metadata will be stored in that directory. After all the configuration we need to format the directory only once in the name node and finally start the hadoop name node service.
  • Similarly here we have used [hadoop-data] group which will be the data node, so after installing we need to configure the files of the hadoop. In data node we also need to make a directory which we will be shared with the name node. In data node we don’t need to format the directory and finally start the hadoop data node service.
  • So now finally last thing we need to do is create a firewall rule to allow the traffic on port 9001 and enable it permanently.
  • Now Run the playbook
  • As we can see that the playbook has successfully run and both the software has been copied to the target node.
  • Hadoop files are also configured in both the target nodes.
  • Hadoop data node service started
  • Hadoop name node service started
  • Finally the cluster has been setup, as we can see that 1 datanode is availabe

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Thank You

--

--