How to create hadoop cluster by automation using Ansible.

4 min readJan 13, 2021

In this article we will create an Ansible playbook for configuring the hadoop and start cluster services.

Hadoop :- The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Ansible :- It is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.

Ansible Inventory: First we have to create an ansible inventory file in which we have to give the IP of the target nodes on which we have to install and configure the hadoop cluster. Here group [hadoop-name] means the target node will be the namenode or master node, [hadoop-data] means the target node will be data node or slave node and [hadoop:children] is a group which has both the target nodes.

Ansible Palybook:

First we need to have the required software to install the hadoop, for these we require the java jdk and hadoop file which we will copy to the target nodes by copy module and install it with command module. We have used hadoop group that means the task will run on both the target nodes.
Next we have used [hadoop-name] group which will be the name node so after installing we need to configure the files of the hadoop. In name node we also need to make a directory so that all the metadata will be stored in that directory. After all the configuration we need to format the directory only once in the name node and finally start the hadoop name node service.

Similarly here we have used [hadoop-data] group which will be the data node, so after installing we need to configure the files of the hadoop. In data node we also need to make a directory which we will be shared with the name node. In data node we don’t need to format the directory and finally start the hadoop data node service.