LVM and it’s integration with hadoop

5 min readNov 12, 2020

What is LVM ? Logical Volume management (LVM) is and advance concept of partition of hard disk, it is sometimes also known as elasticity. It is a system of managing logical volumes, or filesystems, that is much more advanced and flexible than the traditional method of partitioning a disk into one or more segments and formatting that partition with a filesystem

LVM is used for the following purposes:

Creating single logical volumes of multiple physical volumes or entire hard disks , allowing for dynamic volume resizing.
Managing large hard disk farms by allowing disks to be added and replaced without downtime or service disruption.
On small systems (like a desktop), instead of having to estimate at installation time how big a partition might need to be, LVM allows filesystems to be easily resized as needed.

Here’s how LVM architecture looks like

Basically LVM has 3 main components

1. PV (Physical Volume)

2. VG (Volume Group)

3. LV (Logical Volume)

Let’s understand each part in detail

PV (Physical Volume) :- Physical Volume means the volume or the hard disk that we have attached in our OS. For PV we can use the existing hard disk partition or use a new hard disk.
VG (Volume Group) :- Volume Group simply means the storage or the hard disk which is entirely made up of PV. A VG can contain n number of PV’s.
LV (Logical Volume) :- Logical Volume is the partition which is made from the VG hard disk.

Let’s see how to create a LV with the help of PV and VG. First we have to look for a PV.

fdsik -l

This command will show us all the volumes attach to our system. Now to create a PV type

pvcreate /dev/sdb
pvcreate /dev/sdc

this will create two new PV. Here /dev/sdb is the name of the hard disk. Now we will create a new VG with the help of these two PV. To create a VG type

vgcreate myvg1 /dev/sdb /dev/sdc

this will create a new VG of name myvg1 and of size same as that of these two PV combine. Technically we have created a new hard disk by combining the PV’s. Now since we have successfully created a VG let’s create a new LV

lvcreate -name mylv1 -size 15G myvg1

this will create a new LV of size 15 GB. This LV is a partition which is made from the VG myvg1. Now we have to format the partition and than we have to mount it.

mkfs.ext4 /dev/myvg1/mylv1
mount /dev/myvg1/mylv1 /abc

These command will format and mount the partition to a folder named abc, you can mount it with any other folder or directory. We can use “ df -hT ” command to verify the partition. This is the basic part that we have completed but the real use case of the LVM is to increase or decrease the size of the partition.

To extend the partition size: -

To extend the partition size of the LV we use the remaining storage of the VG. This will increase the partition size of the LV on the fly, means without disturbing the other processes that are using the partition. To do this type

lvextend -size +5G /dev/myvg1/mylv1

this will extend the size by 5 Gb of mylv1 on the fly. Now since new storage is added to the partition we need to format the partition again. But we can’t format the whole partition again, we only need to format the extra size of the partition so to do this type

resize2fs /dev/myvg1/mylv1

To redcue the partition size :-

To reduce the partition size we first need the unmount the partition because their might be some user who might be using that partition, so we can’t reduce the partition size on the fly. So we follow the steps to reduce the storage

first unmount the partition
than clean the garbage data that is stored in the volume
format it again to size we want to reduce
now reduce the size to which we want to, here ex. 10 Gb. This will reduce the partiton size to 10 Gb the extra 10Gb will be reduced.
mount it again

umount /abc
e2fsk -f /dev/myvg1/mylv1
resize2fs /dev/myvg1/mylv1 10G
lvreduce -size 10G /dev/myvg1/mylv1
mount /dev/myvg1/mylv1 /abc

Integrating LVM with hadoop :-

After knowing about the LVM it is really peice of cake to integrate LVM with hadoop. If you know about hadoop than that’s great but if you don’t know you can refer to my previous article https://medium.com/@bambhaniaanirudh/what-is-big-data-f058f7b3a7a6

Now go to the hadoop directory where we have all the configuration files. Use command to go hadoop directory and list all the files

cd /etc/hadoop
ls

In hadoop the storage is provided by the data node. But sometimes their might be requirement of extra storage because of all the storage might have been used in that data node, and in real life we don’t remove the old hard disk and attach a new one because the cost of purchasing a new hard disk every time it get occupied and the servers are live so users might face error while accessing the data, so instead we use LVM. To acheive this we just need to change the directory of the data node to a directory which is configured as a LVM in this case /abc which we have create.

Now we can increase or decrease the size of the data node storage using the concept of LVM.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Thank you

Anirudh Bambhania