What is “BIG DATA”
Big data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
How big is “BIG DATA”
- The big data growth we’ve been witnessing is only natural. We constantly generate data. On Google alone, we submit 40,000 search queries per second. That amounts to 1.2 trillion searches yearly!
- Each minute, 300 new hours of video show up on YouTube. That’s why there’s more than 1 billion gigabytes (1 exabyte) of data on its servers!
- People share more than 500 terabytes of data on Facebook daily. Every minute, users send 31 million messages and view 2.7 million videos.
- The amount of data created each year is growing faster than ever before. By 2020, every human on the planet will be creating 1.7 megabytes of information… each second!
- In only a year, the accumulated world data will grow to 44 zettabytes (that’s 44 trillion gigabytes)! For comparison, today it’s about 4.4 zettabytes.
Types Of Big Data
- Structured :- Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data.
- Unstructured :- Any data with unknown form or the structure is classified as unstructured data. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc.
- Semi-structured :- Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file
Challenges Of Big Data
(i) Volume — The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data.
(ii) Variety — Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications
(iii) Velocity — The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.
(iv) Variability — This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
What is Hadoop? And why many companies use it?
Hadoop, a platform developed by The Apache Software Foundation, is a popular open-source Big Data platform for distributed processing of large datasets across clusters of computers. Each system in Apache Hadoop acts as a storage device and as a computation platform. It is one of the most widely used platforms for developers to build Big Data solutions. It offers easy scalability options from a single system to thousands of machines and uses commodity hardware, which reduces costs for organizations.
1. Range of data sources
The data collected from various sources will be of structured or unstructured form. The sources can be social media, clickstream data or even email conversations. A lot of time would need to be allotted in order to convert all the collected data into a single format. Hadoop saves this time as it can derive valuable data from any form of data. It also has a variety of functions such as data warehousing, fraud detection, market campaign analysis etc.
2. Cost effective
In conventional methods, companies had to spend a considerable amount of their benefits into storing large amounts of data. In certain cases they even had to delete large sets of raw data in order to make space for new data. There was a possibility of losing valuable information in such cases. By using Hadoop, this problem was completely solved. It is a cost-effective solution for data storage purposes. This helps in the long run because it stores the entire raw data generated by a company. If the company changes the direction of its processes in the future, it can easily refer to the raw data and take the necessary steps. This would not have been possible in the traditional approach because the raw data would have been deleted due to increase in expenses.
3. Speed
Every organization uses a platform to get the work done at a faster rate. Hadoop enables the company to do just that with its data storage needs. It uses a storage system wherein the data is stored on a distributed file system. Since the tools used for the processing of data are located on same servers as the data, the processing operation is also carried out at a faster rate. Therefore, you can processes terabytes of data within minutes using Hadoop.
4. Multiple copies
Hadoop automatically duplicates the data that is stored in it and creates multiple copies. This is done to ensure that in case there is a failure, data is not lost. Hadoop understands that the data stored by the company is important and should not be lost unless the company discards it.
Advantages of BiG Data
- Big data allows you to re-develop the products you are selling. Information on what others think about your products -such as through unstructured social networking site text- helps you in product development.
- Big data allows you to test different variations of CAD (computer-aided design) images to determine how minor changes affect your process or product. This makes big data invaluable in the manufacturing process.
- Predictive analysis will keep you ahead of your competitors. Big data can facilitate this by, as an example, scanning and analyzing social media feeds and newspaper reports. Big data also helps you do health-tests on your customers, suppliers, and other stakeholders to help you reduce risks such as default.
- Big data allows you to diversify your revenue streams. Analyzing big data can give you trend-data that could help you come up with a completely new revenue stream.
— — — — — — — — — — — — — — — — — — — — — — — —
#bigdata #hadoop #bigdatamanagement #arthbylw
#vimaldaga #righteducation #educationredefine
#rightmentor #worldrecordholder #arth #linuxworld
#makingindiafutureready #righeudcation