Hadoop- The Baby Elephant for Big Data
What is Hadoop?
“Apache software foundation” (ASP) is non-profit organization and a global community of S/W development has developed “Hadoop”,the open source framework which allows distributed processing of large dataset across cluster of computers, using simple programming model.
Hadoop-The true story!
To help people to find relevant information about the content searched in the year of 1990, search engines were getting created. Open source web engines were invented to process the task simultaneously. “Web crawlers” are the programs used by search engines to copy the contents of the pages visited. Later the visited web pages were indexed and downloaded by web search engines.
“Dough Cutting” and “Mike Cafacella.” developed “Nutch” search engines. His son used to use this nomenclature of “Nutch”, for the meal. In 2006 when Dough joined Yahoo with his search engine “Nutch” the web crawler program of “Nutch”, has been named by him as “Hadoop” which was the name Dough’s son’s baby elephant toy. During the same days Google has started using distributed environment for storing and processing of data to get more relevant results with increased fetching speed.
Assumptions behind Development:
Designing of Hadoop framework is based on some interesting assumptions:
“Hardware is made to fail”,is the first assumption behind designing of Hadoop. Latency is the system quality which justifies the time of execution. To improve the latency period, processing the data in the batches will be the only option is the second assumption. It’s been also assumed that HDFS (Hadoop Distributed File system) will handle very large size of data typically from gigabyte to terabyte. Portability is also one of the features that has been considered while designing of the framework.
Hadoop is the best scalable, cost effective, most flexible and reliable framework to manage and analyse big data which. Organizations are switching over to Hadoop for data management. Opportunities are endless. The only thing is needed that passion towards data handling and analytics.
To know some more interesting facts about Hadoop don’t forget to visit http://skillville.in/course-detail%20-%20data-engg.html
References:
https://opensource.com/life/14/8/intro-apache-hadoop-big-data
Hadoop-The beginners guide