An introduction to Apache Hadoop | Hadoop Training in Hyderabad

Apache Hadoop is an open source software framework. Its storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project. Being built and used by a global community of contributors and users. It licensed under the Apache License 2.0.

Hadoop invented by Doug Cutting and Mike Cafarella in 2005. It is support distribution for the Nutch search engine project. Doug was working at Yahoo! at the time. He is now Chief Architect of Cloudera, named the project after his son's toy elephant. Cutting's son was 2 years old at the time and beginning to talk. He called his stuffed yellow elephant "Hadoop".

The Apache Hadoop framework following modules is

Hadoop Common:

It contains libraries and utilities.

Hadoop Distributed File System:

A distributed file-system that stores data on the commodity machines. It providing very high bandwidth across the cluster

Hadoop YARN:

A resource-management platform responsible for managing compute resources in clusters. By using them for scheduling of users' applications

Hadoop MapReduce:

Programming model for large-scale data processing

All the modules in Hadoop designed with a fundamental assumption. Those hardware failures are the common software by the framework. Hadoop's MapReduce and HDFS derived from Google's MapReduce and Google File System papers.

The entire Hadoop "platform" now consists of many related projects. As well Apache Pig, Apache Hive, Apache HBase, and others.

The end-users, though MapReduce Java code is common. A programming language can with "Hadoop Streaming" to put in place the "map" and "reduce" parts. Apache Pig and Apache Hive, among other related projects. Expose higher level user interfaces like Pig Latin and a SQL variant. The Hadoop framework itself is written in the Java programming language. Some native code in C and command line utilities are written as shell-scripts.