Apache Hadoop Tutorial: 02/21/17

Overview of HDFS | Hadoop Training in Hyderabad

HDFS is an Apache Software Foundation project. a subproject of the Apache Hadoop project. Hadoop is ideal for storing large amounts of data. like terabytes and petabytes uses HDFS as its storage system. HDFS nodes contained within clusters over data files distributed. then access and store the data files as one seamless file system. Access to data files handles in a streaming manner. that applications or commands execute the MapReduce processing model.

HDFS is high-throughput access to large data sets. the primary features of HDFS provide a high-level view.

Overview of HDFS

HDFS has many similarities with other distributed file systems. but is different in several respects. One noticeable difference is HDFS's write-once-read-many model. that relaxes control requirements, simplifies data, and enables high-throughput access.

The attribute of HDFS is viewpoint data rather than moving the data to the application space.

HDFS restricts data writing to one writer at a time. Bytes are always appended to the end of a stream. the byte streams to store in the order written.

HDFS has many goals. Here are some of the most notable

Fault tolerance by detecting and applying quick automatic recovery

Data access via MapReduce streaming

Simple and robust model

the data close to the processing logic

Portability across commodity hardware and operating systems

Scalability to store and process large amounts of data

Economy by distributing data and processing across clusters of commodity personal computers

Efficiency by distributing data and process it in parallel nodes data.

many copies of data and redeploying processing logic in the event of failures

HDFS provides interfaces for applications to move them closer to data located