Overview of HDFS | Hadoop Training in Hyderabad

 
HDFS is an Apache Software Foundation project. a subproject of the Apache Hadoop project. Hadoop is ideal for storing large amounts of data. like terabytes and petabytes uses HDFS as its storage system. HDFS nodes contained within clusters over data files distributed. then access and store the data files as one seamless file system. Access to data files handles in a streaming manner. that applications or commands execute the MapReduce processing model.

HDFS is high-throughput access to large data sets. the primary features of HDFS provide a high-level view.


HDFS has many similarities with other distributed file systems. but is different in several respects. One noticeable difference is HDFS's write-once-read-many model. that relaxes control requirements, simplifies data, and enables high-throughput access.

The attribute of HDFS is viewpoint data rather than moving the data to the application space.

HDFS restricts data writing to one writer at a time. Bytes are always appended to the end of a stream. the byte streams to store in the order written.

HDFS has many goals. Here are some of the most notable

  Fault tolerance by detecting and applying quick automatic recovery

  Data access via MapReduce streaming

  Simple and robust model

 the data close to the processing logic

  Portability across commodity hardware and operating systems

  Scalability to store and process large amounts of data

  Economy by distributing data and processing across clusters of commodity personal computers

  Efficiency by distributing data and process it in parallel nodes data.

  many copies of data and redeploying processing logic in the event of failures

HDFS provides interfaces for applications to move them closer to data located