HDFS is an Apache Software Foundation project. a subproject of the Apache Hadoop project. Hadoop is ideal for storing large amounts of data. like terabytes and petabytes uses HDFS as its storage system. HDFS nodes contained within clusters over data files distributed. then access and store the data files as one seamless file system. Access to data files handles in a streaming manner. that applications or commands execute the MapReduce processing model.
HDFS is high-throughput access to large data sets. the primary features of HDFS provide a high-level view.
HDFS has many similarities with other distributed file systems. but is different in several respects. One noticeable difference is HDFS's write-once-read-many model. that relaxes control requirements, simplifies data, and enables high-throughput access.
The attribute of HDFS is viewpoint data rather than moving the data to the application space.
HDFS restricts data writing to one writer at a time. Bytes are always appended to the end of a stream. the byte streams to store in the order written.
HDFS has many goals. Here are some of the most notable
Fault tolerance by detecting and applying quick automatic recovery
Data access via MapReduce streaming
Simple and robust model
the data close to the processing logic
Portability across commodity hardware and operating systems
Scalability to store and process large amounts of data
Economy by distributing data and processing across clusters of commodity personal computers
Efficiency by distributing data and process it in parallel nodes data.
many copies of data and redeploying processing logic in the event of failures
HDFS provides interfaces for applications to move them closer to data located