HDF5 (Hierarchical Data Format 5)
HDF5 is a file format and library for storing and managing large scientific datasets. It supports a hierarchical group/dataset structure similar to a filesystem, with datasets of arbitrary dimensions. HDF5 is the standard for satellite imagery, genomics, and physics simulations.
MIME Type
application/x-hdf5
Type
Binary
Compression
Lossless
Advantages
- + Handles datasets from kilobytes to exabytes
- + Hierarchical structure organizes complex data
- + Built-in compression and chunked storage for performance
- + Parallel I/O support for HPC clusters
Disadvantages
- − Complex API with a steep learning curve
- − Not suited for simple tabular data (use Parquet or CSV)
- − File corruption risk with concurrent writes without locks
When to Use .HDF5
Use HDF5 for large scientific datasets, multi-dimensional arrays, and any data requiring hierarchical organization with efficient I/O.
Technical Details
HDF5 files organize data in groups (directories) and datasets (multidimensional arrays) with metadata attributes. It supports chunked storage, compression filters (gzip, LZF, SZIP), and parallel I/O for high-performance computing.
History
The HDF Group (originally at NCSA, University of Illinois) created HDF in the late 1980s. HDF5 was released in 1998 as a complete redesign, and it is now used by NASA, CERN, and the genomics community.