Apache Arrow IPC (In-Memory Columnar)
Apache Arrow IPC is a language-agnostic columnar format for in-memory data. It enables zero-copy data sharing between processes and languages (Python, R, C++, Java) without serialization overhead, making it the backbone of modern data processing pipelines.
MIME Type
application/vnd.apache.arrow.file
Type
Binary
Compression
Lossless
Advantages
- + Zero-copy data sharing between languages and processes
- + Optimized for SIMD and vectorized computation
- + Standard memory layout for modern data tools (DuckDB, Polars, Pandas)
Disadvantages
- − Not designed for persistent storage — use Parquet for that
- − Files are larger than compressed Parquet or CSV
- − More complex than CSV for simple data exchange
When to Use .ARROW
Use Arrow for inter-process data exchange, building data processing pipelines, and any scenario requiring zero-copy data sharing.
Technical Details
Arrow files use a fixed-size binary layout with contiguous memory buffers for each column. The IPC format supports streaming (sequential messages) and file (random access with footer) modes. Null values use validity bitmaps.
History
Wes McKinney (creator of Pandas) started Apache Arrow in 2016 to solve the inefficiency of serializing data between tools. Arrow provides a common memory layout adopted by DuckDB, Polars, and Pandas 2.0.