Apache Arrow IPC (In-Memory Columnar)

Apache Arrow IPC is a language-agnostic columnar format for in-memory data. It enables zero-copy data sharing between processes and languages (Python, R, C++, Java) without serialization overhead, making it the backbone of modern data processing pipelines.

MIME Type

application/vnd.apache.arrow.file

Type

Binary

Compression

Lossless

Advantages

+ Zero-copy data sharing between languages and processes
+ Optimized for SIMD and vectorized computation
+ Standard memory layout for modern data tools (DuckDB, Polars, Pandas)

Disadvantages

− Not designed for persistent storage — use Parquet for that
− Files are larger than compressed Parquet or CSV
− More complex than CSV for simple data exchange

When to Use .ARROW

Use Arrow for inter-process data exchange, building data processing pipelines, and any scenario requiring zero-copy data sharing.

Technical Details

Arrow files use a fixed-size binary layout with contiguous memory buffers for each column. The IPC format supports streaming (sequential messages) and file (random access with footer) modes. Null values use validity bitmaps.

History

Wes McKinney (creator of Pandas) started Apache Arrow in 2016 to solve the inefficiency of serializing data between tools. Arrow provides a common memory layout adopted by DuckDB, Polars, and Pandas 2.0.

Convert from .ARROW

.arrow → .avro .arrow → .csv .arrow → .json .arrow → .ndjson .arrow → .parquet .arrow → .xlsx

Convert to .ARROW

.avro → .arrow .csv → .arrow .json → .arrow .ndjson → .arrow .parquet → .arrow .xlsx → .arrow

Related Formats

.avro .bson .geojson .hdf5 .msgpack .ndjson .parquet .protobuf .sqlite

Related Terms

Arrow