Apache Avro (Row-Based Serialization)

Avro is a row-based data serialization format that embeds its JSON schema within the file. It excels at schema evolution — readers and writers can have different but compatible schemas. Avro is the standard for Kafka message serialization and Hadoop data pipelines.

MIME Type

application/avro

Type

Binary

Compression

Lossless

Advantages

+ Schema evolution — add/remove fields without breaking readers
+ Compact binary encoding with efficient compression
+ Self-describing — schema embedded in the file
+ Standard in Kafka and the Hadoop ecosystem

Disadvantages

− Row-based — less efficient than Parquet for analytical queries
− Not human-readable in binary form
− JSON schema specification has a learning curve

When to Use .AVRO

Use Avro for Kafka message schemas, Hadoop/Spark data pipelines, and any system where schema evolution and compact row storage are priorities.

Technical Details

Avro files contain a JSON schema header followed by binary-encoded data blocks compressed with DEFLATE or Snappy. Schema resolution at read time enables adding, removing, or renaming fields without breaking consumers.

History

Doug Cutting created Avro in 2009 as part of the Hadoop ecosystem. Unlike Thrift and Protocol Buffers, Avro was designed for dynamic schema resolution without code generation.

Convert from .AVRO

.avro → .arrow .avro → .csv .avro → .json .avro → .ndjson .avro → .parquet .avro → .xlsx

Convert to .AVRO

.arrow → .avro .csv → .avro .json → .avro .ndjson → .avro .parquet → .avro .xlsx → .avro

Related Formats

.arrow .bson .geojson .hdf5 .msgpack .ndjson .parquet .protobuf .sqlite