Cisco CloudCenter: Get the Hybrid IT Advantage

Apache Avro

Definition - What does Apache Avro mean?

Apache Avro is a data serialization and remote procedure call framework which is developed within the Apache Hadoop project where it provides both a serialization format to get persistent data and a wire format for providing communication between Hadoop nodes, as well as connecting client programs to the Hadoop services.

Avro uses the JSON format for defining protocols and data types, as well as serializes data into a compact binary format.

Techopedia explains Apache Avro

Apache Avro is a big data serialization framework which produces data in a compact binary format which does not require code generation or proxy objects.

It is used as a data serialization component for Apache Hadoop. Avro works on the concept of schemas. When Avro data are being read, the schema which was used during the writing of that specific data is always present.

This allows each data set without per-value overheads, which makes the serialization both fast and relatively small in size. And since data and their schema are fully self-describing, this makes it easy to use with dynamic scripting languages.

When the Avro data are stored in a specific file, the schema is also stored with them to be later processed by another program. So if a program reading the data is expecting another schema, then this can easily be resolved since both schemas are present.

Avro provides:

  • A compact and fast binary data format

  • Rich data structures

  • A container file for storing persistent data

  • Remote procedure call (RPC)

  • Integration with dynamic languages

Generation of code is not a requirement for reading or writing data files or to use or implement RPC protocols.

Share this:

Connect with us

Email Newsletter

Join thousands of others with our weekly newsletter

The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.