Apache Kafka

Definition - What does Apache Kafka mean?

Apache Kafka is an open-source publish-subscribe message system designed to provide quick, scalable and fault-tolerant handling of real-time data feeds. Unlike traditional enterprise messaging software, Kafka is able to handle all the data flowing through a company, and to do it in near real time.

Kafka is written in Scala and was originally developed by LinkedIn. Since that time, a number of companies have used it to build real-time platforms.

Techopedia explains Apache Kafka

Kafka has many similarities to transaction logs, and it maintains feeds of messages in topics. Producers write data to topics and consumers read from those topics, which are partitioned and replicated across multiple nodes in a distributed system format. Kafka is unique in that it treats each topic partition as a log, and each message in a partition is assigned a unique offset. It retains all messages for a certain amount of time, and consumers are responsible for tracking their location in each log. This differs from previous systems, where brokers were responsible for this tracking, which severely limited the system's ability to scale as the number of consumers increased. This structure allows Kafka to support many consumers and retain large amounts of data with very low overhead.

Kafka can be used:

  • As a traditional message broker
  • For website activity tracking
  • For log aggregation
  • For big data stream processing

Kafka can be used alongside Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data.

Share this:

Connect with us

Email Newsletter

Join thousands of others with our weekly newsletter

The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.