What Does Apache Kafka Mean?
Apache Kafka is an open-source publish-subscribe message system designed to provide quick, scalable and fault-tolerant handling of real-time data feeds. Unlike traditional enterprise messaging software, Kafka is able to handle all the data flowing through a company, and to do it in near real time.
Kafka is written in Scala and was originally developed by LinkedIn. Since that time, a number of companies have used it to build real-time platforms.
Techopedia Explains Apache Kafka
Kafka has many similarities to transaction logs, and it maintains feeds of messages in topics. Producers write data to topics and consumers read from those topics, which are partitioned and replicated across multiple nodes in a distributed system format. Kafka is unique in that it treats each topic partition as a log, and each message in a partition is assigned a unique offset. It retains all messages for a certain amount of time, and consumers are responsible for tracking their location in each log. This differs from previous systems, where brokers were responsible for this tracking, which severely limited the system’s ability to scale as the number of consumers increased. This structure allows Kafka to support many consumers and retain large amounts of data with very low overhead.
Kafka can be used:
- As a traditional message broker
- For website activity tracking
- For log aggregation
- For big data stream processing
Kafka can be used alongside Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data.