[WEBINAR] Bulletproof: How Today's Business Leaders Stay on Top

Apache Kudu

Definition - What does Apache Kudu mean?

Apache Kudu is a member of the open-source Apache Hadoop ecosystem. It is an open-source storage engine intended for structured data that supports low-latency random access together with efficient analytical access patterns. It was designed and implemented to bridge the gap between the widely used Hadoop Distributed File System (HDFS) and HBase NoSQL Database. Although these systems may still prove advantageous, Apache Kudu can cater to many common workloads as it can dramatically simplify their architecture.

Techopedia explains Apache Kudu

Apache Kudu was primarily developed as a project at Cloudera. Most of the contributions to date have been by developers employed by Cloudera. During its release, only convenience binaries were included in Cloudera’s repositories, however it adopted the Apache Software Foundation (ASF) source release process upon joining the incubator. It is specifically designed for use cases that require fast analytics on fast data. It was engineered to take advantage of next-generation hardware and in-memory processing. It lowers query latency significantly for Apache Impala and Apache Spark. It distributes data through columnar storage engine or through horizontal partitioning, then replicates each partition using Raft consensus thus providing low mean-time-to-recovery and low tail latencies.

Though Kudu is a product designed within the context of the Apache Hadoop ecosystem, it also supports integration with other data analytics projects both in and out of the ASF.

Apache Kudu proves to be efficient as it can process real-time analytic workloads across a single storing layer, thus giving the architects flexibility to address a wider variety of use cases without exotic workarounds.

Share this:

Connect with us

Email Newsletter

Join thousands of others with our weekly newsletter

The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.