Tech moves fast! Stay ahead of the curve with Techopedia!
Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
Apache Kudu is a member of the open-source Apache Hadoop ecosystem. It is an open-source storage engine intended for structured data that supports low-latency random access together with efficient analytical access patterns. It was designed and implemented to bridge the gap between the widely used Hadoop Distributed File System (HDFS) and HBase NoSQL Database. Although these systems may still prove advantageous, Apache Kudu can cater to many common workloads as it can dramatically simplify their architecture.
Apache Kudu was primarily developed as a project at Cloudera. Most of the contributions to date have been by developers employed by Cloudera. During its release, only convenience binaries were included in Cloudera’s repositories, however it adopted the Apache Software Foundation (ASF) source release process upon joining the incubator. It is specifically designed for use cases that require fast analytics on fast data. It was engineered to take advantage of next-generation hardware and in-memory processing. It lowers query latency significantly for Apache Impala and Apache Spark. It distributes data through columnar storage engine or through horizontal partitioning, then replicates each partition using Raft consensus thus providing low mean-time-to-recovery and low tail latencies.
Though Kudu is a product designed within the context of the Apache Hadoop ecosystem, it also supports integration with other data analytics projects both in and out of the ASF.
Apache Kudu proves to be efficient as it can process real-time analytic workloads across a single storing layer, thus giving the architects flexibility to address a wider variety of use cases without exotic workarounds.