Don't miss an insight. Subscribe to Techopedia for free.

Subscribe
Advertisements

Data Lakehouse

What Does Data Lakehouse Mean?

A data lakehouse is a unified storage architecture that combines the cost benefits of a data lake with the analytic benefits of a data warehouse.

Advertisements

An important purpose of a data lakehouse is to make it easier for machine learning engineers (MLEs) to use the same large data sets for different types of artificial intelligence (AI) workloads.

A data lakehouse architecture has five layers:

  • Ingestion layer – pulls structured and unstructured data from a variety of sources.
  • Storage layer – stores data at rest as storage objects in one layer of the architecture.
  • Metadata layer – used to locate specific storage objects and assign schema on read.
  • Application Programming Integration (API) layer – helps applications understand what data items are required to complete a particular task and how to retrieve them.
  • Consumption layer – provides support for analytics and reporting.

Techopedia Explains Data Lakehouse

A data lakehouse allows the same unified storage layer to be used for multiple purposes — including predictive analytics, prescriptive analytics, deep learning and reporting.

This emerging architecture uses metadata to combine the flexibility of a data lake with the benefits of a data warehouse. Popular data lakehouse vendors include:

Cloudera – this open source, open standards-based data lakehouse is built on Apache Iceberg’s open table format.

Databricks – the Databricks Lakehouse Platform can be delivered and managed as a service on AWS, Microsoft Azure and Google Cloud.

Dremio – provides fully-managed services designed to help customers experiment with using a lakehouse architecture with less TCO.

Snowflake – integrates subject-specific data marts, data warehouses and data lakes into a single source of truth (SSOT) that can be used to power different types of workloads.

Advertisements

Related Terms