Data Lake

Why Trust Techopedia

What Does Data Lake Mean?

A data lake is a centralized storage repository for large volumes of structured and unstructured data. A data lake has a flat architecture and uses object storage to store data.


Data lakes play an important role in helping data scientists visualize and analyze data from disparate data in their native formats. In data science, this is an especially important consideration when the scope of the data — and its uses — may not yet be fully known.

Although data lakes offer strong data access benefits, they require a management component to help users find the most relevant data, understand relationships and integrate heterogeneous data sources. Popular data lake platforms include:

A data lake may also be referred to as a schema-agnostic or schema-less data repository.

Techopedia Explains Data Lake

The data lake architecture is a store-everything approach to big data. Data is not classified when it is stored in the repository and the value of the data is not clear at the outset. When the data is accessed, only then will it be classified and organized for analysis.

Data lakes were developed to promote the accessibility and reuse of data. Hadoop, an open-source framework for processing and analyzing big data, can be used to sift through the data in the repository.

Data Lake vs. Data Swamp

Getting business value out of a data lake has proved to be challenging for some companies because this type of “junk drawer” approach to storage can be difficult to govern.

In response, three emerging architectures seek to minimize the challenges of managing distributed data storage and querying different types of data schemas more effectively: data mesh, data fiber and data lakehouse.

Data mesh – distributes data ownership among teams who know the data and are able to manage it independently without centralized oversight.

Data fiber – standardizes data governance policies for cloud storage, on premises storage and edge devices.

Data lakehouse – combines the flexibility of a data lake with the benefits of a data warehouse in one storage layer.


Related Terms

Margaret Rouse
Technology Expert
Margaret Rouse
Technology Expert

Margaret is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical business audience. Over the past twenty years, her IT definitions have been published by Que in an encyclopedia of technology terms and cited in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine, and Discovery Magazine. She joined Techopedia in 2011. Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages.