The Data Lake Survival Guide: The What, Why and How of the Data Lake
In times past, when thinking about digital data, it made sense to segregate data between transactional data, the data captured in business applications, stored in database tables and presented by BI tools, and all other data: emails, web pages, images, video and so on. Nowadays we tend to refer to such “other data” as unstructured data.
Nevertheless it was analyzable and software for deriving value from such data has crossed the chasm. It was that analytical imperative more than anything else which gave rise to the original concept of a data lake, a data store for both species of data and, additionally for data harvested from multiple sources external to the business, some of which was inevitably unstructured.
In this paper, we will examine how the new ecosystem created by the data lake will no longer consist entirely of the transactions (or events) of the business. It will also include data from other sources, which the business uses to perform analytics and inform its users of important information on which decisions can be based. The system of record will be, as it always was, the golden copy of corporate data and the audit trail of the IT activities of the business.