Data Lake Management – Spotlight

Data Lake Management – Spotlight

Report Author(s): Philip Howard,Daniel Howard

Part of the reason data lakes have become popular is that setting up a basic data lake is inexpensive and easy: all you really need is some spare hardware and Hadoop and you’re off to the races. Unfortunately, without additional software, such lakes are likely to fail when used for anything substantial due to lack of effective processes. This is exacerbated by the open source nature of the data lake community: many software offerings available are open source, and therefore cheap to try out. However, in part because of the proliferation of open source software on the data lake, there are no pre-packaged, one-size-fits-all solutions available. This makes it difficult to build a truly effective data lake, as a suite of mostly open source solutions must be assembled manually to address a variety of issues. This paper discusses these issues and how they might be addressed. A companion paper to this – a Market Update on Data Lake Management – discusses the solutions provided by a range of vendors, in order to prevent your lake turning into a swamp.


Bloor Research

Bloor is an independent research and analyst house focused on the idea that Evolution is Essential to business success and ultimately survival. For nearly 30 years we have enabled businesses to understand the potential offered by technology and choose the optimal solutions for their needs.