Saturday, May 18, 2024
No menu items!
HomeCloud ComputingWhat is a data lake? Massively scalable storage for big data analytics

What is a data lake? Massively scalable storage for big data analytics

In 2011, James Dixon, then CTO of the business intelligence company Pentaho, coined the term data lake. He described the data lake in contrast to the information silos typical of data marts, which were popular at the time: 

If you think of a data mart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

Data lakes have evolved since then, and now compete with data warehouses for a share of big data storage and analytics. Various tools and products support faster SQL querying in data lakes, and all three major cloud providers offer data lake storage and analytics. There’s even the new data lakehouse concept, which combines governance, security, and analytics with affordable storage. This article is a high dive into data lakes, including what they are, how they’re used, and how to ensure your data lake does not become a data swamp.

To read this article in full, please click here

InfoWorld Cloud ComputingRead More



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments