ao link
Business Reporter
Business Reporter
Business Reporter
Search Business Report
My Account
Remember Login
My Account
Remember Login

Implementing an effective data lakehouse

Linked InTwitterFacebook

Jonny Dixon at Dremio explains how to build and execute a successful data lakehouse strategy

 

The concept of a data lakehouse has become increasingly popular among modern enterprises wishing to combine the benefits of a data warehouse and a data lake. It allows for high-speed data transformation and querying, as well as the consolidation of multi-structured data in flexible object stores.

 

This combination enables support for both business intelligence (BI) and data science workloads whilst simplifying their approach to meeting the ever-growing demand for analytics.

 

While it is still in the early stages of adoption, many companies are starting to see data lakehouses as an efficient way to streamline their architectures, cut costs, and facilitate the governance of self-service analytics.

 

However, many organisations don’t know where to start – and the risk of spending time and money only for it to go wrong is putting many off taking advantage of data lakehouses’ benefits.

 

So, what must organisations do to build and execute the right strategy to modernise their open data stacks?

 

Exploring the benefits of the data lakehouse

The data lakehouse is designed to combine the structure and performance of a data warehouse with the flexibility of a data lake. It is a type of data architecture that leverages data warehouse commands, often in SQL, to query data lake object stores quickly, whether on-premises or in the cloud.

 

Some everyday use cases for a data lakehouse include support for data mesh, a unified access layer for analytics, data warehouse consolidation, data modernisation for hybrid cloud environments, departmental lakehouses, and FinOps program support.

 

The data lakehouse can handle BI and data science workloads by executing queries on relational and multi-structured data stored as files. Some lakehouse platforms offer a semantic layer consolidating virtual views of the underlying physical data, improving communication and client interaction.

 

Enterprises choose to adopt the data lakehouse for two primary advantages. The first is simplified architecture. By performing fast queries directly on the object store within the data lake, enterprises eliminate the need to copy or move data to meet BI performance requirements. This reduces the reliance on data extracts and data warehouses, minimising the effort of managing multiple copies and streamlining costs. It also enhances agility in accommodating changing business needs.

 

The second advantage is workload consolidation. The data lakehouse’s support for both BI and data science workloads enables enterprises to consolidate their workloads, eliminating the need for separate platforms. However, they can still maintain an open architecture and formats to collaborate with other tools.

 

Lakehouse platforms that consolidate data views into a semantic layer further simplify data access, enable self-service capabilities, and facilitate governance. Data analysts and scientists can prototype new analytics approaches without duplicating or moving data. This reduces the workload for data engineers, allowing them to focus on more innovative tasks.

 

Executing the right strategy

To successfully implement a data lakehouse, it is essential to follow a strategic approach. Organisations should define and prioritise the most critical business use cases for the data lakehouse. These could include periodic reporting, interactive reports and dashboards, ad-hoc queries, 360-degree customer views, or artificial intelligence/machine learning (AI/ML) projects.

 

They should identify the use cases that can deliver a "quick win" and prioritise the architectural characteristics required to support them, such as unification, simplicity, accessibility, high performance, cost-effectiveness, governance, or openness.

 

They should also plan and execute the first project that supports their highest priority use case(s). A priority is to assemble the right team of stakeholders, including an executive sponsor, data analyst or scientist, data engineer, architect, and governance manager.

 

Together, they can create and implement a roadmap for incremental changes to their environment. For example, if the focus is to support 360-degree customer views, the team may migrate semi-structured customer data from HDFS to a cloud object store in the target lakehouse.

 

Once the team can demonstrate the business value with the first project and achieve a "quick win," they can gain the budget, executive support, and architectural platform to expand further. This means planning and executing a second project that migrates other functional data, such as finance or supply-chain records, from HDFS to the lakehouse.

 

Alternatively, the team can extend the unified access layer to support legacy databases on-premises. This will enable a data mesh approach where business domain owners can publish data products to self-service users across the organisation. The goal is to proceed with a series of incremental and achievable projects, each demonstrating the data lakehouse’s return on investment (ROI).

 

Organisations continue to realise the value of leveraging unstructured data with AI and machine learning. As data lakehouses represent a more advanced stage of maturity compared to the combined data lake and data warehouse model, this approach is predicted to gain popularity.

 

Over time, lakehouses will continue to close gaps while maintaining simplicity, cost-efficiency, and the ability to serve various data applications. However, having the right strategic approach to implementing the data lakehouse will increase ROI and pave success for future projects.

 


 

Jonny Dixon is Senior Product Manager at Dremio

 

Main image courtesy of iStockPhoto.com

Linked InTwitterFacebook
Business Reporter

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

© 2024, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543