Chris Royles at Cloudera explains how to ensure that your data lake house doesn’t become a data swamp
The era of hybrid cloud is upon us. As the new de facto operating model for IT, it’s helping to power organisations as diverse as retailers and healthcare providers, banks and utilities.
An estimated 82% of enterprises have now adopted hybrid cloud, with most splitting their data and workloads between on-premises and public/private cloud environments.
Without the right data architecture in place, firms can quickly become bogged down, unable to extract insights from the huge quantities of structured and unstructured data residing in the distributed enterprise.
It also introduces new risks, especially in governance and compliance.
Navigating data governance and compliance
We’ve moved to a new epoch of data, where huge volumes are distributed across the enterprise, from on-premises servers out to the public cloud and network edge. It’s forecast that 120ZB of data will be created, captured, copied, and consumed this year – a 24% increase from 2022.
Making sense of all of this data for critical business decision-making could be the difference between success and failure for today’s companies. The first step to generating this insight is managing it efficiently across the entire lifecycle: ingest, prepare, analyse, predict and publish.
But beyond forward-thinking planning and strategy, there’s another critical, more retrospective requirement for business success: effective governance. This too demands lifecycle management, especially around the “analyse” phase and particularly for companies in highly regulated industries like telecoms and financial services.
If data is poorly managed, it can quickly submerge companies in a swamp of poor decision-making and compliance risk.
Having flexibility when managing this huge quantity of data is vital – not only because of surging data volumes and highly distributed computing environments, but also because of the volatility and complexity of the compliance landscape. Gartner predicts that by the end of this year, two-thirds of the world’s population will be governed by “modern” privacy regulations like the GDPR.
Data privacy for citizens and customers is important in maintaining trust and building long term sustainable relationships. So when considering privacy and data protection laws that come into force, each can be slightly different. Many apply globally. Some even carry criminal liabilities for serious infractions.
The goalposts also keep moving with new court rulings, adding further complexity. The saga of data transfers between Europe and the US has seen multiple new policies applied and then struck down again by judges. That’s not to mention the added layer of industry-specific data protection and privacy regulations such as HIPAA in the US.
In short, companies must ensure they have the right governance rules in place today to comply with this patchwork of regulations and legislation, and the agility to respond to changes in the future.
This isn’t easy in many large organisations, which may have invested in data infrastructure from multiple providers, each with a different way of handling security and governance. Nor is centralisation necessarily the answer, given the various global privacy legislations that firms must comply with.
Avoiding a data swamp
Instead, organisations should be looking to migrate their data into a Data Fabric, that supports global security, policy management and governance across multiple open data lakehouse’s where assets – such as documents, web pages, applications or databases – are still able to reside in multiple clouds.
The unified platform approach means security and governance are handled consistently. But there is enough flexibility to apply governance rules and security controls individually to each cloud, according to localised industry, jurisdictional and customer requirements. The governance and security applied to a US public cloud instance may be markedly different from that applied to an on-premises healthcare cloud in the UK, or a European public cloud, for example.
Such an architecture can automatically detect and catalogue sensitive data across the company, wherever it resides, and apply the appropriate controls, depending on policy.
These might include user access controls, encryption at rest and in transit, and data classification, lineage, modelling and auditing. The system can be dynamically updated when these requirements change, for example by replicating or accessing in-place data and workloads in different clouds/locations.
Most importantly, all of the above can be done seamlessly without impacting critical business processes, ensuring organisations can still derive critical insight whilst minimising compliance risk.
And those same machine learning capabilities used to generate this insight can be deployed to unearth intelligence for compliance reporting – no matter where it resides or in what format.
Consistency across clouds
The bottom line is that effective governance will help to uncover business value and demonstrate compliance. But only if it’s treated proactively and not as an afterthought.
Given the sheer volume of data generated by modern enterprises, and the mind-boggling number of new privacy regulations that are themselves constantly in flux, consistency is key.
The way to avoid a siloed approach to security and governance is via a single data architecture. Ultimately, it will free organisations from the data swamp so they can fulfil their potential.
Chris Royles is EMEA Field CTO at Cloudera
Main image courtesy of iStockPhoto.com
© 2025, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543