ao link
Business Reporter
Business Reporter
Business Reporter
Search Business Report
My Account
Remember Login
My Account
Remember Login

Talk data to me: protecting secrets from prying AI ears

Sponsored by BigID

The AI revolution is upon us – but are we prepared? What are the cyber-risks posed by generative AI, and how do we use it safely?

Linked InTwitterFacebook

They say (well, Spider-Man’s Uncle Ben does) that with great power comes great responsibility. And these days, that responsibility includes training your AI on safe-to-use data-sets, which can be a challenge.

 

GPT and OpenAI are amazingly powerful tools – but since it’s AI, you’re going to face the classic moral dilemma: how does it know right from wrong? Black from white from grey? What data is safe to publish or to be trained on, and what should be kept locked down?

 

These new generative AI frameworks (such as large language models, or LLMs) essentially impersonate a human. They’re trained using a large volume of unsupervised data scraped from billions of words from the internet, alongside a smaller set of supervised data labeled by humans. And that data is the key.

 

Because this highlights a new risk vector: training LLMs on client data, on customer data, on copyrighted data, on regulated data – essentially using data outside of the given purpose – can violate consumer privacy and increase the risk of compromising the data you know, and data you don’t.

 

There are a multitude of security and privacy risks that come with this type of innovation. There have already been reports of employees feeding Chat GPT with confidential information (not maliciously, but to boost productivity), investigations into OpenAI over regulatory concerns about data use, and researchers jailbreaking LLMs to get around security hurdles.

 

Consumers’ privacy can be compromised if they’re using an ungoverned set of data: the AI could be trained on private and personal data, regulated data, customer data and more.  Companies can put themselves at risk – if this type of conversational AI is trained on dark data, data it doesn’t know is sensitive, or data it didn’t even know was there in the first place, it can result in breaches of trade secrets, intellectual property, confidential information, financial data and more. 

 

And what makes this even more difficult? Generative AI is now incorporating unstructured data as well. That means your documents, Excel spreadsheets, Slack messages, email, files, pdfs, notes and more – a potential treasure trove of private, sensitive and regulated data. It’s incredibly easy for privacy and security to be compromised here: just look at the sheer volume of unintentional data leaks and insider risks that result in security breaches, and multiply that by… well, by a volume of AI potential.

 

If conversational AI is trained on a sprawl of ungoverned data, where there’s no classification or insight into what data that is, it’ll only amplify risk, accelerate data breaches, compromised accounts and violations of data privacy.

 

But what if you could train LLMs on only the data that’s safe for use? 

 

When generative AI is trained on a dataset that’s been properly governed – data that’s been vetted – there’s much less risk. You can embrace the AI revolution without compromising the security or privacy of your organisation, your employees and your customers.

 

How? Organisations of all shapes and sizes need to first and foremost know their data: what it is, whose it is, how sensitive or regulated it is and where it is. They need data visibility and control across their tech stack, from the data centres to the cloud. They can automatically define which data sets are safe for training based on the data itself, effectively governing the data that goes into your AI input sets.

 

By doing this, you can choose to exclude employee and customer data, proprietary data, or confidential data, for instance – or focus exclusively on non-confidential public data. You can point generative AI to open source data that’s been scanned for vulnerabilities already, keeping it free of secrets and potentially compromising information. 


As AI and ML become more powerful through GPT and open-source training, it’s more important than ever to manage, protect and govern the data that’s sourcing the future. To find out how to get ahead of data security governance for the AI revolution and beyond, visit bigid.com.


Sarah Hospelhorn, Chief Marketing Officer, BigID

Sponsored by BigID
Linked InTwitterFacebook
Business Reporter

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

© 2024, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543