Is ChatGPT a disaster for data privacy?

AI & Automation17 Feb 2023

Camilla Winlo at Gemserv asks questions about how this powerful tool really uses the data is pulls from the web

As the dust settles and the novelty of ChatGPT starts to wear off, a few major queries around its use of data have arisen. We know that ChatGPT uses a large language model trained by OpenAI on billions of data points from across the internet, using this data to formulate a response to any question or instruction that a user inputs.

Therefore, ChatGPT’s responses could be fuelled by data scraped, without permission, from any of our digital footprints, including personal websites and even social media posts.

We’ve already seen the fallout of this data collection method from various AI image generators. Just last month, Getty Images kickstarted legal proceedings against Stability AI, claiming that the generator used its database to train its image generation model.

In addition, Clearview AI, a platform which built its facial recognition data base using images scraped from the internet, was consequently served enforcement notices by several data protection regulators last year.

With new AI chatbot iterations currently in development, including the recently released Bard from Google, the risk of data privacy disputes and copyright infringement aimed at conversational AI is pertinent.

Is your data being stolen?

ChatGPT’s large language model requires a huge amount of data. OpenAI originally built the tool using 300 billion words lifted directly from the internet – everything from articles to books, webpages and product reviews.

All of this data was scraped without the original poster’s consent, meaning your personal information could very well have been collected and processed by ChatGPT, now used to converse with strangers.

The company is now worth around $29 billion, yet the individuals and companies that produced the data it scraped from the internet have not been compensated. Even in cases where data is publicly available, ChatGPT has the potential to breach textual integrity, a fundamental legal principle of privacy which ensures that information is not revealed outside of the context it was produced in.

The prompts that a user inputs into ChatGPT can also be a privacy risk, as any sensitive information inadvertently handed over could become public domain. For example, if a legal professional used the tool to draft an agreement or contract, any information included in this content becomes part of ChatGPT’s database and could be included in a response to another user’s prompt.

How does this affect compliance?

In the EU, scraping data points from sites can be a direct breach of the GDPR, the ePrivacy directive, and the Charter of Fundamental Rights. In the US, no federal law regulates the use of personal data within AI models.

However organisations that collect and use data from individuals are required to comply with regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and the Children’s Online Privacy Protection Act (COPPA).

And the California Consumer Privacy Act (CCPA), which covers the state in which many of the world’s tech giants operate out of, enforces many similar privacy requirements to the GDPR.

As of today, ChatGPT offers no method of requesting the removal of data from its database, which is a guaranteed right in accordance with the GDPR and CCPA. Other machine learning developers are working on ways to enable the removal of specific data points, but these are still in the early stages of creation.

There are also major technical problems that arise when removing data from machine learning models if that data has been used to train the model itself, as it can lead to less accurate responses.

The “right to be forgotten” is particularly important in cases where information is inaccurate, biased or misleading, which seems to be a burgeoning threat for ChatGPT. If the tool’s training data includes errors or misinformation, or even if the algorithm used to train it is biased, it can lead to the spread of false information in sensitive areas like politics.

Without the ability to easily remove this data as part of the right to be forgotten, these incomplete or inaccurate outputs could become a much larger problem.

Cyber-criminals and ChatGPT

Another major data privacy risk lies in the nefarious actions of criminals online, who may have found their new favourite toy in ChatGPT. The billions of data points scraped by ChatGPT are now free to use for any number of targeted attacks, including malware, ransomware, phishing, Business Email Compromise (BEC) and social engineering.

ChatGPT’s ability to create instant, realistic-sounding conversations could be an effective tool in drafting phishing emails urging victims to click on malicious links, install malware or give away sensitive information. It makes the process of malicious impersonation a lot easier, allowing cyber-criminals to gain trust with their victims.

ChatGPT can also generate large volumes of automated messages to be used in spam attacks to overwhelm servers, hold sensitive information to ransom or sell it on the dark web.

As the use of these large language models becomes more widespread, it’s never been more vital for companies like OpenAI to find a solution for privacy issues such as the right to be forgotten. Businesses also need to ensure that their teams understand the data privacy ramifications of tools like ChatGPT before they roll them out for use.

Being mindful of these risks, conducting in-depth risk assessments and taking a proactive, rather than reactive, stance to any issues that might arise is the only way to harness a tool like ChatGPT without putting data in danger.

Camilla Winlo is Head of Data Privacy at Gemserv

Main image courtesy of iStockPhoto.com

Is ChatGPT a disaster for data privacy?

Business Reporter Team

You may also like

#BreakTheBias this International Women’s Day

#ShapeTheWorld this International Women in Engineering Day-June 2020

10 micro-trends that will shape the future of marketing technologySPONSORED ARTICLE

Related Articles

Internet harms and the Online Safety Act

FinTechTalk: The future of hyper-personalisation – redefining the customer experience

Reshaping business with AI and the metaverse

SupplyChainTalk: Strengthening supplier relationships and navigating supply chain risks

Related Articles

American View: It is right or reasonable to expect normal performance during a disaster?

Intel to sell majority stake in Altera for $4.46 billion to fund revival effort

Most Viewed

Lucid to acquire select Arizona-based Nikola facilities and assets

Siemens can make more acquisitions after Altair deal, exec tells paper

Volkswagen investment chief steps down from Northvolt board

Exclusive-Samsung delays taking deliveries of ASML chip gear for its new US factory, sources say

Delivery Hero to list Talabat business in Dubai in December

Exclusive-Huawei aims to mass-produce newest AI chip in early 2025, despite US curbs, sources say

Lithuania fines Revolut 3.5 million euros for money-laundering prevention failures

Microsoft plans to invest $80 billion on AI-enabled data centers in fiscal 2025

AI chip firm Cerebras partners with France's Mistral, claims speed record

Meta in talks to raise $35 billion for data center financing led by Apollo, Bloomberg News reports

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

info@business-reporter.co.uk