ao link
Business Reporter
Business Reporter
Business Reporter
Search Business Report
My Account
Remember Login
My Account
Remember Login

Managing data bias in AI

Matthieu Jonglez at  Progress explores the dangers of data bias and ways to mitigate it

  

In our quest to comprehend the world around us, humans instinctively seek patterns when faced with insufficient data. The tendency toward data bias has ramifications for individuals and the technologies adopted and deployed and casts a shadow over our understanding of businesses and organisations.

 

Large-scale data bias is particularly alarming, given the widespread integration of technologies like machine learning, AI and other data platforms used by hundreds of millions of users worldwide. The potential fallout from these errors is far reaching.

 

The rise in adoption of AI is undisputed – according to the latest McKinsey Global Survey on AI, 65 percent of respondents report that their organisations are regularly using generative AI, nearly double the percentage from their previous survey just ten months before. But despite the advantages for speed and automation offered by AI, there’s a hidden issue that is often overlooked, an unwanted bias in data.

 

This bias in organisational data caused by AI processes can, if left unchecked, cause issues for sales, hiring, customer experience and more. In fact, a Progress research report has revealed that almost two thirds (65%) of organisations suffer from data bias. 

  

With only 13% of surveyed organisations having a clear strategy to address this, business leaders need to understand the risks and scale of data bias in order to eliminate the problem. By doing so, they could minimise risk, make better decisions, advance market opportunities, become an attractive employer for tech talent, and improve company reputation. 

  

What is data bias? 

With AI, the algorithms are only as good as the data used to create them. Flawed or biased data sets build incorrect assumptions and decisions. Data bias is the term for when a machine gives a certain set of outputs to one defined group and a different set of outputs to another, typically in line with historical human biases like race, disability, gender, sex, age, or nationality. 

  

Biased data can include flawed data sets, blind assumptions, automated testing protocols that are not appropriately inclusive, and models that wrongly discriminate against under-represented groups. 

  

Business and societal implications of bias 

Unmonitored AI trained on flawed data will begin to reflect human biases in day-to-day operations and these biased decisions can negatively impact on all areas of the business. Bias can arise in processes such as online hiring, predictive modelling, and credit assessment.

 

As a by-product, unfair bias and stereotyping leads to challenges in customer acquisition and poor customer experiences, that may even be reputation-damaging. It’s also likely to have negative impacts on inclusion and diversity efforts. 

  

Assessing its impact on everyday life, data bias has also developed in everyday technologies like facial recognition, digital accessibility, search engines, and beyond. Victims of data bias can also suffer adverse outcomes resulting from intrinsically biased AI algorithms. As many as three quarters (76%) of organisations surveyed by Progress believe wider societal impacts are likely if enterprises do not adequately address data bias. 

  

Main causes of data bias 

Some factors that influence development of data bias include: 

  

Selective information capture - When commencing projects at the point of data creation, the user may capture what they *thought* was relevant and important, which may be very different from what the organisation or project needs. 

  

Cultural or personal influence - When data is collected and used in the training of machine learning models, the models inherit the mood, personal and cultural bias of the human tech experts building them. This is such that a data scientist could use data that ignores large groups of people or intentionally omits specific sets of data. Unconscious bias can creep in at any point during the AI and app development lifecycle. 

  

Data collection purpose – A project can inherit the scope of the data collection, i.e., if data is collected for one purpose but then is not fully understood or re-used for another purpose, that scope may be misconstrued and assumptions for that original data collection become a form of bias. 

  

Accessibility bias – Bias can exist in the design of the build that excludes certain groups from understanding the true meaning of a communication. For instance, challenges for those with disabilities through experiencing language complexity in everyday apps and websites. 

  

Barriers to addressing data bias 

The biggest barriers to addressing the issue are identified as lack of awareness of potential biases, understanding how to identify bias as well as the lack of access to expert resources, such as data scientists. 

  

Greater understanding is needed around the training, processes and technology needed to tackle data bias successfully. In fact, 51% of organisations surveyed by Progress consider lack of awareness and understating of bias as a barrier to addressing it and many do not know where to start to address the problem. 

    

Strategies to mitigate data bias

Some fundamental considerations for organisations to tackle data bias include:

 

A robust data bias policy - One of the first steps should be to appoint an effective leader who can take a holistic view of data bias across the organisation and drive policies for change. In fact, 76% of respondents agreed that data bias was best tackled centrally across the organisation rather than in siloed departments. The Chief Information Officer/ Chief Technology Officer is ideal to lead data bias initiatives, but a Chief Data Officer or COO would also be suitable.

 

Giving AI the right data – It’s necessary to provide AI with diverse data sets - unbiased data delivers unbiased results. Companies typically have mountains of structured and unstructured data ranging from Excel files to financial reports layered throughout their operation. By feeding AI a wide range of diverse data, the machine can use all of it to make the best, unbiased decision. Businesses can train AI algorithms to avoid culture, age, and gender-based bias by providing as many data points as possible about a subject, in theory making a resulting answer more “right”.

 

Tools for transparency - Achieving this requires an agile, transparent, rules-based data platform where data can be ingested, harmonised and curated for the AI tool. Transparency is the antidote to bias. Data lineage features allow the human expert to track any changes made to the data, including back to the moment humans introduced bias. Every touchpoint within the entire tech or development stack must factor in the reality of data bias, including the data selection and preparation process, business logic development and analytical models, testing and results analysis.

 

Mechanisms to identify and measure bias

For those working to combat data bias, effective measures were found to include education and training; improved transparency and traceability of algorithms and data; more time spent model training, building and evaluation; and using tools to help source bias within data sets. Only a continuous commitment to assessment and removal will prevent bias.

 

Add human oversight - AI can’t govern itself, and when left unattended, AI struggles ethically and often produces inaccurate and discriminatory predictions. Regardless of the quality and quantity of data provided, AI requires the necessary context for whatever it is being trained on. Companies need to create a rules-based system that can accurately categorise data using the appropriate classifications.

 

Diverse teams - For AI to be sustainable over time, the pool of those developing these algorithms must become more diverse, across the racial and gender spectrums, those with less-advanced degrees and those from a broader cross-section of professions and backgrounds.

 

An ethical approach to innovation

AI can greatly improve human decision-making and is a boon for delivering better services for citizens and business. As AI/ML use increases, more data scientists, practitioners, and programmers will dive into datasets and produce ever-more algorithms meaning the issue of data bias could intensify.

 

However, the good news - organisations are aware it’s a problem that needs addressing. By providing transparency, people will trust that organisation’s systems to produce unbiased results. Taking some proactive measures to address and mitigate data bias effectively, as part of implementing a sound digital ethics strategy, is the way for leaders to demonstrate a responsible approach to AI use.

 

Bad AI decision-making, typically caused by bias, can largely be mitigated by proactive human intervention. A comprehensive approach that combines people, tech/tools, training and ongoing policy vigilance will ensure data bias is eradicated from AI/ML practices, benefitting the company, its stakeholders and beyond.

 


 

Matthieu Jonglez is Vice President, Technology - Application and Data Platform at Progress

 

Main image courtesy of iStockPhoto.com and Khanchit Khirisutchalual

Business Reporter

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

© 2024, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543