Artificial intelligence is only as neutral as its creators allow it to be, and as unbiased as the data it’s trained on. That’s why the algorithms that sift through our data and ultimately guide decisions can easily be compromised by built-in bias.
In some cases, bias, which may be unintentional, unconscious, or influenced by environmental factors – such as a lack of diversity in teams – spills over during the development of artificial intelligence. For example, the predominantly female characterisation of virtual assistants has been said to project workplace stereotypes onto AI.
But when training AI using reams of existing data, the presence of bias is usually the result of historic factors, or the lack of diversity in the source material: for example, images of managers that are predominantly male, or facial recognition systems that are largely trained on white faces. Without adequate balance in the source data, algorithms may simply regurgitate that bias, while giving it a veneer of computer-generated neutrality.
Gender and racial bias have been highlighted in AI systems before, from imaging technologies optimised to identify light skin tones, to MIT’s facial recognition system that proved incapable of identifying a black woman, due to a lack of diversity in the training data.
When he admitted to the system’s problem at 2017’s World Economic Forum in Davos, MIT Media Lab’s Joichi Ito acknowledged that most of his own students were young, white males who preferred the binary world of computers to the messy, emotional world of other human beings. That was the root cause of the problem, he suggested.
IBM releases world’s largest data set for bias studies
Cognitive services giant IBM has announced a number of measures to tackle bias in AI systems and better understand how it develops. In particular, the company is focused on ensuring that facial recognition software is built and trained responsibly.
Later this year, IBM will make public two datasets to be used as tools for the technology industry and AI research community.
The first will be made up of one million annotated images, harvested from photography platform Flickr. The dataset will rely on Flickr’s geo-tags to balance the source material and reduce sample selection bias.
According to IBM, the current largest facial attribute dataset is made up of just 200,000 images.
IBM will also be releasing an annotated dataset of up to 36,000 images that are equally distributed across skin tones, genders, and ages. The company hopes that it will help algorithm designers to identify and address bias in their facial analysis systems.
Addressing bias before training begins
Part of dealing with bias is acknowledging that, for whatever reason, it exists.
In a blog post outlining the steps the company will be taking this year, IBM Fellows Aleksandra Mojsilovic and John Smith highlighted the importance of training development teams – which tend to be dominated by young white men – to recognise how bias occurs and becomes problematic.
“It is therefore critical that any organisations using AI – including visual recognition or video analysis capabilities – train the teams working with it to understand bias, including implicit and unconscious bias, monitor for it, and know how to address it,” they wrote.
There is irony in the need for flawed humans afflicted with unconscious bias to train machines to be neutral. But given the emergence of AI and its increased adoption in a multitude of applications, the need to prevent bias from entering into AI systems is pressing.
“We believe no technology — no matter how accurate — can or should replace human judgement, intuition and expertise,” said IBM.
“The power of advanced innovations, like AI, lies in their ability to augment, not replace, human decision-making. AI holds significant power to improve the way we live and work, but only if AI systems are developed and trained responsibly, and produce outcomes we trust. Making sure that the system is trained on balanced data, and rid of biases, is critical to achieving such trust.”
Internet of Business says
If the past two years have taught us anything, it is the need to acknowledge that gender, racial, and other biases exist at every level in society, either unconsciously, as artefacts of an evolving culture, as systemic or sector-based problems, or as beliefs or policy decisions.
A number of reports have suggested that up to 90 percent of coders and developers are male, with the vast majority being young and white. A UK-RAS presentation at last year’s UK Robotics Week, for example, quoted the statistic that 83 percent of people across all types of STEM careers (science, technology, engineering, and maths) are men.
But AI systems need to reflect all of human society, and work for everyone. So developers need to design protocols to not only counterbalance the lack of diversity in development teams, but also the historic bias that may exist across decades of data – for example, in the legal system, in human resources, in location and property data, and so on.
Failure to do so may actively entrench those biases in our systems and, as our author suggests, give them a veneer of neutrality and evidenced fact.
We support IBM’s forward-looking research, and welcome its focus on AI assistance and augmentation, not replacement – an aim it shares with Microsoft.
• Additional analysis: Chris Middleton.