IA's technical biases: a fatality?

What is the impact of biases on AI? A vast question, the answer to which touches on both technical and societal aspects.

While artificial intelligence continues to develop and the barrier to entry for deep learning continues to fall, there are still technological and societal challenges to be overcome. Bias is a major issue in data and AI today.

It’s a subject of study that has made its mark in recent years, and one that has been much talked about. The biggest industrialists involved in AI are interested in the subject. Notably because their “failures” have sometimes been caused by these biases. And these failures have largely fuelled more or less justified ethical debates. Here, we’ll try to shed some light on these biases, so that we can understand how to counter them.

The importance of an unbiased training base

First of all, it’s important to understand that much of the bias can come from the data.
The problem is that AIs derive their behavior from the data they have learned. Biased data therefore necessarily leads to biased behavior. The consequences can be far-reaching. And thus, both in terms of the algorithm’s performance and the impact it can have, particularly in terms of ethical considerations.
In critical areas, the consequences can be even greater. It is therefore easy to see the need to understand and avoid bias¹.

The difficulty, however, is that biases are inherently difficult to identify. It may even be thought that, in some cases, there is no such thing as an unbiased form of representation, and that they are therefore unavoidable. This is why, without being aware of it, projects with biased behavior have seen the light of day and made the news.

However, with the necessary awareness on the part of designers and the solutions developed, biases can be managed. Ultimately, the effects of bias in artificial intelligence systems should diminish, and users’ trust in these algorithms should increase.

What are the different types of bias that AI can encounter?

Biases can be divided into two categories: processing biases and cognitive biases². Processing biases have more to do with the technical side of things, and cognitive biases have more to do with the human side. Among the technical biases, we can mention the following:

Sampling biases

Sampling biases are biases directly linked to the algorithm’s learning base. This may concern the over- or under-representation of certain data. This was the case, for example, with facial recognition algorithms that didn’t have enough photos of people of color, and therefore had more trouble identifying them.

Measurement biases

Measurement biases come into play during data acquisition. They are linked to changes in the measuring equipment. For example, when taking photos, if the camera is changed.

Reinforcement biases

More complex, reinforcement biases are linked to the way the algorithm works. They can occur when one decision has an impact on the next. This is the principle behind Internet recommendations.

In the United States, a system has been used to distribute police forces in neighborhoods considered dangerous. However, the more police there were on the premises, the more offenses were recorded, and so inevitably the more police the algorithm sent, thinking that the neighborhood was very dangerous. In fact, depending on how it is designed, an algorithm can become locked into a decision circle that is gradually reinforced.

Exclusion biases

Exclusion biases come into play when processing data. It is possible to decide to exclude certain data because they are not deemed relevant or cause too many processing problems. However, deleting data results in a loss of information and can affect the way the algorithm works. What’s more, even if some data is deleted, this doesn’t mean it isn’t used.
For example, if a piece of information characterizing gender is deleted, the algorithm can find it again by relying on other data such as first names, and therefore still take it into account³.

Cognitive biases

Cognitive biases are broader and more complex to measure. We won’t go into much detail about them in this article, as they are linked to humans and the societies in which they are found. These biases can be linked to human behavior.

For example, if a recruiter discriminates and the algorithm is trained to reproduce his or her decision-making, then the algorithm will also discriminate. However, these biases can also come from within the company. As a general rule, this occurs in labeling. In the world of medicine, for example, nurses are predominantly women and doctors are predominantly men. In 2013, a Google natural language analysis algorithm assimilated “doctor – man + woman = nurse”, due to an inequality already present in our society⁴.

How to detect bias in your dataset?

Once biases have been correctly identified, we still need to know how to counter or avoid them. An obvious first step is to be aware of their existence and essence, so as to take them into account when developing AI.

General awareness of the issue of bias is growing, in line with current interest in the very characterization of AI⁵. International bodies are looking into the issue in order to begin the move towards “trusted” AI. Industry players are doing the same, and standardization bodies such as CEN CENELEC and ISO are paving the way towards AI standardization.

In addition to raising awareness, verification processes must be implemented at algorithmic level. AIs need to be tested and evaluated. This requires the use of several databases, including a training set, a validation set and a test set. During this validation work, it is possible to fine-tune the AI system’s hyperparameters and test the algorithm on a base that has been checked for bias⁶.

Tools also exist to help analyze AI. One of their aims is to explain their behavior, so as to be able to correct them if necessary or anticipate a malfunction. In this way, it is possible to ensure the relevance of decision-making. Public databases verified by a trusted third party (without bias) could also be a solution, so that algorithms can be trained or tested on them⁷.

Bias is an aspect of AI that industry and institutions are increasingly taking into account. Even if the image of the “black box” still clings to AI, it is still possible to work on validating the input to this black box. By working on mitigating the risks posed by bias, the industry is reinforcing confidence in AI and enabling its integration to move up a gear.