The field of Artificial Intelligence(AI) is a rapidly expanding field and is now a part of our everyday life. Be it our phone cameras or the translators on our phone everything uses AI. AI helps people by making the process of decision making easier. What this means is that, it can narrow down possible solutions to our problems making it easier for us to find answers to our questions. It is now being used in a lot of areas starting from whether or not a candidate is suitable for a job or not, whether a person should be given a loan or not and many such areas. 

Now, what would happen if these AI systems being used were to favor a certain class of people more, over another class of people. This could be possible in any system due to human intervention but what if this was due to the methods used to design these systems that this bias to favor a certain section is introduced into the system. To understand more about how this bias has come up and how we can solve it, we first need to understand how these systems work and what they actually are.

What is AI? and how did it all start?

AI could be considered to be a huge branch in the field of Computer Science which is mainly concerned with building machines that are smart enough to be capable of performing tasks that typically require human intelligence. AI is considered as a field with many approaches, but the advancements in some of these approaches such as machine learning and deep learning are creating a paradigm shift in every sector of the tech industry. 

The field of AI began with one question by mathematician Alan Turing. After breaking the Nazi encryption machine Enigma which later resulted in the Allied forces winning the world war 2, Alan Turing asked a simple question “Can machines think?”, later his paper titled “Computing machinery and Intelligence” and its subsequent Turing Test, established the fundamental goal and vision of artificial intelligence. 

Today, Artificial intelligence is based on the principle that human intelligence can be defined in a way that a machine can easily mimic it and execute tasks, from the most simple to those that are even more complex. The goals of artificial intelligence include mimicking human cognitive activity.

How is AI being used?

Artificial Intelligence generally falls under two broad categories:

  • Narrow AI: Also known as weak AI, this kind of AI operates within a limited context and is a simulation of human intelligence. Its main focus is on performing a single task extremely well.
  • Artificial General Intelligence: This is the kind of AI we see in movies, like the robots from Westworld or Data from Star Trek: The Next Generation. Artificial General Intelligence is a machine with general intelligence and much like a human being,it can apply that intelligence to solve any problem.

Lets talk a little more about Narrow AI,

Narrow AI

Narrow AI is all around us and is easily the most successful realization of artificial intelligence to date. With its focus on performing specific tasks.

A few examples of Narrow AI include:

  • Google Search
  • Image recognition software
  • Siri, Alexa and other personal assistants
  • Self-driving cars

Much of Narrow AI is powered by breakthroughs in machine learning and deep learning. The main difference between artificial intelligence, machine learning and deep learning can be confusing, To put it in simple words, 

“Artificial Intelligence is a set of algorithms and intelligence to try to mimic human intelligence, Machine Learning is one of them, and deep learning is one of those machine learning techniques”

Simply put, machine learning feeds a computer data and uses statistical techniques to help it “learn” how to get progressively better at a task, without having been specifically programmed for that task, eliminating the need for millions of lines of written code. Machine learning consists of both supervised learning (using labeled data sets) and unsupervised learning (using unlabeled data sets).  

Deep learning is a type of machine learning that runs inputs through a biologically-inspired neural network architecture. The neural networks contain a number of hidden layers through which the data is processed, allowing the machine to go “deep” in its learning, making connections and weighting input for the best results.

The Problem

As it is evident, most of our lives are surrounded with the applications of Narrow AI predominantly. From the search engines we use to the voice assistants all fall under the category of narrow AI. Not only these, governments in certain countries are now using recommender systems to identify people who may benefit from a certain policy of the government. 

All of these narrow AI solutions are mostly built using machine learning and as I mentioned above, machine learning works on two things first, the data that it is fed and the algorithm which it uses to learn to optimize its task without being specifically programmed to do these. 

With advancements in the field of technology, it has become a matter of prime importance to be able to optimize these algorithms. Most of the modern day research in the field of AI is done in this area i.e., to optimize the algorithms to perform with the least amount of resources at their disposal. This has led to newer and optimized algorithms being developed but the accuracy with which these algorithms have predicted outcomes has not seen a similar rise.

The main reason for this is that we have been too much focused on optimizing our algorithms that we didn’t care about the data that is being fed to these systems. This data which is of prime importance to what a machine learning model predicts has not been optimized much. This has led to a huge bias being introduced into the predictions. 

How important is unbiased data

To understand the consequences of biased data and its consequences on real life decision making let’s take a few examples, 

Consider the US census 2020, census is foundational to many social and economic policy decisions of any government hence it needs to account for 100 percent of the population of the country. However with the pandemic and the political scenario over the citizenship questions there is a threat of undercounting of minorities and it is expected that there would be a significant undercounting of minority groups who are hard to locate, contact, persuade and interview for the census. This undercounting would induce bias and degrade the quality of the data infrastructure. 

Consider the undercounts in the 2010 census in which approximately 16 million people were omitted in the final counts. This number is equal to the population of 4 states namely Iowa, Arizona, Oklahoma and Arkansas put together for that year. Also, close to 1 million kids under the age of 5 are undercounted. 

This undercounting is not only restricted to the US census but is also common in other national censuses as well as these minorities can be harder to reach, their mistrust towards the government, or they living in areas under political unrest. This could be seen in the fact that the Australian Census in 2016 undercounted Aboriginals and Torres Strait population which are two of the indigeneous minorities of Australia by about 17.5 percent. This undercounting is estimated to be much higher in 2020 than it was in 2010.

The implications of these biases are going to be massive, as the census is the most trusted, open and publicly available rich data on population composition and characteristics. While businesses have proprietary information and consumers, the Census Bureau reports definitive public counts on Age, Gender,ethnicity,race, employment, family status as well as geographic distribution, which form the foundation of the population data infrastructure. This Census data is then used in models supporting public transportation, housing, health care insurances, educational policy reforms etc. When minorities are undercounted in the data fed to these models, the models trained generate a bias towards them which would lead to the model picking the wrong person for a government aimed at benefiting a minority community.

The first step to improving results is to make the database representative of age, gender, ethnicity and race per census data. Census being so important, every effort should be made to count 100 percent. Investing in this data quality and accuracy is essential to making AI possible not only for the few and privileged, but for everyone in the society. Most systems today use data that’s already available or collected for some other purposes because it’s convenient and cheap.

Consider another example, to collect data to measure retail sales in stores a team of data scientists at Nielsen visited retail stores outside Bangalore in India. They specifically chose small stores which are informal and hard to reach. They could have easily selected stores inside the city where the electronic data could be easily integrated into a data pipeline and be fed straight to train their model making it  cheap, convenient and easy. Their answer was simple, because for countries like India where according to the International Labor Organization 65 percent of the population lives in these rural areas.

Source: TED

 Imagine the bias in decision when 65% percent of consumption in India is excluded in a model, this would lead to decisions favouring urban population over the rural population.

Without this urban-rural context and signal on livelihood, lifestyle, economy and values retail brands will make wrong investments on pricing, advertising and marketing or the urban bias may even lead to wrong policy decisions with regards to health, transportation, housing and other investments.

These wrong decisions cannot be blamed on bad algorithmic designing, it’s due to bad data, data that excludes areas intended to be measured in the first place. The data in context is of priority and not the algorithms.

To sum up the two case studies, it is evident that bias is a big issue especially when dealing with humanitarian crises, because it can influence who gets help and who doesn’t.

How are we tackling this?

As of now, there hasn’t been much development on this issue, but in a conversation with students from the Wharton School of the University of Pennsylvania, Alexandra Olteanu a post-doctoral researcher at Microsoft Research in US-Canada, talked about the ethical and people considerations in data collection and artificial intelligence and how we could work towards removing these biases. According to her,

Many of these systems know only what you show them. If you understand that there were some problems in the past, and you want to improve the future, then you need to go back to the data and understand who is represented in that data set, but even more importantly, how they are represented. Maybe you have everyone in, but the way in which you gathered data about them — the way in which you decided what signals were important when you made a decision — are equally important.

In research approaches, there are two main kinds of approaches focusing on inducing a reduced bias. One is called “individual fairness,” where the idea is to ensure that similar individuals are treated similarly. The challenge here is how do we identify that two individuals are similar. What types of attributes should one include and what type of mathematical function should one use? 

The second approach is focused on what is known as “group fairness.” Here, the idea is to ensure that the error rates for different groups of people are similar. In this approach, the problem is how do you define an error, how do you compute [or] aggregate errors across groups, and so on.

Another issue that these approaches do not address is the process fairness,i.e. When the system takes the decision, is the process involved to make the decision fair for everyone involved? This is considered to be currently a work in progress.

Summary

  • The field of Artificial Intelligence is fast growing and is influencing every aspect of our everyday life.
  • The use of AI could be categorised into two main domains , Narrow AI and Artificial General Intelligence.
  • Narrow AI is what we see predominantly in our daily life like i voice assistants, search queries etc.,.
  • Artificial General Intelligence is a much advanced field focused more on generalizing these problems rather than having specific solutions for specific problems.
  • The key to making good AI has long been considered to be about optimization of algorithms ignoring data.
  • This led to low effort being put on collecting data, which led to completely ignoring minority percentages of the data.
  • This lesser representation made models induced with a bias hence further ignoring them in decision making.
  • This problem is being now realised and is currently a work-in-progress according to researchers.