What is Explainable AI (XAI)?

An introduction to XAI — the field aimed at making machine learning models understandable to humans

Conor O'Sullivan

Published in

Towards Data Science

11 min readSep 19, 2022

Updated: 8 March 2023

Should we always trust a model that performs well?

A model could reject your application for a mortgage or diagnose you with cancer. These decisions have consequences. Serious consequences. Even if they are correct, we would expect an explanation.

A human could give one. A human would be able to tell you that your income is too low or that a cluster of cells is malignant. To get similar explanations from a model we look to the field of explainable AI.

We will explore this field and understand what it aims to achieve. We discuss the approaches in XAI below with a focus on the first two. To end, we touch on what it takes to go from technical interpretations to human-friendly explanations.

Summary of approaches in XAI (source: author)

You may also enjoy this video on the topic. And, if you want to learn more, check out my course — XAI with Python. You can get free access if you sign up to my newsletter.

What is XAI?

XAI, also known as interpretable machine learning (IML), aims to build machine learning models that humans can understand. It is a field of research and the set of existing tools and methodologies developed by that field. This includes:

methods to interpret black-box models
modelling methodologies used to build models that are easy to interpret

We can think of XAI as both interpreting models and making models more interpretable. Explaining predictions to a less technical audience also falls in the domain. That is how we go from interpretations that data scientists can understand to human-friendly explanations. So, really, XAI involves any method used to understand or explain how a model makes predictions.

XAI aims to build models that can be understood by humans

Understanding a model can mean understanding how individual predictions are made. We call these local interpretations. With these, we want to know how each individual model feature has contributed to the prediction. It can also mean understanding how a model works as a whole. We call these global interpretations.

Why do we need XAI?

The obvious benefit is the aim of XAI — an understanding of a model. Through local interpretations, we can understand individual decisions made using machine learning and explain those to customers. Through global interpretations, we can understand what trends the model is using to make predictions. From this, many other benefits flow.

It can help increase trust in machine learning and lead to greater adoption in other fields. You can also gain knowledge of your dataset and tell better stories about your results. You can even improve accuracy and performance in production. This video discusses these 6 benefits in-depth:

Machine learning models are notoriously complicated. So, how do we go about understanding them? Well, we have a few approaches available.

Intrinsically interpretable models

The first approach is to build models that are intrinsically interpretable. These are simple models that can be understood by a human without the need for additional methods [1]. We only need to look at the model’s parameters or a model summary. These will tell us how an individual prediction was made or even what trends are captured by the model.

Interpretable models can be understood by a human without any other aids/techniques.

A decision tree is a good example of this type of model. Looking at Figure 1, suppose we want to understand why we gave a loan to a 29-year-old student with a $3000 monthly income. The person is over 25 so we go right at the first node. Then, she has an income of ≥ 2000 so we go right and arrive at a “No” leaf node. So, the model predicts that the student will not default and the automated underwriting system sanctions the loan.

Figure 1: example of a decision tree (source: author)

Other examples are linear and logistic regression. To understand how these models work we can look at the parameter values given to each model feature. The parameter*feature value gives the contribution of that feature to the prediction. The signs and magnitudes of the parameter tell us the relationships between the feature and target variable.

Using these models leads us away from machine learning. We move towards a more statistical mindset of building models. Much more thought goes into building an intrinsically interpretable model. We need to put more time into feature engineering and selecting a small group of uncorrelated features. The benefit is having a simple model that is easy to interpret.

Black-box models

Some problems cannot be solved with simple models. For tasks like image recognition, we move towards less interpretable or black box models. We can see some examples of these in Figure 2.

Black box models are too complicated to be understood directly by humans. To understand a random forest we need to simultaneously understand all the decision trees. Similarly, a neural network will have too many parameters to comprehend at once. We need additional methods to peer into the black box.

Model agnostic methods

This brings us to model agnostic methods. They include methods like PDPs, ICE Plots, ALE Plots, SHAP, LIME and Friedman’s h-statistic. These methods can interpret any model. The algorithm really is treated as a black box that can be swapped out for any other model. These can be classified by what they aim to interpret and how they are calculated.

An overview of model agnostic approaches (source: author)

One approach is to use surrogate models. These methods start by using the original model to make predictions. We then train another model (i.e. surrogate model) on these predictions. That is we use the original models' predictions instead of the target variable. In this way, the surrogate model learns what features the original model used to make predictions. The surrogate model must be intrinsically interpretable. This allows us to interpret the original model by looking directly at the surrogate model.

Another approach is to use permutations. This involves changing/permuting model features. We use the model to make predictions on these permuted features. We can then understand how changes in feature values lead to changes in predictions.

A good example of permutation methods are PDPs and ICE Plots. You can see one of these in Figure 3. Specifically, the ICE Plot is given by all the individual lines. There is a line for each observation in our dataset. To create each line, we permute the value of one feature and record the resulting predictions. We do this while holding the values of the other features constant. The bold yellow line is the PDP. This is the average of all the individual lines.

Figure 3: PDP and ICE Plot example (source: author)

We can see that, on average, the prediction decreases with the feature. Looking at the ICE Plot, some of the observations do not follow this trend. This indicates a potential interaction in our data. PDPs and ICE Plots are an example of global interpretation methods. We can use them to understand the trends captured by a model. They can’t be used to understand how individual predictions were made.

Shapley values can. As seen in Figure 4, there is a value for each model feature. They tell us how each feature has contributed to the prediction f(x) when compared to the average prediction E[f(x)].

Figure 4: example of Shapley values (source: author)

In the past, Shapley values have been approximated using permutations. A more recent method called SHAP has significantly increased the speed of these approximations. It uses a combination of permutations and surrogate models. The feature values of an individual observation are permuted. A linear regression model is then trained on these values. The weights of this model give the approximate Shapley values.

In general, this approach is known as a local surrogate model. We train surrogate models on permutations of individual predictions instead of all predictions. LIME is another method that uses this approach.

Additional XAI methods

Intrinsically interpretable models and model agnostic methods are the main approaches to XAI. Some other methods include counterfactual explanations, causal models, and adversarial examples. The last two would actually be considered fields of their own. They do however share methods and goals with XAI. Really, any method that aims to understand how a model makes predictions will fall under XAI. Many methods have been developed for specific models. We call these non-agnostic methods.

XAI includes any method used to understand how a model makes predictions

Counterfactual explanations

Counterfactual explanations could be considered a permutation method. They rely on permutations of feature values. Importantly, they focus on finding feature values that change a prediction. For example, we want to see what it would take to go from a negative to a positive diagnosis.

More specifically, a counterfactual explanation is the smallest change we need to make to a feature value to change the prediction. For continuous target variables, the change will be a predefined percentage or amount.

Counterfactual explanations are useful for answering contrasting questions. Questions where the customer compares their current position to a potential future position. For example, after being rejected for a loan they could ask:

“How can I be accepted?”

With counterfactual explanations, we could reply:

“You need to increase your monthly income by $200” or “You need to decrease your existing debt exposure by $10000”.

Causal models

Machine learning only cares about correlations. A model could use country of origin to predict the chance of developing skin cancer. However, the true cause is the varying levels of sunshine in each country. We call the country of origin a proxy for the amount of sunshine.

When building causal models we aim to use only causal relationships. We do not want to include any model features that are proxies for the true causes. To do this we need to rely on domain knowledge and put more effort into feature selection.

Building causal models does not mean the model is easier to interpret. It means that any interpretation will be true to reality. The contributions of features to a prediction will be close to the true causes of an event. Any explanations you give will also be more convincing.

Why did you diagnose me with skin cancer?

“Because you are from South Africa”, is not a convincing reason.

Adversarial examples

Adversarial examples are observations that lead to unintuitive predictions. If a human had looked at the data they would have made a different prediction.

Finding adversarial examples is similar to counterfactual explanations. The difference is we want to change feature values to intentionally trick the model. We are still trying to understand how the model works but not to provide interpretations. We want to find weaknesses in the model and avoid adversarial attacks.

Adversarial examples are common for applications like image recognition. It is possible to create images that look perfectly normal to a human but lead to incorrect predictions.

For example, researchers at Google showed how introducing a layer of noise could change the prediction of an image recognition model. Looking at Figure 5, you can see that, to a human, the layer of noise is not even noticeable. Yet, the model now predicts that the panda is a gibbon.

Figure 5: Adversarial example (Source: I. Goodfellow, et. al.)

Non-agnostic methods

Many methods have been developed for specific black-box models. For tree-based methods, we can count the number of splits for each feature. For neural networks, we have methods like pixel-wise decomposition and deepLIFT.

Although SHAP would be considered model agnostic it also has non-agnostic approximation methods. For example, TreeSHAP can only be used for tree-based methods and DeepSHAP for neural networks.

The obvious downside is that non-agnostic methods can only be used with specific models. This is why research has been directed toward agnostic methods. These give us more flexibility when it comes to algorithm selection. It also means that our interpretation methods are future-proof. They could be used to interpret algorithms that haven’t been developed yet.

From interpretations to explanations

The methods we have discussed are all technical. They are used by data scientists to explain models to other data scientists. In reality, we will be expected to explain our models to a non-technical audience. This includes colleagues, regulators or customers. To do this we need to bridge the gap between technical interpretations and human-friendly explanations.

You will need to:

adjust the level based on the expertise of the audience
put thought into which features to explain

A good explanation does not necessarily require the contributions of all features to be explained.

We discuss this process in more depth in the article below. As an example, we walk through how to use SHAP feature contributions to give a convincing explanation.

The Art of Explaining Predictions

How to explain your model in a human-friendly way

towardsdatascience.com

XAI is an exciting field. If you want to learn more, check out the tutorials below:

Introduction to SHAP with Python

How to create and interpret SHAP plots: waterfall, force, decision, mean SHAP, and beeswarm

towardsdatascience.com

The Ultimate Guide to PDPs and ICE Plots

The intuition, maths and code (R and Python) behind partial dependence plots and individual conditional expectation…

towardsdatascience.com

I hope you found this article helpful! If you want to see more you can support me by becoming one of my referred members. You’ll get access to all the articles on medium and I’ll get part of your fee.

Join Medium with my referral link — Conor O’Sullivan

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…

conorosullyds.medium.com

You can find me on | Twitter | YouTube | Newsletter — sign up for FREE access to a Python SHAP course

Image Sources

All images are my own or obtain from www.flaticon.com. In the case of the latter, I have a “Full license” as defined under their Premium Plan.

References

[1] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, 2019.

C. Molnar, Interpretable Machine Learning (2021), https://christophm.github.io/interpretable-ml-book/

S. Masís, Interpretable Machine Learning with Python (2021)

Moraffah, R., Karami, M., Guo, R., Raglin, A. and Liu, H., 2020. Causal interpretability for machine learning-problems, methods and evaluation. https://arxiv.org/abs/2003.03934

Microsoft, Causality and Machine Learning, https://www.microsoft.com/en-us/research/group/causal-inference/

What is Explainable AI (XAI)?

An introduction to XAI — the field aimed at making machine learning models understandable to humans

What is XAI?

Why do we need XAI?

Intrinsically interpretable models

Black-box models

Model agnostic methods

Additional XAI methods

Counterfactual explanations

Causal models

Adversarial examples

Non-agnostic methods

From interpretations to explanations

The Art of Explaining Predictions

How to explain your model in a human-friendly way

Introduction to SHAP with Python

How to create and interpret SHAP plots: waterfall, force, decision, mean SHAP, and beeswarm

The Ultimate Guide to PDPs and ICE Plots

The intuition, maths and code (R and Python) behind partial dependence plots and individual conditional expectation…

Join Medium with my referral link — Conor O’Sullivan

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…

Image Sources

References

Written by Conor O'Sullivan