XAI is the study of how to make machine learning models explainable to humans. It is a subfield of AI that is concerned with the interpretability of machine learning models. It helps in interpretability, explainability, and transparency of a given model.

In general, ML models are considered as black boxes. Its like you feed in training data and get outputs on unseen data. Unlike explicit programming, where you define the rules on how something works, ML models are trained and the ‘stuff’ in it do not make sense directly.

XAI can also assist in finding the issues with certain models. Some of it might be poor results, bias to certain thing etc..

Why care about XAI?

Let’s take 2 senarios:

In a recomdation system, a ML model might give the user a wrong suggestion (say for a netflix series).

So what is the outcome of this wrong suggestion? One thing that could happen is that the user will just spend more time to find something else to watch. This is not a big deal. Worst case senario, when the suggestion is not good multiple times, it might lead the user to cancel the subscription.

But what if the wrong suggestion is something like a medical diagnosis? The user might not be able to find a better solution and might end up with a wrong diagnosis. This could lead to a lot of issues.

Ok, now that just escalated quickly. Now you might get a decent idea on why XAI is important.

Summary of why XAI

  • Helps verifying and debugging ML models
  • Helps to improve ML models
  • Finding new insights about the data
  • Compliance

Explaination methods and Approaches

Dimensions of Explainability

Demistifying the black box algorithims would need us to look at the following stuff:

  • Outcomes - Understanding why a certain outcome was predicted relating to the given input.
  • End Users - This is about providing the right level of abstraction to the end user. You dont have to give every detail for a user to trust a model.
  • Data - This is used to model the model. Having roboust data to train on, with a good model would give good results.
  • Model - This is about understanding the model as a whole. How is the input being mapped to the output, and be aware of the limitations of the model.

Questions of Explainability

What do we understand from the data?

We should spend enough time analyzing and exploring the data. This will help us find potential issues like gaps, inconsistencies, biases, that might impact the model and the predictions.

It helps to know what is expected and what part of data contributes to certain output.

How is the model created?

This is about understanding the model as a whole. How is the input being mapped to the output, and be aware of the limitations of the model. This is the phase where we look at the inductive bias of the algorithim and relate that with the data we used. If its unclear on how the algorithim builds the model, then the model becomes less transparant thus becomming less interpretable.

Global Interpertability of a trained model

This is trying to find the key feature values, what are the complex interactions happening inside the model. This is espically hard to achieve for complex deep networks where there are lots of paramaters to work with.

Influence of different parts of the model on the final output

For example, in deep neural netowrks (DNN) different layers tries to learn different features and makes predictions. This is about understanding the contribution of each layer to the final output. When model predictions are incorrent, it helps to see what part of the model, impacted the most to the wrong prediction and work on it.

Types of explaination methods

Local and Global explainability

Local explainability is about understanding the model for a spefic input, or a certain range of inputs. Global explainability is used to understand the model’s behavour or certain important features that contribute towards a sepfic set of model outcomes.

Intrinsic and extrinsic explainability

Some basic models like linear models, decision trees are really simple to understand, since we clearly know the logic and the mathematical mapping of the input and the output. But extrinsic (also known as post-hoc) explainability is about first training an ML model on the given data then use some explainablity techniques to generate insights about the model.

Model agnostic and model specific explainability

Model agnostic explainability is about using some explainability techniques that are not specific to a certain model. Model specific explainability is about using explainability techniques that are specific to a certain model.

Model centric and data centric explainability

Model centric explainability is about understanding the model as a whole. Data centric explainability is about understanding the data and how it is used to train the model.

Accuracy and Interpretability trade off

Ideal senario would be that we need the model to be highly accurate and highly interpretable. But this is not always possible. There is a trade off between accuracy and interpretability. The more complex the model, the more accurate it is, but the less interpretable it is. The more simple the model, the less accurate it is, but the more interpretable it is.

picture 1

References

Book On Amazon

Tags: ,

Updated: