Explainable AI
Explainable AI (XAI) refers to methods and techniques that make the decisions and predictions of artificial intelligence systems interpretable and understandable to human users, addressing the opacity of complex machine learning models.
Explainable AI (XAI) encompasses the methods, frameworks, and design principles that allow human users — including domain experts, regulators, developers, and affected individuals — to understand why an AI system produced a particular output. As machine learning models have grown in predictive power, they have also become more opaque: deep neural networks with billions of parameters, gradient-boosted ensembles, and large language models offer little inherent insight into their reasoning. XAI addresses this opacity by surfacing post-hoc explanations, designing inherently interpretable models, or providing visualisations and statistics that reveal how inputs relate to outputs.[^1]
The need for explainability is driven by multiple factors: regulatory requirements that mandate transparency in automated decision-making, operational requirements that allow engineers to identify and correct model errors, and ethical requirements that ensure individuals can contest automated decisions that affect them.
Why Explainability Matters
In high-stakes domains — medical diagnosis, credit lending, criminal justice risk scoring, and hiring — opaque AI models create legal and ethical risks. A loan applicant denied credit by an algorithmic system has a reasonable expectation of understanding the basis for that decision. A clinician relying on an AI diagnosis tool must be able to assess whether the model's reasoning aligns with medical knowledge before acting on its recommendation. A regulator auditing a bank's credit model must be able to verify that the model does not rely on proxies for protected characteristics such as race or gender.[^2]
Beyond ethics and compliance, explainability supports debugging and model improvement: an explanation that reveals a model is relying on a spurious feature (such as the hospital name embedded in a medical image) enables practitioners to retrain or re-engineer the model.
Methods
XAI methods are typically categorised along two dimensions: scope (local vs. global) and model dependency (model-agnostic vs. model-specific).
SHAP (SHapley Additive exPlanations)
SHAP is grounded in cooperative game theory. It assigns each input feature a Shapley value representing that feature's average marginal contribution to the model's prediction across all possible subsets of features. Shapley values provide both local explanations (why did the model predict X for this specific instance?) and global explanations (which features matter most across the entire dataset?). SHAP values satisfy desirable mathematical properties — efficiency, symmetry, dummy, and additivity — that make them theoretically principled.[^3] Implementations such as TreeSHAP exploit the structure of tree-based models to compute exact Shapley values efficiently, while KernelSHAP provides a model-agnostic approximation.
LIME (Local Interpretable Model-Agnostic Explanations)
LIME generates explanations by perturbing the input around a specific instance, observing how the model's output changes, and fitting a simple, interpretable surrogate model (such as a linear regression) to this local neighbourhood. The surrogate's coefficients are presented as the explanation for that instance. LIME is model-agnostic — it treats the underlying model as a black box — and supports tabular, text, and image data. However, LIME explanations are inherently local and may not generalise across the input space, and results can be sensitive to the perturbation sampling strategy.[^4]
Attention Visualisation
In transformer-based models, attention weights indicate how much each input token influenced each output token. Visualising attention patterns can provide intuitive explanations of which words or image patches a model focused on, though research has shown that attention weights do not always correspond to feature importance in the causal sense.
Saliency Maps and Gradient Methods
For image classification CNNs, gradient-based methods such as Grad-CAM (Gradient-weighted Class Activation Mapping) produce heatmaps that highlight image regions most influential for a given prediction. These maps are widely used in medical imaging to verify that a model is attending to clinically relevant anatomy rather than image artefacts.
Intrinsically Interpretable Models
An alternative to post-hoc explanation is to use models that are inherently interpretable: decision trees, linear regression, rule-based systems, and generalised additive models (GAMs). Techniques such as Neural Additive Models (NAMs) attempt to bring the predictive power of neural networks closer to the interpretability of additive models.
Regulatory Context
Regulatory pressure has made XAI a compliance requirement rather than merely a best practice in many jurisdictions. The EU AI Act (2024) classifies high-risk AI systems — in medical devices, credit scoring, recruitment, and law enforcement — as requiring transparency, human oversight, and documentation of how the system reaches decisions, with penalties reaching 6% of global annual revenue for non-compliance. The EU GDPR Article 22 grants individuals the right not to be subject to solely automated decisions with significant effects, and the right to receive meaningful information about the logic involved. Similar obligations appear in financial regulations: the Basel Committee on Banking Supervision has issued guidance on responsible AI use in banking that includes model explainability requirements.[^2]
See Also
References
References
- Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. Z. (2019). XAI — Explainable artificial intelligence. Science Robotics, 4(37).
- Basel Committee on Banking Supervision. (2022). Principles for the effective management and supervision of climate-related financial risks. Bank for International Settlements. [Model risk governance document that includes AI explainability provisions.]
- Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30.
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.