What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Shadow Mode

Shadow mode is a machine learning deployment strategy in which a new model processes live production traffic in parallel with the existing model, capturing outputs for evaluation without affecting users or business operations.

6 min readLast updated June 2026Infrastructure

Shadow mode is a deployment strategy in machine learning systems in which a newly developed model runs alongside the currently serving model, receiving the same inputs from real production traffic but having its outputs suppressed from the user-facing response. The shadow model's predictions are logged and compared against those of the live model — and against eventual ground truth when it becomes available — allowing the engineering team to validate the new model's behaviour under realistic conditions before committing to a full rollout.

Motivation

Offline evaluation on held-out test sets is a necessary but insufficient condition for deploying a new ML model. Test sets reflect a static snapshot of the data distribution, which may diverge from what the model encounters in production. Users generate edge cases that evaluation sets rarely anticipate. System-level factors — hardware characteristics, serialisation differences, upstream data pipeline changes — can produce discrepancies between offline performance and production behaviour.

Shadow mode addresses this gap by exposing the candidate model to real traffic while insulating users from any risk. If the shadow model underperforms, crashes, or produces unexpected outputs, the impact is limited to the monitoring logs rather than to user experience or business outcomes.

How Shadow Mode Works

In a typical shadow mode setup, the production inference server intercepts each incoming request and dispatches it to two parallel paths: the live model (the champion) and the shadow model (the challenger). The live model's output is returned to the user in the normal response pathway. The shadow model's output is captured asynchronously — often written to a logging system or feature store — without affecting response latency as seen by the user.

The shadow model may run in the same inference cluster or in a dedicated shadow environment. Asynchronous processing is common to ensure that the overhead of running two models does not degrade the live user experience, particularly when the shadow model is larger or slower than the champion.

Once sufficient shadow predictions have accumulated, the team compares metrics including prediction accuracy, output distribution, latency, error rates, and performance on specific user segments or data subsets. When the shadow model demonstrates acceptable or improved behaviour across all relevant dimensions, the team proceeds to a gradual rollout — typically a canary deployment — before promoting the challenger to champion status.

Relationship to Other Deployment Strategies

Shadow mode is one of several progressive delivery strategies used in ML systems.

A/B testing exposes a fraction of live users to each model variant and measures downstream business metrics — click-through rates, task completion, user satisfaction — rather than model-level prediction metrics. A/B testing is appropriate when the success criterion is a business outcome rather than a ground-truth comparison.

Canary deployment routes a small but nonzero fraction of live traffic to the new model, making its outputs visible to the canary user segment. Unlike shadow mode, canary deployments have real user impact, which is why shadow mode is typically used before or instead of an early canary phase for higher-risk changes.

Blue-green deployment switches all traffic from one environment to another at a point in time, with rollback possible by switching back. It lacks the gradual validation characteristics of shadow mode.

Shadow Mode for Agentic AI Systems

In 2025, shadow mode was extended to agentic AI systems in which models take multi-step actions with real-world consequences — placing orders, modifying records, sending communications. For such systems, running the agent in shadow mode means the agent processes real events and produces action recommendations that are logged but not executed. Human reviewers or automated evaluators compare the agent's proposed actions against what human operators actually did, and the agent is promoted to live operation only when its accuracy on a defined set of decision types meets a specified threshold.

This pattern is particularly common in high-stakes domains such as financial compliance, fraud detection, and clinical decision support, where the cost of an incorrect autonomous action is high.

Infrastructure Considerations

Running shadow mode at scale requires infrastructure to duplicate request traffic, maintain separate model serving endpoints, capture and store shadow predictions alongside their corresponding inputs, and provide tooling to analyse the accumulated comparison data. Feature stores and model serving platforms such as those offered by Amazon SageMaker, Google Vertex AI, and Azure Machine Learning include built-in shadow testing capabilities. Open-source serving frameworks including Seldon Core and BentoML also support shadow routing configurations.

Malaysian Context — Shadow Mode in Malaysian Financial and Healthcare Deployments

Shadow mode is particularly relevant in Malaysia's financial services and healthcare sectors, where model deployment decisions carry regulatory weight and the consequences of errors are significant. Bank Negara Malaysia's AI guidelines and the Malaysia AI Governance Framework both emphasise rigorous testing and validation before deploying AI systems in consequential contexts. Shadow mode testing provides auditable evidence of pre-deployment validation that satisfies these expectations.

Malaysian banks including Maybank, CIMB, and Public Bank use ML models for credit scoring, fraud detection, and anti-money laundering. Replacing or upgrading these models involves regulatory notification and internal risk committee approval. Shadow mode allows these institutions to accumulate weeks or months of production-traffic validation data before presenting a change to risk committees, reducing the procedural burden of model updates.

In the Malaysian healthcare context, hospital systems in the Kementerian Kesihatan Malaysia (Ministry of Health) network and private hospital groups such as IHH Healthcare and KPJ Healthcare have piloted AI models for clinical decision support. Deploying such models in shadow mode first, where recommendations are logged but not shown to clinicians, allows clinical informaticists to validate performance against actual clinical decisions before moving to an advisory display mode.

Malaysian public sector AI deployments, including those under MAMPU (Malaysian Administrative Modernisation and Management Planning Unit) for government service automation, have adopted shadow testing as a standard step in their AI project delivery methodology, aligned with civil service procurement guidelines that require demonstrated reliability before production use.

References

Sculley, D. et al. (2015). Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems 28.
Amazon Web Services. (2023). Minimize the production impact of ML model updates with Amazon SageMaker shadow testing. AWS Machine Learning Blog.
Microsoft. (2024). Shadow testing. Engineering Fundamentals Playbook. https://microsoft.github.io/code-with-engineering-playbook/automated-testing/shadow-testing/
Dycora. (2024). Deployment and shadow mode testing: Validating a new model on live traffic without user impact. Dycora Blog.
ZenML. (2025). What 1,200 production deployments reveal about LLMOps in 2025. ZenML Blog.

Tags:shadow-mode mlops model-deployment testing production

Type	ML deployment and testing strategy
Also known as	Shadow deployment, shadow testing, dark launch
Category	MLOps
Key benefit	Validate new models on real traffic without user impact
Related	Canary deployment, A/B testing (ML), Model serving, MLOps