AIWiki
Malaysia

Amazon SageMaker

Amazon SageMaker is a fully managed cloud platform from AWS that provides an integrated environment for building, training, and deploying machine learning models at scale, incorporating tools for data preparation, model development, MLOps, and generative AI.

6 min readLast updated June 2026Companies & Tools

Amazon SageMaker is a fully managed machine learning platform developed by Amazon Web Services (AWS). Launched in November 2017 at AWS re:Invent, SageMaker provides data scientists, machine learning engineers, and developers with an integrated set of tools for the complete ML lifecycle: data exploration and preparation, model training and evaluation, deployment and serving, and ongoing monitoring. In 2024, AWS rebranded the product as Amazon SageMaker AI to emphasise its positioning as a comprehensive data, analytics, and AI platform rather than a standalone ML service.

History and Evolution

SageMaker was initially conceived as a managed environment that would remove the infrastructure burden from data science teams — provisioning compute, managing dependencies, and handling model hosting — so that practitioners could focus on modelling rather than DevOps. At launch, it offered managed Jupyter notebook instances, built-in training algorithms, and one-click endpoint deployment.

Over subsequent years, AWS expanded SageMaker into a platform of more than 100 integrated services. Major additions included SageMaker Studio (a web-based integrated development environment for ML), SageMaker Autopilot (AutoML), SageMaker Clarify (bias detection and explainability), SageMaker Pipelines (MLOps workflow orchestration), SageMaker Model Monitor (production monitoring), and SageMaker HyperPod (distributed training infrastructure for large models).

In 2024, AWS introduced the next generation of SageMaker under the SageMaker AI name, unifying data engineering, analytics, and AI development into a single platform with SageMaker Unified Studio as the central interface.

Core Components

SageMaker AI (Training and Inference)

The foundational service allows users to submit training jobs that run on managed compute instances, ranging from small CPU instances for prototyping to clusters of hundreds of NVIDIA A100 or H100 GPUs for large model training. SageMaker handles container provisioning, distributed training setup, and checkpoint management. Trained models can be deployed to managed real-time endpoints, batch transform jobs, or serverless inference endpoints that automatically scale to zero when not in use.

SageMaker Studio

SageMaker Studio is a web-based IDE that provides notebook environments, experiment tracking, model registration, pipeline visualisation, and debugging tools in a single interface. It integrates with SageMaker's broader platform services and supports collaboration between team members on shared projects.

SageMaker JumpStart

JumpStart is SageMaker's model hub and solution accelerator. It provides one-click deployment of pre-trained foundation models including Llama, DeepSeek, Mistral, Qwen, and Amazon's own Nova family of models, as well as fine-tuning pipelines and industry-specific ML solution templates. JumpStart lowers the barrier to deploying state-of-the-art models by abstracting infrastructure provisioning.

SageMaker Pipelines

SageMaker Pipelines provides a directed acyclic graph (DAG) orchestration layer for building repeatable ML workflows. A pipeline can chain data preprocessing, training, evaluation, conditional deployment steps, and notification actions, with each step tracked in the experiment management system. Pipelines can be triggered on a schedule, in response to new data, or via an API call.

SageMaker HyperPod

HyperPod is SageMaker's purpose-built infrastructure for large-scale distributed training of foundation models. It provides resilient training clusters with automatic failure detection and recovery, health-aware job scheduling, and integration with distributed training frameworks such as DeepSpeed and Megatron-LM. HyperPod targets organisations training models with tens of billions or hundreds of billions of parameters.

SageMaker Model Monitor

Model Monitor automatically detects data quality issues, model quality degradation, bias drift, and feature attribution drift in deployed models. It compares the statistical properties of incoming inference data against a baseline established at deployment time and triggers alerts when significant deviations are detected.

Pricing Model

SageMaker charges separately for compute consumed by training jobs and endpoints, storage, and additional services used. Training instances are billed per second; inference endpoints are billed per hour for provisioned instances or per invocation for serverless endpoints. SageMaker Savings Plans offer discounts of up to 64% on training and inference costs in exchange for commitment to a minimum usage level over one or three years.

Competitive Position

| Platform | Primary Cloud | Key Differentiator | |---|---|---| | Amazon SageMaker | AWS | Breadth of integrated ML services | | Google Vertex AI | Google Cloud | Integration with Google foundation models | | Azure Machine Learning | Microsoft Azure | Integration with Microsoft tools and OpenAI | | IBM watsonx | IBM Cloud | Enterprise governance and explainability |

See Also

References

  1. Amazon Web Services. (2024). What is Amazon SageMaker AI? AWS Documentation. https://docs.aws.amazon.com/sagemaker/
  2. Amazon Web Services. (2025). Introducing the next generation of Amazon SageMaker. AWS News Blog.
  3. Amazon Web Services. (2025). Amazon SageMaker AI in 2025: A Year in Review. AWS Machine Learning Blog.
  4. Bank Negara Malaysia. (2022). Risk Management in Technology (RMiT). BNM Policy Document.
  5. MDEC. (2024). Malaysia Digital Economy Blueprint: Cloud and AI Infrastructure. Malaysia Digital Economy Corporation.