AIWiki
Malaysia

Google Vertex AI

Google Vertex AI is a unified machine learning platform on Google Cloud that consolidates data preparation, model training, deployment, and monitoring for both custom-built models and Google's foundation models including Gemini.

6 min readLast updated May 2026Companies & Tools

Google Vertex AI is the managed machine learning platform of Google Cloud, providing an integrated set of services for building, deploying, and managing machine learning models. Announced at Google I/O 2021, Vertex AI consolidated and rebranded several earlier services — AI Platform, AutoML, and Vertex's predecessor offerings — into a single product surface. Vertex AI supports both traditional supervised and unsupervised machine learning workflows and a broad catalogue of foundation models, including Google's first-party Gemini family and third-party models accessible through the Model Garden.

Platform Architecture

Vertex AI is organised around a small number of core resources: datasets, training pipelines, models, endpoints, and predictions. Datasets reside in Google Cloud Storage or BigQuery and can be ingested into Vertex AI for managed processing. Training jobs run on Google-managed infrastructure — including CPUs, NVIDIA GPUs, and Google's Tensor Processing Units (TPUs) — and produce model artifacts that are registered in the Vertex AI Model Registry. Models are deployed to endpoints, which provide auto-scaling online prediction, or executed in batch prediction mode against large datasets.

Auxiliary services include Vertex AI Pipelines, a Kubeflow Pipelines-based orchestration service; Vertex AI Workbench, a managed Jupyter notebook environment; Vertex AI Feature Store, a managed online and offline feature store; Vertex AI Matching Engine, a managed vector similarity search service; Vertex AI Model Monitoring for drift and skew detection; and Vertex AI Experiments for tracking machine learning experiments.

Foundation Models and Generative AI

Vertex AI hosts Google's first-party foundation models through the Generative AI on Vertex AI service. The Gemini family — including Gemini Pro, Gemini Ultra, Gemini Nano, and successor releases — is available with multi-modal input across text, image, audio, and video. Imagen provides text-to-image generation, Imagen Edit and Imagen Customisation support image editing, Chirp is a speech recognition model spanning many languages, and Codey is the family of code generation and completion models.

The Model Garden offers a curated catalogue of foundation models that includes Google's own models alongside selected partner and open-source models such as Anthropic's Claude, Meta's Llama series, Mistral's open models, and others. Models are accessed through a consistent API surface, with managed serving infrastructure that handles scaling, observability, and tenancy isolation.

AutoML

AutoML in Vertex AI enables users without deep machine learning expertise to train high-quality models on tabular, image, video, and text data by uploading labelled examples and letting the system search neural architectures and hyperparameters automatically. AutoML Tables, AutoML Vision, AutoML Video Intelligence, and AutoML Natural Language address different modalities. The trained models can be deployed to Vertex AI endpoints or exported as containers for deployment elsewhere, including edge devices via Coral.

MLOps Capabilities

Vertex AI provides end-to-end MLOps functionality. Pipelines define the sequence of data preparation, training, evaluation, and deployment steps as a directed acyclic graph that runs in a managed environment with caching and lineage tracking. The Model Registry stores model versions with metadata, evaluations, and approval status, supporting model governance requirements. Endpoint traffic splitting enables canary and blue-green deployments. Model Monitoring tracks input feature distributions, prediction distributions, and explanation-based attributions over time, generating alerts when drift exceeds configured thresholds.

Integration and Pricing

Vertex AI integrates natively with other Google Cloud services including BigQuery for data warehousing, Dataflow for streaming and batch data processing, Pub/Sub for messaging, Cloud Storage for object storage, and Identity and Access Management (IAM) for permissions. Pricing combines compute charges for training and prediction, storage charges for datasets and model artifacts, and per-call or per-token charges for foundation model APIs.

Regional Availability

Vertex AI is available in many Google Cloud regions worldwide. The Singapore region (asia-southeast1) has hosted Vertex AI services from launch, providing low-latency access for Southeast Asian customers. Google Cloud's Malaysia region (Kuala Lumpur, asia-southeast2 announcement) was confirmed for opening with Vertex AI capabilities, supporting in-country data residency for Malaysian customers subject to PDPA and sectoral regulations.

References

  1. Google Cloud. (2021). Announcing Vertex AI: A Unified ML Platform. Google Cloud Blog.
  2. Google Cloud. (2024). Vertex AI Documentation. Mountain View, CA: Google LLC.
  3. Google DeepMind. (2024). Gemini Family Technical Reports. Google DeepMind.
  4. Bank Negara Malaysia. (2024). Policy Document on Risk Management in Technology (RMiT). Kuala Lumpur: BNM.
  5. Government of Malaysia and Google Cloud. (2024). Joint announcement on cloud region and digital transformation collaboration.