What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Foundation Model

A large-scale AI model pretrained on broad, diverse datasets and designed to be adapted to a wide range of downstream tasks through fine-tuning, prompting, or retrieval augmentation.

6 min readLast updated June 2026Models

A foundation model is a large-scale artificial intelligence model trained on broad, diverse datasets — typically at internet scale — using self-supervised or weakly supervised learning, and subsequently adapted to a wide variety of downstream tasks. The term was introduced by the Center for Research on Foundation Models (CRFM) at Stanford University in a landmark 2021 report, which defined foundation models by two key properties: they are trained on broad data at scale, and they are adaptable to a wide range of downstream tasks through processes such as fine-tuning, prompting, or retrieval augmentation.

The "foundation" metaphor is deliberate: these models serve as a common base upon which task-specific applications are built, rather than training a separate specialised model for each application. Practitioners leverage the general representations learned during pretraining and invest relatively little additional compute in task-specific adaptation.

Pretraining at Scale

Foundation models are distinguished above all by the scale of their pretraining. They are trained on datasets comprising hundreds of billions to trillions of tokens of text, billions of images, or combinations of modalities, using distributed training across thousands of specialised accelerators (GPUs or TPUs) over weeks or months.

The pretraining objective varies by modality. Language foundation models typically use next-token prediction (autoregressive language modelling) or masked token prediction (as in BERT). Vision foundation models use contrastive objectives, masked image modelling, or image-text alignment (as in CLIP). Multimodal models combine these approaches.

The scaling laws empirically validated by Kaplan and colleagues at OpenAI (2020) and subsequently by Hoffmann and colleagues at DeepMind (Chinchilla, 2022) hold that model capabilities improve predictably as a power law with increases in model parameters, training data, and compute, provided these three factors are scaled in proportion. This regularity has guided the design of successive model generations.

Architecture

The dominant architecture for foundation models is the Transformer, introduced by Vaswani and colleagues in 2017. Its self-attention mechanism scales efficiently to long contexts and parallelises well across accelerator arrays. GPT-family models use a decoder-only Transformer; BERT uses an encoder-only architecture; T5 uses an encoder-decoder design. The LLaMA family, Mistral, and Qwen use decoder-only architectures with refinements including grouped-query attention, rotary positional embeddings (RoPE), and SwiGLU activation functions.

Adaptation Methods

Foundation models are not typically deployed directly from pretraining weights. Several adaptation strategies are used.

Fine-tuning updates some or all of the model's parameters on a task-specific labelled dataset. Full fine-tuning is computationally expensive; parameter-efficient methods such as LoRA (Low-Rank Adaptation) and adapters fine-tune a small fraction of parameters while keeping the rest frozen.

Prompting and in-context learning bypass weight updates entirely. A carefully constructed natural language prompt — potentially including examples of the task — is prepended to the input, and the model generates the desired output in a zero-shot or few-shot setting.

Retrieval-augmented generation (RAG) augments a frozen foundation model with an external knowledge base, enabling it to incorporate facts not present in its training data without retraining.

Reinforcement learning from human feedback (RLHF) further aligns a pretrained model with human preferences and instructions, producing instruction-following models such as ChatGPT and Claude.

Notable Examples

As of mid-2026, prominent foundation models include GPT-4 and GPT-5 (OpenAI), Claude 3 Opus and Claude Sonnet 4 (Anthropic), Gemini 2.5 Pro (Google DeepMind), Llama 3.x and Llama 4 (Meta), Mistral Large (Mistral AI), Qwen 2.5 and Qwen 3 (Alibaba), and DeepSeek-V3 (DeepSeek AI). These span text, vision, code, audio, and multimodal capabilities, and are deployed via cloud APIs, on-premises installations, and edge devices depending on model size.

Open-weight models — whose parameters are publicly released — such as Llama and Mistral have enabled a broad ecosystem of derivative models fine-tuned for specific languages, domains, and applications.

Governance and Concerns

Foundation models raise policy and ethical concerns. Their training data often contains copyrighted material, personal information, and biased content, raising questions about intellectual property, privacy, and fairness. The concentration of capability in a small number of large organisations — due to the enormous capital requirements for training at scale — has prompted discussions about access, competition, and AI governance. The EU AI Act (2024) designates general-purpose AI (GPAI) models above a certain compute threshold as requiring additional transparency and safety obligations.

Malaysian Context — Foundation Models and National AI Strategy

Foundation models are central to Malaysia's ambitions under the Malaysia AI Roadmap and the MyDigital Blueprint. Cloud hyperscalers — Microsoft Azure, Amazon Web Services, and Google Cloud — have announced significant data centre investments in Malaysia to serve the Southeast Asian market, making foundation model APIs widely accessible to Malaysian businesses.

Amazon Web Services, through Amazon Bedrock, provides API access to a range of foundation models including Claude, Llama, and Titan, and has engaged with Malaysian enterprises through its local operations. Microsoft's Azure AI platform delivers GPT-4 and related models through local partners. These cloud-delivered foundation models have been adopted by Malaysian banks (Maybank, CIMB, Public Bank), telecommunications companies (CelcomDigi, Maxis), and government agencies as the basis for customer service automation, document processing, and internal productivity tools.

MDEC has worked with cloud providers to make foundation model access available to Malaysian SMEs through the Malaysia Digital SME Programme, subsidising API usage and providing training resources. HRD Corp has approved foundation-model-focused upskilling modules covering fine-tuning, prompt engineering, and RAG as part of its human capital development mandate.

AI Teragrid (AITG Sdn Bhd), a Penang-based AI infrastructure company, builds enterprise AI solutions on top of foundation models from AWS Bedrock and other providers, implementing retrieval-augmented generation and multi-agent systems for Malaysian and Southeast Asian clients. The national broadband improvements under Jalinan Digital Negara (JENDELA) are reducing latency for API-based access to cloud-hosted foundation models in East Malaysia and rural areas.

Malaysia's National AI Office is considering policies around domestic foundation model development, including whether to invest in training Bahasa Malaysia-capable foundation models to serve government, education, and public service use cases with greater linguistic fidelity than English-first international models.

References

Bommasani, R., et al. (2021). On the opportunities and risks of foundation models. arXiv:2108.07258. Stanford CRFM.
Kaplan, J., et al. (2020). Scaling laws for neural language models. arXiv:2001.08361. OpenAI.
Hoffmann, J., et al. (2022). Training compute-optimal large language models. arXiv:2203.15556. DeepMind.
Vaswani, A., et al. (2017). Attention is all you need. NeurIPS 2017.
AWS. (2025). What are foundation models? Amazon Web Services documentation. aws.amazon.com.

Tags:foundation model pretraining large language model transfer learning generative AI

Type	AI model class
Term coined	Stanford CRFM, 2021
Key examples	GPT-4, Claude, Gemini, Llama, Stable Diffusion
Training approach	Self-supervised pretraining on internet-scale data
Related	Large language model, transfer learning, fine-tuning, RLHF

Pretraining at Scale

Architecture

Adaptation Methods

Notable Examples

Governance and Concerns

See Also

References

References