What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning technique that adapts large pre-trained models by injecting small trainable low-rank matrices into transformer layers, drastically reducing the number of trainable parameters without sacrificing performance.

6 min readLast updated May 2026Applications

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that allows large pre-trained language models — and, increasingly, diffusion models and vision models — to be adapted for specific tasks by training only a small number of additional parameters while keeping the original model weights frozen. Introduced by Edward Hu, Yelong Shen, and colleagues at Microsoft Research in 2021, LoRA has become one of the most widely deployed fine-tuning techniques in the industry, enabling practitioners to customise foundation models at a fraction of the computational and financial cost of full fine-tuning.

The Problem LoRA Solves

Full fine-tuning of a large language model requires updating every weight in the model on task-specific data. For models with tens or hundreds of billions of parameters — such as GPT-4, LLaMA 3, or Mistral Large — this demands GPU clusters with hundreds of gigabytes of memory and days or weeks of training time. The resulting fine-tuned model also occupies the same storage footprint as the original, making it impractical to maintain separate fine-tuned versions for multiple downstream tasks.

LoRA addresses this by operating on the hypothesis that the weight updates required for task adaptation have low intrinsic rank. That is, although each weight matrix in a transformer is large — potentially thousands of dimensions on each side — the meaningful change to that matrix during adaptation can be approximated by a much smaller matrix formed from the outer product of two low-dimensional vectors.

Mathematical Foundation

Given a weight matrix W ∈ ℝ^(d×k) in the pre-trained model, LoRA learns two small matrices: A ∈ ℝ^(r×k) and B ∈ ℝ^(d×r), where r is the rank hyperparameter and r ≪ min(d, k). During the forward pass, the effective weight used is W + BA, where BA is the low-rank approximation of the update ΔW. The matrices A and B are initialised so that BA = 0 at the start of training (A is randomly initialised; B is zero-initialised), ensuring the model begins fine-tuning from the pre-trained baseline.

Only A and B are trained; W remains frozen throughout. The rank r controls the capacity of the adaptation — higher rank allows the model to learn more complex task-specific transformations at the cost of more trainable parameters. In practice, r values of 4 to 64 are common, and the ratio of trainable to total parameters is typically 0.01%–1%.

Efficiency Gains

The reduction in trainable parameters is substantial. Compared to full fine-tuning of GPT-3 175B using the Adam optimiser, LoRA reduces the number of trainable parameters by approximately 10,000 times and GPU memory requirements by roughly three times. Because the LoRA matrices A and B can be merged directly back into the original weight matrix W after training — by computing W' = W + BA — there is zero additional inference latency. This contrasts with adapter-based methods that insert additional computations into the forward pass.

QLoRA: Combining Quantisation and LoRA

QLoRA, introduced by Tim Dettmers and colleagues in 2023, combines LoRA with 4-bit quantisation of the base model weights. The base model is loaded in a compressed 4-bit format (using NormalFloat4, a quantisation scheme optimised for normally distributed weights), dramatically reducing memory requirements, while LoRA adapters are trained in higher precision. QLoRA makes it feasible to fine-tune models with tens of billions of parameters on a single consumer GPU, democratising access to custom model development. A 65B parameter model can be fine-tuned on a single 48 GB GPU using QLoRA, a task that would previously have required a multi-GPU cluster.

Application to Diffusion and Vision Models

LoRA was initially developed for language models but has been widely adopted for fine-tuning text-to-image diffusion models such as Stable Diffusion. In this context, LoRA adapters encode the visual style, subject, or composition of a small set of reference images, allowing the base model to generate new images consistent with the learned concept without modifying the base model itself. Platforms such as Civitai host thousands of community-trained LoRA adapters for Stable Diffusion, illustrating the breadth of the technique's adoption.

Variants and Extensions

DoRA (Decomposed LoRA) decomposes the weight update into magnitude and direction components, training them separately and generally achieving better performance at the same rank. AdaLoRA adaptively allocates the rank budget across different weight matrices based on their estimated importance, concentrating capacity where it matters most. LoftQ initialises the LoRA matrices based on the quantisation error of the base model, improving convergence when combined with quantisation.

Malaysian Context — LoRA Adoption in Malaysian AI Development

LoRA and related parameter-efficient fine-tuning techniques are directly relevant to Malaysia's enterprise AI development landscape, where organisations seek to customise large foundation models for local languages, industry-specific terminology, and regulatory requirements without incurring the cost of training or fully fine-tuning large models from scratch. The commercial availability of Bahasa Malaysia datasets and the development of multilingual models with Malay language support have created practical use cases for LoRA-adapted models across the public sector, media, and financial services.

Several Malaysian AI startups and technology companies have employed LoRA to adapt models such as LLaMA and Mistral for Bahasa Malaysia chatbots, legal document analysis, and Malay-language summarisation tools. The technique's low hardware requirements — particularly when combined with QLoRA — allow Malaysian teams to achieve domain adaptation using cloud GPU instances or even high-end workstations rather than dedicated AI clusters, substantially reducing development costs.

ILMU, described as Malaysia's first home-grown large language model and deployed by digital banks, illustrates the broader trend of building or adapting language models for local linguistic and regulatory contexts. LoRA-style fine-tuning is a natural fit for such efforts, allowing the base model to be adapted for specific banking use cases — such as interpreting Bank Negara Malaysia (BNM) policy documents or responding to customer queries in Malaysian English — with minimal compute.

HRD Corp-accredited AI training programmes in Malaysia increasingly cover parameter-efficient fine-tuning methods, reflecting industry demand for practitioners who can adapt large models to enterprise requirements. MDEC's AI initiatives and the Malaysia Digital Blueprint both encourage local capability building in AI model development, for which LoRA has become a standard practitioner skill given its accessibility and demonstrated effectiveness.

References

Hu, E. J., Shen, Y., Wallis, P., et al. (2021). LoRA: Low-rank adaptation of large language models. arXiv:2106.09685.
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient finetuning of quantized LLMs. arXiv:2305.14314.
Liu, S., Gu, T., Hu, E. J., et al. (2024). DoRA: Weight-decomposed low-rank adaptation. ICML 2024.
Zhang, Q., Chen, M., Bukharin, A., et al. (2023). AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning. ICLR 2023.
IBM. (2025). What is LoRA (Low-Rank Adaptation)?. IBM Think. https://www.ibm.com/think/topics/lora
Raschka, S. (2023). Practical tips for finetuning LLMs using LoRA. Ahead of AI Newsletter.

Tags:LoRA fine-tuning parameter-efficient LLM adaptation

Type	Parameter-efficient fine-tuning (PEFT) method
Introduced by	Edward Hu, Yelong Shen et al. (Microsoft Research, 2021)
Paper	arXiv:2106.09685
Key benefit	Up to 10,000x fewer trainable parameters vs full fine-tuning
Related	Fine-tuning, PEFT, Transformer architecture, Quantisation