AIWiki
Malaysia

Fine-Tuning

The process of further training a pre-trained machine learning model on a smaller, task-specific dataset to adapt its weights for a particular domain, task, or desired behaviour.

6 min readLast updated May 2026Applications

Fine-tuning is the process of taking a neural network model that has already been trained on a large, general dataset and continuing to train it on a smaller, more specific dataset to adapt its behaviour for a particular task or domain. The technique sits at the intersection of transfer learning and supervised training, and it has become the dominant paradigm for deploying large language models (LLMs) and other foundation models in production settings.

The core intuition is that a large pre-trained model has already learned general representations of language, vision, or other modalities at significant computational expense. Fine-tuning allows practitioners to leverage these representations without starting from scratch, dramatically reducing the data, time, and compute required to achieve strong performance on a new task.

Full Fine-Tuning

In full fine-tuning, all parameters of the pre-trained model are updated during training on the target dataset. This approach can yield the best task-specific performance because every weight is free to adapt. However, it is computationally intensive: fine-tuning a model with 70 billion parameters requires storing not only the model weights but also optimizer states and gradients, placing heavy demands on GPU memory.[^1] Full fine-tuning also risks catastrophic forgetting—the degradation of the model's general capabilities as it specialises. Regularisation strategies such as weight decay and learning rate warmup help mitigate this.

Parameter-Efficient Fine-Tuning (PEFT)

Parameter-efficient fine-tuning (PEFT) methods address the memory and compute costs of full fine-tuning by keeping most of the base model frozen and training only a small subset of parameters. IBM defines PEFT as a technique in which only a small portion of an LLM's parameters are selectively modified, adding new layers or modifying existing ones in a task-specific manner, with performance comparable to full fine-tuning at a fraction of the cost.[^2]

PEFT encompasses several families of methods, the most widely adopted being Low-Rank Adaptation (LoRA) and its variants.

LoRA (Low-Rank Adaptation)

LoRA injects trainable low-rank matrices into each transformer layer. The key insight is that the change to a model's weight matrix during fine-tuning tends to lie in a low-dimensional subspace—meaning it can be approximated by the product of two smaller matrices. For a weight matrix W of shape (d × k), LoRA adds a perturbation ΔW = BA, where B has shape (d × r) and A has shape (r × k), with rank r much smaller than d or k.[^3]

This reduces the number of trainable parameters by orders of magnitude. A full fine-tune of LLaMA 65B requires more than 780 GB of GPU memory; the same operation with QLoRA (a 4-bit quantised variant of LoRA) requires only 48 GB.[^4] LoRA adapters can be saved separately from the base model and swapped in at inference time, enabling a single hosted base model to serve many fine-tuned variants.

QLoRA

QLoRA extends LoRA by additionally quantising the frozen base model weights to 4-bit precision, substantially reducing the memory footprint of the base model itself. It introduces a new data type (NF4, or 4-bit NormalFloat) and double quantisation to minimise the quantisation error introduced by this compression.

Other PEFT Techniques

Prefix tuning prepends learnable virtual tokens to the input, effectively conditioning the model without modifying any weights. Adapter layers insert small bottleneck modules between transformer blocks. Prompt tuning optimises a small set of input tokens while leaving the full model frozen. Each approach involves different trade-offs between parameter count, convergence speed, and task performance.

Instruction Fine-Tuning

Instruction fine-tuning is a supervised fine-tuning variant in which the model is trained on a curated dataset of (instruction, response) pairs, teaching it to follow natural-language directives. OpenAI's InstructGPT (2022) and subsequent ChatGPT models are prominent examples; the process typically combines supervised fine-tuning with reinforcement learning from human feedback (RLHF) in a multi-stage pipeline.[^5]

Instruction fine-tuning is distinct from domain fine-tuning: the former shapes the model's behavioural style and ability to follow instructions, while the latter injects specialised knowledge.

Evaluation and Overfitting

A persistent risk in fine-tuning on small datasets is overfitting—the model memorises training examples rather than generalising. Common mitigation strategies include early stopping based on validation loss, data augmentation, learning rate scheduling, and mixing a small proportion of general-purpose data into the fine-tuning set to preserve breadth.

See Also

References

References

  1. Databricks. (2024). Efficient Fine-Tuning with LoRA: A Guide to Optimal Parameter Selection for Large Language Models. https://www.databricks.com/blog/efficient-fine-tuning-lora-guide-llms
  2. IBM. (2024). What is parameter-efficient fine-tuning (PEFT)? IBM Think. https://www.ibm.com/think/topics/parameter-efficient-fine-tuning
  3. Hu, E., Shen, Y., Wallis, P., et al. (2022). LoRA: Low-Rank Adaptation of Large Language Models. ICLR 2022. arXiv:2106.09685.
  4. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314.
  5. Ouyang, L., Wu, J., Jiang, X., et al. (2022). Training language models to follow instructions with human feedback. arXiv:2203.02155.
  6. Fintechnews Malaysia. (2025). What Malaysian Banks Are Getting Right (and Wrong) About AI. https://fintechnews.my/52657/banking/what-malaysian-banks-are-getting-right-and-wrong-about-ai/