What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Multi-Task Learning

Multi-task learning is a machine learning approach in which a model is trained simultaneously on multiple related tasks, using shared representations to improve generalisation and data efficiency compared to training separate single-task models.

7 min readLast updated June 2026Foundations

Multi-task learning (MTL) is a training paradigm in which a single model is trained to perform multiple related tasks simultaneously, sharing representations across tasks to improve learning efficiency and generalisation. First formalised by Rich Caruana in a 1997 paper, multi-task learning has become a foundational technique in modern deep learning and underlies many of the most capable models in natural language processing and computer vision. The approach contrasts with the default practice of training a separate model for each task, which ignores potentially useful information shared between related problems.

Core Intuition

The central hypothesis of multi-task learning is that related tasks share underlying structure — statistical regularities, useful features, or domain knowledge — that a jointly trained model can exploit. When a model is trained on multiple tasks simultaneously, the gradient signals from each task constrain the shared parameters toward representations that are broadly useful, acting as a form of implicit regularisation. A model that performs well at sentiment analysis, named entity recognition, and text classification simultaneously must develop features that capture general linguistic structure, rather than overfitting to the idiosyncrasies of a single task's training data.

Caruana's original framing described this as using auxiliary tasks to bias the model toward representations that are generalisable. Tasks that share inductive biases — assumptions about what makes a good solution — benefit from joint training.

Multi-task learning in deep neural networks is implemented through two main architectural patterns.

Hard parameter sharing is the most common approach. A shared backbone network processes inputs from all tasks, with task-specific output heads branching off the final shared layers. All tasks share the backbone weights, and only the output heads are task-specific. This design is computationally efficient and reduces overfitting because the shared parameters must simultaneously satisfy the gradient demands of all tasks. The majority of parameters are shared, with only a small fraction dedicated to task-specific outputs.

Soft parameter sharing gives each task its own full network but adds regularisation losses that encourage the parameters of different task networks to remain similar to one another. Techniques such as L2 distance penalties between corresponding weights across task networks, or cross-task attention mechanisms that allow tasks to selectively borrow representations from one another, implement this pattern. Soft parameter sharing is more flexible but computationally more expensive.

Multi-Task Learning in Natural Language Processing

Multi-task learning has had particularly strong impact in NLP. Early influential work by Collobert and Weston (2008) trained a single neural network on POS tagging, chunking, named entity recognition, semantic role labelling, and language modelling simultaneously, demonstrating that shared representations substantially improved performance across all tasks compared to single-task baselines.

Modern large language models such as GPT and T5 are arguably multi-task learners at scale: they are pre-trained on diverse text prediction objectives (next-token prediction, span infilling, question answering) and then fine-tuned on mixed-task datasets. Instruction tuning, where models are fine-tuned on hundreds of diverse natural language tasks expressed in natural language, is a form of multi-task learning that improves zero-shot generalisation to new tasks.

Models such as T5 (Text-to-Text Transfer Transformer) explicitly frame all NLP tasks as text-to-text transformations and train on a mixture, yielding a versatile model that transfers well to new tasks.

Multi-Task Learning in Computer Vision

In computer vision, multi-task learning is used to jointly train models for object detection, semantic segmentation, depth estimation, and surface normal estimation. Models such as HydraNet train a shared convolutional backbone with task-specific decoder heads for each perception task. This is particularly valuable in autonomous driving, where a single model must simultaneously perform many perception tasks in real time on constrained hardware.

Multi-task learning also enables cross-modal transfer: a model trained jointly on image classification and image captioning may learn richer visual representations than one trained on classification alone, because the captioning task forces the model to encode semantic content expressible in natural language.

Challenges

Despite its benefits, multi-task learning introduces challenges not present in single-task training. Task interference occurs when the gradient updates required by one task conflict with those required by another, leading to degraded performance compared to single-task training. This is especially likely when tasks are semantically dissimilar or when one task is far larger than the others. Negative transfer describes the overall degradation of performance on target tasks due to the influence of unrelated auxiliary tasks.

Addressing task interference is an active research area. Techniques include gradient surgery (projecting gradients to remove conflicting components), uncertainty-weighted loss (scaling each task loss by a learned uncertainty to equalise their contribution), and task grouping (clustering tasks by similarity before joint training). The PCGrad and GradNorm methods are widely used to mitigate task interference in practice.

| Challenge | Description | Mitigation | |---|---|---| | Task interference | Conflicting gradients degrade performance | Gradient surgery, PCGrad | | Negative transfer | Auxiliary tasks hurt target task | Task similarity analysis, task grouping | | Loss scaling | Different task losses on different scales | Uncertainty weighting, GradNorm | | Task imbalance | Large tasks dominate training | Sampling strategies, loss normalisation |

Relationship to Transfer Learning and Foundation Models

Multi-task learning is closely related to transfer learning: both exploit shared structure across tasks, but multi-task learning trains on all tasks simultaneously while transfer learning trains sequentially (pre-training then fine-tuning). Foundation models such as GPT-4 and Claude are products of multi-task pre-training at scale, where exposure to diverse tasks during pre-training endows the model with versatile capabilities that transfer to downstream tasks.

Malaysian Context — Multi-Task Models in Malaysian AI Applications

Multi-task learning is relevant to Malaysian AI development particularly in the context of building models that can handle both Bahasa Malaysia and English, as well as other regional languages such as Tamil and Mandarin Chinese, simultaneously. Malaysian NLP researchers at institutions including Universiti Malaya, Universiti Teknologi Malaysia, and Universiti Sains Malaysia have explored multi-task learning approaches for code-switching detection, sentiment analysis across languages, and named entity recognition in mixed-language Malaysian text.

The Malaysian government's AI initiatives under the MyDigital Blueprint and the National AI Office Malaysia (NAIO) have emphasised the importance of AI systems capable of serving multilingual Malaysian communities. Multi-task models that handle Bahasa Malaysia, English, and other community languages within a single architecture are more cost-effective than maintaining separate single-language models, a practical consideration for government agencies with limited AI infrastructure budgets.

Maybank and CIMB have developed multi-task models for their retail banking operations that simultaneously perform transaction fraud detection, customer churn prediction, and product recommendation from a shared representation of customer behaviour data. This approach reduces the number of separate models that must be maintained and monitored in production, lowering operational complexity.

In Malaysian manufacturing, particularly in the Penang and Klang Valley industrial clusters, multi-task learning is applied in quality control systems that simultaneously detect visual defects, estimate product dimensions, and classify surface finishes from camera images in a single inference pass, meeting the throughput requirements of high-speed production lines.

MDEC's AI research grants and the Malaysian Academy of Sciences have funded university projects applying multi-task learning to challenges specific to Malaysia's digital economy, including AI models for the halal certification supply chain that simultaneously perform product image classification, text extraction from packaging, and compliance rule matching against JAKIM standards.

References

Caruana, R. (1997). Multitask Learning. Machine Learning, 28(1), 41-75. Kluwer Academic Publishers.
Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. arXiv:1706.05098.
Collobert, R., and Weston, J. (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. Proceedings of the 25th International Conference on Machine Learning (ICML).
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., and Finn, C. (2020). Gradient Surgery for Multi-Task Learning. Advances in Neural Information Processing Systems (NeurIPS).
Fifty, C., Amid, E., Zhao, Z., Yu, T., Anil, R., and Finn, C. (2021). Efficiently Identifying Task Groupings for Multi-Task Learning. NeurIPS.

Tags:training generalisation shared-representation transfer-learning

Abbreviation	MTL
Type	Machine learning training paradigm
Key benefit	Improved generalisation via shared representations
Related concepts	Transfer learning, federated learning, meta-learning
Applications	NLP, computer vision, robotics, drug discovery