What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Contrastive Learning

Contrastive learning is a self-supervised machine learning paradigm that trains models to produce similar representations for related data pairs and dissimilar representations for unrelated pairs, enabling powerful feature learning without labelled data.

6 min readLast updated June 2026Foundations

Contrastive learning is a self-supervised learning approach that trains neural networks to produce meaningful representations by comparing data samples against one another. The core principle is straightforward: a model should assign similar representation vectors to data points that are semantically related (positive pairs) and dissimilar vectors to data points that are unrelated (negative pairs). By learning to enforce these similarity constraints from unlabelled data — using data augmentation or cross-modal correspondence to define what counts as related — contrastive methods enable powerful feature learning without requiring human-annotated labels. Contrastive learning underpins some of the most influential models in computer vision and multimodal AI, including SimCLR, CLIP, MoCo, and BYOL.

Core Concepts

The building block of contrastive learning is the contrastive loss, which rewards a model for pulling positive pairs together in embedding space while pushing negative pairs apart. The most widely used formulation is the InfoNCE loss (also called NT-Xent in some frameworks), which frames the objective as a classification problem: given a query embedding, identify its matching positive key among a large set of negative keys. Models trained with InfoNCE loss learn to maximise the mutual information between paired views, yielding representations that capture shared semantic content while discarding view-specific noise.

A critical design choice is how to define positive pairs. In vision applications, positive pairs are typically formed by applying two different random augmentations to the same image — for example, random cropping, colour jittering, and Gaussian blurring. Two augmented views of the same image should be deemed similar, while views from different images form negative pairs. In cross-modal contrastive learning (as used in CLIP), a positive pair consists of an image and its natural language caption, while other image-caption combinations in the batch form negatives.

Influential Models

SimCLR

Simple Framework for Contrastive Learning of Visual Representations (SimCLR), proposed by Chen et al. at Google in 2020, demonstrated that contrastive learning with sufficiently strong data augmentation and large batch sizes could match or exceed supervised pre-training performance on downstream tasks. SimCLR uses a non-linear projection head between the encoder and the contrastive loss, a design choice subsequently adopted across the field. SimCLR-v2 extended the framework with larger models and semi-supervised learning, achieving near-supervised accuracy on ImageNet using only 1-10% of labels.

CLIP

Contrastive Language-Image Pre-training (CLIP), developed by OpenAI in 2021, applied cross-modal contrastive learning to learn joint image-text embeddings from 400 million internet-scraped image-caption pairs. CLIP's learned representations enable remarkable zero-shot transfer: a model trained with CLIP embeddings can perform image classification on unseen categories simply by comparing image embeddings against text descriptions of candidate classes, with no task-specific fine-tuning. CLIP embeddings are also widely used as the vision backbone for text-to-image models including DALL-E and Stable Diffusion.

MoCo and BYOL

Momentum Contrast (MoCo), from Facebook AI Research, introduced a memory bank of negative samples maintained with a momentum encoder, enabling large effective batch sizes without the GPU memory cost of SimCLR's large batches. Bootstrap Your Own Latent (BYOL), from DeepMind, eliminated negative samples entirely, showing that a model can learn useful representations by bootstrapping predictions from one augmented view against an exponential moving average of a second view — though the theoretical justification for why this does not collapse to trivial solutions remains an active research question.

Applications

Beyond image classification pre-training, contrastive learning has found applications across a wide range of AI domains. In natural language processing, contrastive objectives are used to train sentence encoders (SimCSE) that excel at semantic textual similarity tasks. In retrieval and search, contrastive-trained dense encoders power bi-encoder retrieval systems, including those used in retrieval-augmented generation pipelines. In recommender systems, contrastive learning on user-item interaction graphs learns user and item embeddings that improve recommendation quality. In medical imaging, contrastive pre-training on large unlabelled radiology datasets has improved diagnostic model performance where labelled data is scarce.

Malaysian Context — Contrastive Learning for Malaysian Computer Vision and Search

Contrastive learning is particularly relevant to Malaysian AI applications in domains where labelled training data is scarce or expensive to produce. Malaysia's manufacturing sector — including semiconductor and electronics manufacturing hubs in Penang, Johor, and Selangor — increasingly deploys visual quality inspection systems. Contrastive pre-training on large collections of unlabelled product images allows these systems to learn robust visual representations before fine-tuning on a small set of labelled defect examples, reducing the annotation burden significantly.

In the e-commerce and retail space, companies such as Lazada Malaysia and Shopee Malaysia use embedding-based visual search powered by contrastive learning models. CLIP-style multimodal models that align product images with text descriptions enable cross-modal search, allowing customers to describe products in natural language and retrieve visually relevant results. Malaysian language support for such systems requires cross-lingual and multilingual extensions of CLIP-style training to cover Bahasa Malaysia.

MDEC has supported AI adoption programmes in the creative and design industries, where contrastive learning models underpin content-based image retrieval tools used by graphic designers and advertising agencies. The PDPA (Personal Data Protection Act) has implications for contrastive learning systems trained on datasets that include personal images; organisations must ensure that pre-training data sourced in Malaysia complies with data protection requirements under PDPA 2010 and its amendments.

Malaysian universities including Universiti Malaya and Multimedia University (MMU) have published research applying contrastive learning to medical imaging datasets from local hospitals, exploring self-supervised approaches for analysing chest X-rays and histopathology slides where radiologist-labelled data is limited. HRD Corp-funded training programmes in advanced machine learning increasingly include contrastive learning as a topic, reflecting its growing role in practical deep learning pipelines.

References

Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations. Proceedings of ICML 2020. arXiv:2002.05709.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of ICML 2021.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. Proceedings of CVPR 2020.
Grill, J.-B., Strub, F., Altche, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos, R., & Valko, M. (2020). Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. Advances in NeurIPS 2020.

Tags:self-supervised representation-learning clip deep-learning

Type	Self-supervised learning paradigm
Key models	SimCLR (Google), CLIP (OpenAI), MoCo (Facebook AI), BYOL (DeepMind)
Training signal	Similarity between augmented views of the same sample
Key applications	Visual representation learning, multimodal embeddings, retrieval
Related	Self-supervised Learning, Embedding, Multimodal AI, Transfer Learning

Core Concepts

Influential Models

SimCLR

CLIP

MoCo and BYOL

Applications

See Also

References

References