What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Cross-Entropy Loss

Cross-entropy loss is the standard objective function for training classification models, measuring the divergence between a predicted probability distribution and the true distribution of labels.

4 min readLast updated June 2026Foundations

Cross-entropy loss is the objective function most commonly minimised when training classification models, including the vast majority of neural network classifiers and large language models. It quantifies how far a model's predicted probability distribution lies from the true distribution of the labels, returning a small value when the model assigns high probability to the correct class and a large value when it is confidently wrong. Minimising cross-entropy is equivalent to maximising the likelihood of the observed data under the model.

Definition

For a single example with a true label expressed as a one-hot vector and a predicted distribution p, cross-entropy loss is L = -sum_i y_i * log(p_i), the negative sum over classes of the true probability times the logarithm of the predicted probability. Because the true distribution is usually one-hot, with a single correct class, the expression simplifies to the negative logarithm of the probability the model assigned to that correct class. If the model assigns probability close to 1 to the right class, the logarithm is near zero and the loss is small; if it assigns a tiny probability, the logarithm is a large negative number and the loss is large.

For binary classification the formula reduces to binary cross-entropy, L = -(y * log(p) + (1 - y) * log(1 - p)), evaluated for the single output probability.

Connection to information theory and likelihood

The name comes from information theory, where cross-entropy measures the average number of bits needed to encode events from one distribution using a code optimised for another. Minimising it drives the predicted distribution toward the true one. Cross-entropy equals the entropy of the true distribution plus the Kullback-Leibler divergence between the two distributions; since the entropy term is fixed, minimising cross-entropy minimises the divergence. From a statistical viewpoint, this is exactly maximum likelihood estimation, which is one reason the loss is so well motivated.

Why it pairs with softmax

Cross-entropy is almost always applied to the output of a softmax layer. This pairing is favoured because the gradient of the combined operation with respect to the network's raw scores is simply the predicted probability minus the true label. This clean gradient avoids the saturation problems that arise when squared error is used with sigmoid or softmax outputs, where gradients can become vanishingly small and stall learning. The simple gradient propagates efficiently through backpropagation, making optimisation by gradient descent fast and stable. Deep-learning frameworks typically fuse softmax and cross-entropy into one numerically stable operation.

| Variant | Setting | | --- | --- | | Categorical cross-entropy | Multi-class, one-hot labels | | Binary cross-entropy | Two-class problems | | Sparse categorical cross-entropy | Multi-class, integer labels |

Role in language models

In large language models, next-token prediction is a classification problem over the vocabulary, and cross-entropy is the loss used during pre-training and fine-tuning. The exponential of the average cross-entropy is the perplexity, a standard measure of how well a language model predicts text. Lower cross-entropy means lower perplexity and a better fit to the training distribution.

Malaysian Context — Training the Models Behind Local AI

Cross-entropy loss is the optimisation target behind the AI systems being trained in Malaysia. Homegrown large language models such as ILMU, built by YTL AI Labs in collaboration with Universiti Malaya, and MaLLaM are trained by minimising cross-entropy over Bahasa Melayu, English and dialect text, so the quality of their Malay-language fluency is measured in part through cross-entropy and the related perplexity metric.

The growing domestic capacity to train such models depends on AI-cloud and data-centre infrastructure, including YTL's facilities developed with Nvidia and other regional investments aligned with the MyDigital Blueprint and Malaysia's sovereign-AI ambitions. Classification systems built by Malaysian banks such as Maybank and CIMB, and by telecommunications operators including Maxis and CelcomDigi, are likewise trained with cross-entropy for tasks from credit scoring to document classification.

Because it is a core part of the machine-learning curriculum, cross-entropy is taught in Malaysian university programmes and in industry upskilling courses funded through HRD Corp and supported by MDEC. As local organisations train and fine-tune more models in-country, a working understanding of the loss function that drives that training remains fundamental to Malaysia's AI talent base.

References

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. Wiley.

Tags:loss function classification neural networks optimisation

Type	Loss / objective function
Used for	Classification tasks
Formula	-sum_i y_i * log(p_i)
Common pairing	Softmax output layer
Roots in	Information theory
Related	Maximum likelihood, KL divergence