What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Softmax Function

The softmax function converts a vector of real-valued scores into a probability distribution, and is widely used as the output layer of neural network classifiers and in attention mechanisms.

4 min readLast updated June 2026Foundations

The softmax function takes a vector of arbitrary real numbers, often called logits, and transforms it into a vector of values between 0 and 1 that sum to 1, thereby forming a valid probability distribution. For an input vector with components z, the i-th output is softmax(z_i) = exp(z_i) / sum_j exp(z_j), the exponential of that component divided by the sum of the exponentials of all components. The function is a smooth, differentiable generalisation of the arg-max operation, which is why it is sometimes described as a soft version of selecting the largest element.

Properties

Softmax has several properties that make it well suited to machine learning. Its outputs are always positive and always sum to one, so they can be interpreted directly as class probabilities. It is monotonic, preserving the order of the inputs, so the largest logit yields the largest probability. The exponential exaggerates differences between inputs: a logit that is moderately larger than the others receives a disproportionately large share of the probability mass, while small differences are softened.

A useful feature is that softmax is invariant to adding a constant to every input. Subtracting the maximum logit from each component before exponentiating changes nothing mathematically but prevents the exponentials from overflowing to very large numbers, and this trick is standard in numerically stable implementations.

Relationship to classification

Softmax is the canonical output layer for multi-class classification networks. The raw scores produced by the final linear layer are passed through softmax to obtain a probability for each class, and the class with the highest probability is taken as the prediction. During training, the softmax output is compared against the true class using cross-entropy loss, a pairing so common that many frameworks fuse the two operations for efficiency and numerical stability. The gradient of cross-entropy with respect to the logits takes the simple form of the predicted probability minus the true label, which makes optimisation by gradient descent straightforward.

For two-class problems, softmax reduces to the logistic sigmoid function. Softmax can therefore be seen as the multi-class extension of logistic regression.

Beyond classification

The function appears throughout modern deep learning beyond the final classification layer. In the attention mechanism at the heart of transformer architectures, softmax converts raw compatibility scores between tokens into attention weights that sum to one, determining how much each token attends to every other. In reinforcement learning, softmax over action values produces a stochastic policy. A temperature parameter is often introduced to control sharpness: a high temperature flattens the distribution toward uniform, encouraging exploration or diversity, while a low temperature concentrates mass on the top choice. This temperature control is exactly what governs the randomness of text produced by large language models during sampling.

| Setting | Role of softmax | | --- | --- | | Image or text classifier | Produces class probabilities | | Transformer attention | Normalises attention weights | | Language model sampling | Shapes next-token distribution | | Reinforcement learning | Defines a stochastic policy |

Malaysian Context — A Building Block in Local AI Systems

The softmax function is a foundational component in virtually every neural network deployed by Malaysian organisations, even though it is rarely visible to end users. Homegrown large language models such as ILMU, developed by YTL AI Labs, and MaLLaM rely on softmax both in their attention layers and in the next-token prediction that generates Bahasa Melayu and Manglish text. Any classification service offered by Malaysian technology firms, from document categorisation to sentiment analysis, uses a softmax output layer.

In the financial sector, models built by banks including Maybank and CIMB for credit scoring and transaction categorisation produce calibrated probabilities through softmax, supporting the explainability that Bank Negara Malaysia expects of analytics used in regulated decisions. Telecommunications operators such as Maxis and CelcomDigi apply the same machinery in churn and recommendation models.

Because softmax is part of the standard deep-learning curriculum, it features in AI courses delivered through Malaysian universities and in upskilling programmes funded by HRD Corp and coordinated under MDEC. As Malaysia builds out domestic AI-cloud and data-centre infrastructure to train and serve larger models, efficient and numerically stable softmax computation on GPUs and accelerators remains a quiet but essential ingredient of local AI capability.

References

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Bridle, J. S. (1990). Probabilistic Interpretation of Feedforward Classification Network Outputs. Neurocomputing. Springer.
Vaswani, A. et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.

Tags:activation function neural networks classification probability

Type	Activation / normalising function
Input	Vector of real numbers (logits)
Output	Probability distribution summing to 1
Formula	exp(z_i) / sum_j exp(z_j)
Common pairing	Cross-entropy loss
Related	Sigmoid, attention mechanism