What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Rotary Positional Embedding (RoPE)

A method for encoding token positions in transformer models by rotating query and key vectors, capturing relative position through rotation angles rather than additive position vectors.

5 min readLast updated July 2026Foundations

Rotary Positional Embedding, commonly abbreviated as RoPE, is a technique for encoding the position of tokens within a sequence in transformer neural networks. It was introduced in the 2021 paper "RoFormer: Enhanced Transformer with Rotary Position Embedding" by Jianlin Su and collaborators. Rather than adding a separate positional vector to each token embedding, RoPE rotates the query and key vectors used in the self-attention mechanism by an angle that depends on each token position. This construction encodes absolute position while naturally expressing the relative distance between any two tokens inside the attention computation.

Background and motivation

Transformer models process all tokens in a sequence in parallel and therefore have no inherent notion of order. Some form of positional information must be injected so that the model can distinguish a sentence from a shuffled version of the same words. Early transformers used fixed sinusoidal position vectors or learned absolute position embeddings, both of which are added to the token embeddings before the first layer. These approaches encode absolute position well but represent relative position only indirectly, and learned absolute embeddings generalise poorly to sequences longer than those seen during training.

RoPE addresses these limitations by changing where and how position enters the model. Position is applied multiplicatively to queries and keys at every attention layer rather than added once at the input.

How it works

The core idea is to treat pairs of dimensions in the query and key vectors as coordinates in a two-dimensional plane, and to rotate each pair by an angle proportional to the token position. A query vector at position m and a key vector at position n are each rotated, and because the attention score depends on the dot product between them, the result depends only on the difference between the positions, written m - n. In effect, the relative distance between two tokens is baked directly into their similarity score.

Concretely, the embedding dimensions are partitioned into two-dimensional chunks. Each chunk is rotated by an angle set by a frequency parameter, with lower dimensions rotating quickly and higher dimensions rotating slowly. This spread of frequencies lets the model represent both fine-grained local ordering and coarse long-range position. A useful way to picture it: token embeddings are represented as complex numbers and positions as pure rotations applied to them.

Advantages

RoPE has several properties that explain its wide adoption. It introduces no additional learnable parameters tied to position, so it adds negligible memory and compute cost. Because rotations preserve vector length, the norm of query and key vectors is unchanged, which keeps attention scores numerically stable. Most importantly, the sinusoidal nature of the rotation angles gives RoPE reasonable length extrapolation, allowing a model to handle sequences somewhat longer than those it was trained on. Later techniques such as position interpolation and NTK-aware scaling extend this further, enabling context windows of hundreds of thousands of tokens by rescaling the RoPE frequencies.

The table below contrasts RoPE with earlier schemes.

| Method | Position type | Parameters | Length extrapolation | | --- | --- | --- | --- | | Sinusoidal (additive) | Absolute | None | Limited | | Learned absolute | Absolute | Yes | Poor | | Relative bias | Relative | Some | Moderate | | RoPE | Relative via rotation | None | Good, extensible |

Limitations

RoPE is not without issues. Research has shown that reduced numerical precision, such as the BFloat16 format widely used in training, can degrade the relative position property during long-context training. Extending context length usually requires rescaling the rotation frequencies rather than working out of the box. Despite these caveats, RoPE has become the default position encoding in most open-weight large language models released since 2023, including the Llama, Mistral, Qwen and DeepSeek families.

Malaysian Context — Foundations for Sovereign Language Models

Rotary Positional Embedding is a building block inside almost every modern large language model, including those being developed within Malaysia. ILMU, the Malaysian large language model launched by YTL and trained on the YTL AI Cloud, and MaLLaM, developed by the Mesolitica research community, both build on transformer architectures that rely on RoPE-style position encoding for handling long documents in Malay, English and mixed-code text.

Long-context handling matters for Malaysian applications because official documents, from Bank Negara Malaysia circulars to parliamentary Hansard records, are often lengthy and multilingual. Robust relative position encoding helps models trained by local groups such as MIMOS and university research labs maintain coherence across long passages of Bahasa Melayu.

For AI engineering talent being trained under HRD Corp programmes and at Malaysian universities, understanding position encoding is part of the core curriculum for anyone fine-tuning open-weight models. As Malaysia pursues sovereign AI capability through the National AI Office and MyDigital initiatives, familiarity with foundational components like RoPE underpins the ability to adapt global models to local languages and contexts.

References

Su, J. et al. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv:2104.09864.
Vaswani, A. et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.
labml.ai. (2023). Rotary Positional Embeddings (RoPE). nn.labml.ai.

Tags:transformers positional encoding attention large language models

Type	Positional encoding technique
Introduced	2021 (RoFormer paper)
Proposed by	Jianlin Su and colleagues
Key use	Relative position encoding in transformers
Adopted in	Llama, Mistral, Qwen, DeepSeek and others
Related	Attention mechanism, Transformer architecture