AIWiki
Malaysia

GPT-4

GPT-4 is a large multimodal language model developed by OpenAI, released in March 2023, that accepts both image and text inputs and demonstrates human-level performance on numerous professional and academic benchmarks.

6 min readLast updated May 2026Models

GPT-4 (Generative Pre-trained Transformer 4) is a large multimodal language model developed by OpenAI and released on 14 March 2023. It represents the fourth generation of OpenAI's GPT series and is distinguished from its predecessors by its ability to accept both image and text inputs — producing text outputs — and by its substantially improved performance across a wide range of academic, professional, and reasoning tasks.[^1] OpenAI declined to publish detailed information about GPT-4's architecture, parameter count, or training infrastructure in the accompanying technical report, citing competitive and safety considerations, making it one of the most capable yet least transparent major AI models at the time of its release.

Capabilities and Benchmarks

At launch, GPT-4 demonstrated performance that OpenAI described as approximately human-level on a range of professional examinations. On a simulated bar exam it scored in approximately the top 10% of test takers, compared with GPT-3.5 which scored in the bottom 10%. It achieved scores above the passing threshold on the United States Medical Licensing Examination (USMLE) and performed at the level of a passing candidate on the Graduate Record Examination (GRE). On Massive Multitask Language Understanding (MMLU), a benchmark spanning 57 subjects from elementary mathematics to professional law, GPT-4 achieved approximately 86%, a substantial improvement over its predecessors.[^2]

These benchmark results attracted significant attention because they suggested that sufficiently large language models were capable of performing tasks long considered to require specialised human training. Critics noted, however, that strong benchmark performance does not necessarily translate to reliable real-world deployment, particularly in high-stakes domains such as medicine or law where errors carry serious consequences.

Architecture

OpenAI has not publicly disclosed the architectural details of GPT-4. Based on information disclosed by company personnel and independent analysis, researchers believe it employs a Mixture of Experts (MoE) design — routing each input token to a subset of specialised sub-networks — which would allow the total parameter count to be substantially larger than the active parameter count at inference time, improving both capacity and compute efficiency. This architecture pattern had been discussed in academic literature prior to GPT-4 but had not been confirmed for large proprietary models at that scale.[^3]

Like earlier GPT models, GPT-4 uses a transformer-based decoder architecture and was trained using a combination of unsupervised pre-training on large text corpora and Reinforcement Learning from Human Feedback (RLHF), which aligns model outputs toward responses rated as helpful, harmless, and accurate by human raters.

Multimodal Vision Input

GPT-4 was the first GPT model to accept image inputs. Users can submit a photograph, diagram, chart, or screenshot alongside a text prompt, and GPT-4 can describe the image, answer questions about its content, reason about visual relationships, and extract structured information from charts or tables. OpenAI initially deployed the vision capability only to select partners before making it broadly available through ChatGPT and the API.

The practical applications of the vision capability span a wide range, from accessibility tools that describe images for visually impaired users to automated analysis of medical imaging, interpretation of engineering schematics, and extraction of data from scanned documents.

Context Window

The initial release of GPT-4 supported a 32,768-token context window — roughly 25,000 words — a significant increase over GPT-3.5's 4,096-token limit. In November 2023, OpenAI introduced GPT-4 Turbo, which extended the context window to 128,000 tokens (approximately 100,000 words), enabling the model to reason over book-length documents, large codebases, or extended conversation histories in a single prompt.[^4]

Variants and Successors

OpenAI subsequently released several GPT-4 variants. GPT-4 Turbo combined the larger context window with updated training data and reduced API pricing. GPT-4o (May 2024), described as an "omni" model, extended native multimodality to include audio input and output alongside text and images, and operated substantially faster and at lower cost than GPT-4 Turbo. The GPT-4.1 series and the reasoning-focused o1 and o3 model families followed, each emphasising different performance trade-offs between speed, cost, and complex multi-step reasoning.

Safety and Alignment

OpenAI invested heavily in safety testing prior to GPT-4's release, engaging more than 50 external experts in adversarial testing ("red-teaming") across domains including cybersecurity, bioweapons risk, and disinformation. The model was trained with a system prompt mechanism that allows API users to specify behavioural guidelines and constrain model outputs for their specific use case. Despite these measures, GPT-4 — like all large language models — remains susceptible to prompt injection, jailbreaking, and the generation of plausible-sounding but factually incorrect content (hallucination).

References

  1. OpenAI. (2023). GPT-4 Technical Report. arXiv:2303.08774.
  2. OpenAI. (2023). GPT-4 Technical Report: Benchmark Performance. OpenAI.
  3. Shazeer, N., et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538.
  4. OpenAI. (2023). Introducing GPT-4 Turbo. OpenAI DevDay Blog.