Search Results
51 results for “text”
AI Benchmarking
The systematic evaluation of AI systems using standardised datasets, tasks, and metrics to measure capability, compare models, and track progress across research and deployment contexts.
AI Literacy
AI literacy is the set of knowledge, skills, and attitudes that enable individuals to understand, evaluate, and use artificial intelligence tools effectively and responsibly in personal, professional, and civic contexts.
AI Memory
AI memory refers to the mechanisms that allow artificial intelligence agents to retain, retrieve, and use information across interactions, extending capability beyond a single context window.
AI Music Generation
AI music generation is the use of machine learning models to compose, arrange, or produce music from text prompts or other inputs, spanning full songs with vocals, instrumental tracks, and sound design.
AI Video Generation
AI video generation refers to the automated creation of video content from text prompts, images, or other inputs using generative neural networks, enabling synthetic video production without cameras or traditional animation.
AI Watermarking
AI watermarking refers to techniques for embedding detectable signals into AI-generated content to establish provenance, enable detection, and support content authenticity verification across images, audio, video, and text.
BERT
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based language model developed by Google that reads text bidirectionally to understand word context in natural language tasks.
Chatbot
A chatbot is a software application designed to simulate human conversation through text or voice, ranging from simple rule-based systems to sophisticated AI assistants powered by large language models.
ChatGLM
A family of open-source bilingual (Chinese-English) large language models developed by Zhipu AI and Tsinghua University, known for strong reasoning capabilities, large context windows, and enterprise-grade open-weight releases under MIT licensing.
CLIP
CLIP (Contrastive Language-Image Pre-training) is a multimodal neural network model developed by OpenAI that learns visual concepts from natural language descriptions by jointly training an image encoder and a text encoder on 400 million image-text pairs.
Context Window
The maximum number of tokens — including the prompt, prior conversation, retrieved documents, and the model's own output — that a large language model can process in a single forward pass.
DALL-E
DALL-E is a series of text-to-image generative AI models developed by OpenAI that create photorealistic and artistic images from natural language prompts using diffusion and language-vision alignment techniques.
Diffusion Model
A class of generative AI models that learn to reverse a gradual noise-addition process, enabling the generation of high-quality images, audio, and video from random noise guided by text or other conditioning signals.
ElevenLabs
ElevenLabs is an AI audio research and deployment company founded in 2022 that develops text-to-speech, voice cloning, dubbing, and conversational voice agent technologies based on proprietary deep learning models.
Embedding
An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.
Gemini
Gemini is a family of multimodal large language models developed by Google DeepMind, designed to natively process and generate text, code, images, audio, and video across a range of model sizes.
Generative AI
Generative AI refers to artificial intelligence systems capable of producing new content — text, images, audio, video, or code — by learning the underlying distribution of training data.
GPT-4
GPT-4 is a large multimodal language model developed by OpenAI, released in March 2023, that accepts both image and text inputs and demonstrates human-level performance on numerous professional and academic benchmarks.
Hunyuan
A family of large language models developed by Tencent, integrated across WeChat, QQ, and Tencent Cloud, offering multimodal capabilities including text, image, video, voice, and 3D generation through a unified omni-modal architecture.
ILMU (Malaysian Large Language Model)
ILMU is Malaysia's first homegrown multimodal large language model, developed by YTL AI Labs to understand and generate Bahasa Melayu, Manglish and regional dialects across text, voice and vision.
In-Context Learning
In-context learning is the ability of large language models to perform new tasks by conditioning on examples or instructions provided within the input prompt, without updating model weights.
Kimi
A conversational AI assistant and long-context large language model developed by Moonshot AI, a Beijing startup, known for its industry-leading context window lengths and strong performance on agentic reasoning tasks.
Kling AI
A family of generative AI video models developed by Kuaishou Technology in China, capable of producing photorealistic short-form video with synchronised audio from text or image prompts.
Large Language Models
Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.
Machine Translation
Machine translation is the automated conversion of text or speech from one natural language into another using rule-based, statistical, or neural systems.
Mamba (Structured State Space Model)
Mamba is a selective state space model architecture that achieves linear-time sequence modelling, offering a computationally efficient alternative to the Transformer for long-context tasks.
Midjourney
An independent AI research lab and image generation service that produces images and video from natural-language text prompts, accessible primarily through Discord and a web application.
MiniMax
A Chinese AI company and model developer known for the MiniMax-M1 and M2 large language models featuring ultra-long context windows of up to 4 million tokens, strong agentic performance, and open MIT-licensed releases.
Model Context Protocol
The Model Context Protocol (MCP) is an open standard introduced by Anthropic in 2024 that defines a universal interface for connecting large language models to external tools, data sources, and services.
Multimodal AI
Artificial intelligence systems that can process, understand, and generate information across multiple data types simultaneously, including text, images, audio, video, and other modalities.
Named Entity Recognition
Named entity recognition (NER) is a natural language processing task that identifies and classifies named entities in text — such as people, organisations, locations, and dates — into predefined categories.
Natural Language Generation
Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.
Natural Language Processing
Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.
Optical Character Recognition
A computer vision technology that converts images of typed, handwritten, or printed text into machine-readable digital text, increasingly powered by deep learning and transformer-based vision models.
Pika Labs
A United States-based artificial intelligence startup founded in 2023 that develops the Pika text-to-video and image-to-video generation models, competing with Runway and OpenAI's Sora in the AI video generation market.
Prompt Caching
Prompt caching is an inference optimisation technique that stores precomputed key-value representations of repeated prompt prefixes, reducing latency and token processing costs for applications with stable system prompts or long shared contexts.
Runway ML
Runway is a generative artificial intelligence company that develops video generation and editing models, best known for its Gen-series text-to-video systems used in filmmaking and content creation.
Self-Supervised Learning
A machine learning training paradigm in which a model generates its own supervisory signal from unlabelled data by solving pretext tasks, learning rich representations without human-annotated labels.
Semantic Search
Semantic search is a search paradigm that retrieves results based on the meaning and intent of a query rather than exact keyword matches, using vector embeddings to measure conceptual similarity between text.
Sentiment Analysis
Sentiment analysis is a natural language processing technique that automatically identifies and classifies the emotional tone of text as positive, negative, or neutral, and is widely used in customer feedback, social media monitoring, and financial analysis.
Sora
Sora is a text-to-video generative AI model developed by OpenAI that produces short, high-fidelity video clips with synchronised audio from natural-language prompts.
Speech Recognition
Speech recognition, or automatic speech recognition (ASR), is the technology that enables computers to identify and transcribe spoken language into text using acoustic models, language models, and deep learning architectures.
Stability AI
A British artificial intelligence company best known for developing and releasing Stable Diffusion, an open-weight text-to-image generative model, and a family of related image, video, audio, and 3D models.
Stable Diffusion
Stable Diffusion is an open-source latent diffusion model developed by Stability AI that generates high-quality images from text prompts, running efficiently on consumer-grade hardware.
Text Summarisation
Text summarisation is the natural language processing task of producing a shorter version of a document that preserves its key information, using extractive or abstractive techniques.
Text-to-Speech
Text-to-speech is the technology that converts written text into synthesised spoken audio using rule-based, concatenative, or neural network methods.
Token
A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.
Tokenisation
Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.
Tool Use
Tool use in AI refers to the capability of language models to invoke external functions, APIs, or services to retrieve information, perform actions, or extend their abilities beyond text generation.
Vision-Language Model
A multimodal AI system that jointly processes and generates information from both images and text, extending large language models with visual perception capabilities through cross-modal alignment.
Word2Vec
A neural network-based algorithm developed by Google in 2013 that learns dense vector representations of words from large text corpora, capturing semantic and syntactic relationships through distributional similarity.