Search Results
6 results for “multimodal”
Gemini
Gemini is a family of multimodal large language models developed by Google DeepMind, designed to natively process and generate text, code, images, audio, and video across a range of model sizes.
GPT-4
GPT-4 is a large multimodal language model developed by OpenAI, released in March 2023, that accepts both image and text inputs and demonstrates human-level performance on numerous professional and academic benchmarks.
Kling AI
A family of generative AI video models developed by Kuaishou Technology in China, capable of producing photorealistic short-form video with synchronised audio from text or image prompts.
Multimodal AI
Artificial intelligence systems that can process, understand, and generate information across multiple data types simultaneously, including text, images, audio, video, and other modalities.
Text-to-Speech
Text-to-speech is the technology that converts written text into synthesised spoken audio using rule-based, concatenative, or neural network methods.
Transformer Architecture
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of modern large language models and multimodal AI systems.