Search Results
14 results for “audio”
AI Music Generation
AI music generation is the use of machine learning models to compose, arrange, or produce music from text prompts or other inputs, spanning full songs with vocals, instrumental tracks, and sound design.
AI Watermarking
AI watermarking refers to techniques for embedding detectable signals into AI-generated content to establish provenance, enable detection, and support content authenticity verification across images, audio, video, and text.
Diffusion Model
A class of generative AI models that learn to reverse a gradual noise-addition process, enabling the generation of high-quality images, audio, and video from random noise guided by text or other conditioning signals.
ElevenLabs
ElevenLabs is an AI audio research and deployment company founded in 2022 that develops text-to-speech, voice cloning, dubbing, and conversational voice agent technologies based on proprietary deep learning models.
Embedding
An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.
Gemini
Gemini is a family of multimodal large language models developed by Google DeepMind, designed to natively process and generate text, code, images, audio, and video across a range of model sizes.
Generative AI
Generative AI refers to artificial intelligence systems capable of producing new content — text, images, audio, video, or code — by learning the underlying distribution of training data.
Kling AI
A family of generative AI video models developed by Kuaishou Technology in China, capable of producing photorealistic short-form video with synchronised audio from text or image prompts.
Multimodal AI
Artificial intelligence systems that can process, understand, and generate information across multiple data types simultaneously, including text, images, audio, video, and other modalities.
Sora
Sora is a text-to-video generative AI model developed by OpenAI that produces short, high-fidelity video clips with synchronised audio from natural-language prompts.
Speech Recognition
Speech recognition, or automatic speech recognition (ASR), is the technology that enables computers to identify and transcribe spoken language into text using acoustic models, language models, and deep learning architectures.
Stability AI
A British artificial intelligence company best known for developing and releasing Stable Diffusion, an open-weight text-to-image generative model, and a family of related image, video, audio, and 3D models.
Text-to-Speech
Text-to-speech is the technology that converts written text into synthesised spoken audio using rule-based, concatenative, or neural network methods.
Whisper
Whisper is an open-source automatic speech recognition system developed by OpenAI, trained on 680,000 hours of multilingual audio data and capable of transcription, translation, and language identification across nearly 100 languages.