AIWiki
Malaysia

Search Results

14 results for audio

Applications

AI Music Generation

AI music generation is the use of machine learning models to compose, arrange, or produce music from text prompts or other inputs, spanning full songs with vocals, instrumental tracks, and sound design.

5 min readUpdated June 2026
Applications

AI Watermarking

AI watermarking refers to techniques for embedding detectable signals into AI-generated content to establish provenance, enable detection, and support content authenticity verification across images, audio, video, and text.

6 min readUpdated June 2026
Foundations

Diffusion Model

A class of generative AI models that learn to reverse a gradual noise-addition process, enabling the generation of high-quality images, audio, and video from random noise guided by text or other conditioning signals.

7 min readUpdated May 2026
Companies & Tools

ElevenLabs

ElevenLabs is an AI audio research and deployment company founded in 2022 that develops text-to-speech, voice cloning, dubbing, and conversational voice agent technologies based on proprietary deep learning models.

5 min readUpdated May 2026
Foundations

Embedding

An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.

6 min readUpdated May 2026
Models

Gemini

Gemini is a family of multimodal large language models developed by Google DeepMind, designed to natively process and generate text, code, images, audio, and video across a range of model sizes.

6 min readUpdated May 2026
Applications

Generative AI

Generative AI refers to artificial intelligence systems capable of producing new content — text, images, audio, video, or code — by learning the underlying distribution of training data.

4 min readUpdated May 2026
Models

Kling AI

A family of generative AI video models developed by Kuaishou Technology in China, capable of producing photorealistic short-form video with synchronised audio from text or image prompts.

6 min readUpdated June 2026
Foundations

Multimodal AI

Artificial intelligence systems that can process, understand, and generate information across multiple data types simultaneously, including text, images, audio, video, and other modalities.

5 min readUpdated May 2026
Models

Sora

Sora is a text-to-video generative AI model developed by OpenAI that produces short, high-fidelity video clips with synchronised audio from natural-language prompts.

5 min readUpdated May 2026
Applications

Speech Recognition

Speech recognition, or automatic speech recognition (ASR), is the technology that enables computers to identify and transcribe spoken language into text using acoustic models, language models, and deep learning architectures.

6 min readUpdated May 2026
Companies & Tools

Stability AI

A British artificial intelligence company best known for developing and releasing Stable Diffusion, an open-weight text-to-image generative model, and a family of related image, video, audio, and 3D models.

6 min readUpdated May 2026
Applications

Text-to-Speech

Text-to-speech is the technology that converts written text into synthesised spoken audio using rule-based, concatenative, or neural network methods.

5 min readUpdated May 2026
Applications

Whisper

Whisper is an open-source automatic speech recognition system developed by OpenAI, trained on 680,000 hours of multilingual audio data and capable of transcription, translation, and language identification across nearly 100 languages.

5 min readUpdated June 2026