AIWiki
Malaysia

Kling AI

A family of generative AI video models developed by Kuaishou Technology in China, capable of producing photorealistic short-form video with synchronised audio from text or image prompts.

6 min readLast updated June 2026Models

Kling AI is a family of generative video models developed by Kuaishou Technology, the Beijing-based operator of the short-video platform Kuaishou and the international Kwai application. First released in June 2024, Kling AI was among the earliest publicly accessible video generation systems to match the qualitative leap demonstrated by OpenAI's Sora announcement earlier that year. Since launch, the series has produced multiple generations of models capable of generating high-resolution video clips from text or image prompts, with rapidly expanding controls over motion, style, duration, and — in the most recent releases — natively generated audio.

Model generations

The original Kling 1.0 model, released in June 2024, offered text-to-video and image-to-video generation at up to 1080p resolution and clip lengths around five to ten seconds. Successive releases through 2024 and 2025 introduced improved motion consistency, longer-duration generation, and finer prompt adherence. Kling 2.0 added improved physical realism, while the Kling 2.5 Turbo model, released in September 2025, focused on inference cost reduction and improved benchmarks against competing systems such as Veo and Seedance.

In December 2025, Kuaishou released Kling Video 2.6, which introduced simultaneous audio-visual generation. The model produces visuals, voiceovers, ambient sound, and sound effects within a single forward pass, eliminating the need for a separate dubbing stage in traditional AI video pipelines. The same period saw the launch of Kling O1, marketed as a unified multimodal video model that consolidates text-to-video, image-to-video, video editing, in-painting, style transfer, and shot extension into a single engine. Kling 3.0, launched globally in early 2026, extended generation duration to roughly fifteen seconds and added multilingual native audio output across languages and dialects.

Capabilities

Kling supports several generation modes. Text-to-video produces a clip from a natural-language description. Image-to-video animates a still input image. Start-frame and end-frame conditioning lets users specify the first and last frame, with the model interpolating motion in between. Reference-based generation lets a user supply a character, style, or scene reference that the generated clip should respect. Editing operations such as in-painting allow specific regions of an existing clip to be modified, and shot extension lets a user lengthen an existing clip beyond its original duration.

Audio-visual generation in Kling 2.6 and later produces synchronised speech, singing, ambient sound, and effects aligned with the visual content. Multilingual support in Kling 3.0 covers major world languages and several dialects. Output resolution typically ranges from 720p to 1080p depending on tier and use case.

Access and ecosystem

Kling is accessible through the Kling AI web interface, the Kuaishou mobile applications, and an API for enterprise customers. The service operates on a credit-based pricing model in which different model versions, resolutions, and durations consume varying numbers of credits. Kuaishou has reported tens of millions of registered creators and tens of thousands of enterprise clients using Kling AI, particularly in advertising, e-commerce content production, short-form entertainment, and education.

Comparison to other video models

| Model | Developer | Notable feature | Typical clip length | |---|---|---|---| | Kling | Kuaishou (China) | Native audio in Kling 2.6+; multilingual | 5-15 seconds | | Sora | OpenAI (USA) | High-fidelity long-form generation | up to 60+ seconds | | Runway Gen-3 | Runway ML (USA) | Strong creative tooling integration | 5-10 seconds | | Veo | Google DeepMind | Cinematic style, high resolution | 8 seconds and up | | Pika | Pika Labs (USA) | Strong stylisation, fast iteration | 3-10 seconds | | Seedance | ByteDance (China) | High-throughput generation | 5-10 seconds |

Safety and policy

Like other foundation video models, Kling implements content safety policies covering depiction of real public figures, violence, sexual content, child safety, and trademarks. Kuaishou applies both pre-generation prompt filtering and post-generation content review. Watermarking and provenance disclosures are used in some jurisdictions, and the platform has invested in detection tools to support downstream content moderation. The model operates under Chinese AI governance rules, including the 2023 Interim Measures for the Management of Generative AI Services, which require labelling of AI-generated content and registration of major generative services.

See Also

References

References

  1. Kuaishou Technology. (2025). Kling AI Launches Video 2.6 Model with Simultaneous Audio-Visual Generation. Investor Relations announcement.
  2. Kuaishou Technology. (2025). Kling AI 2.5 Turbo Video Model Release. Investor Relations announcement.
  3. Kuaishou Technology. (2026). Kling 3.0 Global Launch. Investor Relations announcement.
  4. Cyberspace Administration of China. (2023). Interim Measures for the Management of Generative AI Services.