Qwen Researchers Launch Qwen3-TTS: an Open Multilingual TTS Suite with Actual-Time Latency and Superb-Grained Voice Management

Here’s a rewritten version of the article in a more natural and human-like tone:

Breaking News: Alibaba Cloud’s Qwen Team Revolutionizes Text-to-Speech Technology with Qwen3-TTS

In a major breakthrough, the Qwen team at Alibaba Cloud has just released Qwen3-TTS, an open-source multilingual text-to-speech (TTS) suite that’s set to change the game in AI-powered interactions. And the best part? It’s not just faster and more accurate – it’s also ridiculously customizable.

So, what makes Qwen3-TTS so special? For starters, it’s got three core tasks under its belt: voice cloning, voice design, and high-quality speech generation. And with support for 10 languages, including Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian, this suite is ready to conquer the globe.

But let’s talk about the really cool stuff. Qwen3-TTS can produce speech in real-time, thanks to its 12Hz speech tokenizer and two language model sizes. And with over 5 million hours of multilingual speech data to draw from, this suite is able to learn the nuances of each language and generate speech that’s both natural-sounding and accurate.

But the real game-changer here is the fine-grained voice control. The VoiceDesign model lets users create new voices from scratch, specifying everything from tone and pace to pitch and accent. Yeah, it’s a whole new level of control, and it’s got massive potential applications across industries – from entertainment to education to healthcare.

So, what does it look like under the hood? Qwen3-TTS is all about flexibility and customization. It’s got a tokenizer for creating custom speech tokens, a streaming decoder for generating speech in real-time, and a range of evaluation metrics and tools to help you assess the quality and accuracy of the generated speech.

In terms of performance? Qwen3-TTS is a beast. On the Seed-TTS test set, it scored a word error rate of 0.77 on Chinese and 1.24 on English. And on the InstructTTSEval test set? It’s beating the competition hands down.

So, what does this mean for you? Qwen3-TTS is a major breakthrough in TTS technology, and it’s got the potential to revolutionize the way we interact with AI-powered systems. Want to try it out for yourself? Head on over to the model weights, repo, and playground to get started.

And if you’re as stoked as I am about the future of TTS, be sure to follow me on Twitter for the latest updates. And if you’re just Starting to get into AI and machine learning, join our community on Reddit and subscribe to our newsletter – we’d love to have you along for the ride!

Oh, and one more thing: if you’re looking for a community to geek out with about AI and machine learning, check out our Telegram channel for the latest news and developments.

Qwen Researchers Launch Qwen3-TTS: an Open Multilingual TTS Suite with Actual-Time Latency and Superb-Grained Voice Management

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

Walmart-owned Flipkart, Amazon are squeezing India’s fast commerce startups

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Imaginative and prescient-Language Mannequin with Bounding Field Prediction, Multilingual Assist, and Sub-250ms Edge Inference

Qwen Researchers Launch Qwen3-TTS: an Open Multilingual TTS Suite with Actual-Time Latency and Superb-Grained Voice Management

Related Posts

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

Walmart-owned Flipkart, Amazon are squeezing India’s fast commerce startups

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Imaginative and prescient-Language Mannequin with Bounding Field Prediction, Multilingual Assist, and Sub-250ms Edge Inference