**Breaking News: Inworld AI Unveils TTS-1.5, Revolutionizing Realtime Voice Agents**
I just got my hands on some super exciting news from Inworld AI, and I had to share it with all of you, pronto! The company has just launched TTS-1.5, a game-changing text-to-speech system designed specifically for real-time voice brokers with super strict constraints on latency, quality, and cost. And, according to Artificial Analysis, TTS-1.5 has taken the top spot as the number one ranked text-to-speech system.
So, what makes TTS-1.5 so revolutionary? For starters, it prioritizes P90 time to first audio latency, which is a major deal for consumer-perceived responsiveness. The TTS-1.5 Max model boasts a P90 time of under 250 ms, while the TTS-1.5 Mini model comes in at under 130 ms. That’s about 4 times faster than its predecessor, making it perfect for latency-sensitive applications like real-time gaming or extremely responsive voice brokers.
But that’s not all – TTS-1.5 also builds on its previous version, TTS-1, delivering about 30% more expressive range and 40% higher stability. This means that the system can generate more nuanced and varied speech patterns, reducing errors like truncated sentences, unintended phrase substitutions, or artifacts.
Now, you might be wondering about the pricing. I know I was, and I’m happy to report that Inworld AI has priced TTS-1.5 to be optimized for client-scale deployments. The TTS-1.5 Mini model costs a mere $5 per 1 million characters, while the TTS-1.5 Max model costs $10 per 1 million characters. This means that you can run TTS repeatedly in high-usage products like voice native companions, education platforms, or customer support lines without worrying about the cost.
Multilingual support is also a major feature of TTS-1.5, with support for 15 languages, including English, Spanish, French, Korean, Dutch, Chinese, German, Italian, Japanese, Polish, Portuguese, Russian, Hindi, Arabic, and Hebrew. This allows a single TTS pipeline to cover a wide range of markets without the need for separate models per region.
And, for those looking to create custom or branded voices, Inworld AI offers both instant voice cloning and professional voice cloning options. Instant voice cloning can create a customized voice from just 15 seconds of audio, while professional voice cloning requires at least 30 minutes of fresh audio and targets branded voices and less common accents.
Finally, TTS-1.5 is available as both a cloud API and an on-prem deployment option, allowing you to choose the deployment method that best fits your needs. Whether you’re looking to integrate with existing voice agent stacks or maintain data sovereignty and compliance, Inworld AI has got you covered.
So, what are the key takeaways from this innovative new system? TTS-1.5 delivers real-time performance, increasing expressiveness by 30% and improving stability with a 40% reduction in phrase error rate. It’s priced to be optimized for client-scale deployments, supports 15 languages, and offers both instant and professional voice cloning options. And, best of all, it’s available as both a cloud API and on-prem deployment option.
Ready to learn more about TTS-1.5? Head over to the technical details here. And, don’t forget to follow me on Twitter, join our 100k+ ML SubReddit, and subscribe to our newsletter. Oh, and did I mention that you can now join us on Telegram as well?
