Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    FlashLabs Researchers Launch Chroma 1.0: A 4B Actual Time Speech Dialogue Mannequin With Personalised Voice Cloning

    Naveed AhmadBy Naveed Ahmad22/01/2026Updated:31/01/2026No Comments3 Mins Read
    1769050504 blog banner23 1 8

    **The Future of Speech Dialogue: FlashLabs’ Chroma 1.0**

    In the world of natural language processing (NLP), there’s been a buzz about the latest breakthrough in spoken dialogue systems. FlashLabs, a research group, has just released Chroma 1.0, an open-source, end-to-end spoken dialogue model that’s taken the NLP community by storm. In this blog post, we’ll dive into the details of this remarkable model and explore its capabilities.

    **What’s So Special About Chroma 1.0?**

    Chroma 1.0 is a 4B parameter, end-to-end spoken dialogue system that operates directly on discrete speech representations, not text transcripts. This means it can generate speech in real-time, all while preserving the speaker’s identity throughout multi-turn conversations. But that’s not all – it also achieves high speaker similarity, which is a major goal in NLP.

    **Key Features of Chroma 1.0**

    Here are some of the key highlights of Chroma 1.0:

    * **Personalized voice cloning**: Achieves a speaker similarity score of 0.81 on the SEED-TTS-EVAL protocol, outperforming other TTS baselines, including CosyVoice-3.
    * **Real-time performance**: Can generate speech in under 1 second, with a Time to First Token (TTFT) of 146.87 ms and an Actual Time Issue (RTF) of 0.43.
    * **Multi-turn conversations**: Preserves the speaker’s identity throughout multi-turn conversations.
    * **Low latency**: Single-stream inference on an H200 GPU yields an overall TTFT of about 147 ms.

    **How Does Chroma 1.0 Work?**

    Chroma 1.0 consists of two primary subsystems: the Chroma Reasoner and the speech stack. The Chroma Reasoner is built on the Thinker module from the Qwen-omni collection, while the speech stack is constructed on a 1B parameter LLaMA model Spine, a 100M Chroma Decoder, and a Mimi-based Codec Decoder.

    **Training Setup and Synthetic Speech-to-Speech (S2S) Data**

    The researchers utilized an artificial speech-to-speech (S2S) pipeline to train the Spine and Decoder to perform acoustic modeling and voice cloning. The artificial pairs train the Spine and Decoder to perform acoustic modeling and voice cloning, while the Reasoner remains frozen, providing textual embeddings and multimodal hidden states.

    **Evaluation and Results**

    The researchers evaluated Chroma 1.0 on various benchmarks, including the URO Bench, SEED-TTS-EVAL, and more. The results show impressive performance, with scores ranging from 57.44% to 62.07%. Additionally, the model achieves strong results on various oral dialogue metrics.

    **Conclusion**

    Chroma 1.0 is an exciting development in the world of spoken dialogue systems. With its ability to generate speech in real-time, preserve speaker identity, and achieve high speaker similarity, this model has the potential to revolutionize the way we interact with machines. If you’re interested in exploring more, be sure to check out the paper, model weights, project, and playground.

    Naveed Ahmad

    Related Posts

    OpenAI Proclaims Main Growth of London Workplace

    26/02/2026

    eBay to put off 800 workers

    26/02/2026

    Hint raises $3M to resolve the AI agent adoption downside in enterprise

    26/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.