Hanabi AI, a cutting-edge voice technology startup, has announced the release of OpenAudio S1, the world’s first generative AI voice actor model. Unlike conventional text-to-speech engines, OpenAudio S1 enables real-time control over emotion, tone, and pacing—transforming AI speech into authentic performance. The model is now live in open beta on fish.audio, free for all to explore.
1. A New Paradigm: From Speech Generation to Voice Acting
- OpenAudio S1 reimagines speech synthesis as emotional voice performance, not just scripted output.
- Users can direct speech with real-time control over emotion, pacing, and tone.
- Achieves emotional authenticity, replicating subtle expressions like trembling, hesitation, or excitement.
2. Technical Excellence Backed by Research
- Built on a 4 billion parameter architecture, trained on diverse text/audio datasets.
- Integrates fully with fish.audio, making professional-grade tools accessible to creators at all levels.
- Real-time control features include instruction parsing such as:
- (confident but hiding fear)
- (whispering with urgency)
3. Benchmark-Leading Performance
- Tops Hugging Face’s TTS Arena benchmarks in key areas:
- Expressiveness: Handles complex emotions (sarcasm, joy, sadness, fear).
- Latency: Delivers <100ms latency—ideal for live interaction (games, assistants).
- Controllability: Adjust tone, pitch, emotion mid-session with granular precision.
- Outperforms models by ElevenLabs, OpenAI, and Cartesia.
4. Voice Cloning and Multilingual Fluency
- Excels at voice cloning—replicating rhythm, timbre, and pacing with high fidelity.
- Supports 11 languages and multi-speaker dialogues, maintaining tonal consistency across languages.
5. A Vision for Emotionally Intelligent AI
- Hanabi AI sees voice as the emotional core of human-AI interaction.
- OpenAudio S1 is a stepping stone toward AI companions that communicate with empathy, intent, and warmth.
- Ongoing research by Hanabi’s OpenAudio Lab explores future improvements in emotional nuance and fidelity.
6. Commercial Traction & Gen Z-Led Growth
- Founded by a four-person Gen Z team, Hanabi scaled from $400K to $5M ARR in 4 months.
- Monthly active users jumped from 50K to 420K, driven by adoption of Fish Audio tools.
- Founder Shijia Liao brings 7+ years of speech AI expertise and deep roots in open-source innovation (So-VITS-SVC, GPT-SoVITS, Bert-VITS2).
7. Accessible Today – and Open Tomorrow
- OpenAudio S1 is available in open beta via fish.audio.
- Hanabi plans to release parts of its model architecture, training pipeline, and inference stack, reinforcing its commitment to open research and community.
With OpenAudio S1, Hanabi AI is reshaping the landscape of AI-generated voice—from robotic text-to-speech to emotionally rich, real-time voice acting. Backed by powerful research, benchmark-setting technology, and open accessibility, this innovation sets a new creative standard for voice-driven AI experiences. It marks a defining step toward human-like interaction in storytelling, gaming, and digital communication—and a pivotal moment for the future of TechEdgeAI.