HOLY SHITT! @ZyphraAI just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥
> Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output
> Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone
> Multilingual Support: Supports English, Japanese, Chinese, French, and German
> Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear)
> Fast Performance: Runs at ~2x real-time speed on an RTX 4090
> Available on the Hugging Face Hub 🤗
> Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output
> Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone
> Multilingual Support: Supports English, Japanese, Chinese, French, and German
> Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear)
> Fast Performance: Runs at ~2x real-time speed on an RTX 4090
> Available on the Hugging Face Hub 🤗