# TTS Tools

Last update: Mar 4, 2024

# Introduction

TTS is an abbreviation of Text To Speech, an AI that converts any given text into vocal speech.
The ones listed here offer a decent variety of features & options, such as model training, fine-tuning, 0 shot training, or being mixed with RVC.
Here's an index of the best TTS tools out there:
‎

# ElevenLabs/11Labs

ElevenLabs is a freemium service (only one in this guide) that offers TTS, training TTS models & translating videos from different languages.
‎

# Bark TTS

Bark is a multilingual TTS model created by Suno AI.
It’s characterized by its great ability to express emotions & non-speech sounds.
Examples are laughter, laughs, sighs, gasps, clearing throat, hesitations ( - or … ), emphasis of words, etc.
It can be used both locally or in the cloud:
- Official Guide
- Fixed Fork (with UI & fine-tuning)
- Voice Cloning
- Bark TTS Colab
- GUI Version (with fine-tuning)
- 0 Shot Voice Cloning
- Official HF Space
- 0 Shot Voice Cloning HF Space
  - For training you'll need a paid GPU. Otherwise you can only TTS.

# ‎

# Edge TTS

This is Microsoft Edge TTS, which is good quality, multilingual & works great on long sentences.
It can only be used online via their API, through their web browser, a HF/Colab space or mixed with RVC.
1. Download the browser.
2. Open your Notepad & paste the following code:
<!DOCTYPE html> <html> <body style="background-color:#dddddd"> <h3 aria-hidden="true">Browser TTS "Hack"</h3> <textarea rows="10" cols="50" id="ttsText" style="background-color:#eeeeee"></textarea> <br /> <button aria-hidden="true" onclick="genText()"><font aria-hidden="true">Generate</font></button> <pre id="tts"></pre> <script> function genText() { var x = document.getElementById("ttsText").value; document.getElementById("tts").innerHTML = x; } </script> </body> </html>
# ‎
1. Save it as “Microsoft Edge TTS.txt”
2. Rename it to “Microsoft Edge TTS.html”
3. Open Microsoft Edge & drag the .html to it.
4. Use Audacity to record the audio. Set the recording mode to loopback to record the internal audio (Realtek driver might be needed).
5. In the TTS input the text you want & click Generate. Stop recording when the voice is done.
6. You can then select Voice Options in the toolbar & change the speed to a faster/slower speech.
- 📒 Google Colab
- 🤗 Hugging Face
- Ilaria RVC
- Applio Colab
- Local Applio
  ‎
These being mixed with RVC means it generates the speech & runs the output through RVC, applying the voice model.

# ‎

# StyleTTS2

StyleTTS 2 aims to achieve human-level TTS synthesis only in English.
‎
It works better on full sentences, is both available locally & online, and you can fine-tune it with your own dataset.
‎
It has 2 versions:
‎
- LJSpeech:
  Its dataset should only be of single-speaker recordings. Suitable for training models with a consistent voice.
  ‎
- LibriTTS:
  Its dataset can be of multispeaker recordings. Allows StyleTTS 2 to adapt to different voices.
  ‎

Official StyleTTS2 Guide

LJSpeech Colab
LibriTTS Colab
StyleTTS2 Finetuning Colab
StyleTTS2 HF Space (Duplicate the space to skip queue. Without GPU you can only infer)

# ‎

# Tortoise TTS

Expressive but a little slow. Available both locally & online.

Official Github Repository

# ‎

# XTTS2

Built on 🐢 Tortoise TTS & developed by Coqui AI, which has been discontinued unfortunately.
Has important model changes that make cross-language 0 Shot voice cloning & multilingual speech generation super easy.
You need less training data. Just least a 2 minute audio.
Can use it either online or locally:

Inference 0 Shot Training UI Colab (Run it & click the Public Link)
Training & Inference UI Colab
Inference 0 Shot Training HF Space

# ‎

# OpenVoice

Has Versatile Instant Voice Cloning (aka 0 Shot Training)
Contains cross-lingual & flexible voice style control
Available both locally & online:

Official GitHub repo

# ‎

# MetaVoice-1B

MetaVoice-1B is a 1.2B parameter base model, trained on 100k hours of speech for TTS.
‎
It has been built with the following priorities:
- Emotional speech rhythm and tone in English.
- Zero-shot cloning for American & British voices, with 30s reference audio.
  ‎
Available both locally & online:

Model Github Repo

TTS with 0 Shot Training Demo | Easier Version
TTS with 0 Shot Training HF Space
Freemium MetaVoice Studio (Only premade voices)

# ‎

# MeloTTS

MeloTTS is a high-quality multilingual TTS library, made by MyShell.ai
Includes almost real-time inference.
It can be used both locally and online:

Official GitHub Repo

# ‎

# GPT-SoVITS

GPT-SoVITS has cross language inference, but there could be some noises.
It's very good with Chinese, but also with English.
Most parts are in japanese & not deeply tested. Expect some instability.
Can be used both locally & online:

Official GitHub Repo

Colab Space (with fine-tuning, inference & UI)

# ‎

# gTTS

It's Google Text To Speech, which is the same one used in Google Translate.
‎
It has very few voices in different languages, is a little robotic & doesn’t let you choose the gender of the voice (except in the Colab, but it's not that great).
‎
It can be used only online (API): ‎

# ‎

# `You have reached the end.`

Report Issues