# Glossary

Last update: June 15, 2024

# List of smaller keywords.

‎: ‎

# Backing vocals

Vocal lines that contribute to the sound of the lead vocals in a song.

# ‎

# Bit depth

In the field of digital audio, it defines the dynamic range of each sample.
This determines the difference between the quietest & loudest sound.
Basically, higher bit depths represent more accurately the loudness of an audio.

# ‎

# Bitrate

The amount of data processed per certain unit of time, usually in kilobits per second (KBPS).
Higher bitrate equals a higher quality.
You can think of it as video resolution (240, 480, 1080, etc.).

# ‎

# Checkpoints

In RVC, these are files of a model that generate over the course of training, that can be very useful.
‎
The rate at which they're saved is determined by the save frequency value (or save rate or similar names). For newbies, it's recommended use a value of 15.
‎
They are divided by two types:
‎
- Weights:
  - These are actual models.
  - They're organized with this format: modelname_epoch_step.pth
  - Example: Tyler_e60_s120.pth
    ‎
- G and D:
  - Named G_ and D_, followed by the step number & .pth.
  - Example: G_70.pth and D_70.pth
  - These allow you to resume training, if G and D's numbers match.

# ‎

# Cloud-based

Any software or application that's stored, managed, and available through the provider's virtual servers, and is accessed through a web browser.
The opposite of local running.

# ‎

# CUDA

A technology developed by NVIDIA, that uses the power of graphics cards to perform calculations that require great processing power.
It's especially useful for AI tools, such as RVC and UVR, which greatly optimizes the process.
CUDA is downloaded automatically as a part of the NVIDIA driver.

# ‎

# DAW

It stands for Digital Audio Workstation, and it's any software used for making and mixing music.
For basic audio editing, we recommend Audacity.
For professional mixing, FL Studio.

# ‎

# Fine-tuning

Further improving an AI model, training it with a another dataset.

# ‎

# Fork

It's a copy of a main GitHub project. It aims to make a different version of the project with improvements, changes & new features.

# ‎

# Gradio

Gradio is an open-source Python packag that makes it easy for developers to create user-friendly web interfaces for machine learning models and other applications, such as RVC.
It deploys the program on a Local URL, which is the one running locally on the machine, and a Public Share Link, which is a tunnel that exposes the Local URL. The Public Share Link is used, for example, in Google Colabs, powered by their Share API. Sometimes, the Share API goes down, you can check its status here.

# ‎

# Google Colab

Google Colaboratory is a product of Google that allows anybody to write & execute arbitrary python code through websites.
It's free version is slower & with a usage time of their GPUs of around 3 hours a day. Once you exhaust it, you'll have to wait 12 - 24 hours.
Learn how to bypass their limitations here.

# ‎

# GPU

It stands for Graphics Processing Unit. It's designed to rapidly manipulate and alter memory to accelerate creation of images.
In AI training, is used for quick parallel independent computations, which increases the speed substantially.
Basically the speed at which RVC/UVR will work will depend on how good your GPU is.

# ‎

# Inference

In the context of AI, it's using an AI model to complete any task.
For this, using the GPU is more convenient as it's faster. Though normally you can still use CPU, which takes longer.
For example, in RVC is when a voice model is used to transform an audio, to make it sound like the model.

# ‎

# Local running

Running locally is a process that involves running apps in your own machine, using its resources.
The opposite of cloud-based.

# ‎

# Lossless Formats

Audio formats that don't compress (lose) the original quality.
‎
They're recommended for RVC, as the more quality an audio has, the more accurate results they'll offer.
‎
The main ones are WAV & FLAC:
‎
- FLAC:
  - Its algorithm compresses the data without losing quality.
  - It's recommended over WAV since it's space-efficient.
    ‎
- WAV:
  - Doesn't do any kind of compression. It's purely the original data.
  - Therefore it has a much bigger file size.
  - It's more accurate to describe it as an uncompressed format

Both formats give the same results & don't have an audible difference.
Converting a lossy audio to a lossless one won't restore the lost quality.

# ‎

# Lossy Formats

Audio formats that compress (lose) the original quality. They're built to be space-efficient.
So by getting rid of some data (in this case, quality), they achieve a smaller file size.
Common lossy formats are MP3, OGG, OPUS, M4A, etc.

# ‎

# Localtunnel

Localtunnel is a tunnel made to expose a local url (like http://localhost:3000).
It's used in Google Colabs to expose the Local URL so that users on Cloud can access the program.

# ‎

# Model training

In the field of AI, is the process where an AI model is fed with its dataset & learns from it.

# ‎

# Specs

It refers to a computer's specifications. Hardware like GPU, CPU, RAM, etc.
The performance of the hardware of a computer directly correlates to the performance of all its software.

# ‎

# 0 Shot Training

Doing inference on an AI model without explicitly training on it.
It's faster but with less quality, and you won't be able to save the model.
For example, in TTS you do inference by cloning a voice with an audio, a data it hasn't seen before.
Different from making a dataset & doing the long training process, based on lots of criteria such as epochs.
In some cases you can do it on GPU, some in CPU.

# ‎

# `You have reached the end.`

Report Issues