Nvidia’s ‘Fugatto’ AI Audio Model Synthesizes Sounds That Have Never Existed

Photo Credit: Yassine Ait Tahit

Nvidia unveils its Fugatto generative AI model, capable of synthesizing sounds that have never existed.Nvidia recently announced its new AI audio generator, “Fugatto,” which can synthesize music, speech, or sounds based on a text prompt. What sets Fugatto aside from other generative AI audio models is its inference level techniques enabling it to transform any mix of audio, including the creation of sounds that have never been heard before.

A “Swiss Army knife for sound,” Fugatto is able to put together songs based on some pretty novel prompts, like creating a trumpet that meows or a saxophone that barks. Whatever a user can describe, the model can create, according to Nvidia. Other examples provided by the company include the ability to produce unique sound effects from a description: “Deep, rumbling bass pulses paired with intermittent, high-pitched digital chirps, like the sound of a massive sentient machine waking up.”

The model is able to edit music, such as isolating the vocals in a song, changing instruments, or switching up the melody. Fugatto can even transform the sound of someone’s voice, such as changing their accent or giving them a calm or angry tone.

“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at Nvidia, and one of the researchers behind Fugatto, who is also an orchestral conductor and composer. “Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”One of the hardest parts of developing such a robust model, according to the company, was generating a blended dataset containing millions of audio samples used for training. “The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data,” says Nvidia.

The model is currently not publicly available, and it’s the company didn’t reveal what the timeline for that might look like, or if it will ever become widely available. A website full of samples showcases its uses, providing a glimpse into the future of what’s possible with ethical generative AI.