Now Hear This: World's Most Flexible Sound Machine Debuts
25/11/2024
While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.
Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.
For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.
This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.
A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.
Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.
Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.
A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.
The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.
An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.
Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.
Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.
Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.
For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.
With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.
Users Get Artistic Controls Several capabilities add to Fugatto's novelty.
During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.
The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.
I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.
In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.
The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.
Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.
A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.
The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.
Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.
One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.
They also scrutinized existing datasets to reveal new relationships among the dat
Most recent headlines
02/01/2025
TV Tech's Top 20 Streaming Stories of 2024
Streaming media's impact on the media and entertainment industry continued to widen in 2024, as more sports rights moved to streaming platforms and the rapi...
02/01/2025
Gray Television is Now Officially Gray Media, Inc.
ATLANTA The board of directors for Gray Television Inc. has unanimously voted to change the company's name to Gray Media, Inc. effective Jan. 1....
02/01/2025
Saving the past of TV and film for the future
Official television and film archives exist to preserve the history and creativity of media. But even these formal institutions do not contain everything that h...
02/01/2025
Rich Welsh appointed SMPTE president
Welsh succeeds Renard Jenkins, who has reached the end of his two-year tenure in the role By Jenny Priestley Published: January 2, 2025 Welsh succeeds Ren...
02/01/2025
Creator Economy Isn't Really Like Film/Show Entertainment
Creator Economy Isn't Really Like Film/Show Entertainment Andy Marken January 1, 2025 0 Comments Come on Barry, you're good at this sort of s...
02/01/2025
Blackmagic Design Announces New Lower Price for Blackmagic Pocket Cinema Camera 4K!
Blackmagic Design Announces New Lower Price for Blackmagic Pocket Cinema Camera ...
02/01/2025
Hollywood Lighting Hacks: Big Looks, Tiny Budget
Hollywood Lighting Hacks: Big Looks, Tiny Budget Sean Alami January 1, 2025 0 Comments Discover how to create stunning Hollywood-style lighting using ...
02/01/2025
RT UNVEILS NEW YEAR SLATE WITH NEW ENTERTAINMENT, LIVE SPORTING ACTION AND GROUND-BREAKING DOCUMENTARIES
A fresh start to 2025 with new faces, familiar voices and plenty of Irish music ...
02/01/2025
GeForce NOW Rings in the New Year With 14 New Games
GeForce NOW is kicking off 2025 by delivering 14 games to the cloud this month, with two available to stream this week so members can get started on their New Y...
01/01/2025
How newsrooms can adapt to evolving audience demands
Marcy Lefkovitz, Dalet SVP product innovation, details the findings of the companys report into the future of newsrooms, including leveraging AI, optimising mul...
31/12/2024
Inside the Archives: Spotlighting Sundance-Supported Alums in the 2025 Sundance Film Festival Program
Rashad Frett shares a moment with Michelle Satter at the 2023 Directors Lab. Pho...
31/12/2024
What's Ahead in 2025? OWC's Larry O'Connor Offers Some Bold Predictions
What's Ahead in 2025? OWC's Larry O'Connor Offers Some Bold Predicti...
31/12/2024
Blackmagic Design Announces New Features for Blackmagic Cloud
Blackmagic Design Announces New Features for Blackmagic Cloud Brie Clayton December 31, 2024 0 Comments New thumbnail icon view, clip metadata inspect...
31/12/2024
5 Features in DaVinci Resolve You're Not Using (But You Should Be)
5 Features in DaVinci Resolve You're Not Using (But You Should Be) Kasia Jarco December 31, 2024 0 Comments In this video we are going to talk abo...
31/12/2024
NFL Christmas Day Games on Netflix Average Over 30 Million Global Viewers
Back to All News NFL Christmas Day Games on Netflix Average Over 30 Million Global Viewers Entertainment 31 December 2024 Global Link copied to clipboard ...
31/12/2024
Squid Game' Season 2 Smashes Top 10 Records as Millions Worldwide Take Part in Fan Activations and Experiences
Back to All News Squid Game' Season 2 Smashes Top 10 Records as Millions W...
30/12/2024
21 Years of Media Development: Lessons, Challenges and Change
Leaving Thomson after 21 years, former Managing Director David Quin reflects on a career spent navigating the shifting sands of global media development. From r...
30/12/2024
Release Rundown: The 100 Sundance-Supported Projects Released in 2024
[Pictured: a still from Love Lies Bleeding, one of the 100 Sundance-supported titles that opened to wider audiences in 2024]...
30/12/2024
Digital Domain Creates Groundbreaking VFX for Venom: The Last Dance, Bringing Klyntar and Knull to Life
Digital Domain Creates Groundbreaking VFX for Venom: The Last Dance, Bringing ...
30/12/2024
Internet Giant GMP Builds Tokyo Virtual Event Space with URSA Broadcast G2
Internet Giant GMP Builds Tokyo Virtual Event Space with URSA Broadcast G2 Brie Clayton December 30, 2024 0 Comments Blackmagic URSA Broadcast G2 and ...
30/12/2024
Renderboxes launches Photon, a no-compromise transportable high-performance workstation for creative professionals
Renderboxes launches Photon, a no-compromise transportable high-performance work...
30/12/2024
Shinobi II Monitor Supports Canon, Sony, Nikon - Touch to Focus with New Feature Update
Shinobi II Monitor Supports Canon, Sony, Nikon - Touch to Focus with New Feature...
30/12/2024
In a Groundbreaking Agreement, VEON and Starlink to Bring Starlink Direct-to-Cell Satellite Connectivity to Kyivstar Customers
30 Dec 2024 In a Groundbreaking Agreement, VEON and Starlink to Bring Starlink ...
30/12/2024
Research Galore From 2024: Recapping AI Advancements in 3D Simulation, Climate Science and Audio Engineering
The pace of technology innovation has accelerated in the past year, most dramati...
29/12/2024
Gyeongnam Culture and Arts Foundation Invests in Ikegami...
Gyeongnam Culture and Arts Foundation (GCAF), an active promotor of cultural and artistic diversity, has chosen Ikegami UHK-X700 camera chains for integration i...
27/12/2024
2024 in Review: Streaming Into Deals and Turmoil
After starting 2024 with hopes that the transition to streaming might finally produce a more stable, profitable businesses, the media and entertainment industry...
27/12/2024
TV Tech's Top Regulatory Stories of 2024
Broadcasters ended the year with hopes and fears about how the federal government will regulate the TV industry in 2025. While the incoming Trump administration...
27/12/2024
Have You Heard? 5 AI Podcast Episodes Listeners Loved in 2024
NVIDIA's AI Podcast gives listeners the inside scoop on the ways AI is transforming nearly every industry. Since the show's debut in 2016, it's gar...
26/12/2024
M&E Not Ready to Fully Embrace AI in 2024
Talking about artificial intelligence in 2024 is akin to talking about the internet in 1994 everyone's excited about it and thinks it will have an enormous ...
26/12/2024
Netflix NFL Christmas Gameday Reaches 65 Million US Viewers
Back to All News Netflix NFL Christmas Gameday Reaches 65 Million US Viewers Entertainment 26 December 2024 GlobalUnited States Link copied to clipboard U...
26/12/2024
Check Out All the Must-See Trailers From Netflix's NFL Christmas Gameday Live
Back to All News Check Out All the Must-See Trailers From Netflix's NFL Chr...
26/12/2024
Beyonc Delivers Epic Gift of a Halftime Performance + "Beyonc Bowl" Is Coming to Netflix Soon!
Back to All News Beyonc Delivers Epic Gift of a Halftime Performance "Beyonc...
26/12/2024
Get in the Game With Squid Game' Season 2 Complete Coverage Guide
Back to All News Get in the Game With Squid Game' Season 2 Complete Coverage Guide Entertainment 26 December 2024 Global Link copied to clipboard Are...
26/12/2024
Cheers to 2024: GeForce NOW Recaps Year of Ultimate Cloud Gaming
This GFN Thursday wraps up another incredible year for cloud gaming. Take a look back at the top games and new features that made 2024 a standout for GeForce NO...
24/12/2024
PARTICIPATE AT SERIES MANIA FORUM 2025
The National Film and Video Foundation (NFVF) is calling for applications from South African film and television filmmakers to be part of the NFVF's delegat...
24/12/2024
FCC Fines Paramount Global $244,952 for Emergency Alert Violations
WASHINGTON, D.C. The Federal Communications Commission Enforcement Bureau said Paramount Global is being fined $244,952 for violations of emergency alerting rul...
24/12/2024
Streamer Carnegie Hall+ Features Holiday Programming and Music
NEW YORK Carnegie Hall's video-streaming channel, Carnegie Hall+, said it will celebrate the holiday season by offering a wide selection of holiday-themed o...
24/12/2024
PlayersTV Acquires Cloud Media Center
LOS ANGELES PlayersTV, an athlete and fan-owned media company, has announced the acquisition of Cloud Media Center, an AI-driven sports adtech and media distrib...
24/12/2024
The Best Playlists of 2024
The Best Playlists of 2024 The top playlists we created in 2024, from dad rock to dembow. By Tara Bellucci December 23, 2024 Image by Kelly Davidson Tak...
24/12/2024
From Generative to Agentic AI, Wrapping the Year's AI Advancements
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
23/12/2024
Broadcast and the IT industry: an inevitable evolution?
At IBC2024, Grass Valley CTO Ian Fletcher told attendees at GV Forum that broadcast is now an IT industry. During our discussions at the show, TVBEurope heard v...
23/12/2024
Cracking creation: Fun facts from Wallace & Gromit: Vengeance Most Fowl
According to co-director Nick Park, the production team embraced technology to help create the iconic duos latest adventure By Jenny Priestley Published: Dec...
23/12/2024
Mapping the great wave: how Lux Aeterna produced data-centric visuals for Tsunami: Race Against Time
VFX house Lux Aeterna used wave height data from the National Oceanic & Atmosphe...
23/12/2024
DAZN acquires Australia's Foxtel in $2.2 billion deal
We are committed to supporting and investing in Foxtel's television and streaming services, across both sports and entertainment, using our world-leading te...
23/12/2024
Looking back at 2024: the industry trends that stood out
From AI to IP, the cloud to virtual production, key industry executives take a look back at some of the biggest trends that impacted the media and entertainment...
23/12/2024
BOOST GRAPHICS THE SPECIALIST INTERNATIONAL GRAPHICS SUBS...
Boost Graphics, the specialist international graphics and virtual production subsidiary of EMG / Gravity Media, the leading force in production and content & me...
23/12/2024
LucidLink Earns TPN Gold Certification to Advance Secure...
LucidLink, the leading storage collaboration platform revolutionizing how global teams work, proudly announces its achievement of the prestigious Trusted Partne...
23/12/2024
Mediagenix Welcomes Bruno Langlais as Business Developmen...
Mediagenix, a leader in software solutions for content strategy, content value management, and content scheduling, announces the appointment of Bruno Langlais a...
23/12/2024
NextGen TV Live Spanish Captions Debut on PMVG Test Bed S...
WCTEPublic Media Venture Group (PMVG) has announced that it is now providing real-time translation of closed captioning from English to Spanish on PMVG's Ne...