More Than Fine: Multi-LoRA Support Now Available in NVIDIA RTX AI Toolkit
28/08/2024
Large language models are driving some of the most exciting developments in AI with their ability to quickly understand, summarize and generate text-based content.
These capabilities power a variety of use cases, including productivity tools, digital assistants, non-playable characters in video games and more. But they're not a one-size-fits-all solution, and developers often must fine-tune LLMs to fit the needs of their applications.
The NVIDIA RTX AI Toolkit makes it easy to fine-tune and deploy AI models on RTX AI PCs and workstations through a technique called low-rank adaptation, or LoRA. A new update, available today, enables support for using multiple LoRA adapters simultaneously within the NVIDIA TensorRT-LLM AI acceleration library, improving the performance of fine-tuned models by up to 6x.
Fine-Tuned for Performance LLMs must be carefully customized to achieve higher performance and meet growing user demands.
These foundational models are trained on huge amounts of data but often lack the context needed for a developer's specific use case. For example, a generic LLM can generate video game dialogue, but it will likely miss the nuance and subtlety needed to write in the style of a woodland elf with a dark past and a barely concealed disdain for authority.
To achieve more tailored outputs, developers can fine-tune the model with information related to the app's use case.
Take the example of developing an app to generate in-game dialogue using an LLM. The process of fine-tuning starts with using the weights of a pretrained model, such as information on what a character may say in the game. To get the dialogue in the right style, a developer can tune the model on a smaller dataset of examples, such as dialogue written in a more spooky or villainous tone.
In some cases, developers may want to run all of these different fine-tuning processes simultaneously. For example, they may want to generate marketing copy written in different voices for various content channels. At the same time, they may want to summarize a document and make stylistic suggestions - as well as draft a video game scene description and imagery prompt for a text-to-image generator.
It's not practical to run multiple models simultaneously, as they won't all fit in GPU memory at the same time. Even if they did, their inference time would be impacted by memory bandwidth - how fast data can be read from memory into GPUs.
Lo(RA) and Behold A popular way to address these issues is to use fine-tuning techniques such as low-rank adaptation. A simple way of thinking of it is as a patch file containing the customizations from the fine-tuning process.
Once trained, customized LoRA adapters can integrate seamlessly with the foundation model during inference, adding minimal overhead. Developers can attach the adapters to a single model to serve multiple use cases. This keeps the memory footprint low while still providing the additional details needed for each specific use case.
Architecture overview of supporting multiple clients and use-cases with a single foundation model using multi-LoRA capabilities In practice, this means that an app can keep just one copy of the base model in memory, alongside many customizations using multiple LoRA adapters.
This process is called multi-LoRA serving. When multiple calls are made to the model, the GPU can process all of the calls in parallel, maximizing the use of its Tensor Cores and minimizing the demands of memory and bandwidth so developers can efficiently use AI models in their workflows. Fine-tuned models using multi-LoRA adapters perform up to 6x faster.
LLM inference performance on GeForce RTX 4090 Desktop GPU for Llama 3B int4 with LoRA adapters applied at runtime. Input sequence length is 43 tokens and output sequence length is 100 tokens. LoRA adapter max rank is 64. In the example of the in-game dialogue application described earlier, the app's scope could be expanded, using multi-LoRA serving, to generate both story elements and illustrations - driven by a single prompt.
The user could input a basic story idea, and the LLM would flesh out the concept, expanding on the idea to provide a detailed foundation. The application could then use the same model, enhanced with two distinct LoRA adapters, to refine the story and generate corresponding imagery. One LoRA adapter generates a Stable Diffusion prompt to create visuals using a locally deployed Stable Diffusion XL model. Meanwhile, the other LoRA adapter, fine-tuned for story writing, could craft a well-structured and engaging narrative.
In this case, the same model is used for both inference passes, ensuring that the space required for the process doesn't significantly increase. The second pass, which involves both text and image generation, is performed using batched inference, making the process exceptionally fast and efficient on NVIDIA GPUs. This allows users to rapidly iterate through different versions of their stories, refining the narrative and the illustrations with ease.
This process is outlined in more detail in a recent technical blog.
LLMs are becoming one of the most important components of modern AI. As adoption and integration grows, demand for powerful, fast LLMs with application-specific customizations will only increase. The multi-LoRA support added today to the RTX AI Toolkit gives developers a powerful new way to accelerate these capabilities.
More from Nvidia
12/09/2024
GeForce NOW to Bring Dead Rising Deluxe Remaster' to the Cloud at Launch
Rise and shine - Capcom's latest action-adventure game, Dead Rising Deluxe Remaster, heads to the cloud at launch next week. It's part of nine new titl...
11/09/2024
AI on the Air: Behind the Scenes at IBC With Holoscan for Media
AI is transforming the broadcast industry by enhancing the way content is created, distributed and consumed - but integrating the technology can be challenging....
11/09/2024
NVIDIA and Oracle to Accelerate AI and Data Processing for Enterprises
Enterprises are looking for increasingly powerful compute to support their AI workloads and accelerate data processing. The efficiency gained can translate to b...
11/09/2024
Ready to Roll: Nuro to License Its Autonomous Driving System
To accelerate autonomous vehicle development and deployment timelines, Nuro announced today it will license its Nuro Driver autonomous driving system directly t...
09/09/2024
Live Media Reimagined: NVIDIA Holoscan for Media Now Available for Production
Companies in broadcast, sports and streaming are transitioning to software-defined infrastructure to benefit from flexible deployment and to more easily adopt t...
06/09/2024
How AI Is Personalizing Customer Service Experiences Across Industries
Customer service departments across industries are facing increased call volumes, high customer service agent turnover, talent shortages and shifting customer e...
05/09/2024
19 New Games to Drop for GeForce NOW in September
Fall will be here soon, so leaf it to GeForce NOW to bring the games, with 19 joining the cloud in September. Get started with the seven games available to str...
05/09/2024
Three Ways to Ride the Flywheel of Cybersecurity AI
The business transformations that generative AI brings come with risks that AI itself can help secure in a kind of flywheel of progress. Companies who were qui...
04/09/2024
Volvo Cars EX90 SUV Rolls Out, Built on NVIDIA Accelerated Computing and AI
Volvo Cars' new, fully electric EX90 is making its way from the automaker's assembly line in Charleston, South Carolina, to dealerships around the U.S. ...
04/09/2024
Do the Math: New RTX AI PC Hardware Delivers More AI, Faster
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
04/09/2024
Hammer Time: Machina Labs' Edward Mehr on Autonomous Blacksmith Bots and More
Edward Mehr works where AI meets the anvil. The company he cofounded, Machina L...
04/09/2024
Manufacturing Intelligence: Deltia AI Delivers Assembly Line Gains With NVIDIA Metropolis and Jetson
It all started at Berlin's Merantix venture studio in 2022, when Silviu Homo...
29/08/2024
From RAG to Richness: Startup Uplevels Retrieval-Augmented Generation for Enterprises
Well before OpenAI upended the technology industry with its release of ChatGPT i...
29/08/2024
Crystal-Clear Gaming: Visions of Mana' Sharpens on GeForce NOW
It's time to mana-fest the spirit of adventure with Square Enix's highly anticipated action role-playing game, Visions of Mana, launching today in the c...
28/08/2024
NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Debut
As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large l...
28/08/2024
More Than Fine: Multi-LoRA Support Now Available in NVIDIA RTX AI Toolkit
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
27/08/2024
From Prototype to Prompt: NVIDIA NIM Agent Blueprints Fast-Forward Next Wave of Enterprise Generative AI
The initial wave of generative AI was driven by its use in internet services tha...
27/08/2024
Better Molecules, Faster: NVIDIA NIM Agent Blueprint Redefines Hit Identification With Generative AI-Based Virtual Screening
Aiming at making the process faster and smarter, NVIDIA on Wednesday released th...
26/08/2024
NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan
Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business net...
23/08/2024
NVIDIA to Present Innovations at Hot Chips That Boost Data Center Performance and Energy Efficiency
A deep technology conference for processor and system architects from industry a...
22/08/2024
Straight Out of Gamescom and Into Xbox PC Games, GeForce NOW Newly Supports Automatic Xbox Sign-In
Straight out of Gamescom, NVIDIA introduced GeForce NOW support for Xbox automat...
21/08/2024
How Snowflake Is Unlocking the Value of Data With Large Language Models
Snowflake is using AI to help enterprises transform data into insights and applications. In this episode of NVIDIA's AI Podcast, host Noah Kravitz and Baris...
21/08/2024
Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy
Developers of generative AI typically face a tradeoff between model size and acc...
21/08/2024
SLMming Down Latency: How NVIDIA's First On-Device Small Language Model Makes Digital Humans More Lifelike
Editor's note: This post is part of the AI Decoded series, which demystifies...
20/08/2024
NVIDIA Showcases New AI Capabilities With ACE, RTX Games and More at Gamescom 2024
At Gamescom, the world's biggest gaming expo, NVIDIA has once again pushed t...
20/08/2024
High-Tech Highways: India Uses NVIDIA Accelerated Computing to Ease Tollbooth Traffic
India is home to the globe's second-largest road network, spanning nearly 4 ...
20/08/2024
Level Up: NVIDIA, MediaTek to Bring G-SYNC Display Technologies to More Gamers
Picture this: NVIDIA and MediaTek are working together to make the industry's best gaming display technologies more accessible to gamers globally. The comp...
20/08/2024
NVIDIA Announces First Digital Human Technologies On-Device Small Language Model, Improving Conversation for Game Characters
NVIDIA's first digital human technology small language model is being demons...
20/08/2024
At Gamescom 2024, GeForce NOW Brings Black Myth: Wukong' and FINAL FANTASY XVI Demo' to the Cloud
Each week, GeForce NOW elevates cloud gaming by bringing top PC games and new up...
19/08/2024
AI Chases the Storm: New NVIDIA Research Boosts Weather Prediction, Climate Simulation
As hurricanes, tornadoes and other extreme weather events occur with increased f...
15/08/2024
GeForce NOW and CurseForge Bring Mod Support to World of Warcraft: The War Within' in the Cloud
Time to be wowed: GeForce NOW members can now stream World of Warcraft on suppor...
14/08/2024
Decoding NVIDIA Edify - The Technology That Helps Developers Create Custom Models Trained on Their Data
Editor's note: This post is part of the AI Decoded series, which demystifies...
13/08/2024
Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards
Bringing together the world's brightest minds and the latest accelerated computing technology leads to powerful breakthroughs that help tackle some of the b...
09/08/2024
Golden Opportunities: California to Train Students, Educators in AI
The State of California today announced a first-of-its-kind AI education initiative with NVIDIA. The public-private collaboration supports the state's goal...
08/08/2024
GeForce NOW Celebrates 2,000 Games in the Cloud
Editor's note: This blog was updated on Aug. 9 to reflect changes to the availability of Warhammer 40,000: Speed Freeks.' This GFN Thursday marks 2,00...
08/08/2024
Figure Unveils Next-Gen Conversational Humanoid Robot With 3x AI Computing for Fully Autonomous Tasks
Silicon Valley's Figure has taken the wraps off of its next-generation Figur...
07/08/2024
Problem Solved: STEM Studies Supercharged With RTX and AI Technologies
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
07/08/2024
Recursion CEO Chris Gibson on Accelerating the Biopharmaceutical Industry With AI
Techbio is a field combining data, technology and biology to enhance scientific ...
06/08/2024
Meet the Maker: High School Student Develops Robot Guide Dogs With NVIDIA Jetson
High school student Selin Alara Ornek is looking ahead - using machine learning and the NVIDIA Jetson platform for edge AI and robotics to create robot guide do...
06/08/2024
Editor's Paradise: NVIDIA RTX-Powered Video Software CyberLink PowerDirector Gains High-Efficiency Video Coding Upgrades
Editor's note: This post is part of our In the NVIDIA Studio series, which c...
01/08/2024
August Adventures Await: 18 New Games Coming to GeForce NOW
Members can choose their own adventure with GeForce NOW bringing 18 new games to the cloud in August - including Square Enix's fantasy role-playing game Vis...
31/07/2024
Oracle Cloud Infrastructure Expands NVIDIA GPU-Accelerated Instances for AI, Digital Twins and More
Enterprises are rapidly adopting generative AI, large language models (LLMs), ad...
31/07/2024
NVIDIA Researchers Harness Real-Time Gen AI to Build Immersive Desert World
NVIDIA researchers used NVIDIA Edify, a multimodal architecture for visual generative AI, to build a detailed 3D desert landscape within a few minutes in a live...
31/07/2024
NVIDIA and Zoox Pave the Way for Autonomous Ride-Hailing
In celebration of Zoox's 10th anniversary, NVIDIA founder and CEO Jensen Huang recently joined the robotaxi company's CEO, Aicha Evans, and its cofounde...
31/07/2024
Taking AI to Warp Speed: Decoding How NVIDIA's Latest RTX-Powered Tools and Apps Help Developers Accelerate AI on PCs and Workstations
Editor's note: This post is part of the AI Decoded series, which demystifies...
29/07/2024
For Your Edification: Shutterstock Releases Generative 3D, Getty Images Upgrades Service Powered by NVIDIA
Designers and artists have new and improved ways to boost their productivity wit...
29/07/2024
AI Gets Physical: New NVIDIA NIM Microservices Bring Generative AI to Digital Environments
Millions of people already use generative AI to assist in writing and learning. ...
29/07/2024
Hugging Face Offers Developers Inference-as-a-Service Powered by NVIDIA NIM
One of the world's largest AI communities - comprising 4 million developers on the Hugging Face platform - is gaining easy access to NVIDIA-accelerated infe...
29/07/2024
New NVIDIA Digital Human Technologies Enhance Customer Interactions Across Industries
Generative AI is unlocking new ways for enterprises to engage customers through ...