How to Accelerate Larger LLMs Locally on RTX With LM Studio
23/10/2024
Large language models (LLMs) are reshaping productivity. They're capable of drafting documents, summarizing web pages and, having been trained on vast quantities of data, accurately answering questions about nearly any topic.
LLMs are at the core of many emerging use cases in generative AI, including digital assistants, conversational avatars and customer service agents.
Many of the latest LLMs can run locally on PCs or workstations. This is useful for a variety of reasons: users can keep conversations and content private on-device, use AI without the internet, or simply take advantage of the powerful NVIDIA GeForce RTX GPUs in their system. Other models, because of their size and complexity, do no't fit into the local GPU's video memory (VRAM) and require hardware in large data centers.
However, Iit i's possible to accelerate part of a prompt on a data-center-class model locally on RTX-powered PCs using a technique called GPU offloading. This allows users to benefit from GPU acceleration without being as limited by GPU memory constraints.
Size and Quality vs. Performance There's a tradeoff between the model size and the quality of responses and the performance. In general, larger models deliver higher-quality responses, but run more slowly. With smaller models, performance goes up while quality goes down.
This tradeoff isn't always straightforward. There are cases where performance might be more important than quality. Some users may prioritize accuracy for use cases like content generation, since it can run in the background. A conversational assistant, meanwhile, needs to be fast while also providing accurate responses.
The most accurate LLMs, designed to run in the data center, are tens of gigabytes in size, and may not fit in a GPU's memory. This would traditionally prevent the application from taking advantage of GPU acceleration.
However, GPU offloading uses part of the LLM on the GPU and part on the CPU. This allows users to take maximum advantage of GPU acceleration regardless of model size.
Optimize AI Acceleration With GPU Offloading and LM Studio LM Studio is an application that lets users download and host LLMs on their desktop or laptop computer, with an easy-to-use interface that allows for extensive customization in how those models operate. LM Studio is built on top of llama.cpp, so it's fully optimized for use with GeForce RTX and NVIDIA RTX GPUs.
LM Studio and GPU offloading takes advantage of GPU acceleration to boost the performance of a locally hosted LLM, even if the model can't be fully loaded into VRAM.
With GPU offloading, LM Studio divides the model into smaller chunks, or subgraphs, which represent layers of the model architecture. Subgraphs aren't permanently fixed on the GPU, but loaded and unloaded as needed. With LM Studio's GPU offloading slider, users can decide how many of these layers are processed by the GPU.
LM Studio's interface makes it easy to decide how much of an LLM should be loaded to the GPU. For example, imagine using this GPU offloading technique with a large model like Gemma 2 27B. 27B refers to the number of parameters in the model, informing an estimate as to how much memory is required to run the model.
According to 4-bit quantization, a technique for reducing the size of an LLM without significantly reducing accuracy, each parameter takes up a half byte of memory. This means that the model should require about 13.5 billion bytes, or 13.5GB - plus some overhead, which generally ranges from 1-5GB.
Accelerating this model entirely on the GPU requires 19GB of VRAM, available on the GeForce RTX 4090 desktop GPU. With GPU offloading, the model can run on a system with a lower-end GPU and still benefit from acceleration.
The table above shows how to run several popular models of increasing size across a range of GeForce RTX and NVIDIA RTX GPUs. The maximum level of GPU offload is indicated for each combination. Note that even with GPU offloading, users still need enough system RAM to fit the whole model. In LM Studio, it's possible to assess the performance impact of different levels of GPU offloading, compared with CPU only. The below table shows the results of running the same query across different offloading levels on a GeForce RTX 4090 desktop GPU.
Depending on the percent of the model offloaded to GPU, users see increasing throughput performance compared with running on CPUs alone. For the Gemma 2 27B model, performance goes from an anemic 2.1 tokens per second to increasingly usable speeds the more the GPU is used. This enables users to benefit from the performance of larger models that they otherwise would've been unable to run. On this particular model, even users with an 8GB GPU can enjoy a meaningful speedup versus running only on CPUs. Of course, an 8GB GPU can always run a smaller model that fits entirely in GPU memory and get full GPU acceleration.
Achieving Optimal Balance LM Studio's GPU offloading feature is a powerful tool for unlocking the full potential of LLMs designed for the data center, like Gemma 2 27B, locally on RTX AI PCs. It makes larger, more complex models accessible across the entire lineup of PCs powered by GeForce RTX and NVIDIA RTX GPUs.
Download LM Studio to try GPU offloading on larger models, or experiment with a variety of RTX-accelerated LLMs running locally on RTX AI PCs and workstations.
Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what's new and what's next by subscribing to the AI Decoded newsletter.
More from Nvidia
30/10/2024
Spooks Await at the Haunted Sanctuary,' Built With RTX and AI
Among the artists using AI to enhance and accelerate their creative endeavors is Sabour Amirazodi, a creator and tech marketing and workflow specialist at NVIDI...
29/10/2024
A New ERA of AI Factories: NVIDIA Unveils Enterprise Reference Architectures
As the world transitions from general-purpose to accelerated computing, finding a path to building data center infrastructure at scale is becoming more importan...
28/10/2024
Bring Receipts: New NVIDIA AI Workflow Detects Fraudulent Credit Card Transactions
Financial losses from worldwide credit card transaction fraud are expected to re...
28/10/2024
Fintech Leaders Tap Generative AI for Safer, Faster, More Accurate Financial Services
An overwhelming 91% of financial services industry (FSI) companies are either as...
24/10/2024
India Should Manufacture Its Own AI,' Declares NVIDIA CEO
Artificial intelligence will be the driving force behind India's digital transformation, fueling innovation, economic growth, and global leadership, NVIDIA ...
24/10/2024
Zoom's AI-First Transformation to Boost Business Productivity, Collaboration
Zoom, a company that helped change the way people work during the COVID-19 pandemic, is continuing to reimagine the future of work by transforming itself into a...
24/10/2024
Call of Duty: Black Ops 6' Storms Into the Cloud With GeForce NOW
Attention, recruits! It's time to test combat skills and strategic prowess. Drop into the heart of the action this GFN Thursday with the launch of the highl...
23/10/2024
Healthcare Leaders Across India Bring NVIDIA NIM for Hindi Language to LLM Applications
Life sciences and healthcare organizations across India are using generative AI ...
23/10/2024
India Manufacturers Build Factory Digital Twins With NVIDIA AI and Omniverse
Manufacturers and service providers in India are adopting NVIDIA Omniverse to tap into simulation, digital twins and generative AI to accelerate their factory p...
23/10/2024
India's Robotics Ecosystem Adopts NVIDIA Isaac and Omniverse to Build Next Wave of Physical AI
In vast warehouses, Addverb's robots work tirelessly, picking, sorting and d...
23/10/2024
Open for AI: India Tech Leaders Build AI Factories for Economic Transformation
India's leading cloud infrastructure providers and server manufacturers are ramping up accelerated data center capacity. By year's end, they'll have...
23/10/2024
World's Greatest Upskill: Consulting Giants Team With NVIDIA to Transform India Into Front Office for AI Era
Information technology giants including Infosys, TCS, Tech Mahindra and Wipro ar...
23/10/2024
Start Local, Go Global: India's Startups Spur Growth and Innovation With NVIDIA Technology
India is becoming a key producer of AI for virtually every industry - powered by...
23/10/2024
NVIDIA, F5 Turbocharge Sovereign AI Cloud Security, Efficiency
To improve AI efficiency and security in sovereign cloud environments, NVIDIA and F5 are integrating NVIDIA BlueField-3 DPUs with the F5 BIG-IP Next for Kuberne...
23/10/2024
The Three Computer Solution: Powering the Next Wave of AI Robotics
ChatGPT marked the big bang moment of generative AI. Answers can be generated in response to nearly any query, helping transform digital work such as content cr...
23/10/2024
Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact
NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the cou...
23/10/2024
How to Accelerate Larger LLMs Locally on RTX With LM Studio
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
22/10/2024
What Is Agentic AI?
AI chatbots use generative AI to provide responses based on a single interaction. A person makes a query and the chatbot uses natural language processing to rep...
22/10/2024
NVIDIA Brings Generative AI Tools, Simulation and Perception Workflows to ROS Developer Ecosystem
At ROSCon in Odense, one of Denmark's oldest cities and a hub of automation,...
21/10/2024
NVIDIA CEO Jensen Huang to Spotlight Innovation at India's AI Summit
The NVIDIA AI Summit India, taking place Oct. 23-25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India i...
21/10/2024
NVIDIA and Microsoft Give AI Startups a Double Dose of Acceleration
NVIDIA is expanding its collaboration with Microsoft to support global AI startups across industries - with an initial focus on healthcare and life sciences com...
21/10/2024
NVIDIA Works With Deloitte to Deploy Digital AI Agents for Healthcare
Ahead of a visit to the hospital for a surgical procedure, patients often have plenty of questions about what to expect - and can be plenty nervous. To help mi...
17/10/2024
How Digital Twins Are Driving Efficiency and Cutting Emissions in Manufacturing
Improving the sustainability of manufacturing involves optimizing entire product lifecycles - from material sourcing and transportation to design, production, d...
17/10/2024
Waterways Wonder: Clearbot Autonomously Cleans Waters With Energy-Efficient AI
What started as two classmates seeking a free graduation trip to Bali subsidized by a university project ended up as an AI-driven sea-cleaning boat prototype bu...
17/10/2024
Sustainable Manufacturing and Design: How Digital Twins Are Driving Efficiency and Cutting Emissions
Improving the sustainability of manufacturing involves optimizing entire product...
17/10/2024
Get Ready to Slay: Dragon Age: The Veilguard' to Soar Into GeForce NOW at Launch
Bundle up this fall with GeForce NOW and Dragon Age: The Veilguard with a specia...
15/10/2024
We Would Like to Achieve Superhuman Productivity,' NVIDIA CEO Says as Lenovo Brings Smarter AI to Enterprises
Moving to accelerate enterprise AI innovation, NVIDIA founder and CEO Jensen Hua...
14/10/2024
MAXimum AI: RTX-Accelerated Adobe AI-Powered Features Speed Up Content Creation
At the Adobe MAX creativity conference this week, Adobe announced updates to its Adobe Creative Cloud products, including Premiere Pro and After Effects, as wel...
11/10/2024
NVIDIA AI Summit Panel Outlines Autonomous Driving Safety
The autonomous driving industry is shaped by rapid technological advancements and the need for standardization of guidelines to ensure the safety of both autono...
11/10/2024
Game-Changer: How the World's First GPU Leveled Up Gaming and Ignited the AI Era
In 1999, fans lined up at Blockbuster to rent chunky VHS tapes of The Matrix. Y2...
10/10/2024
The Next Chapter Awaits: Dive Into Diablo IV's' Latest Adventure Vessel of Hatred' on GeForce NOW
Prepare for a devilishly good time this GFN Thursday as the critically acclaimed...
10/10/2024
AI'll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections
Half of the world's population will experience a mental health disorder - bu...
09/10/2024
AI Summit: US Energy Secretary Highlights AI's Role in Science, Energy and Security
AI can help solve some of the world's biggest challenges - whether climate c...
09/10/2024
Flux and Furious: New Image Generation Model Runs Fastest on RTX AI PCs and Workstations
Editor's note: This post is part of the AI Decoded series, which demystifies...
09/10/2024
What's the ROI? Getting the Most Out of LLM Inference
Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and...
08/10/2024
NVIDIA AI Summit Highlights Game-Changing Energy Efficiency and AI-Driven Innovation
Accelerated computing is sustainable computing, Bob Pette, NVIDIA's vice pre...
08/10/2024
Accelerated Computing Key to Quantum Research
A recently released joint research paper by NVIDIA, Moderna and Yale reviews how techniques from quantum machine learning (QML) may enhance drug discovery metho...
08/10/2024
Pittsburgh Steels Itself for Innovation With Launch of NVIDIA AI Tech Community
Serving as a bridge for academia, industry and public-sector groups to partner on artificial intelligence innovation, NVIDIA is launching its inaugural AI Tech ...
08/10/2024
TSMC and NVIDIA Transform Semiconductor Manufacturing With Accelerated Computing
TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA's computational lithography platform, called cuLitho, to accelera...
08/10/2024
SETI Institute Researchers Engage in World's First Real-Time AI Search for Fast Radio Bursts
This summer, scientists supercharged their tools in the hunt for signs of life b...
08/10/2024
From Concept to Compliance, MITRE Digital Proving Ground Will Accelerate Validation of Autonomous Vehicles
The path to safe, widespread autonomous vehicles is going digital. MITRE - a go...
08/10/2024
A Not-So-Secret Agent: NVIDIA Unveils NIM Blueprint for Cybersecurity
Artificial intelligence is transforming cybersecurity with new generative AI tools and capabilities that were once the stuff of science fiction. And like many o...
08/10/2024
US Healthcare System Deploys AI Agents, From Research to Rounds
The U.S. healthcare system is adopting digital health agents to harness AI across the board, from research laboratories to clinical settings. The latest AI-acc...
07/10/2024
Foxconn to Build Taiwan's Fastest AI Supercomputer With NVIDIA Blackwell
NVIDIA and Foxconn are building Taiwan's largest supercomputer, marking a milestone in the island's AI advancement. The project, Hon Hai Kaohsiung Supe...
03/10/2024
No Tricks, Just Games: GeForce NOW Thrills With 22 Games in October
The air is crisp, the pumpkins are waiting to be carved, and GFN Thursday is ready to deliver some gaming thrills. GeForce NOW is unleashing a monster mash of ...
03/10/2024
How AI and Accelerated Computing Drive Energy Efficiency
AI isn't just about building smarter machines. It's about building a greener world. From optimizing energy use to reducing emissions, AI and accelerate...
02/10/2024
Brave New World: Leo AI and Ollama Bring RTX-Accelerated Local LLMs to Brave Browser Users
Editor's note: This post is part of the AI Decoded series, which demystifies...
01/10/2024
NVIDIA AI Summit DC: Industry Leaders Gather to Showcase AI's Real-World Impact
Washington, D.C., is where possibility has always met policy, and AI presents un...
27/09/2024
Bon Voyage: NIO Unveils ONVO L60 Smart Electric SUV, Built on NVIDIA DRIVE Orin
NIO's smart EV brand, ONVO, has unveiled the L60 flagship mid-size family SUV, built on the NVIDIA DRIVE Orin system-on-a-chip. Earlier this year, the auto...
26/09/2024
A Whole New World: GreedFall II: The Dying World' Joins GeForce NOW
Whether looking for a time-traveling adventure, strategic roleplay or epic action, anyone can find something to play on GeForce NOW, with over 2,000 games in th...