Sony Pixel Power calrec Sony

How to Accelerate Larger LLMs Locally on RTX With LM Studio

23/10/2024

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.

Large language models (LLMs) are reshaping productivity. They're capable of drafting documents, summarizing web pages and, having been trained on vast quantities of data, accurately answering questions about nearly any topic.

LLMs are at the core of many emerging use cases in generative AI, including digital assistants, conversational avatars and customer service agents.

Many of the latest LLMs can run locally on PCs or workstations. This is useful for a variety of reasons: users can keep conversations and content private on-device, use AI without the internet, or simply take advantage of the powerful NVIDIA GeForce RTX GPUs in their system. Other models, because of their size and complexity, do no't fit into the local GPU's video memory (VRAM) and require hardware in large data centers.

However, Iit i's possible to accelerate part of a prompt on a data-center-class model locally on RTX-powered PCs using a technique called GPU offloading. This allows users to benefit from GPU acceleration without being as limited by GPU memory constraints.

Size and Quality vs. Performance There's a tradeoff between the model size and the quality of responses and the performance. In general, larger models deliver higher-quality responses, but run more slowly. With smaller models, performance goes up while quality goes down.

This tradeoff isn't always straightforward. There are cases where performance might be more important than quality. Some users may prioritize accuracy for use cases like content generation, since it can run in the background. A conversational assistant, meanwhile, needs to be fast while also providing accurate responses.

The most accurate LLMs, designed to run in the data center, are tens of gigabytes in size, and may not fit in a GPU's memory. This would traditionally prevent the application from taking advantage of GPU acceleration.

However, GPU offloading uses part of the LLM on the GPU and part on the CPU. This allows users to take maximum advantage of GPU acceleration regardless of model size.

Optimize AI Acceleration With GPU Offloading and LM Studio LM Studio is an application that lets users download and host LLMs on their desktop or laptop computer, with an easy-to-use interface that allows for extensive customization in how those models operate. LM Studio is built on top of llama.cpp, so it's fully optimized for use with GeForce RTX and NVIDIA RTX GPUs.

LM Studio and GPU offloading takes advantage of GPU acceleration to boost the performance of a locally hosted LLM, even if the model can't be fully loaded into VRAM.

With GPU offloading, LM Studio divides the model into smaller chunks, or subgraphs, which represent layers of the model architecture. Subgraphs aren't permanently fixed on the GPU, but loaded and unloaded as needed. With LM Studio's GPU offloading slider, users can decide how many of these layers are processed by the GPU.

LM Studio's interface makes it easy to decide how much of an LLM should be loaded to the GPU. For example, imagine using this GPU offloading technique with a large model like Gemma 2 27B. 27B refers to the number of parameters in the model, informing an estimate as to how much memory is required to run the model.

According to 4-bit quantization, a technique for reducing the size of an LLM without significantly reducing accuracy, each parameter takes up a half byte of memory. This means that the model should require about 13.5 billion bytes, or 13.5GB - plus some overhead, which generally ranges from 1-5GB.

Accelerating this model entirely on the GPU requires 19GB of VRAM, available on the GeForce RTX 4090 desktop GPU. With GPU offloading, the model can run on a system with a lower-end GPU and still benefit from acceleration.

The table above shows how to run several popular models of increasing size across a range of GeForce RTX and NVIDIA RTX GPUs. The maximum level of GPU offload is indicated for each combination. Note that even with GPU offloading, users still need enough system RAM to fit the whole model. In LM Studio, it's possible to assess the performance impact of different levels of GPU offloading, compared with CPU only. The below table shows the results of running the same query across different offloading levels on a GeForce RTX 4090 desktop GPU.

Depending on the percent of the model offloaded to GPU, users see increasing throughput performance compared with running on CPUs alone. For the Gemma 2 27B model, performance goes from an anemic 2.1 tokens per second to increasingly usable speeds the more the GPU is used. This enables users to benefit from the performance of larger models that they otherwise would've been unable to run. On this particular model, even users with an 8GB GPU can enjoy a meaningful speedup versus running only on CPUs. Of course, an 8GB GPU can always run a smaller model that fits entirely in GPU memory and get full GPU acceleration.

Achieving Optimal Balance LM Studio's GPU offloading feature is a powerful tool for unlocking the full potential of LLMs designed for the data center, like Gemma 2 27B, locally on RTX AI PCs. It makes larger, more complex models accessible across the entire lineup of PCs powered by GeForce RTX and NVIDIA RTX GPUs.

Download LM Studio to try GPU offloading on larger models, or experiment with a variety of RTX-accelerated LLMs running locally on RTX AI PCs and workstations.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what's new and what's next by subscribing to the AI Decoded newsletter.
LINK: https://blogs.nvidia.com/blog/ai-decoded-lm-studio/...
See more stories from nvidia

More from Nvidia

30/10/2024

Spooks Await at the Haunted Sanctuary,' Built With RTX and AI

Among the artists using AI to enhance and accelerate their creative endeavors is Sabour Amirazodi, a creator and tech marketing and workflow specialist at NVIDI...

29/10/2024

A New ERA of AI Factories: NVIDIA Unveils Enterprise Reference Architectures

As the world transitions from general-purpose to accelerated computing, finding a path to building data center infrastructure at scale is becoming more importan...

28/10/2024

Bring Receipts: New NVIDIA AI Workflow Detects Fraudulent Credit Card Transactions

Financial losses from worldwide credit card transaction fraud are expected to re...

28/10/2024

Fintech Leaders Tap Generative AI for Safer, Faster, More Accurate Financial Services

An overwhelming 91% of financial services industry (FSI) companies are either as...

24/10/2024

India Should Manufacture Its Own AI,' Declares NVIDIA CEO

Artificial intelligence will be the driving force behind India's digital transformation, fueling innovation, economic growth, and global leadership, NVIDIA ...

24/10/2024

Zoom's AI-First Transformation to Boost Business Productivity, Collaboration

Zoom, a company that helped change the way people work during the COVID-19 pandemic, is continuing to reimagine the future of work by transforming itself into a...

24/10/2024

Call of Duty: Black Ops 6' Storms Into the Cloud With GeForce NOW

Attention, recruits! It's time to test combat skills and strategic prowess. Drop into the heart of the action this GFN Thursday with the launch of the highl...

23/10/2024

Healthcare Leaders Across India Bring NVIDIA NIM for Hindi Language to LLM Applications

Life sciences and healthcare organizations across India are using generative AI ...

23/10/2024

India Manufacturers Build Factory Digital Twins With NVIDIA AI and Omniverse

Manufacturers and service providers in India are adopting NVIDIA Omniverse to tap into simulation, digital twins and generative AI to accelerate their factory p...

23/10/2024

India's Robotics Ecosystem Adopts NVIDIA Isaac and Omniverse to Build Next Wave of Physical AI

In vast warehouses, Addverb's robots work tirelessly, picking, sorting and d...

23/10/2024

Open for AI: India Tech Leaders Build AI Factories for Economic Transformation

India's leading cloud infrastructure providers and server manufacturers are ramping up accelerated data center capacity. By year's end, they'll have...

23/10/2024

World's Greatest Upskill: Consulting Giants Team With NVIDIA to Transform India Into Front Office for AI Era

Information technology giants including Infosys, TCS, Tech Mahindra and Wipro ar...

23/10/2024

Start Local, Go Global: India's Startups Spur Growth and Innovation With NVIDIA Technology

India is becoming a key producer of AI for virtually every industry - powered by...

23/10/2024

NVIDIA, F5 Turbocharge Sovereign AI Cloud Security, Efficiency

To improve AI efficiency and security in sovereign cloud environments, NVIDIA and F5 are integrating NVIDIA BlueField-3 DPUs with the F5 BIG-IP Next for Kuberne...

23/10/2024

The Three Computer Solution: Powering the Next Wave of AI Robotics

ChatGPT marked the big bang moment of generative AI. Answers can be generated in response to nearly any query, helping transform digital work such as content cr...

23/10/2024

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the cou...

23/10/2024

How to Accelerate Larger LLMs Locally on RTX With LM Studio

Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...

22/10/2024

What Is Agentic AI?

AI chatbots use generative AI to provide responses based on a single interaction. A person makes a query and the chatbot uses natural language processing to rep...

22/10/2024

NVIDIA Brings Generative AI Tools, Simulation and Perception Workflows to ROS Developer Ecosystem

At ROSCon in Odense, one of Denmark's oldest cities and a hub of automation,...

21/10/2024

NVIDIA CEO Jensen Huang to Spotlight Innovation at India's AI Summit

The NVIDIA AI Summit India, taking place Oct. 23-25 at the Jio World Convention Centre in Mumbai, will bring together the brightest minds to explore how India i...

21/10/2024

NVIDIA and Microsoft Give AI Startups a Double Dose of Acceleration

NVIDIA is expanding its collaboration with Microsoft to support global AI startups across industries - with an initial focus on healthcare and life sciences com...

21/10/2024

NVIDIA Works With Deloitte to Deploy Digital AI Agents for Healthcare

Ahead of a visit to the hospital for a surgical procedure, patients often have plenty of questions about what to expect - and can be plenty nervous. To help mi...

17/10/2024

How Digital Twins Are Driving Efficiency and Cutting Emissions in Manufacturing

Improving the sustainability of manufacturing involves optimizing entire product lifecycles - from material sourcing and transportation to design, production, d...

17/10/2024

Waterways Wonder: Clearbot Autonomously Cleans Waters With Energy-Efficient AI

What started as two classmates seeking a free graduation trip to Bali subsidized by a university project ended up as an AI-driven sea-cleaning boat prototype bu...

17/10/2024

Sustainable Manufacturing and Design: How Digital Twins Are Driving Efficiency and Cutting Emissions

Improving the sustainability of manufacturing involves optimizing entire product...

17/10/2024

Get Ready to Slay: Dragon Age: The Veilguard' to Soar Into GeForce NOW at Launch

Bundle up this fall with GeForce NOW and Dragon Age: The Veilguard with a specia...

15/10/2024

We Would Like to Achieve Superhuman Productivity,' NVIDIA CEO Says as Lenovo Brings Smarter AI to Enterprises

Moving to accelerate enterprise AI innovation, NVIDIA founder and CEO Jensen Hua...

14/10/2024

MAXimum AI: RTX-Accelerated Adobe AI-Powered Features Speed Up Content Creation

At the Adobe MAX creativity conference this week, Adobe announced updates to its Adobe Creative Cloud products, including Premiere Pro and After Effects, as wel...

11/10/2024

NVIDIA AI Summit Panel Outlines Autonomous Driving Safety

The autonomous driving industry is shaped by rapid technological advancements and the need for standardization of guidelines to ensure the safety of both autono...

11/10/2024

Game-Changer: How the World's First GPU Leveled Up Gaming and Ignited the AI Era

In 1999, fans lined up at Blockbuster to rent chunky VHS tapes of The Matrix. Y2...

10/10/2024

The Next Chapter Awaits: Dive Into Diablo IV's' Latest Adventure Vessel of Hatred' on GeForce NOW

Prepare for a devilishly good time this GFN Thursday as the critically acclaimed...

10/10/2024

AI'll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

Half of the world's population will experience a mental health disorder - bu...

09/10/2024

AI Summit: US Energy Secretary Highlights AI's Role in Science, Energy and Security

AI can help solve some of the world's biggest challenges - whether climate c...

09/10/2024

Flux and Furious: New Image Generation Model Runs Fastest on RTX AI PCs and Workstations

Editor's note: This post is part of the AI Decoded series, which demystifies...

09/10/2024

What's the ROI? Getting the Most Out of LLM Inference

Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and...

08/10/2024

NVIDIA AI Summit Highlights Game-Changing Energy Efficiency and AI-Driven Innovation

Accelerated computing is sustainable computing, Bob Pette, NVIDIA's vice pre...

08/10/2024

Accelerated Computing Key to Quantum Research

A recently released joint research paper by NVIDIA, Moderna and Yale reviews how techniques from quantum machine learning (QML) may enhance drug discovery metho...

08/10/2024

Pittsburgh Steels Itself for Innovation With Launch of NVIDIA AI Tech Community

Serving as a bridge for academia, industry and public-sector groups to partner on artificial intelligence innovation, NVIDIA is launching its inaugural AI Tech ...

08/10/2024

TSMC and NVIDIA Transform Semiconductor Manufacturing With Accelerated Computing

TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA's computational lithography platform, called cuLitho, to accelera...

08/10/2024

SETI Institute Researchers Engage in World's First Real-Time AI Search for Fast Radio Bursts

This summer, scientists supercharged their tools in the hunt for signs of life b...

08/10/2024

From Concept to Compliance, MITRE Digital Proving Ground Will Accelerate Validation of Autonomous Vehicles

The path to safe, widespread autonomous vehicles is going digital. MITRE - a go...

08/10/2024

A Not-So-Secret Agent: NVIDIA Unveils NIM Blueprint for Cybersecurity

Artificial intelligence is transforming cybersecurity with new generative AI tools and capabilities that were once the stuff of science fiction. And like many o...

08/10/2024

US Healthcare System Deploys AI Agents, From Research to Rounds

The U.S. healthcare system is adopting digital health agents to harness AI across the board, from research laboratories to clinical settings. The latest AI-acc...

07/10/2024

Foxconn to Build Taiwan's Fastest AI Supercomputer With NVIDIA Blackwell

NVIDIA and Foxconn are building Taiwan's largest supercomputer, marking a milestone in the island's AI advancement. The project, Hon Hai Kaohsiung Supe...

03/10/2024

No Tricks, Just Games: GeForce NOW Thrills With 22 Games in October

The air is crisp, the pumpkins are waiting to be carved, and GFN Thursday is ready to deliver some gaming thrills. GeForce NOW is unleashing a monster mash of ...

03/10/2024

How AI and Accelerated Computing Drive Energy Efficiency

AI isn't just about building smarter machines. It's about building a greener world. From optimizing energy use to reducing emissions, AI and accelerate...

02/10/2024

Brave New World: Leo AI and Ollama Bring RTX-Accelerated Local LLMs to Brave Browser Users

Editor's note: This post is part of the AI Decoded series, which demystifies...

01/10/2024

NVIDIA AI Summit DC: Industry Leaders Gather to Showcase AI's Real-World Impact

Washington, D.C., is where possibility has always met policy, and AI presents un...

27/09/2024

Bon Voyage: NIO Unveils ONVO L60 Smart Electric SUV, Built on NVIDIA DRIVE Orin

NIO's smart EV brand, ONVO, has unveiled the L60 flagship mid-size family SUV, built on the NVIDIA DRIVE Orin system-on-a-chip. Earlier this year, the auto...

26/09/2024

A Whole New World: GreedFall II: The Dying World' Joins GeForce NOW

Whether looking for a time-traveling adventure, strategic roleplay or epic action, anyone can find something to play on GeForce NOW, with over 2,000 games in th...