Sony Pixel Power calrec Sony

How the Economics of Inference Can Maximize AI Value

23/04/2025

As AI models evolve and adoption grows, enterprises must perform a delicate balancing act to achieve maximum value.

That's because inference - the process of running data through a model to get an output - offers a different computational challenge than training a model.

Pretraining a model - the process of ingesting data, breaking it down into tokens and finding patterns - is essentially a one-time cost. But in inference, every prompt to a model generates tokens, each of which incur a cost.

That means that as AI model performance and use increases, so do the amount of tokens generated and their associated computational costs. For companies looking to build AI capabilities, the key is generating as many tokens as possible - with maximum speed, accuracy and quality of service - without sending computational costs skyrocketing.

As such, the AI ecosystem has been working to make inference cheaper and more efficient. Inference costs have been trending down for the past year thanks to major leaps in model optimization, leading to increasingly advanced, energy-efficient accelerated computing infrastructure and full-stack solutions.

According to the Stanford University Institute for Human-Centered AI's 2025 AI Index Report, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November 2022 and October 2024. At the hardware level, costs have declined by 30% annually, while energy efficiency has improved by 40% each year. Open-weight models are also closing the gap with closed models, reducing the performance difference from 8% to just 1.7% on some benchmarks in a single year. Together, these trends are rapidly lowering the barriers to advanced AI.

As models evolve and generate more demand and create more tokens, enterprises need to scale their accelerated computing resources to deliver the next generation of AI reasoning tools or risk rising costs and energy consumption.

What follows is a primer to understand the concepts of the economics of inference, enterprises can position themselves to achieve efficient, cost-effective and profitable AI solutions at scale.

Key Terminology for the Economics of AI Inference Knowing key terms of the economics of inference helps set the foundation for understanding its importance.

Tokens are the fundamental unit of data in an AI model. They're derived from data during training as text, images, audio clips and videos. Through a process called tokenization, each piece of data is broken down into smaller constituent units. During training, the model learns the relationships between tokens so it can perform inference and generate an accurate, relevant output.

Throughput refers to the amount of data - typically measured in tokens - that the model can output in a specific amount of time, which itself is a function of the infrastructure running the model. Throughput is often measured in tokens per second, with higher throughput meaning greater return on infrastructure.

Latency is a measure of the amount of time between inputting a prompt and the start of the model's response. Lower latency means faster responses. The two main ways of measuring latency are:

Time to First Token: A measurement of the initial processing time required by the model to generate its first output token after a user prompt.

Time per Output Token: The average time between consecutive tokens - or the time it takes to generate a completion token for each user querying the model at the same time. It's also known as inter-token latency or token-to-token latency.

Time to first token and time per output token are helpful benchmarks, but they're just two pieces of a larger equation. Focusing solely on them can still lead to a deterioration of performance or cost.

To account for other interdependencies, IT leaders are starting to measure goodput, which is defined as the throughput achieved by a system while maintaining target time to first token and time per output token levels. This metric allows organizations to evaluate performance in a more holistic manner, ensuring that throughput, latency and cost are aligned to support both operational efficiency and an exceptional user experience.

Energy efficiency is the measure of how effectively an AI system converts power into computational output, expressed as performance per watt. By using accelerated computing platforms, organizations can maximize tokens per watt while minimizing energy consumption.

How the Scaling Laws Apply to Inference Cost The three AI scaling laws are also core to understanding the economics of inference:

Pretraining scaling: The original scaling law that demonstrated that by increasing training dataset size, model parameter count and computational resources, models can achieve predictable improvements in intelligence and accuracy.

Post-training: A process where models are fine-tuned for accuracy and specificity so they can be applied to application development. Techniques like retrieval-augmented generation can be used to return more relevant answers from an enterprise database.

Test-time scaling (aka long thinking or reasoning ): A technique by which models allocate additional computational resources during inference to evaluate multiple possible outcomes before arriving at the best answer.

While AI is evolving and post-training and test-time scaling techniques become more sophisticated, pretraining isn't disappearing and remains an important way to scale models. Pretraining will still be needed to support post-training and test-time scaling.

Profitable AI Takes a Full-Stack Approach In comparison to inference from a model that's only gone through pretraining and post-training, models that harness test-time scaling generate multiple tokens to solve a complex problem. This results in more accurate and relevant model outputs - but
LINK: https://blogs.nvidia.com/blog/ai-inference-economics/...
See more stories from nvidia

Most recent headlines

04/09/2025

Monumental Sports & Entertainment and Dalet Win Prestigious 2025 NAB Show Project of the Year Award

Monumental Sports & Entertainment (MSE), in collaboration with Dalet, has been a...

23/04/2025

Finding Accountability

In 2024, Thomson's Total Turnout project played a pivotal role in elevating ethical election reporting and strengthening journalist safety in Pakistan. Wit...

23/04/2025

We heard that you could use a little pick-me-up, so get ready to sh-sh-shake it goodit's almost time for the Eurovision Song Contest!

We heard that you could use a little pick-me-up, so get ready to sh-sh-shake it ...

23/04/2025

The Gauge: Poland | March 2025

March shows a further downward trend of time spent watching television; following the February period when winter holidays contributed to a shorter time spent. ...

23/04/2025

The Gauge: Mexico March 2025

During March, audiences in Mexico increased their streaming usage by 2.1 points compared to the previous month, accounting for 24.4% of TV viewing. Disclaimer:...

23/04/2025

Nielsen Report: Asian American Audiences Are Reshaping Sports, Digital Media and Beauty Trends

AANHPI audiences over index the total U.S. for share of time spent with Netflix ...

23/04/2025

Roku Unveils New TVs, Smart Devices and Software Upgrades

SAN JOSE, Calif. Roku has announced new TVs, new streaming devices and significant upgrades to its user interface and software platforms that are designed to st...

23/04/2025

COW Job Listing: Opportunity for a Passionate Feature Film Editor - London-Based, In-Person, Paid Indie Project

COW Job Listing: Opportunity for a Passionate Feature Film Editor - London-Based...

23/04/2025

RM Equity Partners Acquires MAGIX Software, Appoints Robert Rutkowski as CEO to Drive Growth in the Creator Economy

RM Equity Partners Acquires MAGIX Software, Appoints Robert Rutkowski as CEO to ...

23/04/2025

All Men Are Wicked Western Shot with Blackmagic Design

All Men Are Wicked Western Shot with Blackmagic Design Brie Clayton April 23, 2025 0 Comments Blackmagic Pocket Cinema Camera 4Ks were put to the test...

23/04/2025

March Madness, Max Boost Warner Bros. Discovery's TV Viewing Share

NEW YORK Warner Bros. Discovery captured the largest monthly viewership increase among media distributors in March, according to Nielsen's latest Media Dist...

23/04/2025

WAPA+ FAST Channel Launches on Samsung TV Plus

MIAMI, Fla. Hemisphere Media Group has inked a deal with Samsung to launch WAPA+ as a FAST channel on Samsung TV Plus, a free TV streaming service that comes pr...

23/04/2025

Calrec Promotes Sid Stanley to Managing Director

HEBDEN BRIDGE, U.K. Calrec Audio has promoted Sid Stanley to managing director. Stanley, who joined the company in July 2018 as general manager, has been instru...

23/04/2025

FCC Commissioner Simington Names Gavin M. Wax Chief Of Staff

WASHINGTON Federal Communications Commission Commissioner Nathan A. Simington has announced a series of staff appointments made in March and April 2025, includi...

23/04/2025

FCC Releases Agenda for April Open Meeting

WASHINGTON The Federal Communications Commission has set its agenda for the Monday, April 28, 2025 Open Meeting, which is scheduled to start at 10:30 a.m. in th...

23/04/2025

COW Job Listing: Full-Time Video Editor, Remote

COW Job Listing: Full-Time Video Editor, Remote Brie Clayton April 22, 2025 0 Comments Full-Time Video Editor April 23, 2025COW Job Listing: Opportu...

23/04/2025

Berklee Abu Dhabi's Jazz Night Kicks off Global Celebration of Music and Culture

Berklee Abu Dhabi's Jazz Night Kicks off Global Celebration of Music and Cul...

23/04/2025

E.W. Scripps Folding Scripps News, Eliminating 200 Jobs; Stock Jumps 15%

The E.W. Scripps Co. said it was shutting down its Scripps News over-the-air channel effective November 15 and eliminating at least 200 jobs....

23/04/2025

Audio Helps Chess.com Checkmate Cheating

Audio Helps Chess.com Checkmate Cheating Matches are produced using REMI workflows By Dan Daley, Audio Editor Wednesday, April 23, 2025 - 7:00 am Print Th...

23/04/2025

Tech Focus: Intercoms, Part 1 - Key to Onsite, REMI, Hybrid Operations

Tech Focus: Intercoms, Part 1 - Key to Onsite, REMI, Hybrid Operations Intercoms keep increasingly disparate production locations together By Dan Daley, Audio ...

23/04/2025

Speaking My Language: Bringing AI to the Fore With Real Time Audio Translation for French football

Speaking my language: Bringing AI to the fore with real time audio translation f...

23/04/2025

Netflix Drops Trailer for Crime Drama 'Secrets We Keep', Set in Denmark's Wealthiest Neighborhood

Back to All News Netflix Drops Trailer for Crime Drama Secrets We Keep, Set in ...

23/04/2025

World Book Day: Netflix Announces New TV Adaptations and Impact on Book Sales as Sweet Magnolias' Is Renewed for Season 5

Back to All News World Book Day: Netflix Announces New TV Adaptations and Impac...

23/04/2025

Comscore to Announce First Quarter 2025 Financial Results

Comscore to Announce First Quarter 2025 Financial ResultsRESTON, VA, April 23, 2025 Comscore, Inc. (Nasdaq: SCOR), a trusted partner for planning, transacting...

23/04/2025

RT opens Voluntary Exit Programme 2025

In an email to RT staff today, RT Director-General, Kevin Backhurst confirmed that RT 's Voluntary Exit Programme (VEP) 2025 is open from today, Wednesday...

23/04/2025

RT coverage of The Funeral of Pope Francis

RT will air live coverage on television, radio and online of the funeral of Pope Francis which takes place this Saturday, 26 April 2025. Beginning at 8.30am o...

23/04/2025

How the Economics of Inference Can Maximize AI Value

As AI models evolve and adoption grows, enterprises must perform a delicate balancing act to achieve maximum value. That's because inference - the process ...

23/04/2025

Capital One Banks on AI for Financial Services

Financial services has long been at the forefront of adopting technological innovations. Today, generative AI and agentic systems are redefining the industry, f...

23/04/2025

Thales and Michelin drive software revenue growth with innovative simulation software

Facebook Twitter LinkedIn Enables Michelin to focus on development of its ...

23/04/2025

Project G-Assist Plug-In Builder Lets Anyone Customize AI on GeForce RTX AI PCs

AI is rapidly reshaping what's possible on a PC - whether for real-time image generation or voice-controlled workflows. As AI capabilities grow, so does the...

23/04/2025

Enterprises Onboard AI Teammates Faster With NVIDIA NeMo Tools to Scale Employee Productivity

An AI agent is only as accurate, relevant and timely as the data that powers it....

23/04/2025

Dolby Ushers in the Future of In-car Entertainment at Auto Shanghai 2025

April 23 2025, 02:00 (PDT) Dolby Ushers in the Future of In-car Entertainment at Auto Shanghai 2025 Dolby and industry partners demonstrate robust momentum ...

22/04/2025

Amalia Ulman Breaks the Rules With Magic Farm

Chloe Sevigny at the Magic Farm premiere (photo by Michael Hurcomb/Shutterstock for Sundance Film Festival)...

22/04/2025

Celebrate Earth Day With New Music Featuring the Sounds of Nature

Music has the power to connect us to one another and to the world around us. Spotify is once again teaming up with the Museum for the United Nations - UN Live&#...

22/04/2025

Spotify and ELLE Once Again Collaborate in Celebration of Emerging Women in Music

Spotify has an unwavering commitment to supporting emerging artists across all g...

22/04/2025

Celebrate Earth Day With Audiobooks That Educate and Entertain

This year marks the 55th anniversary of Earth Day, celebrated annually on April 22. On Spotify, listeners can explore a world of stories about our planet in our...

22/04/2025

McAvaney takes the lead for SBS' Coverage of the World Athletics Championships in Tokyo in September

McAvaney takes the lead for SBS' Coverage of the World Athletics Championshi...

22/04/2025

VLAST live streams virtual K-pop concerts with AJA Gear

Virtual idols are poised to revolutionize entertainment, with the market projected to top $4 billion by 2029. Crafted with motion capture and game engine techno...

22/04/2025

Singapore Polytechnic Readies Aspiring AV Professionals f...

As the audiovisual (AV) industry rapidly evolves, educational institutions are preparing the next generation of AV professionals for the future. Singapore Polyt...

22/04/2025

Study: Most U.S. TV Viewers Choose Streaming As Default Viewing Option

DENVER A new survey highlights just how important streaming has become in the entertainment landscape. Data compiled by Adtaxi shows streaming isnt just an alte...

22/04/2025

Samsung TV Plus Expands to Nearly 700 Channels in U.S.

Samsung TV Plus has announced that it is now offering nearly 700 streaming channels in the U.S. more than any of the other major FAST platforms, the streaming s...

22/04/2025

Amazon's Prime Vision NFL Feed Wins Over Fans, Hub Says

PORTSMOUTH, N.H. New findings from Hub Research's latest survey on sports and television coverage reveal Amazon may have cracked the code of how to use tech...

22/04/2025

Rockbridge Growth Equity Sells GSTV to MidOcean Partners

DETROIT Rockbridge Growth Equity has sold a controlling stake in GSTV to MidOcean Partners. Rockbridge will retain a minority stake in the video network which i...

22/04/2025

Gray Promotes Dana Neves to Senior Managing VP

ATLANTA Gray Media has promoted Dana Neves to senior managing vice president, overseeing a number of Gray's television markets in the Northeast and Mid-Atla...

22/04/2025

The Infrastructure of Creative Flow: DigitalGlue's creative.space Wins NAB 2025 Product of the Year

The Infrastructure of Creative Flow: DigitalGlue's creative.space Wins NAB 2...

22/04/2025

iZotope unveils Equinox: the ultimate reverb plugin for Post and Music production

iZotope unveils Equinox: the ultimate reverb plugin for Post and Music productio...

22/04/2025

Every way to export transparent videos from After Effects

Every way to export transparent videos from After Effects Graham Quince April 21, 2025 0 Comments In this video, I break down each way you can export ...

22/04/2025

Tribeca Festival 2025 Announces Short Film Lineup

April 22nd, 2025 Press Materials Available Here Tribeca Festival 2025 Announces Short Film Lineup Featuring How I Learned to Die Executive Produced by Osca...