AI, Go Fetch! New NVIDIA NeMo Retriever Microservices Boost LLM Accuracy and Throughput
23/07/2024
To help developers efficiently fetch the best proprietary data to generate knowledgeable responses for their AI applications, NVIDIA today announced four new NVIDIA NeMo Retriever NIM inference microservices.
Combined with NVIDIA NIM inference microservices for the Llama 3.1 model collection, also announced today, NeMo Retriever NIM microservices enable enterprises to scale to agentic AI workflows - where AI applications operate accurately with minimal intervention or supervision - while delivering the highest accuracy retrieval-augmented generation, or RAG.
NeMo Retriever allows organizations to seamlessly connect custom models to diverse business data and deliver highly accurate responses for AI applications using RAG. In essence, the production-ready microservices enable highly accurate information retrieval for building highly accurate AI applications.
For example, NeMo Retriever can boost model accuracy and throughput for developers creating AI agents and customer service chatbots, analyzing security vulnerabilities or extracting insights from complex supply chain information.
NIM inference microservices enable high-performance, easy-to-use, enterprise-grade inferencing. And with NeMo Retriever NIM microservices, developers can benefit from all of this - superpowered by their data.
These new NeMo Retriever embedding and reranking NIM microservices are now generally available:
NV-EmbedQA-E5-v5, a popular community base embedding model optimized for text question-answering retrieval
NV-EmbedQA-Mistral7B-v2, a popular multilingual community base model fine-tuned for text embedding for high-accuracy question answering
Snowflake-Arctic-Embed-L, an optimized community model, and
NV-RerankQA-Mistral4B-v3, a popular community base model fine-tuned for text reranking for high-accuracy question answering.
They join the collection of NIM microservices easily accessible through the NVIDIA API catalog.
Embedding and Reranking Models NeMo Retriever NIM microservices comprise two model types - embedding and reranking - with open and commercial offerings that ensure transparency and reliability.
Example RAG pipeline using NVIDIA NIM microservices for Llama 3.1 and NeMo Retriever embedding and reranking NIM microservices for a customer service AI chatbot application. An embedding model transforms diverse data - such as text, images, charts and video - into numerical vectors, stored in a vector database, while capturing their meaning and nuance. Embedding models are fast and computationally less expensive than traditional large language models, or LLMs.
A reranking model ingests data and a query, then scores the data according to its relevance to the query. Such models offer significant accuracy improvements while being computationally complex and slower than embedding models.
NeMo Retriever provides the best of both worlds. By casting a wide net of data to be retrieved with an embedding NIM, then using a reranking NIM to trim the results for relevancy, developers tapping NeMo Retriever can build a pipeline that ensures the most helpful, accurate results for their enterprise.
With NeMo Retriever, developers get access to state-of-the-art open, commercial models for building text Q&A retrieval pipelines that provide the highest accuracy. When compared with alternate models, NeMo Retriever NIM microservices provided 30% fewer inaccurate answers for enterprise question answering.
Comparison of NeMo Retriever embedding NIM and embedding plus reranking NIM microservices performance versus lexical search and an alternative embedder. Top Use Cases From RAG and AI agent solutions to data-driven analytics and more, NeMo Retriever powers a wide range of AI applications.
The microservices can be used to build intelligent chatbots that provide accurate, context-aware responses. They can help analyze vast amounts of data to identify security vulnerabilities. They can assist in extracting insights from complex supply chain information. And they can boost AI-enabled retail shopping advisors that offer natural, personalized shopping experiences, among other tasks.
NVIDIA AI workflows for these use cases provide an easy, supported starting point for developing generative AI-powered technologies.
Dozens of NVIDIA data platform partners are working with NeMo Retriever NIM microservices to boost their AI models' accuracy and throughput.
DataStax has integrated NeMo Retriever embedding NIM microservices in its Astra DB and Hyper-Converged platforms, enabling the company to bring accurate, generative AI-enhanced RAG capabilities to customers with faster time to market.
Cohesity will integrate NVIDIA NeMo Retriever microservices with its AI product, Cohesity Gaia, to help customers put their data to work to power insightful, transformative generative AI applications through RAG.
Kinetica will use NVIDIA NeMo Retriever to develop LLM agents that can interact with complex networks in natural language to respond more quickly to outages or breaches - turning insights into immediate action.
NetApp is collaborating with NVIDIA to connect NeMo Retriever microservices to exabytes of data on its intelligent data infrastructure. Every NetApp ONTAP customer will be able to seamlessly talk to their data to access proprietary business insights without having to compromise the security or privacy of their data.
NVIDIA global system integrator partners including Accenture, Deloitte, Infosys, LTTS, Tata Consultancy Services, Tech Mahindra and Wipro, as well as service delivery partners Data Monsters, EXLService (Ireland) Limited, Latentview, Quantiphi, Slalom, SoftServe and Tredence, are developing services to help enterprises add NeMo Retriever NIM microservices into their AI pipelines.
Use
More from Nvidia
13/01/2025
NVIDIA Statement on the Biden Administration's Misguided AI Diffusion' Rule
For decades, leadership in computing and software ecosystems has been a cornerst...
13/01/2025
NVIDIA Statement on the Biden Administration's Misguided ‘AI Diffusion’ Rule
For decades, leadership in computing and software ecosystems has been a cornerst...
13/01/2025
NVIDIA and IQVIA Build Domain-Expert Agentic AI for Healthcare and Life Sciences
IQVIA, the world's leading provider of clinical research services, commercial insights and healthcare intelligence, is working with NVIDIA to build custom f...
10/01/2025
AI Gets Real for Retailers: 9 Out of 10 Retailers Now Adopting or Piloting AI, Latest NVIDIA Survey Finds
Artificial intelligence is rapidly becoming the cornerstone of innovation in the...
09/01/2025
Hyundai Motor Group Embraces NVIDIA AI and Omniverse for Next-Gen Mobility
Driving the future of smart mobility, Hyundai Motor Group (the Group) is partnering with NVIDIA to develop the next generation of safe, secure mobility with AI ...
09/01/2025
GeForce NOW at CES: Bring PC RTX Gaming Everywhere With the Power of GeForce NOW
This GFN Thursday recaps the latest cloud announcements from the CES trade show, including GeForce RTX gaming expansion across popular devices such as Steam Dec...
08/01/2025
Unveiling a New Era of Local AI With NVIDIA NIM Microservices and AI Blueprints
Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning...
07/01/2025
Why Enterprises Need AI Query Engines to Fuel Agentic AI
Data is the fuel of AI applications, but the magnitude and scale of enterprise data often make it too expensive and time-consuming to use effectively. Accordin...
07/01/2025
Why World Foundation Models Will Be Key to Advancing Physical AI
In the fast-evolving landscape of AI, it's becoming increasingly important to develop models that can accurately simulate and predict outcomes in physical, ...
06/01/2025
Now See This: NVIDIA Launches Blueprint for AI Agents That Can Analyze Video
The next big moment in AI is in sight - literally. Today, more than 1.5 billion enterprise level cameras deployed worldwide are generating roughly 7 trillion h...
06/01/2025
Building Smarter Autonomous Machines: NVIDIA Announces Early Access for Omniverse Sensor RTX
Generative AI and foundation models let autonomous machines generalize beyond th...
06/01/2025
NVIDIA Unveils Mega' Omniverse Blueprint for Building Industrial Robot Fleet Digital Twins
According to Gartner, the worldwide end-user spending on all IT products for 202...
02/01/2025
How AI Is Helping Us Do Better-for the Planet and for Each Other
Artificial intelligence and accelerated computing are being used to help solve the world's greatest challenges. NVIDIA has reinvented the computing stack -...
02/01/2025
GeForce NOW Rings in the New Year With 14 New Games
GeForce NOW is kicking off 2025 by delivering 14 games to the cloud this month, with two available to stream this week so members can get started on their New Y...
30/12/2024
Research Galore From 2024: Recapping AI Advancements in 3D Simulation, Climate Science and Audio Engineering
The pace of technology innovation has accelerated in the past year, most dramati...
27/12/2024
Have You Heard? 5 AI Podcast Episodes Listeners Loved in 2024
NVIDIA's AI Podcast gives listeners the inside scoop on the ways AI is transforming nearly every industry. Since the show's debut in 2016, it's gar...
26/12/2024
Cheers to 2024: GeForce NOW Recaps Year of Ultimate Cloud Gaming
This GFN Thursday wraps up another incredible year for cloud gaming. Take a look back at the top games and new features that made 2024 a standout for GeForce NO...
24/12/2024
From Generative to Agentic AI, Wrapping the Year's AI Advancements
Editor's note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, softwa...
19/12/2024
AI's in Style: Ulta Beauty Helps Shoppers Virtually Try New Hairstyles
Shoppers pondering a new hairstyle can now try styles before committing to curls or a new color. An AI app by Ulta Beauty, the largest specialty beauty retailer...
19/12/2024
NieR Perfect: GeForce NOW Loops Square Enix's NieR:Automata' and NieR Replicant ver.1.22474487139' Into the Cloud
Stuck in a gaming rut? Get out of the loop this GFN Thursday with four new games...
18/12/2024
AI at Your Service: Digital Avatars With Speech Capabilities Offer Interactive Customer Experiences
Editor's note: This post is part of the AI On blog series, which explores th...
18/12/2024
Imbue's Kanjun Qiu Shares Insights on How to Build Smarter AI Agents
Imagine a future in which everyone is empowered to build and use their own AI agents. That future may not be far off, as new software is infused with intelligen...
18/12/2024
NVIDIA Awards up to $60,000 Research Fellowships to PhD Students
For more than two decades, the NVIDIA Graduate Fellowship Program has supported graduate students doing outstanding work relevant to NVIDIA technologies. Today,...
17/12/2024
AI in Your Own Words: NVIDIA Debuts NeMo Retriever Microservices for Multilingual Generative AI Fueled by Data
In enterprise AI, understanding and working across multiple languages is no long...
17/12/2024
NVIDIA Unveils Its Most Affordable Generative AI Supercomputer
NVIDIA is taking the wraps off a new compact generative AI supercomputer, offering increased performance at a lower price with a software upgrade. The new NVID...
16/12/2024
Tech Leader, AI Visionary, Endlessly Curious Jensen Huang to Keynote CES 2025
On Jan. 6 at 6:30 p.m. PT, NVIDIA founder and CEO Jensen Huang - with his trademark leather jacket and an unwavering vision - will step onto the CES 2025 stage....
12/12/2024
Ready Player Fun: GFN Thursday Brings Six New Adventures to the Cloud
From heart-pounding action games to remastered classics, there's something for everyone this GFN Thursday. Six new titles join the cloud this week, startin...
11/12/2024
Driving Mobility Forward, Vay Brings Advanced Automotive Solutions to Roads With NVIDIA DRIVE AGX
Vay, a Berlin-based provider of automotive-grade remote driving (teledriving) te...
11/12/2024
Built for the Era of AI, NVIDIA RTX AI PCs Enhance Content Creation, Gaming, Entertainment and More
Editor's note: This post is part of the AI Decoded series, which demystifies...
11/12/2024
Into the Omniverse: How OpenUSD-Based Simulation and Synthetic Data Generation Advance Robot Learning
Editor's note: This post is part of Into the Omniverse, a series focused on ...
10/12/2024
AI Pioneers Win Nobel Prizes for Physics and Chemistry
Artificial intelligence, once the realm of science fiction, claimed its place at the pinnacle of scientific achievement Monday in Sweden. In a historic ceremon...
10/12/2024
Turn Down the Noise: CUDA-Q Enables Industry-First Quantum Computing Demo With Logical Qubits
Quantum computing has the potential to transform industries ranging from drug di...
09/12/2024
Crowning Achievement: NVIDIA Research Model Enables Fast, Efficient Dynamic Scene Reconstruction
Content streaming and engagement are entering a new dimension with QUEEN, an AI ...
06/12/2024
Thailand and Vietnam Embrace Sovereign AI to Drive Economic Growth
Southeast Asia is embracing sovereign AI. The prime ministers of Thailand and Vietnam this week met with NVIDIA founder and CEO Jensen Huang to discuss initiat...
05/12/2024
2025 Predictions: Enterprises, Researchers and Startups Home In on Humanoids, AI Agents as Generative AI Crosses the Chasm
From boardroom to break room, generative AI took this year by storm, stirring di...
05/12/2024
Stream Indiana Jones and the Great Circle' at Launch With RTX Power in the Cloud at up to 50% Off
GeForce NOW is wrapping a sleigh-full of gaming gifts this month, stuffing membe...
04/12/2024
NVIDIA NIM on AWS Supercharges AI Inference
Generative AI is rapidly transforming industries, driving demand for secure, high-performance inference solutions to scale increasingly complex models efficient...
03/12/2024
NVIDIA Advances Physical AI With Accelerated Robotics Simulation on AWS
Field AI is building robot brains that enable robots to autonomously manage a wide range of industrial processes. Vention creates pretrained skills to ease deve...
03/12/2024
Latest NVIDIA AI, Robotics and Quantum Computing Software Comes to AWS
Expanding what's possible for developers and enterprises in the cloud, NVIDIA and Amazon Web Services are converging at AWS re:Invent in Las Vegas this week...
03/12/2024
How AI Can Enhance Disability Inclusion, Special Education
A recent survey from the Special Olympics Global Center for Inclusion in Education shows that while a majority of students with an intellectual and developmenta...
03/12/2024
New NVIDIA Certifications Expand Professionals' Credentials in AI Infrastructure and Operations
As generative AI continues to grow, implementing and managing the right infrastr...
02/12/2024
Siemens Healthineers Adopts MONAI Deploy for Medical Imaging AI
3.6 billion. That's about how many medical imaging tests are performed annually worldwide to diagnose, monitor and treat various conditions. Speeding up th...
28/11/2024
Get the Power of GeForce-Powered Gaming in the Cloud Half Off With Black Friday Deal
Turn Black Friday into Green Thursday with a new deal on GeForce NOW Ultimate an...
27/11/2024
How RTX AI PCs Unlock AI Agents That Solve Complex Problems Autonomously With Generative AI
Editor's note: This post is part of the AI Decoded series, which demystifies...
26/11/2024
Taste of Success: Zordi Plants AI and Robotics to Grow Flavorful Strawberries Indoors
With startup Zordi, founder Gilwoo Lee's enthusiasm for robotics, healthy ea...
25/11/2024
Why Workforce Development Is Key to Reaping AI Benefits
AI is changing industries and economies worldwide. Workforce development is central to ensuring the changes benefit all of us, as Louis Stewart, head of strate...
25/11/2024
Now Hear This: World's Most Flexible Sound Machine Debuts
A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text. While some AI mo...
21/11/2024
Efficiency Meets Personalization: How AI Agents Improve Customer Service
Editor's note: This post is the first in the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and...
21/11/2024
First Star Wars Outlaws' Story Pack Hits GeForce NOW
Get ready to dive deeper into the criminal underworld of a galaxy far, far away as GeForce NOW brings the first major story pack for Star Wars Outlaws to the cl...
21/11/2024
Into the Omniverse: How Generative AI Fuels Personalized, Brand-Accurate Visuals With OpenUSD
Editor's note: This post is part of Into the Omniverse, a blog series focuse...