• NeMo Retriever, the new service, plays a crucial role in the suite’s frameworks and tools tailored for developing, customizing, and deploying generative AI models.
  • NeMo Retriever allows the integration of current data into a large language model (LLM) from various sources such as databases, HTML, PDFs, images, videos, and other formats.

In order to provide more accurate results, Nvidia Corp. has announced the release of a new generative artificial intelligence microservice that enables enterprise businesses to connect custom chatbots, copilots, and AI summarization tools to real-time proprietary company data.

The recently introduced NeMo Retriever, a Nvidia NeMo cloud-native framework and toolset component, facilitates generative AI models’ development, customization, and deployment. This service is crafted to empower enterprise organizations with the capability to integrate retrieval-augmented generation features into their generative AI applications.

Retrieval-augmented generation (RAG) is a technique that enhances the precision and reliability of generative AI models. It achieves this by supplementing the inherent “knowledge” gaps in large language models with facts and data retrieved from external sources. Initially, a large language model undergoes comprehensive training to acquire general task knowledge and capabilities, encompassing understanding conversational prompts, summarization, and question-and-answer functionalities. Given the expensive and time-consuming nature of training, it is typically performed only once, or infrequently, to prepare the deployment model.

Nevertheless, once in operation, the model will be devoid of real-time information and the latest domain-specific knowledge, potentially resulting in inaccuracies and occurrences known as ‘hallucinations.’ This refers to instances where a large language model responds confidently but incorrectly to a question.

With NeMo Retriever, current data can be integrated into an LLM from various sources, such as databases, HTML, PDFs, images, videos, and other formats. Consequently, the model gains a comprehensive collection of facts sourced from the enterprise customer’s proprietary data, ensuring updates as new information emerges. This data can be stored anywhere, including cloud environments, data centers, or on-premises, and accessed securely.

Vice President of hyperscale and high-performance computing at Nvidia, Ian Buck, said, “This is the holy grail for chatbots across the enterprise because the vast majority of useful data is the proprietary data that is not the publicly available data embedded inside of these models but what is available inside companies. So, combining AI with a customer’s database makes it more productive, more accurate, more useful and lets customers optimize models’ capabilities.”

By integrating proprietary data, inaccurate answers can be minimized, as the LLM gains improved contextual information for generating results, thereby enhancing accuracy. Analogous to research papers citing their sources, Retriever’s RAG capability supplements additional sources of expert information from a company’s internal domain-specific knowledge. This augmentation better equips the LLM, providing more precise and accurate responses to posed questions.

In contrast to community-driven open-source RAG toolkits, Nvidia emphasizes that Retriever is specifically crafted to endorse commercial and production-ready generative AI models. These models are pre-optimized for RAG capabilities, offering enterprise support and managed security patches.

Nvidia is already collaborating with enterprise clients like Dropbox Inc., SAP SE, ServiceNow Inc., electronics systems designer Cadence Design Systems Inc., and others to leverage the new feature to integrate RAG into their custom generative AI tools, apps, and services.

According to Anirudh Devgan, President and CEO of Cadence, the company’s researchers are collaborating with Nvidia to use Retriever to improve accuracy and help make higher-quality electronics. Devgan said, “Generative AI introduces innovative approaches to address customer needs, such as tools to uncover potential flaws early in the design process.”

According to Buck, leveraging Retriever enables customers to obtain more accurate results when training generative AI models in less time. This streamlines the process for enterprise customers, allowing them to deploy off-the-shelf models and use internal data without the extensive time, cost, and effort traditionally required to maintain model consistency.

Within Nvidia AI Enterprise, NeMo Retriever will incorporate the aforementioned RAG capabilities, forming an integral part of this end-to-end cloud-native software platform designed to simplify AI application development. Developers can now register for early access to NeMo Retriever.