According to Nvidia, the less accurate a generative AI application is, the less valuable it is. And that accuracy is based on data.
Combined with the recently announced NVIDIA NIM inference microservice for the Llama 3.1 model, the Nemo Retriever NIM microservice enables enterprises to scale agentic AI workflows, enabling AI applications to operate accurately with minimal intervention or supervision, while delivering the highest accuracy retrieval-augmented generation (RAG).
According to NVIDIA, Nemo Retriever allows enterprises to seamlessly connect custom models to a variety of business data and use RAG to provide accurate responses to AI applications. Essentially, production-ready microservices can build highly accurate AI applications by retrieving information with high precision. For example, Nemo Retriever can improve model accuracy and throughput for developers building AI agents and customer service chatbots, analyzing security vulnerabilities, and extracting insights from complex supply chain information.
The NIM inference microservice provides easy-to-use, high-performance, enterprise-grade inference. The Nemo Retriever NIM microservice also enables developers to leverage these benefits based on their data.
The new Nemo Retriever embedding and reranking NIM microservices are: These models join the collection of NIM microservices easily accessible via the NVIDIA API Catalog.
- NV-EmbedQA-E5-v5: A popular community-based embedding model optimized for text question-answering retrieval
- NV-EmbedQA-Mistral7B-v2: A popular multilingual community base model fine-tuned for text embeddings for accurate question answering.
- Snowflake-Arctic-Embed-L: Optimized Community Model
- NV-RerankQA-Mistral4B-v3: A popular community-based model fine-tuned for text reranking for accurate question answers.
Nemo Retriever NIM microservices consist of two model types: embedding and reranking, and are provided as open and commercial products that ensure transparency and reliability.
Embedding models convert various data such as text, images, charts, and videos into numeric vectors and store them in a vector database, while capturing the meaning and context. Embedding models are faster and less computationally expensive than existing large language models (LLMs).
Reranking models collect data and queries, and then score the data based on its relevance to the query. These models are more computationally complex and slower than embedding models, but their accuracy is greatly improved.
NVIDIA says Nemo Retriever offers the best of both worlds: it casts the data to be searched broadly with embedding NIM, then uses reranking NIM to select results based on relevance. This allows developers using Nemo Retriever to build a pipeline that ensures the most useful and accurate results for their businesses.
Nemo Retriever provides developers with a state-of-the-art, open, commercial model for building text response retrieval pipelines that deliver the highest accuracy. Compared to alternative models, Nemo Retriever NIM microservices provided 30% fewer incorrect answers to enterprise Q&A.
NVIDIA NIM microservices can be used together or individually, giving developers a modular approach to building AI applications. Microservices can be integrated with community models, NVIDIA models, or custom models from users in the cloud, on-premises, or in hybrid environments, giving developers greater flexibility.
Enterprises can use NIM with the NVIDIA AI Enterprise software platform to deploy AI applications in production environments.
It can also run on customers’ preferred accelerated infrastructure, including cloud instances from AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, as well as NVIDIA Certified Systems from global server manufacturing partners including Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro.
NVIDIA Developer Program members will have free access to NIM for research, development, and testing on their preferred infrastructure.
editor@itworld.co.kr
Source: www.itworld.co.kr