Exploring 'Nvidia Chat with RTX' as an Offline LLM

How ‘Nvidia Chat with RTX’ Brings ‘LLMs as an Operating System’ Closer

Earlier this month, Nvidia released Chat with RTX, a free-to-download, generative AI-powered chatbot that users can interact with and customize as long as they have relatively affordable GPUs in their desktops.

Users query the chatbot and locate content stored locally in txt, .pdf, .doc, .docx, and .xml files, which can then be connected to open-source language models like Llama 2 and Mistral.

What’s notable about this approach is that it brings a virtual assistant directly onto the user’s local device without reaching out to online server farms.

In announcing Nvidia Chat, Nvidia said: “Since Chat with RT runs locally on Windows RTX PCs and workstations, the provided results are fast — and the user’s data stays on the device.

“Rather than relying on cloud-based LLM services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.”

Key Takeaways

Nvidia’s latest release of Chat with RTX brings a generative AI-powered chatbot directly onto local devices — with no internet connection.
By using retrieval-augmented generation (RAG), users can process data locally and customize virtual assistants on their Windows PCs.
Combined with Google’s Gemma, these push the momentum towards AI on the desktop rather than web-based, along with decentralized AI trained on specific datasets.

LLMs as an Operating System

For Nvidia, Chat with RTX is a step toward reimagining how LLMS are accessed and used.

Giving users the ability to customize a chatbot locally on a Windows PC, on their own documents and data, opens the door for developers to create their own personalized virtual assistants.

A Step Toward a Decentralized Future

The decentralization of AI has been brewing for quite some time – as researchers look to build their own custom models with specific datasets.

After all, while tools like ChatGPT and Claude can be useful for tasks like text summarization or even content creation, they aren’t a fit for every use case.

Giving developers the option to train open-source models offline with Chat with RTX and local files provides more control over model development by presenting an opportunity to build and train with a highly-curated dataset.

This local training is a key point of differentiation from other solution providers like Google and Microsoft – which have attempted to develop LLM-driven tools powered by web-based data.

In each case, these two tech giants have attempted to allow users to process data stored across their cloud environments and product ecosystems.

For instance, Microsoft Copilot enables users to query data stored in documents, emails, calendars, chats, meetings, and contacts. This means users can ask Copilot questions about documents, pull data from emails, or even use assistants as part of popular 365 Apps like Word, Excel, PowerPoint, Outlook, and Teams.

Likewise, Google Gemini can integrate with Google Workspace products like Gmail, Docs, and Sheets so that users can search for content stored within these tools or even generate content directly with Gemini when using them.

Chat with RTX provides an alternative to this approach by giving users the option to search through content located in their local device rather than stored in the cloud.

The Bottom Line

Chat with RTX highlights that the concept of virtual assistant is evolving to embrace personalization. Local files and data can be as valuable to generating insights as web-based data and are much easier to protect from unauthorized third parties.

The future of generative AI is moving toward personalized models, trained on select datasets and optimized for specific use cases rather than more generic consumer-grade models.