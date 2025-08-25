Hallucinations are one of the biggest obstacles to scaling generative artificial intelligence (AI). Large language models (LLMs) can produce well-written and convincing answers to prompt questions, but they often fabricate facts, misquote data, or cite unreliable sources. One emerging approach is to pair AI with blockchain to improve data integrity and traceability.
Blockchain ledgers can anchor model outputs to real, auditable sources. At the same time, new standards like the model context protocol (MCP) are gaining traction as tools for coordinating multi-agent AI systems.Techopedia spoke with Matthijs de Vries, founder of Nuklai, which recently launched Nexus, an AI+data engine on data validation, about the blockchain’s role in explainable AI, and why protocols like MCP are becoming central to the next generation of AI infrastructure.
Key Takeaways
- AI has been getting smarter, but hallucinations remain a challenge to scalability. Solving them requires anchoring model outputs to verifiable, real-world data.
- Blockchain can add trust and transparency by providing tamper-proof provenance and audit trails for the data that AI systems rely on.
- Integration with decentralized compute and storage networks such as io.net and Filecoin enables users to access a range of open-source LLMs on distributed systems, which is key for Web3 applications.
- Businesses with large volumes of data don’t necessarily need blockchain technology, which would require the data to be replicated across nodes. Blockchain adds value when it comes to traceability and revenue distribution mechanisms.
- Protocols like MCP are emerging as critical infrastructure for advanced AI applications, enabling coordination across multi-agent systems.
About Matthijs de Vries
Matthijs de Vries is a serial entrepreneur with a passion for blockchain, data, and the future of AI.
He is the founder of Nuklai, a decentralized data infrastructure provider and the team behind Nexus – a flagship query engine backed by io.net, Filecoin, and Fetch, built to solve AI hallucinations by delivering traceable, source-connected insights.
Matthijs is also one of the active developers of Nexus and has co-founded projects including Nexera (tokenization infrastructure) and Nuant (DeFi trading strategy software).
Integrating Blockchain into AI Systems
A: It doesn’t necessarily eliminate hallucinations – it contributes, more or less as an enabler, when it comes to hallucinations and combating them. One approach is to provide structured data, enabling the LLM to form its answers using this data, rather than relying on its own training data and assumptions. That data alone is already enough for business-specific purposes.
You trust your business’s own data; you don’t need someone else to validate that data for your internal processes. If you use Nexus as a solution on top of your own data to get more insights, you don’t necessarily need blockchain technology. Nexus is a product that stands next to Unix, our Layer-1 blockchain. They can operate independently, but they can enhance each other.
When discussing data that will be shared outside your organization, you should start asking questions. Where did the data come from? Who generated it, who timestamped it? Not all data is free to use. Look at how OpenAI came to life, took the world by storm, but also did some damage. The New York Times, for example, sued OpenAI for scraping millions of articles without paying a dime.
Businesses sit on a lot of untapped data because they’re scared of who’s going to use it, and if they are going to use it, they at least want some payment. Having blockchain as a revenue distribution access control mechanism is when it matters. Indirectly, in those kinds of use cases, it contributes to battling hallucinations of LLMs, but doesn’t always have to be the solution, to be realistic.
A: Blockchain is, first and foremost, insurance. It doesn’t necessarily make things easier – it makes things trustless. There are specific cases where you would say, I’m a business inside the EU. I sit on automotive data, and I only want to sell it to companies outside the automotive industry, inside the EU. I don’t want Russia to have access to this data, for example.
You could have a KYC with every business that is going to attempt to buy your data, which is going to be very expensive and time-consuming, or you could leverage existing decentralized ID protocols, zero-knowledge proofs on-chain. You don’t necessarily know where the buyer is from, but you can guarantee it’s from inside the EU, and it’s not from your industry.
Consider a common example: a car insurance provider allows you to install a device in your car that collects data on your driving habits, including speed. This data is then used to determine if you need to pay an additional premium on top of your monthly insurance payment. But what if we have a mechanism where the insurance provider could potentially use this data outside of the organization? It could be used for infrastructure optimization, and then a government or commercial company can buy access to that data. It’s not fair if only the insurance provider is paid for that.
If we have a mechanism where all the contributors push the data into a community-owned or collectively-owned data set, and each time that data is purchased, whitelisted, or accessed, most of that revenue is distributed to the drivers that actually generate the data. Blockchain is a fair and logical technology to facilitate that.
When it comes to the verifiability of data, a common use case is a truck that transports frozen or chilled goods and has temperature sensors. It logs or transmits the temperature every few minutes. If there is a dispute because the food arrives spoiled, you need to see the temperature data.
If we have an on-chain system where this sensor data is logged – so that you can see, not the data itself but a hash of the data – you can see the data hasn’t been tampered with, and you can use it for insurance purposes or dispute resolution.
A: Businesses know when their data is trustworthy or not, and when it comes from outside businesses, you either trust the business or not. If someone makes a variation of the data, there should be a link that connects it to the original dataset. If you can dig far enough to the source of the original data, you will see if somewhere down the line it became data that results in fake news.
That’s where blockchain contributes. You don’t have to store the data itself on-chain, just a hash of the data. If you can find the original data, you can hash it through the same algorithm, and these hashes compare on-chain, then you know you’re dealing with the same data. This way, you have privacy-preserving information on-chain, and still can determine if the data is real or not.
Powering AI Through Decentralized Compute
A: They have distributed compute; they scale on tens of thousands of GPUs and other hardware. In our case, they also host a couple of open LLM models like DeepSeek, Qwen, and now OpenAI’s new GPT SSO on distributed infrastructure.
You can use LLMs that are not necessarily hosted by a closed enterprise; that is important, especially for Web3. The added benefit is that users can access those models for free up to a daily limit, which resets every day.
Because Nexus is completely LLM model agnostic, users have to bring their own LLM through an API key. Io.net is a nice way to help lower the barrier if potential customers want to play around and see how to use Nexus, but they don’t want to spend money right away, because they can get an API key for free. They can select a model of their choice that’s open source, decentralized, and hosted, and get started right away.
A: There have been massive steps taken in the last year when it comes to decentralized AI, and where we are today is already good. What io.net does with decentralized, hosted models is cool. The fact that OpenAI brought out the new open model is a good step for small and medium businesses.
For some use cases, decentralization is not necessarily an added value – it could actually work against the business. You need massive amounts of data for training LLMs, and on a properly decentralized network, all that data would need to be replicated. If there are terabytes of data, then every node needs to hold a copy of those terabytes.
When it comes to Nexus and the data that’s being onboarded, we don’t store that data; we transform it. The data stays on-premises. We only receive and manage access to it, so the connection is our resource. And that means the data never comes on-chain. There are not a lot of use cases where that makes sense. And where it does make sense, we use Filecoin, for example.
A: Yes, this is the problem of the industry sometimes, thinking we need to decentralize everything we find.
We build solutions, and then we look for a problem. Instead, we need to see decentralization as an insurance policy that allows us to say, it was like this at some point in time, and it cannot be changed, because a smart contract is just like this.
This is where blockchain matters – replication of data is not necessarily a good use case for blockchain, and if it is, it’s already solved by Filecoin.
Future Protocol Applications
A: MCP is simply genius. A model context protocol is necessary for AI to start doing. An LLM generates text, so it cannot really do anything except guess the next word and show it.
MCP is simply a way for an LLM to know what kind of structured text to generate. The data can be intercepted from the LLM’s answer and used to execute functionality that outputs data and processes it in the answer. That functionality can book a hotel room, start a car, turn off lights, and so on.
There’s a first-mover advantage for those who implement MCP fully.
We saw that opportunity, so we immediately went all-in on MCP. If I go to Nexus and configure my agent and its capabilities, I have thousands of tools I can connect with.
That’s also why it’s not necessarily a first-mover advantage, because MCP allows us to work together in a unified way, so that we all can become stronger and become valuable. It’s a bit of both. Eventually, we are all going to connect and work together.
A: We are working on our own model that’s optimized for SQL generation. Although it can only answer in SQL, its usefulness will increase because the quality and complexity of SQL output will significantly improve, enabling it to handle more complicated tasks and generate more valuable insights.
We are also going to implement a feature where you can create agentic themes. You can have an agent that can write code and an agent that can access your data, and they can work together as an orchestrator that manages the work and lets you know if it needs intervention from a human to continue or provide additional information or feedback.
Some platforms focus on marketing-driven lead generation and content creation, but we are examining this for scientific purposes, such as cell analysis.
I want to contribute to those kinds of use cases with Nexus, where data is an important factor, because it all starts with the data from the cell analysis, and then you have all these different steps that are run by autonomous agents.
It will have a far more positive impact on the world than another content-generation AI agent system.
The Bottom Line
As AI systems move deeper into enterprise and mission-critical use cases, reliability is no longer a nice-to-have; it’s a necessity. By tying AI outputs to verifiable data sources and using blockchain to secure provenance, developers are working to make models more trustworthy and explainable.
Emerging protocols like MCP and the rise of decentralized compute suggest that the next phase of AI will be shaped by infrastructure designed to reduce hallucinations and improve accountability.