Unlocking AI with Subsquid

Subsquid Editor

May 3, 2024 • 4 min read

TL;DR

AI requires high-quality data
Subsquid offers easy access to on-chain data at scale
Using Blockchain data one can build trading bots ensure long-term value growth, even during downturns

If you look around Crypto Twitter in 2024, you’ll notice that every project has suddenly become an AI project. From those focused on providing GPUs for training AIs to those that simply added the term AI to their website, the spectrum is broad, and the path to going from serious AI project to memecoin is not far. All it takes is one thread by an (in)famous thought leader.

Needless to say, people have started asking us if Subsquid is an AI project. The hurdle to identifying as such is low; just as you can identify as a season these days, you can be an AI company.

Yet, Subsquid provides something much more valuable than an interface to GPT-4 that answers questions about our developer docs.

Often forgotten when there are shiny aspects to AI, such as making generative images or having it discuss ethical dilemmas, is that without data, AI is nothing.

While others are still busy discussing how AI will take all our jobs, others have noticed a problem with some AI models: they produce bad output. Sometimes, the output is just incorrect and attributed to hallucinations. Other times, it’s outright prejudiced or woke.

The AI model isn’t the one at fault. It’s the training data. Data plays a foundational role in AI. The only way to create good outputs is to feed it quality data to train. Otherwise, it won’t be very effective.

It’s fair to say that AI is only as intelligent as the data it’s been given - allows it to be. If you really want to mess with AI, all you have to do is generate a lot of terrible input to poison its data sets.

Often, vast amounts of input are needed for complex models such as Chat-GPT. GPT-3 was trained on data sets containing 570 GB of text from books, websites, and articles to provide it with contextual data and speech patterns.

More often than not, the authors of the text weren’t informed. However the ethics of using people’s data to train an AI model without their consent warrants a whole separate discussion.

The good news for researchers and developers looking to train models based on blockchain data is that all of that data is publicly available anyway. What’s more, since blockchains go through a process of decentralized consensus, the data has high integrity and is hard to corrupt.

Despite blockchain data in theory being public, it remains hard to extract if you go straight to the source: the blockchain nodes. That’s why many developers so far have relied on centralized providers that offered the information via API.

Subsquid is a decentralized alternative to such providers, making Web3 data of more than 150 EVM, SVM, and Substrate networks available through its decentralized data lake and query engine. The easiest way for AI developers and researchers to get started is by using the Subsquid SDK, an open-source, permissionless toolkit facilitating the creation of Squids, the name we use for indexers.

Since all blockchain data is stored in raw format in the Subsquid data lake, users are free to specify what pieces of the data, and in which format they want to extract it - making it a perfect match for integration with ML/AI pipelines.

By solving the most challenging part of accessing multichain Web3 data, Subsquid powers MEV and AI researchers who can use our SDK to extract high-quality, curated data. As such, we’re contributing to the next generation of AI models, trained on blockchain data.

Here is an idea of how that might look like:

Take the saying “Sell in May and stay away”, often thrown around at this time of the year. Using an AI model, researchers could analyze all historical data of crypto market movements taking into account things like havings and FED rate cuts, to then come up with a trading bot that’d still make money - even if, or especially when, everyone else is losing it.

Another way blockchain data could benefit Web3 overall is by training it to recognize threats early on (bad news for Lazarus). Already, some auditing companies are experimenting with augmenting their security team’s skills with AI.

While crypto is still niche, our open-source, community-driven approach, where anyone can deploy an indexer customized to their use case, makes Subsquid an attractive solution for those looking to gain insights from and train models on blockchain data.

What’s more, even if the AI hype dies down eventually, Subsquid’s data will remain in demand as it powers any dApp that doesn't want to show users an empty interface upon connecting.

After all, data can exist without AI. But AI can’t exist without data.