Training and running an artificial intelligence (AI) system requires large volumes of high-performance storage and low latency. In recent years, companies have shifted from on-site servers to cloud-based storage, but as the cost of hosting bills for large amounts of data has increased, some have brought back data on-premise.
In the middle of that, many companies are transitioning to hybrid systems, which start locally, and then ‘burst’ to the cloud as needed.
Techopedia spoke to Daniel Valdivia, an engineer at AI-ready object storage provider MinIO, about the hybrid cloud approach.
About Daniel Valdivia
Daniel Valdivia has 18+ years of experience in engineering, where he began his career as a developer for Mexvax and as a project manager and developer for Expert Creations. After that, he spent about three years at Oracle as a mobile application developer and a programmer analyst.
In addition to Oracle, he spent almost three years as a senior application developer and a senior applications developer within Financial Management for ServiceNow – The Enterprise Cloud Company.
In 2019, he became an Engineer for MinIO, and is on the front-line as enterprises work out their cloud approach in a world of AI.
Q: First off, what is the concept of the ‘hybrid burst’?
A: Companies are noticing that most of the value of their operations and what’s going on is sitting in their data, and the one thing about data is that data keeps growing at a predictable rate.
At some point, being in the cloud with your data is no longer cost-effective — data keeps growing, and the cloud becomes prohibitively expensive.
If you’re paying bills for hosting many petabytes of data on the cloud, at some point it’s just better to just keep it on-premise.
So that’s where we’re moving to, the point of the burst compute, because now companies are repatriating data back from the cloud to on-premise and building these massive deployments. They’re bringing many petabytes back to on-premise and then they’re keeping the data within their securely within their firewalls.
Now, with these new trends of machine learning and AI, which are very compute-intensive, what’s enabling the modern wave of AI is that we finally have enough compute to process massive amounts of data.
But not everyone can buy GPUs now — they’re a commodity, so even if you need them right now, it will take some time for you to procure them.
So, it might be better to say: “I’m going to keep the data — my private data, my most important asset — on my infrastructure, but when I run my AI pipelines or my machine learning pipelines, I’m going to the cloud to spin up my instances with GPUs”.
Instances are expensive, so you’ll spin them up and then start training on the data that you’re hosting on your own private network and then spin them down again.
So that’s where the bursting into the cloud concept comes into place.
You Don’t Leave GPUs Needlessly Spinning
Q: How does this hybrid model address the challenges of balancing and optimizing the use of the cloud or the GPU?
A: It’s important because cloud providers are now trying to make it more attractive to do compute on the cloud. They have these tensor processing units or other specialized training systems.
This is expensive, so you don’t want to spin them up and have them sit there idle, waiting for you to decide what to do with them. Sometimes, you may want to reserve them, but that may be a different story.
If you architect your machine learning pipelines or your AI pipelines to always work straight out of object storage, all the previous stages that come before intensive compute can be performed there. You can pre-process all the data.
Companies are now experts at big data, so now they can pre-process their data, put it back in object storage, and have it ready for the machine learning algorithms to come and take it in a ‘streaming fashion’.
It’s better if you’re training on 10,000 GPUs that all these 10,000 GPUs are just streaming the data straight from your object store as and when data is needed.
Then the next batch is getting loaded into memory, and then it’s getting trained on, and it just keeps things more efficient.
When you’re done training, you turn down everything.
So the idea of a local file system to load the data, and only one experiment can be running at a time, or only one pipeline can be running — it’s not how the cloud is meant to work.
You have to architect and build the things so you can stream straight from object storage and then build in checkpoints so that as you progress training or generate the results, the models can put them back in object storage.
And then the last part of the pipeline is when you start to do inference — that can also be served straight from object storage. You have different types of machines that will be deploying those models. They just need to load from mobile storage and start serving.
When to Be On-Site and When to Be on the Cloud
Q: What considerations should companies consider when deciding what parts of the workload should be on-premise and what should go to the cloud?
A: The main consideration is data — data is the biggest asset, so they should protect it and pick a resilient solution that can tolerate failure and is not prone to data loss. And it needs to be cost-efficient.
When it comes to compute, we see these trends that every three years the compute is just going to get much better. You’re going to spend a lot of money on these expensive compute boxes, and three years later, there will be a newer version that will be even better.
So, that’s where you can mix and match and see the parts of the cloud make you more agile.
The nice thing about cloud is that you can say: “I have a burst of traffic and I need more compute”, so the cloud providers will provide it, and when the traffic’s gone you take it away.
So you don’t need to build your entire infrastructure — that tends to become very expensive. And as you start getting from medium to large companies, it becomes prohibitively expensive.
So, I would suggest architecting your pipelines or data infrastructure to be hosted by the company itself and to have some degree of compute. You can get pretty far with Spark, Presto, and traditional big data analytics tools being run on-premise, but you need specialized compute hardware when it comes to AI algorithms.
Q: How can companies maintain balance and avoid egress cloud costs?
A: No cloud provider will charge you for bringing data into the cloud, and that’s exactly what you’re doing with your training; you’re bringing data out from the data center and bringing data into the cloud.
We see startups and companies setting up on the edge of the cloud. When you’re training deep neural networks, you need to go over the same datasets over and over, so they set up Min IO on the edge of the cloud and send the data.
Once they load the data, they put Min IO hosting on the cloud, and then when they’re done training, they just tear it down, reducing the latency.
Q: What are some of the other applications where this hybrid approach can have cost benefits?
A: With big data, if you don’t want to run the analytics workload, or sometimes you have some workloads that are just too large or too sporadic, every cloud provider has some sort of stack that you can spawn — Amazon, for example, has Amazon Elastic MapReduce — you can spawn compute nodes, and they can preconfigure with all the tools that you need.
You run your pipelines, and when you’re done analyzing, you tear down the setup.
This approach works well for anything that needs to interact with massive amounts of data — for machine learning pipelines, traditional big data analytics, and modern AI pipelines that are now working with massive amounts of data.
It’s amazing to see car companies managing massive infrastructure — 180 servers, multi-petabyte setup — and it’s just one guy. They used to have a whole storage team, and now it’s just one guy who said: “This is simple enough to take care of it”.
Edge, Cloud and Infrastructure Trends for 2024
Q: What are the trends you see for 2024?
A: Last year, with the scare of recession, we saw a lot of companies looking to cut down costs and repatriate a lot of data back to on-premise.
When it comes to trends, we see that now that companies are realizing with these brand new large language models (LLMs) — all this generative AI that’s coming — it’s just getting started. They need massive amounts of data to make it work for their businesses.
From 2020 to 2022, we saw people going hybrid cloud — but now that people realize we can have the same storage layer across cloud providers and do this with their own infrastructure, they start repatriating data.
For example, we saw a large streaming service coming out of Amazon into their own infrastructure. They realized they could set up a large MinIO on the fastest MBME, keep the most up-to-date programming over there, and just serve it to millions of people.
But then, not every program is needed, so they built another separate large MinIO closer with spinning drives and host programs that people want to eventually see over there, and that works great for them. And if they can do it, anyone else can do it.
The cloud was magical to some extent over the last decade because it did things that not everyone knew or had the time to learn.
Now, storage is such an essential feature of every application. Every application out there has storage.
An analogy that I love to make is that every company is building a car, but every car needs a road, and storage is that road.
Now, people are realizing we can have this nice software that can lay out the road for us, and we are seeing this trend where they are taking the initiative to start repatriating data.