Perhaps you’ve heard of autonomic
computing. It refers to a computer or system’s ability to self-organize and
self-manage. And, until recently, it was still a bit of a futuristic pipe
dream. We wanted to learn a bit more about how an
autonomic system works, so we talked to Ben Nye, the CEO of Turbonomic and the
managing director of Bain Capital Ventures. Turbonomic (formerly VMTurbo)
recently underwent a re-brand in order to more accurately depict what their
software does. The new name incorporates Turbonomic’s core themes in its application
management platform: Turbo (real-time performance), autonomic control
(self-organizing and managing workloads) and economic principles (supply and
demand). Here Ben talks about autonomic systems and the importance of
automation in increasingly complex, data-driven environments.
Techopedia: You’ve appeared numerous times on the Forbes Midas List for top venture capitalists (VCs). As a VC, you have an interesting vantage point to see the whole technology landscape with how much the world has changed over the years. What surprises you as you look back over how much things have changed in the data center?
Ben Nye: The short answer is I think the pace of change in the data center has truly accelerated beyond anything folks saw. What happened was this development of the software defined data center and fundamentally the abstraction away from hardware. That opened up a whole growth drive within the elements of software.
So now, instead of dealing with the refresh cycles of hardware vendors (who for a long time almost served as a gate keeper to the data center), it was now literally opened to the element of how fast can you create ideas – because software, really, is ideas. Without the constraints on idea generation, it’s been a very exciting and fun time, but the pace of change in the data center and even the definition of the data center has evolved materially and more rapidly than ever before.
Something I find very interesting about that is when we went to a software defined data center, all of the controllers and APIs and knobs of the hardware world were redefined in software. What we did [at Turbonomics] was think about this in terms of a new way to drive performance and productivity, which would be to take the application and the change in demand on this application and tie them to the redefined controllers in software because, ultimately, it’s software to software.
When you do that you can now remove the human middleware from between the application layer and the infrastructure layer because now, for the first time, you can tie them directly together- here’s an important word – autonomically, meaning literally allowing the applications to be self-managing and self-organizing.
It also makes it economic in the sense that now demand is finding supply and we’re focused on a consumption model of IT, an economic model instead of an allocation-based model or supply-based model. That is a pretty fundamental twist in the story of how IT or a tech industry management model should run. And it resulted in better performance, and more efficienciency in terms of cost. It also makes customers much more agile and resilient, and makes way better use of labor in the marketplace
Here’s what is so ironic about what’s happened in 2016 with every one of the software-defined data centers. First, you are monitoring your hardware to find out when the applications break, meaning they violated a quality of service or an SLA, but while we’re using software to find the error, we then we go back to the hardware for machine generated alerts. The second clue is we’re allowing applications that run the business to break, and then the third one is we take those repeating machine-generated alerts and we hand those alerts to people.
This has got to be backwards.
And so that’s where we wanted to change the IT management model away from allocations or guessing and back to a demand-based, consumption-based model.
Read: The Demand-Driven Data Center – What System Administrators Can Learn from Wall Street
Techopedia: Now that you mention it, yeah, we’re making software-defined anything, but then the alerts are just being sent to the slow part of the process, which is, as you said, the human middleware.
You mentioned the term autonomic. Can you maybe talk a little bit more about the importance of autonomic systems in IT? Given the name change from VMTurbo to Turbonomic, I’m guessing it’s more important than most people realize.
Ben Nye: Absolutely. First and foremost, the definition of autonomic, when it’s applied to computing, is around systems that can self-manage, self-organize.
So think of Bayesian networks, think of search algorithms, think of big data, which people are now calling “deep learning.” Those are forms of artificial intelligence. What I think is most interesting about Turbonomic is it’s the ultimate form of artificial intelligence because the application workloads are making decisions autonomically in software as to which infrastructure elements they should run on and when they should move themselves, size themselves, start and stop themselves, clone themselves. That is really, really interesting – and we do that by leveraging the abstraction and liquidity afforded by either virtualization, or containers, or clouds.
Then, having a similar abstraction of all the different forms of demands – so you can have VMs, you have containers, you could have JVMs – we’re looking at all these forms of demand and all these forms of supply and they’re abstracted. So, let’s let the demand then pick or match itself to the supply. And then if they’re on one physical host and it starts to congest, rather than start to let it fail and generate an alert and have the application, you know, blow up, why not just simply allow it to make a decision to move itself? As long as you’re pricing in your decision – the move and the cost to the move back – then you can actually make far more interesting resource allocation decisions.
Techopedia: I love the supply and demand analogy. In economic theory, the sources of supply are fixed in the short run and can only change over a long period of time. In what you’re describing – if you keep
that economic analogy – you’re changing the entire paradigm. That is, you can
change supply in the short run, right? You’ve got complete flexibility to
actually be more efficient and, thinking about the resource utilization as a
market, have a nearly efficient market in real time?
Ben Nye: You’re exactly right. It’s an economic
model that becomes the principle around which demand finds supply, but that IT
is managed using economic principles. And as John Maynard Keynes said, “In the
long run we’re all dead.”
Techopedia: I don’t think you’d meet any CIO right now who hasn’t already moved or isn’t seriously contemplating a move to put more resources in the cloud. Where do you see the industry going in the coming years?
Ben Nye: I think you’re going to see a number of changes. It’s pretty clear to us that it’s not going to be an entire replatforming of technology. Just like the mainframe is still here, I don’t think you’ll ever see a 100% replatform. More than likely you’ll see a hybrid world. You’ll have private and public, however I think public would really be public multi-cloud, not public single cloud. In looking at the largest players here, there are only a handful. But when you go to Europe or the rest of the world, you see many carriers that are all clouds as well, and so I don’t think that’s a big leap, right? The real question, though, is how do customers source the right clouds to run their workloads? Our theory behind the company is that any workload should be able to run on any infrastructure, anywhere. Meaning on-prem or off and at any time because, remember, time is surrogate for demand.
So, when demand changes, you may want to burst to the cloud. Or if you’re going to move those workloads to the cloud permanently, what workloads are you going to pull back in? Because now you have capacity in your data center. Why pay twice? And so one of the things that we do together today with Verizon Intelligent Cloud Control but also with other environments is allow customers to base their decision about where to run those workloads, not just on price because price can lock you in, but also more importantly on application performance. Then you can have other considerations such as price, or compliance, or data sovereignty, or security, and other resources that are just fundamentally tradeable resources in this marketplace we’re describing.
Techopedia: That’s the economic model?
Ben Nye: Yeah. So it’s all back to the economic model. Just think about how logical this is. It’s not just an analogy, by the way, it’s actually the way the model works. The workloads have budget and the workloads look at queuing theory and congestion, and so it’s far more expanded. It’s not a linear price increase when it begins to congest; it goes up exponentially, forcing the budget to be impacted and therefore the workload to make a decision to move.
As long as you’ve abstracted away all the complexities in the data center you can now trade IOPS of an XtremIO box, a Pure Storage box, and a Compellent box, and a 3Par box because they all have different IOPS characteristics but the application can therefore buy those resources at its own choosing. It’s no different than looking at CPU or vCPU, MEM or vMEM, right? They’re all tradeable, so should I run here or here? It doesn’t matter! The common commodity here is infrastructure supply.
The common commodity here is the infrastructure supply and the reason that matters is – I’m going to use an analogy – if you remember
back in 1978 we deregulated the airlines. Before that, every seat was the same, we priced them all the same and while it was logical it was wrong because on the consumption side, the willingness to pay was very much differentiated. So, the seats were a commodity, but by changing the focus to demand, the price per seat – even though the seats were the same – you could ascertain different willingness to pay. So what we did is we took the resource that represented the common commodity, published it on the web – first it was Sabre and Apollo, but then it became Travelocity, Kayak and Priceline.
All of a sudden, when you let demand pick the supply, lo and behold the entire industry changed. Load factors went up but the cost of flying went down and the entire airline infrastructure that we have in this country became modernized. It was a great advancement. Oh, and by the way, if you look at Priceline today it is worth $70 billion. That’s more than any airline and they don’t own a single plane.
Techopedia: Interesting. I’ve never really thought of it that way …
Ben Nye: They don’t own a plane, they don’t own a gate, they don’t own a seat, they don’t employ a pilot, right? And then you say, “But what other examples do we have of supply-based centered economy?” Let’s switch. Hotels are supply-based, right? You have a hotel, you can’t move it. You’ve got these rooms but how do you price those rooms? And along comes Hotels.com, and Expedia, and Travelclick, etc. And the same thing happened. You look at restaurants and you’ve got OpenTable. You look at Yellow Pages. That was replaced mostly by Google. You look at classified ads in the newspapers and they were replaced by eBay or Craigslist.
One of my favorite examples is Uber. If you walk around in any city you’ll see a line of cabs waiting for people and then you go up to another part of the same city and there’s a line of people waiting of cabs. And you think, this can’t be right. Then along comes Uber, which uses the smartphone to let demand drive supply. Now, with Uber, you have 90% of demand met within 10 minutes, while in the taxi cab world, 90% of demand is not met within 10 minutes and that’s why Uber’s last round was $62 billion. And remember, they don’t own a cab or a car!
Techopedia: So in a typical data center we’re essentially doing the same thing as hailing a cab, right?
Ben Nye: So think about it this way: The workloads are the budget holders, because that’s why we built the data center. So, they are effectively your humans in this example. Then I have this resource, this common resource, all fully abstracted. That’s called supply and it can be everywhere – it’s everything below the application needs, from the server and computer environment down to the network, down to the storage. Now what we want is to make sure that this is an efficient market. So, those budget holders have to be able to act autonomically, meaning autonomously and in real time given the amount of change in demand on the workload itself or, in this case, on the application. That’s why this is very analogous to demand finding supply. Using this system, you wind up with far better application performance because you’re not waiting on a human labor bottleneck to respond to a machine-generated alert to go make a care-and-feeding decision for the app. You’re instead doing it in real time. And you’re doing it at scale because these institutions, these customers, run thousands of apps a day, and they have to perform.
So, first of all you’re getting a much better performance experience. In addition, you don’t have people spending their days being doers. Instead, they’re going back to being thinkers and they’re not just taking machine generated alerts, they’re thinking about they can actually help the business. They’re thinking about the micro services strategy and the hybrid and multi-cloud strategy and about software defined networks and network functions and virtualization – all these things that actually advance the business and get them out of the world of break-fix application careand feeding, or alert responding.
We’re actually finding that anywhere between 40% and 60% of the data center capital is over provisioned and we can afford a lot of that to be either re-appropriated – so, avoiding the purchase of new hardware – or decommissioned and the reason that matters so much is –
Techopedia: Sorry let me check this, 40-60%? Sorry, that number is astounding.
Ben Nye: Yes. And what’s more important is 14% of electricity in this country is consumed by data centers.
Techopedia: So we could save 5-8% of the nation’s entire electricity consumption if we just didn’t over-provision our data centers?
Ben Nye: Let me give you some back-up to explain to you why, OK? It goes back to the world of a supply-based economy. First, when you have a new application and you’re running an IT shop, how do you size it?
Techopedia: Yeah, you go to the architect and they sort of guess, right? And then they wait until it breaks.
Ben: Exactly. You go to the line of business, and you have a conversation, and they know nothing that you don’t know. So they’re guessing and you’re guessing, and together we try and guess about what the size should be.
So, you’re going to allocate four or eight VCPUs. Now, what’s interesting is that allocation includes a physical footprint, or the virtual footprint on a physical server. Every single time that a request comes from that application, it’s going to be queued as four or eight VCPU. It’s essentially like going to a restaurant and saying you’re a party of four or eight, even though you may only be a party of one. You’ll never get seated.
We over-allocate with our guesses, which means we get the worst performance and it’s widely expensive. That’s problem number one. Problem number two is that now you can’t accurately size your application, which begs the question: how do you place it if you can’t size it?
You’re guessing again. OK, so now we’re guessing on the first thing, we’re guessing on the second thing, then there’s this thing called VM sprawl, or a VM without demand on it. It is left in its state instead of being removed and that reserves hardware as well. Then what we do is try to put all these things together in a human-based historical capacity model and because we only run that once or twice a year, we’ve got to build another hedge in, so we’re talking a 20-30% hedge because demand might increase on all these apps and then we’re going to “close off the cluster,” because we’re going to deem that bunch of hosts to be “full”. Right there you have now locked in as much as half of your data center capacity and it’s over-provisioned.
Techopedia: It’s like you’re set-up for failure, like there’s no possible way in the old paradigm of actually not over provisioning or not having sprawl …
Ben Nye: If all you see and manage is infrastructure supply, how in the world do you know if you have enough supply to be resilient if you don’t see and understand and in real time tie in demand? If all you see is supply, how do you know if you have enough? How do you know if you have too much?
Techopedia: Well, you probably hire a few more heads to guess some more. You spend more money on investigating that problem, don’t you?
Ben Nye: And you still wind up fundamentally over-provisioned on the order of, call it half, and you’re buying hardware unnecessarily. The whole concept behind virtualization in its first instantiation was all around instead of having a dedicated stack of hardware for every single app, I’m going to be able to move these workloads between dedicated stacks and, therefore, the whole idea was to provision hardware to the average of the peaks instead of the sum of the peaks of all that hardware capital.
However, when you now take real-time
autonomic control, performance control, the consumption side of the VM or
container or cloud, and you think about the same thing; what do we do? We go
out and we stress test every single app and there are thousands – there are
hundreds to thousands of apps in an environment depending on the size of the
customer – and so we go stress test those for CPU, for vCPU, for MEM, for vMEM,
and so on that all the different elements or resources right? And then we
provision based on the sum of the peaks again. The difference is if you don’t
have a lag or a bottleneck associated with labor and you can now provision to
the average of the peaks, guess what we can do? We can manage that environment
actively because all the apps don’t ever spike all at one time.
Techopedia: Wow. That’s really getting back to what virtualization was supposed to be all about in the first place.
Ben: This is virtualization or containerization 2.0: Real time, autonomic performance control.
Techopedia: So if the old break-fix loop is an outdated way of thinking, how do you explain that to the average guy on the front line?
Ben Nye: Let me ask you a simple question: Why does one monitor?
Techopedia: Well you want to know what’s going wrong or when something is going wrong, right?
Ben Nye: OK. Yeah. You want to know when it breaks. But why do you want to let it break? That’s the whole question. Look, you’re inevitably going to have some monitoring for some divisions or parts of your data center, but fundamentally, if I can ensure that my applications are running performantly in what we call the desired state, which is the right amount of resources to support them in real time, that’s a far better world than waiting for monitoring, and alerting, and trying to respond to that.
When virtualization first gave rise to the software defined data centers, it was a really interesting advancement, but they took it a step too far because they called themselves the data center operating system of the future and it was straight from the box, right? But if you actually go and look it up the five things that an operating system is supposed to do, the first one is performance management. So, let me ask you, does a hypervisor do performance management?
Techopedia: Of course not.
Ben Nye: No. Right. Then the second thing it has to do is resource allocation. So, does the hypervisor do resource allocation? No.
How about job scheduling? How about reservations? How about planning? No, no, and no. So all of a sudden you realize the way they accomplished this is they generate alerts and the number of alerts grows and grows as we use the resources at a higher level but also as we create more apps, and more forms of workload, and more places in which they can run. All of a sudden, we’re crushing people with all these alerts.
But the biggest thing is that what we’re doing by having humans chase those alerts is turning people into the modern data center operating systems, and that’s weird because, as it turns out, people sleep. People have families, people take vacations, and so people cannot be operating systems and that’s why what we did is we created this application performance control system, Turbonomic, to be able to do exactly those five things. We agree that the hypervisor is a great invention, and containers, and clouds, but we view them as providers of liquidity; they’re not an operating system. The rest of the operating system comes from having an application performance control system. It does those things, it does performance management, resource allocation, job scheduling, reservations, and planning – that’s the whole value of what we have. That’s why we exist in the market place.
Techopedia: Tell me what role you think machine learning or AI plays in this over the next, you know, two to five years? How does Turbonomic with AI change the data center?
Ben Nye: There’s some incredible, interesting inferences one can make in all kinds of different environments. I would say that what we’re doing is being far more precise than that. Remember that one of the issues with large big data sets is that you need time to develop that data and then to correlate it and draw the inferences on that data.
Sometimes, you’ll draw the wrong inference and it’s very hard to know how long it takes for the big data set to unlearn that inference, be it right or wrong. Then at the end it’s still back-ended with a human or some form of static human labor component to actually take an action. In our case, this is autonomic intelligence. It’s not only artificial intelligence and these workloads are truly making decisions on their own in the model but you’re doing it with a degree of precision. It’s far greater than what can be accomplished with simply a big data data set.
Techopedia: If you could leave one message with the average system admin, or the average data center architect, or the average CIO, where are things going to be in the next year or two? What is it that people don’t realize now that they need to know about 2017, 2018 and beyond?
Ben Nye: I think the most important thing is to remember why we entered the technology arena; because we are fundamentally curious, and we want to enable the U.S. economy – or any economy – to do more with less. That’s the way enterprises run and roll. It cannot be right to stick with yesterday’s approach of an allocation- or supply-based model when it requires us to run on the order of roughly 50% over-provisioned, and in an application world of break-fix, and where we’ve turned our labor from thinkers to doers.
There is a better way. The better way is to embrace new ideas and new technologies from new vendors that afford you the opportunity to look at the demand side of the equation, the consumption side of a VM, of a container, of a cloud, and to run more performantly at more scale with smarter labor and better efficiencies in your capital, and flexibility in terms of both agility and resiliency across your operations..
That’s why I found this opportunity so compelling that I wanted to run it, and why I believe in it so fully.
If you would like a free test-drive Turbonomic’s Application Performance Control Platform, you can download here.