Rebecca Jozwiak: Ladies and gentlemen, hello, and welcome to Hot Technologies of 2016. Today we’ve got “Edge Analytics: The IoT Economy at Last.” My name is Rebecca Jozwiak. I will be your moderator for today’s webcast. We do tweet with a hashtag of #HOTTECH16 if you want to join the Twitter conversation.
So the IoT, definitely a hot topic this year and the internet of things, it’s really about the machine data, sensor data, log data, device data. None of which is new, we’ve had that type of data forever, but it’s that we haven’t really been able to use it and now we’re seeing just a ton of new ways to use that data. Particularly in the medical industry, the financial markets, with oil and gas, commodities, it’s just a wealth of information that’s previously been untapped. And not a whole lot of people have really gotten a good grasp on how to do that well. We’re talking about a lot of little data, but it’s a lot of data and, you know, there’s network issues involved, there’s hardware involved, or needs to be processing, and how do you do that without clogging up your system? Well that’s what we’re going to learn about today.
Here’s our lineup of experts. We’ve got Dr. Robin Bloor, our chief analyst at The Bloor Group. We also have Dez Blanchfield, our data scientist at The Bloor Group. And we’re happy to have Shawn Rogers, director of global marketing and channels from Dell Statistica. And with that, I’m going to pass the ball to Robin.
Dr. Robin Bloor: Okay, well thank you for that. I shall push a button and throw up a slide. I have no idea why I created this apocalyptic picture for the internet of things. Possibly because I think it’s going to get chaotic in the end. I’ll move straight on. This is par for the course in any IoT presentation. You have, in one way or another, to say something outrageous about where it’s all going. And actually, most of this is probably true. If you actually look at the way that these curves are gradually expanding. You know, personal computers, smartphones and tablets are probably going to continue to rise. Smart TVs will probably rise. Wearables, they’re probably exploding right now, compared to what they were a few years ago. Connected cars, inevitable that pretty much all cars are going to be connected thoroughly wide and thoroughly transmitting data all the time. And everything else. And this particular graph by BI Intelligence indicates that everything else will outweigh the obvious things very, very quickly.
So what to say about the IoT? The first thing is just an architectural point. You know, when you’ve got data and you’ve got processing you, in one way or another, you’re going to have to put the two together. And with data at the volumes it is now, and gathering in various places, the two aren’t naturally together anymore. They used to be in the old mainframe days, I guess. So you can think in terms of there being a processing layer, a transport layer and a data layer. And in one way or another, transport layer nowadays is going to move the processing around or move the data around across networks. So here are the choices: You can move the data to the processing, you can move the processing to the data, you can move the processing and the data to a convenient execution point, or you can shard the processing and shard the data. And as regard to the internet of things, the data is pretty much already sharded at birth and the likelihood is that an awful lot of the processing is going to be sharded in order that the applications that need to be run can take place.
So I’ve painted a picture. The interesting thing to me about the IoT, I talk about an aggregation domain in this diagram, and I point out that there are sub-domains. So you can imagine that the IoT domain 1 here is a car of some kind, and domain 2 and domain 3 and domain 4, are cars of some kind, and you will aggregate data locally, you will run local apps on that data, and you will put various things into action. But in order to have analytics about all of the cars, you’re going to have to transfer data to the center, not necessarily all the data, but you’re going to have to aggregate in the center. And if you think about this, then you may want to have many, many different aggregation domains to the same set of IoT things. And the domains themselves might further aggregate. So you could have this repeating hierarchy. And basically what we’ve got there is an incredibly complex network. Far more complex than anything we had to have before.
I’ve got a note at the bottom here. All network nodes, including leaf nodes, can be data creators, data stores and processing points. And that gives you a possibility of distribution, the like of which we haven’t seen before. Dez is going to talk a bit more about that, so I shall move on to this particular point. Once we’re at the internet of things and all the data has actually resolved into being events, the point about this slide is just to indicate that we’re going to have to standardize on events. We’re going to have to, at the very least, we’re going to have to have this. We’re going to have the time the event occurred, the geographic location it occurred, the virtual or logical location of the process that created it, the source device that created it, device ID so you know exactly which source device created it, ownership of the data and actors, those people who have a right to use the data in some way or other, it’s going to have to carry its permissions with it, which means really, it’s going to have to carry security with it, and then there’s the data itself. And when you look at this you realize that, you know, even if you’ve got a sensor that’s doing nothing more than reporting the temperature of something every second or so, there’s actually quite a lot of data just to identify exactly where the data originated and what it actually is. By the way, this is not an exhaustive list.
So, in terms of the future IT landscape, the way that I see it is this: that it isn’t just the internet of things, there’s also the fact that we will be in a world of event-driven activity, and therefore we will have to have event-driven architectures, and those architectures are going to have to span large networks. And the other thing is real-time everything, it isn’t necessarily the case for us to be real-time but there is something I refer to as business-time which is the time within which data actually has to be served up and ready to processed. That might not be, you know, a millisecond after it’s created. But there is always such a time for every piece of data and once you have an event-driven architecture it starts to become more sensible to think in terms of a real-time approach to the way that the world works.
So boiling it down, because what we’re really actually talking about is analytics on the IoT. Despite all of that, it’s still all about time to insight, and it’s not just time to insight, insight has to be followed by actions. So, time to insight and time to action is what I would boil it down to. Having said that, I shall pass the ball back to Dez.
Dez Blanchfield: Thank you, Robin. Insightful as always. I love the fact that it’s a hard act to follow on every instance, but I will do my best.
One of the things that I’m seeing, and I’m often entertained by it, to be honest, and not in a disingenuous and negative slant form, but there’s a lot of concern and panic about the internet of things taking over the world and slotting us and you’ll start to lose your data, so I want to have a bit of a look back at some of the things that we’ve done before in the last two to three decades that were a close facsimile to the internet of things, but maybe not quite at the same scale. And just to show ourselves that we’ve actually been here and solved some of the problems, not at this level of scale and not at this speed. Because it means that we can actually solve the problem and that we know what some of the answers are; we’ve just got to hunker down and reapply some of the learnings we had before. And I know this is the entire conversation we’re about to have and I’ve got a whole range of fun things just to chat through about in the Q&A section.
But when we think about the internet of things in the circle, there’s a great deal of centralization currently at a design level that was written in the very early days. Fitbit devices, for example, all tend to go to one central place and it’s likely to be hosted in a cloud platform somewhere and all that data from all those devices hits the same, let’s just say, front end of a stack, including web and app and data-based services. But over time that scale will require a re-engineering to cope with the amount of data that’s coming at them and they’ll re-engineer that so there’s multiple front ends and multiple copies of the stack in multiple locations and regions. And we’re seeing this and there’s a number of examples that I’m going to give you that we can discuss.
The key point of this is that even though we’ve seen some of these solutions that I’m about to cover, the scale and volume of the data and the network traffic that the internet of things will generate does urgently require a shift from central to distributed architectures in my view, and we know this but we haven’t necessarily grasped what the solution is. When we think about the concept of what the internet of things is, it’s a large-scale network model. It’s lots and lots of things that are now making noise. Things that didn’t make noise up until recently. And in fact, I think it was yesterday, I was jokingly talking about the stack, but I went to buy a new toaster and it came with an option that could tell me various things, including when it needed cleaning. And a new microwave with a very similar feature and could even actually ping an app on my phone to say that the thing that I was reheating was now done. And I’m very much of the opinion that if there’s a couple of things I don’t want talking to me it’s my fridge, microwave and toasters. I’m pretty comfortable with them being dumb devices. But I’ve got a new car recently, a little Audi, and it does talk to me and I’m quite pleased with that, because the things that it talks about are things of interest. Like updating maps in real-time to tell me where there’s a better route to get from point A to point B because it’s detected traffic through various mechanisms with data it gets sent.
I have this slide. We’ve already seen the high-volume network models require a shift from central to distributed capture and delivery of data processing and analytics models. We’ve seen things move from the three little graph diagrams there on the right-hand edge where we’ve got, the one on the left out of the three, there’s a centralized model with all the little devices come to the central location and collect data and the scale isn’t so great, they cope just fine there. In the middle we’ve got a slightly more decentralized model and hub and spoke, which is what I think we’re going to need with the internet of things in the next generation. And then on the right-hand side we’ve got this fully distributed and meshed network which is where the internet of things and machine-to-machine is going to go in the very short term in the future, but we’re not quite there for a range of reasons. And predominantly because we’re using internet platforms for most of the communications so far and we haven’t actually built a second network to carry a lot of this data.
There are second networks that exist already such as Batelco network. A lot of people don’t think about the fact that the telecoms' networks are not internet. The internet is a very separate thing in many ways. They’re routing data from smartphones over the phone networks, and then over the phone networks and into the internet in general where they’re actually layering them in two networks. But it’s entirely possible and likely that the internet of things will need another network. We talk about the industrial internet as a topic generally, which we won’t go into detail now, but essentially we’re talking about another network that’s specifically designed for the types of carriage for data or internet of things and machine-to-machine communication.
But some of the examples I wanted to share where we’ve seen high-volume networks and distributed data work very well are things like the internet. The internet was specifically designed and architected from day one to be capable of surviving a nuclear war. If parts of the U.S. are blown up, the internet was designed so that data could move around the internet without packet loss for reasons that we’re still connected. And that still exists today on a global scale. The internet has multiple capabilities around redundancy and routing packets. And in fact the internet’s controlled by a thing called BGP, Border Gateway Protocol, and the Border Gateway Protocol, BGP, is specifically designed to cope with either a router or switch or server being down. When you send or receive an email, if you send three emails in a row there’s no guarantee that each of those emails will follow the same route to the same end destination. They may move through different parts of the internet for various reasons. There could be an outage, there could be maintenance windows where things are offline to be upgraded, there could just be congestion in the network, and we see that with things like traffic networks with cars and public transport and ships and planes. We get content to our devices like our laptops and tablets and computers through browsers and so forth every day through content delivery networks. Content delivery networks are about taking copies of content from your primary serving platform such as the web server and moving copies of that and the cache small amounts to the edge of the network and only delivering it to you from the nearest part of the edge.
Anti-spam and cybersecurity – if a spam event takes place in Canada and Microsoft detects it and see that there’s lots of copies of the same email being sent to a group of random people, checksums are taken on that, a signature for that message is created and put into a network and distributed immediately. And so that email never gets into my inbox, or if it does, it gets tagged as spam immediately because it’s been detected somewhere else in the edge of the network. And so other parts of the edge of the network are told about this spam message signature and it’s put into a database’s index and if those messages start appearing on the other side of the planet, we detect them and we know they’re spam. And the same applies to cybersecurity. A hack that’s taking place on one side of the planet is detected and registered and mapped and all of the sudden on the other part of the network we can fight it and file the rules and policies and change to see if we can block it. Particularly with the new impact of things like denial-of-service or distributed denial-of-service where thousands of machines are used to attack a central website.
Bitcoin and the blockchain, are by default, in its nature is a distributed ledger, the blockchain, and copes with any outages or breakages in the network. Fraud detection and prevention, power and water utilities – we’re seeing, you know the power network, if one part of the network gets a tree land on it and takes out a pole and a wire, my house still gets power. I don’t even know about it, I often don’t even see it in the news. And we’re all used to the transport networks where originally there was a centralized model, “All roads led to Rome,” as they say, and then eventually we had to go to the decentralized model with hubs and spokes, and then we went to a meshed network where you could get from one side of the city to the other through various meshed routes and different intersections. And so what we see here is that this centralized model of what we’re doing now with the internet of things is going to have to push out to the edge of the network. And this applies to analytics more than ever, and that is that we need to push analytics out into the network. And to do that it requires a completely new approach in how we access and process that data and the streams of data, in my view. We’re talking about a scenario now where I believe we see limited intelligence pushed out to the edge of the network on internet-connected devices, but we’re soon going to see those devices increase in intelligence and increase the level of analytics they want to do. And as a result of that we’re going to need to push those smarts out further and further through the network.
For example, smart apps and social media – if we think about social media and some of the smart apps, they are still very central. You know, there’s only two or three data centers for the likes of Facebook. Google have gotten a lot more decentralized, but there’s still a limited number of data centers around the world. Then when we think about content personalization, you have to think down at a very local level. A lot of that’s being done in your browser or at a local content delivery network layer. And we think about health and fitness trackers – a lot of the data that’s being collected from them is getting analyzed locally and so the new versions of the Garmin and Fitbit devices you put on your wrist, they’re becoming smarter and smarter in the device. They now don’t send all the data about your heart rate back to a centralized server to try and get the analytics done; they’re building that intelligence directly into the device. In-car navigation, it used to be that the car would constantly be getting updates and maps from a central location, now the smarts are in the car and the car’s making decisions all by itself and eventually the cars will mesh. The cars will talk to each other via wireless networks of some form, that may be over a 3G or a 4G wireless network in the next generation, but eventually it’ll be device to device. And the only way we’re going to cope with the volume of that is by making the devices smarter.
We already have emergency warning systems that will collect information locally and send that centrally or into a mesh network and make decisions about what’s happening locally. For example, in Japan, there’s applications that people run on their smartphones with accelerometers in the smartphone. The accelerometers in the smartphone will detect vibrations and movement and can determine the difference between just normal day-to-day movement and the tremors and shocks of an earthquake. And that phone will start to alert you immediately, locally. The actual app knows that it detects earthquakes. But it also shares that data up through a network in a distributed hub and spoke model so that people near you get warned immediately or as soon as possible as the data’s flowing through the network. And then eventually when it gets to a central location or a distributed copy of the central location it pushes back out to people who are not in the immediate area, haven’t detected the movement of the planet, but need to be warned of it because maybe a tsunami’s coming.
And smart city infrastructure – the concept of intelligent infrastructure, we’re already building the intellect into smart buildings and smart infrastructure. In fact, yesterday I parked my car in the city in a new area where part of the city’s being refurbished and rebuilt. And they’ve re-done all the streets, and there’s sensors in the streets, and the actual parking meter knows that when I’ve driven in with a car, it knows that when I go to refresh for the two-hour limit that the car hasn’t moved, and it wouldn’t actually let me top up and stay for another two hours. I had to get in the car, pull out of the space and then pull back in to trick it to allow me to stay there for another two hours. But what’s interesting is that eventually we’re going to the point where it’s not just detecting the car entering the area as a localized sensor, but things like optical characteristics where recognition will be applied with cameras looking at my license plate, and it will know that I actually just pulled out and pulled back in and tricked it, and it just won’t let me renew and I’ll move on. And then it’ll distribute that data and make sure that I can’t do that anywhere else and trick the network on an ongoing basis as well. Because it has to, by nature, get smarter, otherwise we’ll all continue to fool it.
There’s an example of this that I’ve actually personally lived in where in firewall technology, in the late ‘80s and early ‘90s, a product called Check Point FireWall-1. A very simple firewall technology that we used to create rules and build policies and rules around certain things to say that types of traffic through certain ports and IP addresses and networks to get to and from each other, the web traffic from one place to another, going from browser and client end to our server end. We solved this problem by actually taking the logic out of the firewalls themselves and actually moving it into the ASIC, the application-specific integrated circuit. It was controlling the ports in Ethernet switches. We found that the server computers, the computers we were actually using as servers to make decisions as firewalls, weren’t powerful enough to handle the volume of traffic going through them for every little bit of packet inspection. We solved the problem by moving the logic required to do packet inspection and internet detections into the network switches which were distributed and able to handle the volume of data going through the network level. We didn’t worry about it at centralized level with firewalls, we moved it out to the switches.
And so we had the manufacturers build the capability for us to push paths and rules and policies into the Ethernet switch so that at the actual Ethernet port level, and maybe a lot of folk in the pool aren’t familiar with this because we’re all living in a wireless world now, but once upon a time everything had to plug in via Ethernet. Now at the Ethernet port level we were doing inspection of packets to see whether the packets were even allowed to move into the switch and into the network. Some of this is what we’re solving now around this challenge of capturing data in the network, specifically from the IRT devices, and inspecting it and doing analysis on it and probably analytics on it in real time to make decisions on it. And some of it’s to gain insights in business intelligence and information of how humans make better decisions and other analytics and performance for the machine-to-machine level stuff where devices are talking to devices and making decisions.
And this is going to be a trend that we have to look at solving in the immediate future because if we don’t, we’re just going to end up with this deluge of noise. And we’ve seen in the big data world, we’ve seen things like data lakes turn into data swamps that we just end up with a deluge of noise that we haven’t figured out how to solve the processing analytics for in a centralized fashion. If we don’t solve this problem, in my view, with the IoT immediately and get a platform solution very quickly we’re going to end up in a very, very bad place.
And with that in mind I’m going to close with my point which is that I believe that one of the biggest changes taking place in the big data and analytics space now is being driven by the immediate need to react to the impact of the internet of things on high-volume and real-time analytics, in that we need to move the analytics out into the network and then eventually to the edge of the network just to cope with the sheer volume of it, just to process it. And then eventually, hopefully, we put the intelligence into the network and the edge of the network in a hub and spoke model that we can actually manage it and gain insights in real time and get value from it. And with that I’m going to pass over to our guest and see where this conversation takes us.
Shawn Rogers: Thank you very much. This is Shawn Rogers from Dell Statistica, and boy, just to begin with, I totally agree with all the major topics that have been touched here. And Rebecca, you started off with one around the idea of, you know, this data’s not new, and it’s remarkable to me how much time and energy is spent discussing the data, the data, the data of the IoT. And it’s certainly relevant, you know, Robin made a good point, even if you’re doing something really simple and you’re tapping into a thermostat once a second, you know, you do that 24 hours a day and you actually do have, you know, some interesting data challenges. But, you know, in the end – and I think a lot of people in the industry are talking about the data this way – that it’s not really all that interesting and, to Rebecca’s point, it’s been around a good, long time, but we haven’t in the past been able to make great use of it. And I think the advanced analytics industry and the BI industry in general are starting to really turn their heads towards IoT. And Dez, to your final point, this being part of or one of the challenging points of the big data landscape I think is very true. I think everybody’s very excited about what we can do with this type of data, but at the same time, if we can’t figure out how to apply insight, take action and, you know, get analytics where the data is, I think we’re going to have challenges that people don’t see really coming their way.
With that said, in the advanced analytics space we’re big fans of what we think can happen with IoT data, especially if we’re applying analytics to it. And there’s a lot of information this slide and I’ll let everybody just hunt and peck around, but if you look at different sectors like retail off to the far right, is seeing their opportunity arising around being able to be more innovative or having some cost savings or process optimization or improvements is very important and they’re seeing a lot of use cases for that. If you look, you know, left to right across the slide, you’ll see how each of these individual industries is claiming new capabilities and new differentiating opportunities for themselves when they’re applying analytics to IoT. And I think bottom line is, is if you are going to endeavor to go down that path you have to not only worry about the data, as we’ve been discussing, and the architecture, but you also have to look at how best to apply the analytics to it and where the analytics need to take place.
For a lot of us on today’s call, you know, Robin and I have known each other a very long time and had countless conversations about traditional architectures in the past, those around centralized databases or enterprise data warehouses and so on, and as we’ve found over the last decade or so we do a pretty good job of stretching the limitations of those infrastructures. And they’re not as steadfast or as strong as we’d like them to be today in order to support all of the great analytics that we’re applying to the information and of course the information’s breaking the architecture as well, you know, the speed of data, the volume of data and so on, are definitely stretching the limitations of some of our more traditional approaches and strategies to this type of work. And so I think it kind of starts to call for the need for companies to take a more agile and perhaps more flexible viewpoint of this and that’s the part, I guess, I’d like to talk about a little bit around the IoT side.
Before I do, I will take a moment just to let everybody on the call, give you a little bit of background on what Statistica is and what we do. As you can see on the title of this slide, Statistica is a predictive analytics, big data and visualization for IoT platform. The product itself is a little over 30 years old and we compete with the other leaders in the market who you’re probably familiar with along the lines of being able to apply predictive analytics, advanced analytics to data. We saw an opportunity to expand our reach of where we were putting our analytics and started working on some technologies a while back that have positioned us rather well to take advantage of what both Dez and Robin have talked about today, which is this new approach and where you’re going to put the analytics and how you’re going to meld it with the data. Along that side comes other things that you have to be able to address with the platform, and as I mentioned, Statistica’s been in the market a good long time. We’re very good at the data blending side of things and I think, you know, we haven’t talked too much about data access today, but being able to reach across these diverse networks and get your hands on the right data at the right time is becoming more and more interesting and important to the end users.
Lastly, I’ll comment one more piece here, because Dez made a good point about the networks themselves, having some level of control and security over analytic models throughout your environment and how they attach themselves to data’s becoming very important. When I got into this industry a few years back – nearly 20 I think at this point – when we talked about advanced analytics, it was in a very curated manner. Only a couple of people in the organization had their hands on it, they deployed it and they gave people the answer as required or provided insights as required. That’s really changing and what we see is a lot of people that were working with one or more diverse and more flexible way of reaching the data, applying security and governance to the data and then being able to collaborate on it. Those are some of the important things that Dell Statistica looks at.
But I want to dive into the topic that’s a little closer to today’s title which is, how should we be addressing the data that comes from the internet of things and what you might want to be looking for when you’re looking at different solutions. The slide I’ve got up in front of you right now is kind of the traditional view and both Dez and Robin kind of touched on this, you know, this idea of talking to a sensor, whether it be an automobile or a toaster or a wind turbine, or what have you, and then moving that data from the data source across to your network back to a centralized sort of configuration, as Dez was mentioning. And it networks quite well and a lot of companies get into the IoT space originally are starting to do it with that model.
The other thing that came along, if you look towards the bottom of the slide, is this idea of taking other traditional data sources, augmenting your IoT data and then at this sort of core, whether your core happens to be a data center or it might be in the cloud, it doesn’t really matter, you would take a product like Statistica and then apply analytics to it at that point and then provide those insights to the consumers off to the right. And I think that this is table stakes at this point. This is something that you have to be able to do and you have to have an open enough architecture for an advanced analytics platform and talk to all of these, sort of, diverse data sources, all of these sensors and all of these different destinations where you have the data. And I think that this is something that you have to be able to do and I think that you’ll find it to be true that a lot of leaders in the market are able to do these type of things. Here at Statistica we kind of talk about this as core analytics. Go get the data, bring the data back to the core, process it, add more data if necessary or if advantageous, and do your analytics and then share that information for action or for insight.
And so I think those are certainly from a function standpoint, we’d probably all agree that, you know, this is the bare necessity and everyone needs to be doing this. Where it starts to get kind of interesting is where you have massive amounts of data, you know, coming from diverse data sources, like IoT sensors, as I mentioned, whether it’s a car or security camera or a manufacturing process, there starts to become an advantage to being able to do the analytic where the data is actually being produced. And the advantage to most people, I think, when we start to move the analytic from the core out to the edge is this capability of diffusing some of the data challenges that are happening, and Dez and Robin will probably comment on this at the end today, but I think that you have to be able to monitor and take action on data out at the edge so that it’s not always necessary to move all of that data across to your network. Robin talked about this in his, sort of, the architecture pictures he drew up, where you have all these different sources but there’s usually some aggregation point. The aggregation point we see quite often is either at a sensor level, but even more often at a gateway level. And these gateways exist as sort of an intermediary in the data flow from the data sources before you get back to the core.
One of the opportunities that Dell Statistica took advantage of is that our capability to export a model from our centralized advanced analytics platform to be able to take a model and then execute that model out at the edge on a different platform, like a gateway or inside of a database, or what have you. And I think that the flexibility that that gives us is what’s really the interesting point of today’s conversation is, is do you have that in your infrastructure today? Are you capable of moving an analytic to where the data lives versus just always moving the data to where your analytics live? And that’s something that Statistica’s been focusing on for quite some time, and as you look closer at the slides you’ll see that there’s some other technology in there from our sister company, Dell Boomi. Dell Boomi’s a data integration and application integration platform in the cloud and we actually utilize Dell Boomi as a trafficking device to move our models from Dell Statistica, through Boomi and off to edge devices. And we think that this is an agile approach that companies are going to be demanding, as much as they like the version I showed you a minute ago, which is the sort of core idea of moving data from the sensors all the way back to the center, at the same time companies are going to want to be able to do it the way that I’m kind of outlining here. And the advantages for doing this is to some of the points that both Robin and Dez made, which is, can you make a decision and take action at the speed of your business? Can you move analytics from one place to another and be able to save yourself the time, money and energy and complexity of constantly moving that edge data back to the core.
Now I’m the first to say that some of the edge data will always be of high enough merit where it would make sense to store that data and keep it and bring it back across to your network, but what edge analytics will allow you to do is the ability to make decisions at the speed that the data is actually coming to, right? That you’re able to apply the insight and the action at a speed where the highest possible value is. And I think that that’s something that we’re all going to be looking for when it comes to utilizing advanced analytics and IoT data is this opportunity to move at the speed of the business or the speed that the customer demands. I think our position is, is that I think you need to be able to do both. And I think that pretty soon and very quickly as more companies are looking at more diverse data sets, especially those from the IoT side, they are going to start looking at the vendor space and demanding what Statistica’s capable of doing. Which is to deploy a model at the core, as we’ve done traditionally for many years, or to deploy it on platforms that are maybe perhaps nontraditional, like an IoT gateway, and to actually be able to score and apply analytics to the data at the edge as the data’s produced. And I think that that’s where the exciting part of this conversation comes in. Because by being able to apply an analytic at the edge at the time the data is coming off of a sensor, allows us to take action as fast as we need to, but also allows us to decide, does this data need to go all the way back to the core immediately? Can we batch it here and then send it back in pieces and parts and do further analyzation later? And that’s what we’re seeing a lot of our leading customers do.
The way that Dell Statistica does this is we have a capability of utilizing, so say for instance you build a neural network inside of Statistica and you wanted to put the neural network somewhere else in your data landscape. We have the capability of outputting those models and all the languages that you noticed in the right-hand corner there – Java, PPML, C and SQL and so on, we also include Python and we’re able to export our scripts as well – and as you move that off of our platform which is centralized, you can then deploy that model or that algorithm wherever you need it. And as I mentioned earlier, we use Dell Boomi to put it and park it where we need to run it and then we can bring the results back, or we can help bring data back, or score the data and take action utilizing our rules engine. All of those things become sort of important when we start looking at this type of data and we think again.
This is something that most of you on the phone are going to have a need to do because it will become very expensive and taxing on your network, as Dez mentioned, to move data from the left of these diagrams to the right of these diagrams over time. It doesn’t sound like a lot but we’ve seen manufacturing customers with ten thousand sensors in their factories. And if you have ten thousand sensors in your factory, even if you’re just doing these one a second sort of tests or signals, you’re talking about eighty four thousand rows of data from each of those individual sensors per day. And so the data definitely piles up and Robin sort of mentioned that. Upfront I mentioned a couple of the industries where we’re seeing people get some pretty interesting things done using our software and IoT data: building automation, energy, utilities is a really important space. We see a lot of work being done on system optimization, even customer service and of course overall operations and maintenance, within energy facilities and within building for automation. And these are some use cases that we see are pretty powerful.
We’ve been doing edge analytics before, I guess, the term was coined. As I mentioned, we’ve got deep roots at Statistica. The company was founded nearly 30 years ago so we’ve got customers going back quite some time that are integrating IoT data with their analytics and have been for a while. And Alliant Energy is one of our use cases or reference customers. And you can imagine the issue an energy company has with a physical plant. Scaling beyond the brick walls of a physical plant is difficult and so energy companies like Alliant are looking for ways to optimize their energy output, basically enhancing their manufacturing process and optimizing it to the highest level. And they use Statistica to manage the furnaces within their plants. And for all of us who go back to our early days in science class we all know that the furnaces make heat, the heat makes steam, the turbines spin, we get electricity. The problem for companies like Alliant is actually optimizing how things heat up and burn within those big cyclone furnaces. And optimizing the output to avoid the extra costs of pollution, carbon displacement, and so on. And so you have to be able to monitor the inside of one of these cyclone furnaces with all of these devices, sensors, and then take all of that sensor data and make changes to the energy process on an ongoing basis. And that’s exactly what Statistica’s been doing for Alliant since about 2007, before even the term IoT was super popular.
To Rebecca’s point early on, the data’s certainly not new. The ability to process it and use it correctly is really where the exciting things are happening. We’ve talked a little bit about health care in the pre-call today and we’re seeing all kinds of applications for folks to do things like better patient care, preventative maintenance, supply chain management and operational efficiencies in health care. And that’s quite ongoing and there’s a lot of different use cases. One that we’re very proud of here at Statistica is with our customer Shire Biopharmaceuticals. And Shire makes specialty drugs for really difficult-to-treat illnesses. And when they create a batch of their medicine for their customers, it’s an extremely expensive process and that extremely expensive process also takes time. When you think about a manufacturing process as you see the challenges are unifying all over the data, being flexible enough across different ways of putting data into the system, validating the information and then being able to be predictive about how we help that customer. And the processes that were pulling most of the information from our manufacturing systems and of course the devices and sensors that drive these manufacturing systems. And it’s a great use case for how companies are avoiding loss and optimizing their manufacturing processes using a combination of sensor data, IoT data and regular data from their processes.
So you know, a good example of where manufacturing, and especially high-tech manufacturing, are benefiting the health care industry around this type of work and data. I think I’ve got just a couple of other points I’d like to make before I wrap it up and give it back to Dez and Robin. But you know, I think that this idea of being able to push your analytic anywhere within your environment is something that’s going to become extremely important for most companies. Being tethered to the traditional format of ETL-ing data from sources back to central locations will always have a place in your strategy, but shouldn’t be your only strategy. You have to take a much more flexible approach to things today. In order to apply the security I mentioned, avoid the taxing of your network, to be able to manage and filter the data as it comes from the edge, and determine what data is worth keeping for the long term, what data is worth moving across to our network, or what data just needs to be analyzed at the time that it’s created, for us to make the best possible decisions. This everywhere and anywhere analytic approach is something that we take quite to heart at Statistica and it’s something that we’re very proficient at. And it goes back to one of those slides I mentioned earlier, the ability to export your models in a variety of languages, so that they can match up and align with the platforms where the data’s being created. And then of course having a distribution device for those models which is also something that we bring to the table and that we’re very excited about. I think that the conversation today is, if we really are going to get serious about this data that’s been in our systems a good long time and we’d like to find a competitive edge and an innovative angle to utilize it, you have to apply some technology to it that allows you to get away from some of those restrictive models that we’ve used in the past.
Again, my point being that if you’re going to do IoT, I think you have to be able to do it at the core, and bring the data in and match it up with other data and do your analytics. But also, equally as important or perhaps even more important is, you have to have this flexibility to put the analytic with the data and move the analytic out from the central side of your architecture out to the edge for the advantages that I’ve mentioned before. That’s a little bit about who we are and what we’re doing in the marketplace. And we’re very excited about IoT, we think it’s definitely coming of age and there’s great opportunities for everybody here to influence their analytics and critical processes with this type of data.
Rebecca Jozwiak: Shawn, thanks so much, that was a really fantastic presentation. And I know Dez is probably dying to ask you a few questions so Dez, I’ll let you go first.
Dez Blanchfield: I have a million questions but I’ll contain myself because I know Robin will have as well. One of the things I’m seeing far and wide is a question that comes up and I’m really keen to get some insight on your experience in this given that you’re right in the heart of things. Organizations are struggling with the challenge, and look some of them have just read the likes of Klaus Schwab’s “The Fourth Industrial Revolution” and then had a panic attack. And those that aren’t familiar with this book, it’s essentially an insight by a gentlemen, by Klaus Schwab, who I think is a professor, who is the founder and Executive Chairman of The World Economic Forum from memory, and the book’s essentially about this whole ubiquitous internet of things explosion and some of the impact on the world in general. Organizations I’m talking to are unsure whether they should go and retrofit current environment or invest everything in building all new environment, infrastructure and platforms. In Dell Statistica as well, are you seeing people retrofit current environments and deploy your platform out into existing infrastructure, or are you seeing them shift their focus to building all new infrastructure and prepare for this deluge?
Shawn Rogers: You know, we’ve had the opportunity to serve both types of customers and being in the market as long as we have, you get those opportunities to kind of go wide. We have customers that have created brand new fab plants in the last couple of years and outfitted them with sensor data, IoT, analytics from the edge, end to end throughout that process. But I would have to say that most of our customers are people that have been doing this type of work for a while but have been forced to ignore that data. You know, Rebecca made the point right up front – this isn’t new data, this type of information has been sort of available in a lot of different formats for a very long time, but where the problem had been is connecting to it, moving it, bringing it someplace where you could get something smart done with it.
And so I would say that most of our customers are looking at what they have today, and Dez, you made this point before, that this is part of that big data revolution and I think what it’s really about is, is it’s about the all data revolution, right? We don’t have to ignore certain system data or manufacturing data or building automation data anymore, we now have the right toys and tools to go get it and then to do smart things with it. And I think that there’s a lot of drivers in this space that are making that happen and some of them are technological. You know, the big data infrastructure solutions like Hadoop and others have made it a little bit less expensive and a little easier for some of us to think about creating a data lake of that type of information. And we’re now looking around the enterprise to go, “Hey, we’ve got analytics in our manufacturing process, but would they be enhanced if we could add some insight from these processes?” And that’s, I think, what most of our customers are doing. It’s not so much creating from the ground up, but augmenting and optimizing the analytics they already have with data that’s new to them.
Dez Blanchfield: Yeah, there’s some exciting things coming through at some of the biggest industries we’ve seen, and you mentioned, the power and utilities. Aviation is just going through this boom where one of my all-time favorite devices that I talk about regularly, the Boeing 787 Dreamliner, and certainly the Airbus equivalent, the A330 has gone down the same route. There was like six thousand sensors in the 787 when it was first released, and I think they’re now talking about fifteen thousand sensors in the new version of it. And the curious thing about talking to some of the folk who are in that world was that the idea of putting sensors in the wings and so forth, and the amazing thing about 787 in a design platform is that, you know, they reinvented everything in the airplane. Like the wings, for example, when the airplane takes off the wings flex up to twelve and a half meters. But in extremes the wings can flex at the tip up to 25 meters. This thing looks like a bird flapping. But what they didn’t have time to get fixed was the engineering of the analytics of all this data, so they have sensors that make LEDs flash green and red if something bad happens, but they don’t actually end up with deep insights in real-time. And they also didn’t solve the problem of how to move the volume of data around because in the domestic airspace in the U.S. on a daily basis there’s 87,400 flights. When every airplane catches up with its buyoffs of a 787 Dreamliner, that’s 43 petabytes a day of data, because these airplanes currently create about half a terabyte of data each. And when you multiply that 87,400 flights a day domestically in the U.S. by point five or half a terabyte, you end up with 43.5 petabytes of data. We physically can’t move that around. So by design we’re having to push the analytics out into the device.
But one of the things that is interesting when I look at this whole architecture – and I’m keen to see what you think about this – is we’ve moved towards the master data management, sort of, first principles of data management, pulling everything into a central location. We’ve got data lakes, and then we create little data ponds if you like, extracts of that that we do analytics on, but by distributing to the edge, one of the things that keeps coming up, particularly from database people and data managers or people in the business of managing information, is what happens when I’ve got lots of distributed little miniature data lakes? What kind of things have been applied to this thinking with regard to edge analytics in your solution, in that, traditionally, everything would come centrally with the data lake, now we end up with these little puddles of data everywhere, and even though we can perform analytics on them locally to get some local insight, what are some of the challenges that you’ve faced and how you’ve solved that, having that distributed data set, and particularly when you get the microcosms of data lakes and distributed areas?
Shawn Rogers: Well I think that’s one of the challenges, right? As we go away from, you know, trucking all the data back to the center location or the core analytic example that I gave and then we do the distributed version is that you do end up with all these little silos, right? Just as you depicted, right? They’re doing a little bit of work, some analytics are running, but how do you bring them back together? And I think that the key is going to be orchestration across all of that and I think that you guys will agree with me, but I’m happy if you don’t, that I think that we’ve been watching this evolution for quite some time.
Going back to the days of our friends Mr. Inmon and Mr. Kimball who helped everybody with the architecture of their early data warehouse investments, the point being that we’ve gone away from that centralized model for a long time. We have adopted this new idea of allowing the data to demonstrate its gravity for where it best should reside inside of your ecosystem and aligning the data with the best possible platform for the best possible outcome. And we’ve started to kind of spend, I think, a more orchestrated approach to our ecosystem as an overarching sort of way of doing things, as is where we’re trying to align all those pieces at once. What type of analytic or work am I going to do with the data, what type of data is it, that’ll help dictate where it should live. Where is it being produced and what type of gravity does the data have?
You know, we see a lot of these big data examples where people are talking about having 10- and 15-petabyte data lakes. Well if you have a data lake that is that big, it’s very impractical for you to move it and so you have to be able to bring analytics to it. But when you do that, to the core of your question, I think it raises a lot of new challenges for everybody to orchestrate the environment and to apply governance and security, and understand what needs to be done with that data to curate it and to get the highest value out of it. And to be honest with you – I’d love to hear your opinion here – I think we’re early days there and I think there’s a lot of good work yet to be done. I think programs like Statistica are focusing on giving more people access to data. We’re definitely focused on these new personas like citizen data scientist who want to drive predictive analytics to places within the organization that it might not have been before. And I think that those are some of the early days around this, but I think the maturity arc is going to have to demonstrate a high level or orchestration and alignment between these platforms, and an understanding of what’s on them and why. And that is an age-old problem for all of us data folks.
Dez Blanchfield: Indeed it is and I completely agree with you on that and I think the great thing we’re hearing here today is at least the front end of the problem of actually capturing the data at, I guess, gateway level at the edge of the network and the ability to do analytics at that point is essentially solved now. And it now frees us up to actually start thinking about the next challenge, which is distributed data lakes. Thank you very much for that, it was a fantastic presentation. I really appreciate the chance to chat with you about it.
I’m going to pass to Robin now because I know he has, and then Rebecca’s also got a long list of great questions from the audience after Robin. Robin?
Dr. Robin Bloor: Okay. Shawn, I’d like you to say a little bit more and I’m not trying to give you the chance to advertise it, but it is actually very important. I’m interested in knowing at what point in time Statistica actually generated the model export capability. But I also, I would like you to say something about Boomi because all that you’ve kind of said so far about Boomi is that it’s ETL, and it is indeed ETL. But it actually is quite capable ETL and for the kind of timings we’re talking about, and some of the situations we’re discussing here, that’s a very important thing. Could you speak to those two things for me?
Shawn Rogers: Sure, yeah, I absolutely can. You know, our movement in this direction was certainly iterative and it was sort of a step-by-step process. We’re just getting ready this coming week to launch Version 13.2 of Statistica. And it has the newest updates of all of the capabilities that we’re talking about today. But going back to Version 13, a year ago October, we announced our capability to export models from our platform, and we called it NDAA at the time. The acronym stood for Native Distributed Analytics Architecture. What we did is we put a lot of time, energy and focus in on opening up our platform with the opportunity to use it as a central command center for your advanced analytics, but also to deploy from there. And the first places, Robin, that we deployed we did a really, really great addition to the platform around machine learning. And so we had the ability to deploy from Statistica to Microsoft’s Azure Cloud to use the power of Azure to power machine learning, as you know, is very intensive and it’s a great way to utilize cloud technologies. And so that was the first bit.
Now here we were exporting our models to Azure and using Azure to run them and then sending the data, or the results, back to the Statistica platform. And then we moved on to other languages that we wanted to be able to export from, and of course one of them being Java opens up the door for us to now start to export our models outward to other locations like Hadoop, so then it gave us a play there as well.
And lastly we focused on being able to output our models with that release into databases. And so that was the first iteration and to be honest with you, the end game was IoT but we weren’t quite there yet with Version 13 last October. Since then we have gotten there and that has to do with the ability to do all of the things I just mentioned, but then to have some sort of transportation device. And going back to Dez’s question of, you know, what’s the challenge and how do we do this when we have all of these analytics running around? Well we use Boomi as kind of a distribution hub and so because it’s in the cloud and because it is so powerful, as I mentioned before, it’s a data integration platform, but it’s also an application integration platform, and it uses JVMs to allow us to park and do work anywhere that you can land a Java virtual machine. That’s what really swung the door open for all of these gateways and edge computing platforms and edge servers, because all of them have the compute and the platform that’s available to run a JVM in. And because we can run the JVM anywhere, Boomi has turned out to be a wonderful distribution and, using my word from earlier, an orchestration device.
And this is getting really important because we’ve all, you know, I think the airplane scenario a minute ago was a great one, and I mentioned, you know, manufacturers like Shire who have ten thousand sensors in one of their factories, you have to start addressing the sort of central approach to advanced analytics at some point. Being ad hoc about it doesn’t really work anymore. It used to when the volume of models and algorithms that we were running was minimal, but now it’s at maximum. There’s thousands of them in an organization. So we have, part of our platform is server based and when you have our enterprise software you also have the ability to tweak and score and manage your models across the environment. And that’s also part of that orchestration thing. We needed to have a layer, Robin, in place that not only allowed you to get a model there in the first place, but also gave you a conduit to tweaking the models and replacing them on an ongoing basis as often as you needed, because this isn’t something you can do manually. You can’t walk around a refinery with a thumb drive trying to upload models to gateways. You have to have a transportation and management system in between it, and so the combination of Statistica and Boomi gives that to our customers.
Dr. Robin Bloor: Yeah. Well I’ll be very brief but, you know, this statement that was made before about the data lake and the idea of accumulating petabytes in any given place, and the fact that it has gravity. You know, when you started talking about orchestration it just started to get me thinking about the very simple fact that, you know, putting a data lake that’s very large in one place probably means you actually have to back it up and it probably means that you have to move a lot of the data around anyway. You know, the real data architecture is much more, in my opinion anyway, much more in the direction that you’re talking about. Which is to distribute it to sensible places, is probably the thing that I would say. And it looks like you’ve got a very good capability to do this. I mean, I’m well briefed on Boomi so it’s kind of, in one way or another, almost unfair that I can see it and maybe the audience can’t. But Boomi is so essential, in my view, in terms of what you’re doing because it has application capabilities. And also because the truth of the matter is that you don’t do these analytic calculations without wanting to action something somewhere for some reason or another. And Boomi plays a part in that, right?
Shawn Rogers: Yeah, absolutely. And so as you know from previous conversations, Statistica has a full-blown business rules engine in it. And I think that that’s really important when we get down to why we do this. You know, I joked up front that there’s really no reason to do IoT at all unless you’re going to analyze, utilize the data to make better decisions or take actions. And so what we focused on wasn’t just being able to put the model out there but being able to tag along with it, a rule set. And because Boomi is so robust in its capabilities to move things from one place to another, within a Boomi atom we can also embed the capability to trigger, to alert and to take action.
And so that’s where we start to get that sort of sophisticated view of IoT data where we say, “Okay, this data is worth listening to.” But really, you know, knowing that “the light is on, the light is on, the light is on, the light is on” isn’t as interesting to as to when the light goes out or when the smoke detector goes off or when whatever happens to our manufacturing process goes out of spec. When that occurs we want to be able to take immediate action. And the data becomes almost secondary here at this point. Because it’s not so important that we saved all of those, “it’s okay, it’s okay, it’s okay” signals, what’s important is that we notice the “Hey, it’s bad” and we took immediate action. Whether it’s sending off an email to someone or we can get domain expertise involved, or whether or not we set off a series of other processes to take immediate action, whether that be corrective or in response to the information. And I think that that’s why you have to have this orchestrated view of it. You can’t just focus on dealing your algorithms all over the place. You have to be able to coordinate and orchestrate them. You need to be able to see how they’re performing. And really, most importantly, I mean, why the heck would you do this if you can’t add the opportunity to take some immediate action against the data?
Dr. Robin Bloor: Okay, Rebecca, I believe you’ve got questions from the audience?
Rebecca Jozwiak: I do. I have a ton of audience questions. Shawn, I know you didn’t want to hang on too long past the top of the hour. What do you think?
Shawn Rogers: I’m happy. Go ahead. I can answer a few.
Rebecca Jozwiak: Let’s see. I know one of the things you mentioned was that the IoT is in early days and it does have a degree of maturity that is going to have to take place and it kind of speaks to this question one attendee asked. If the IPv6 framework is going to be robust enough to accommodate the growth of IoT in the next five or ten years?
Shawn Rogers: Oh, I’m going to let Dez echo off of my answer because I think he’s closer to this type of information that I am. But I’ve always been of the thought that we’re on a very fast track to bend and break most of the frameworks that we have in place. And while I think the addition of that new sort of spec or the direction that we’re going with IPv6 frameworks is important, and it opens up the door for us to have a lot more devices, and to be able to give everything that we want to give an address. I think that everything I’m reading and seeing with my customers, and the number of addresses that are required, I think at some point is going to cause another shift in that landscape. But I’m not really a networking expert so I can’t say a hundred percent that we’re going to break it at some point. But my experience tells me that we’re going to disrupt that model at some point.
Rebecca Jozwiak: I wouldn’t be surprised. I think frameworks are kind of breaking under the weight of all kinds of things. And that’s just logical, right? I mean, you can’t send an email with a typewriter. Another attendee is asking, “Can you use a Hadoop framework?” but I guess I might change that to say, how would you use a Hadoop framework for distributed analytics?
Shawn Rogers: Well, Robin did me the favor of asking me a historical question and so since Version 13 about a year ago for Statistica, we’ve had the ability to drive models out of our system and into Hadoop. And we work very closely with all the big flavors of Hadoop. We’ve got really great success stories around the ability to work with Cloudera as one of the main Hadoop distributions that we work with. But because we can output in Java, it gives us this ability to be open and place our analytics anywhere. Placing them into a Hadoop cluster is something that we do on a normal and regular and everyday basis for many of our customers. The short answer is yes, absolutely.
Rebecca Jozwiak: Excellent. And I’m just going to throw one more out at you and let you get on with your vacation. Another attendee is asking, with IoT analytics plus machine learning, do you think all data needs to be stored for historical purposes and how will that impact the solution architecture?
Shawn Rogers: Well, I don’t think that all data has to be stored. But I do think it’s very interesting to have the ability to entertain, listen to any data source that we want within our organization, wherever it comes from. And I do think that the changes that we’ve seen in the marketplace over the last few years have enabled us to take that all-data approach to things, and it seems to be really kind of paying off. But it’s going to be different for every company and every use case. You know, when we’re looking at health data, now there’s a lot of regulatory issues, a lot of compliance issues to be concerned about, and that makes us save data that other companies might not understand why it needs to be saved, right? In the manufacturing processes, for a lot of our manufacturing customers, there’s a real upside to be able to historically examine your processes and be able to look back on large amounts of this data to learn from it and to build better models from it.
I do think that a lot of the data will need to be kept and I do think we’ve got solutions that make that more economical and scalable today. But at the same time I think every company will find value in data that they don’t have to keep in an atomic level, but they will want to analyze in a real-time sort of way and make decisions on it to drive innovation within their company.
Rebecca Jozwiak: Okay good. No, audience, I did not get to everybody’s questions today, but I will forward them along to Shawn so he can reach you directly and answer those questions. But thank you everybody for attending. Thanks so much to Shawn Rogers from Dell Statistica and from all our analysts, Dez Blanchfield and Dr. Robin Bloor. You can find the archive here at insideanalysis.com, SlideShare, we’ve been starting to put our stuff back up there again, and we are revamping our YouTube so look for that there too as well. Thanks so much folks. And with that I’m going to bid you farewell and we’ll see you next time.