Rebecca Jozwiak: Ladies and gentlemen, hello, and welcome to Hot Technologies of 2016. Today's topic, "Application Running Slowly? Time to Get Precise." And don't we all know too well the problems that can happen when stuff is running slowly? This is Rebecca Jozwiak, I am filling in for Eric who is kind of doing a new role here, today. Yes, this year is hot and, you know, when it comes to technology, like I said, the one thing you really don't want is a slow running anything, any part of your system. And just to kind of use a consumer example, I mean if you have a restaurant, it doesn't matter how great the food is, if the service is slow, you're probably not going to end up going back. Now, it's easy, kind of, in a restaurant to figure out why something's running slowly. Maybe the kitchen is short staffed or there was a malfunction with some equipment, or maybe the wait staff is a little lazy, and it's kind of easy to identify and get that fixed.
But when you think about a data center, it's a completely different story. It could be a network issue, a bad query that's jamming things up, application performance, or a faulty cable can even cause some problems. And troubleshooting with that type of complexity can be, you know, difficult at best. That's kind of what we're going to be talking about today. And we've got, as I said, Eric Kavanagh chiming in as analyst today. We've got Dez Blanchfield our data scientist, and we have Bill Ellis from IDERA, who's going to talk about his company's solution that helps with application performance management. And with that, I'm going to pass the ball over to Eric. Eric, the floor is yours.
Eric Kavanagh: Alrighty, sounds good, folks. And that was a great analogy, actually, because you spoke to the difficulties or ease with which troubleshooting can be accomplished and you get right down to it. Performance issues always result from some kind of problem that's in the network. I mean, it could be as simple as old hardware for example, but the bottom line is any situation like that calls for troubleshooting. That's what I'm going to talk about today. And let's go ahead and jump on the slides here.
Here comes trouble. Troubleshooting – it's fun for people who like it, that's the cool thing. If you find someone who likes to do troubleshooting, hold on to that person, get them some tools to get the job done, because really good stuff if you can find someone who can get to the bottom of something and gets stuff done. But the bottom line is that troubleshooting is problematic and it always has been and it always will be, and if you start talking about troubleshooting, what you’re really getting at is the root cause analysis. What is causing the problem?
Well, if you just sit back and think for a second about even the mainframe days, there were all kinds of issues that could occur. And back then you had to have people who really knew their stuff because there weren’t even good tools to do troubleshooting, so you really had to know your command prompt, and we'll talk about that in a second. And I actually forgot to put in one of my favorite slides, I'll look for it while we're on the show today, maybe during Dez's presentation. But I wanted to show, for anyone who hasn't seen it, one of the funniest British TV shows ever, it's called “The IT Crowd.” And in terms of troubleshooting, the Irish man, who is one of two IT people in the whole company, always says the same thing whenever any call begins, “Have you tried turning it off and on again?” So, do try turning it off and on again. You'd be amazed how often that simple thing can solve some problems.
Those of you who’ve done troubleshooting at home maybe with your parents or friends, probably not with your kids because they tend to know what to do, turn it off and on again. But regardless, troubleshooting is not easy, it's not ever going to be easy, but we're going to talk today about some of the things you can do to make it easier. So, the command prompt – yes, indeed, I’m old enough to remember the early days of computing when all you had was the command prompt to do DIR, Enter. That's what that would do see, directory of files and feel positive that it actually got some command done, right? Dez, of course, our data scientist, he knows how to use the command prompt. And if you can use the command prompt, that is great stuff because most of us mere mortals use some kind of a GUI, a graphic user interface, but there's always something, there's always some disconnect between the GUI and the command line underneath. And just to give you a random example, if you want to know how much code some of the basic programs out there bake into documents these days, go into the latest version of Microsoft Word, type “hello world” and then do “save as HTML.” And then open up that resulting document in a text editor, and you'll probably see pages and pages of tags. That's called code bloat, and code bloat is not really good for troubleshooting, just to be blunt.
Of course, client-server came along and that was great stuff. And in a way we're kind of going back in that direction, but just think about the complexity that came with the situation, now where is the problem, is it on the client, is it on the server, is it the network? Where is it? These sites which just think about viruses, and when a virus can get into one on a network, what can happen? It can go anywhere. Data breaches are crazy these days. They cause performance problems. We've had Russian hackers we can identify by the IP address. We're pretty sure they're Russian, or they're very close, or they're very clever Ukrainians or Polish or even Americans, using proxies. But we've had hackers come into our little old site, Inside Analysis, over the years and cause all kinds of issues. Stuff just stops working, you can't get stuff done. Stuff that used to work doesn't work. How do you know? How do you know what it is? Just as another example here, is a very complex environment, it's very difficult to get into the weeds and really understand how things are taking place and work for us, especially if you get a whole bunch of plug-ins. Stuff can go crazy pretty quickly. I’m kind of getting ahead of myself.
I threw in here, always be wary of the upgrade. Upgrades always scare the daylights out of me. Certainly operating systems. I remember the days when Microsoft would actually suggest that, yes, you could upgrade your operating system from this version to that version. Well, I tried a few times, and that never, ever worked. Just remember, the larger, the more complex an environment is, the more unwieldy the situation is going to become. And then there's virtualization. Think about what VMware did to IT. It revolutionized IT, but it also created this layer of abstractions. If you've got a layer abstraction at that foundational level, that's a whole new ball game, that's a whole new ball of wax and you really have to reassess what you're doing, and all the old tools had to change. And now of course it's the cloud, right? For the customer, the cloud is great, because it's very simple, the user interface is pretty straightforward, but of course you don’t really have a lot of control over the cloud. But for the folks who are behind the scenes, there's a whole lot of stuff that they need to know and understand these days. The environment has become much, much more complex. And certainly with e-commerce, and you think of all the money that trades hands these days. That's why you will not find me in favor of a cashless society anytime soon. The bottom line here is that the situation is getting more problematic by the day.
And keeping performance optimal is always going to involve some element of troubleshooting. I don't care what anyone tells you, there’s no perfect tool, there's not a silver bullet and there never will be because – in another interesting perspective here – we're still learning to speak silicon. We're still learning to understand how even networking works at the nitty gritty level. If you look at systems management software, it's getting pretty good these days. But still, you're looking at lines going up and down and you're looking at representations of reality, it's going to take a person who knows what's going on to fit together the clues that you could stare at optimal tools to be able to understand what's working and what isn't and it's a lot of trial and error, just to be blunt. With that, I'm going to hand it over to Dez Blanchfield and then we'll hear from Bill Ellis of IDERA, who's going to put us to shame with his knowledge. With that, Dez, take it away.
Dez Blanchfield: Hey, thanks Eric. Thank you. Led nicely into my little segue. My title, “Performance Art,” I think is extremely apt in the context of what we're chatting about today, because in many ways when we think about performance art, we think about dancing and music and other creative things. And frankly more often than not, if we're solving problems and in very large scale IT environments and business systems there is indeed an element of art and often black art, because the situation in my experience in some 25-plus years is that the modern app stacks, are very rapidly increasing complexity at a rate that we have never seen before. And we're frankly struggling to keep up and there are organizations such as Uber for example, and whatever, and the Pokémon Go development team, I mean they're experiencing growth and complexity and increase of complexity at rates that are just astronomical. There aren't even books written about it because we hadn’t conceived that level of growth. My view is that the core definition of an application stack has morphed exponentially and I'm going to explain why I think that's the case, and then lead into the challenge at hand, that my good friends at IDERA appear to have a solution to solve.
Very briefly, we all know these but just to recap them, you know, in the early days we had what I call, application architecture, version 1.0. It was a server computer, in this case the mainframe with a bunch of terminals attached, it was relatively easy to diagnose issues if you weren't seeing things on the terminal – you could track down the cable between the terminal and then the server computer, and it was either zero cable or a connector or some issue if it was not related to the terminal, and you're seeing things on the screen, it was pretty easy to work out that the stuff that was causing the issues was in the machine itself. And you could slowly diagnose where in the stack that was from the hardware all the way up to the software layer and the user interface. In what I call version 1.1, we made it a little bit more complex. We put devices in the middle so we could put more terminals in place. And they were some sort of communications device and often they were muxes or multiplexers and they would either run over dedicated line or a dial-up line and so you had a mainframe at a distant location – it could be interstate or internationally – and some device connected over an SMA link or some sort of WAN connectivity and those terminals still operate in the same way. But you had a little bit more complexity because you had to figure out whether the issue was between the terminals and the comms device or the comms device and mainframe. But the stack stayed relatively similar in the mainframe.
Version 1.2, a little bit more complex again because now we added more devices, we added printers and other things, and we clustered these things, and I think about a front-end processor that would handle all the issues of the devices locally, printers and terminals and so forth with the mainframe that distant end. A little bit more complexity. But again, the consistent theme of the mainframe were the apps running locally, so the problem-solving stayed fairly similar inside the application stack. And then we had people with skills ran sorting out issues with terminals and printers and cluster controllers. But then we complicated things and we built networks and all of the sudden the same sort of architecture introduces a network layer. All of the sudden we had a network switch, and workstations were a lot more complex. And this version of architecture we often had graphically user interface apps at the workstation. Not only did we have a server running the app stack, but we also had another stack of applications running locally, and of course the same basic model of devices connecting to a server. Then we took a quantum leap to the more recent model of what I call 2.1, which is where we took that app stack and we made it a lot more complex, a lot harder to diagnose. And we introduced a lot more devices at the front-end, on web browsers and PCs and mobile devices, so forth. And here the application stack then started to dive a little bit deeper into integration as the operating system and hypervisor one.
This image here on the right-hand side we've got the whole stack including network infrastructure, storage servers, virtual machines, the operating system and then the traditional three tiers of database metalware applications, etc., in the front right-hand. Diagnosing application issues and performance issues on this model just became a lot harder. There are so many more moving parts and trying to drill down through that stack was just, you know, became a nightmare and you had to involve additional skills sets and organization to deal with that. It wasn't just your application team anymore, all of the sudden now you had infrastructure people, you had database specialists, purely just working on databases and nothing else – as opposed to a systems programmer who knew their way around databases. Now we’ve got a scenario where IT departments have to deal with a significantly broader complexity of “as a service” and this where the world just exploded and our problem-solving challenges became, went from being a nightmare to just something that’s almost intolerable in some ways.
And this came about as resolvable scale, we're trying to deliver services at. Version 3 of what I consider the application stack – it has introduced this as a service model, where traditional model on the left-hand side, the enterprise IT stack, where everything had to be managed at our end as the consumer and the supplier of services – from application security database, operating systems, virtualization service storage, networking data centers – we had to manage it all, but we had access to it all and so we could scale out our capability and technical skill sets and we could drill right down through that stack and we could find things. But as the infrastructure service and platform service and the software service model came along, all of the sudden our access to the back-end infrastructure, our access to the platforms and the tool we delivered services from, were kind of taken away from us. As we started to consume infrastructure service, we only really had the top four pieces from the operating system, the database, the security environmental application stack and above, available to us. Everything under that was black magic. And it gets even more interesting when you move to platform service because also you're just managing the application stack.
When you get to software as a service, and a traditional model of that is webmail or internet banking, all you have is access to a web browser, so trying to diagnose what's behind that is intolerable, definitely. And I've broken this up into time zones, into slots of time or areas of time if you like or generations, in that from left to right, we've gone from sort of pre-2000s and the traditional stack where we had access to the entire environment and we could drill down through that. But over time it became more and more complex. To early 2000s to mid-2000, to late-2000 to current day, where we've gone from infrastructure service, platform service, software service, to now we're essentially referring to a business service. And the complexity has increased dramatically. There are so many more moving parts. But the availability of skills gets harder and harder and more and more difficult to avail ourselves of. Finding people with the right skill sets with the right access to the right tools to get and dive into this stack and find out, where is something running slow. Is it my laptop or my desktop, is it my phone or my tablet, is it my connectivity over 3 or 4G, or my dedicated link with ADSL, or ISDN what whatever it might be? Or even dial-up, although that is less and less the case these days. Is the web server end, is it something inside the web server? Is it the app server? Is it something around the memory and disk of CPU and network performance inside the application server? Is the database running in there?
And you can imagine, you draw this picture very quickly of the complexity that starts to expand kind of like a big bang image, of this ever-increasing bubble that we're trying to get our arms around and have the skills to dive into and the knowledge and wherewithal to dissect and pull apart. And we're very much now at the era where, you know, the human beings can't cope with the physical scale, even if you've got the capability to pull the database environment apart and pull that database apart and dive into the detail within that database. The number of databases you have to manage now are growing rapidly. Everything is now powered by a database. Very few applications these days are not powered by a database. And the types of databases are growing rapidly as well. It's not just the traditional SQL databases anymore, sometimes its SQL, sometimes its non-SQL, sometimes it’s a graph database, sometimes it’s a document database. And there are all these different types of functions that these different types of databases have and as a result each of them have different performance challenges and different performance criteria. Logging databases and document databases perform very, very differently and perform a different function to a traditional ACID-compliant, ANSI 92-compliant SQL database. And the types of things that we stored in there.
We're at a point, in my mind, where – and I think Eric alluded to this – that human beings are struggling to keep up with the complexity of what we're building and the speed at which we're building, and we're now at the point where the only way for us to manage this infrastructure, and the only way to monitor and to delve into the issues we're facing, are with tools and the right types of tools. And then invariably, the right generation of tools. Tools that actually understand the back-end infrastructure. It isn't OK anymore just to throw an SQL monitor, or an SQL query tool at something and start to pull apart a query and see what makes it work. We actually need a tool that understands the formation of queries and the appropriate way to form queries, and the appropriate ways for queries to talk to the infrastructure at the back-end, and how they're performing as they do that. And to look at the timing of those interactions and the order in which they take place.
And that's a much more complex challenge and that leads me to my roundup question point, and that is, that as the complexity of the app stacks we're developing increase, the performance tools and the tools that we use to manage those, necessarily need to become increasingly smarter and much more capable of looking at more things. But also a lot smarter at how they delve into at what's running at the back-end and what they can discover about it and potentially even some sort of analytics being performed over that to understand that the interactions and the performance, are being delivered, and why it's performing slower or faster.
And then with that I'm going to pass to our dear friend from IDERA, Bill Ellis, and see what he has to say today about how they solve this issue. Bill, over to you.
Bill Ellis: Alright. My name is Bill Ellis and thank you very much. We're going to talk about my application is running slowly, time to get Precise. Let's see what Precise, an IDERA product, can do and how it can help you. A lot of times you only find out that there's been a performance problem because an end-user has called you, and that's really a big problem in itself. Out of everybody in IT, nobody knew until the phone rang. Now, the next big problem is how do we help this particular individual, and it's really not a trivial problem. There's one takeaway from this. That's above and beyond this slide, it's above and beyond the others. And I want you to see if you can get it what it is. But, as we had mentioned, an application requires, relies on a lot of different technologies, the application stack is tall and growing. And many people access an application via a browser, and surprisingly there's more and more processing that's happening in the browser with scripting, etc., and then of course you have the network, the web server, the business logic code and the database. What I want you to consider is that every significant business transaction interacts with the database, whether it's time card reporting, inventory lookup, a purchase order, the database is being updated. And so, the database becomes really the foundation of performance. And the database of course can turn on, or relies on the downstream on storage. Each of these technologies is tightly coupled and able to see what's happening. You have to know what's going on to be able to measure is critical.
Now, one thing that we find is that many of our customers have a tool, and they have a tool for each technology, but what they don't have is context. And context is basically the ability to connect the dots between every tier in the application stack, and this is actually relatively simple. We used to have a limitation of twelve tiers, but basically changed it, we have unlimited tiers and we support mixed environments so we can basically get extremely complicated with a Precise solution.
Now, at a high level, this is how we solve the problem and it's focusing on the transaction, the end-user transaction from click to disk, tells us which ones are running slow, which ones are consuming resources, but the key is this – we allow you to pick up and user ID their location and not only the entire transaction time, but how much time is spent at each individual step. Time is the currency of performance, and it also shows up where resources are being consumed. We don't know up front where the problem is going to be, so we need to have the adequate metrics and the analytics at each of the tiers to be able to diagnose what the problem, where the problem might be.
Now, in today's presentation I’m going to focus in this area, I want you to rest assured that we basically provide the same level of visibility at every tier in the application stack and the crucial thing, is this going to tell us who, what, where and then this part, this is going to tell us why. And it's really the why that is absolutely critical to solving problems, not just knowing about them. Now the other thing that came out very clearly in the presentation was that it's impossible to do this. You need automation. And automation means that you have alerting, you have something that tells you, hopefully before the end-user community, that you have ongoing trend, built up deviation from trend alerting. And then we also offer a line in the sand, you're actually breach the SLA. Now you offer a lot of different information – not everybody needs to consume the buffet, some people just want to have a light snack, this is salad, and so with that we offer a portal we can upload information, it just needs a particular user or a particular community's information needs about performance. The application is running slowly, it's time to get Precise. We're really going to focus on four things. One is the location, inputting the end user. Once again, that context that connects the dots, and the third part of research shows that nearly 90 percent applications issues are in the database and so it's really kind of a travesty that the majority of performance solutions might tell you one SQL statement. But they don't tell you why that SQL statement is running slowly.
And so, the why is always the crucial thing and Precise is excellent at showing why, for every tier and in particular the database, and just to share a little bit about our support matrix with you, which we support SQL Server, Sybase, DB2 and/or Bulk. The look and feel of the solution is very similar, so if you're looking at multiple applications, but slightly different architectures. The information I'm sharing here has the look and feel, the approach, it's the same no matter what the underlying technologies in use happen to be. Precise is web enabled. We come in, we authenticate Precise, and with that we go in and the first thing that we might want to look at is performance by location. And so you can actually see here the different locations where people are actually accessing their executions. You can see if somebody abandoned a page before it fully rendered, or if there’s errors.
Now, one thing with these applications, is the network or the distance from the application server does make different. It's very easy to kind of see here that there is some level of network. I can see when people became busy, and then another interesting thing, we talked about how there's processing within the browser, they actually notice that some of the different browser types provide a better environment for fast processing. And so knowing if people are accessing by Chrome or IE, or whatever it happens to be, you actually can find very often that one browser type inversion is actually superior to another. Now, sometimes you have publicly facing, you don't control the browser, sometimes the applications are internal facing where you can recommend people a browser type to your end-user community, and so these are the types of deep dive visibility and analytics that Precise is able to provide. Now, we get into looking at an application.
I'm not sure if you guys can see my pointer, but I wanted to describe to you, the top graph. The y-axis shows average response time. The x-axis is time across a day. And there's actually a stacked bar graph and that stacked bar graph, the total shows you what the performance is and then it shows a tiering of how much time is spent in each individual step or each individual tier of the application. From the client, through the web server, the green is the Java, this place we're using Tuxedo and down into the database. Now the lower half of the screen shows the different web menus that are being accessed and we have then assorted with just a little green arrow pointing down. It's in descending order, and it bubbles up to the top, the web menu starts to show it. We actually show the execution time, the response time of each individual technology and then there's actually a bar graph for each of those web menus and so we get, start getting an idea of what's going on. Now remember we sorted this all with an end user would call, but how do I find the end user? I come in here, I open up a menu, that allows me to filter on a particular user, so I set that user to Alex Net, click OK, and then we're focused on just the activity from Alex Net. Now what this does, is it allows IT and IT management to be directly responsive to an end user and in particular that they were looking at content management that had six executions with a response time of a little over three seconds. Well three seconds is pretty good, it's not terrible, but it, maybe it's slower.
What I can do with this, is I can slice and dice this information on to different ways. I could say, well, is this transaction slow for everybody? Is it slower today for Alex than it had been yesterday? Is it slow for every user within a particular location? Or, and what that does is it allows me to kind of slice and dice and get an idea of what’s happening, how universal the problem is and it's very important to be able to identify the end user, because it's not just about the software, the infrastructure, it's also about how the end users are exercising the application. Oftentimes you might have a new employee or somebody with a new job function, and they're not familiar with certain SAP screens or certain PeopleSoft panels and they need a little pointer, maybe they're leaving fields blank or putting in wildcards and they're forcing large results to be returned from the database. But having the user ID, you can actually call them before they call you. The other thing that we find is that once the user community is aware that IT knows what they're doing, it's a lot of times they become better behaved and a lot of problems, a lot of things that had been issues, just kind of evaporate, because people behaving, just operate a little more carefully. They use the system with greater care.
The end user identification is essential. In the end it's essential for IT to be able to help a particular end user. Now, what we're done here, is we've gone to the “Flow” tab. You can see that in the top left-hand corner. And we've focused in on one particular component of the web menu. And on the right-hand side is an analysis of that particular transaction, and so at the top it's actually the browser and then the View, just to get familiar with a little bit of the icons within the GUI is for the web server, so we can see the attribute point. And then the “J” is for Java and the “T” is for Tuxedo and naturally the “Q” is SQL. Well that cash value is basically identifies a particular SQL statement. Consider what this does. We've identified a user to a transaction, to the underlying application code, including the individual SQL statements. Now, when I look at those individual SQL statements, I can see that of the total response time, each of them is responsible for about six percent, and when they add up the top four SQL statements, they took about a quarter of the transaction time.
Now often, the database is the easiest to manipulate. It's usually easiest to get an inexpensive, much superior performance. Now I need to go a little bit deeper to find out what's going on and what, I want the example is able to do is actually reveal the individual SQL statement, and you know I can almost guarantee you just by every single shot on the line had some sort of database tool and what the database tool does but just looking at one technology in isolation, is that you look at the, focus on the health of that technology. And a lot of times people look at a top ten list. Now this SQL statement is pretty fast, it's not going to be on the top ten list, but it is the SQL statement that this transaction relies upon. And so what I can do back at that word, context, is now I can bring this to the deep gaze attention but in the context of the individual SQL statement.
Now the that person can open up Precise in context of the individual SQL statement, and Precise captures the actual execution plan that it uses, the execution time this is important stuff to the DBA, will actually show, you can see that 50 percent of the time is spent waiting on storage. Fifty percent of the time is used in the CPU, so you start to get ideas of where the time is being spent, how I might wiggle that time down, and idea is to give people options, because different responses have different costs and risk associated. Ideally we're after the low-risk, low-cost solution to a problem. Now that SQL statement is tracked by a hash value and there's, in the left-hand kind of middle of the screen there's this little “Tune” button, and what that's going to do, is it's going take you to a SQL task. And this SQL task is kind of a pre-built workbench and what this does, is it allows me to really analyze specifically what's impacting the SQL statement starting out with the execution plan. The execution plan is chosen by the optimizer when the statement is parsed, it – back to the food analogy, it's the recipe that's followed to resolve the SQL statement.
And some recipes are more complicated than others, and so we provide findings. And it will actually show here, hey, a lot of time it's doing sequential I/O on a particular index. And see now, when, going back to oxygen, follow this index. Has that index been defragmented recently, what's the health of if? What table space does it live in? Is the table space segregated form the table it references? And so it starts giving you all sorts of ideas on how you might go about solving the problem. Now obviously, you know, we're building in an index. It's a lot lower risk, a lot easier than maybe moving an index from one table space to another table space, so what we're wanting to do is kind of build up options, so that we can deploy the lowest cost, lowest risk option to solve the problem.
Precise can also do things like capture bind variables that are cast to a SQL statement. Obviously the variables that are cast are going to control the size of the results set. And it will control how long that SQL statement takes to execute and how much data has to be passed and processed by the application through the Java, through .NET, into the web server cast plus the network, finally rendered in the end-user browser. What happens in the database directly impacts that browser time. And so it will be crucial to have this level of visibility so we can know exactly what’s going on and give the DBA the most options so that they can choose which one makes the most sense, given a particular situation.
Now, these are some of the quotes and these happen to be from a PeopleSoft shop that has global deployment. Precise supports PeopleSoft and SAP, Siebel, Oracle, E-Business Suite, homegrown Java and .NET applications. We support [inaudible] so if you make web service calls to multiple JVMs, from Java to .NET back to Java, we can track all of that. It could be on-prem, it could be in the cloud. The crucial thing is that things need to be instrumented.
And so, just a couple of quotes from one of our customers.“Before Precise, our DBAs were using OEM,” – that’s a database-only tool, and they basically said, “Hey, the instances look great.” But they could help tell or address a problem with a particular transaction. Precise provided the visibility to do that. And so having that information about the SQL statements was critical about giving the DBAs the visibility to fully squeeze performance out of the database. And so that was really nice. Kind of above and beyond some of the tools that you might be looking at.
And then IT management really loved the fact that Precise was able to translate a complex URL into a panel name. And that way if an end user calls up and says, “Hey I’m having trouble with this,” you can isolate and see who is that user, what are they executing, what kind of performance, they’re actually measuring the rendering time in the end user’s browser. It’s a true measure of the end-user experience. And so also, having that user ID is absolutely essential to helping a particular person who calls.
How does Precise do this? And so we’d like to kind of share our architecture. Precise should live in its own server, and live in a VM, it can live in the cloud. On the front end, Precise is web enabled, whether you’re using dashboards, the alerting interface, or the Expert GUI. On the data collection side we can actually do agentless for several different technologies. Oftentimes, though, we will require an agent, and there’s pluses and minuses to having an agent. A big plus is this, is the data that’s collected can be preprocessed before it’s sent across your LAN. And so that means we can minimize the total impact of the monitoring solution on the target environment.
Now just consider as an alternative, if you have “agentless,” there’s still a data collector, it’s just a matter of where it lives, and it’s making calls and passing raw data about the target application across your LAN. And it’s actually pretty expensive. And so by preprocessing we can actually minimize the footprint. You’ll be able to monitor both physical and virtual. And one thing I wanted to say about virtual technology is that [inaudible] really focuses on is utilization. What Precise focuses on is contention. When is the VMware technology actually minimizing resources to your guest VM? And so it becomes really easy. If you’re only looking at the [inaudible] within a guest VM, you have only part of the picture. Being able to automatically detect and alert on contention, it’s really essential.
Precise can monitor up to 500 instances, so very large deployments basically have multiple Precise servers. And for a global deployment, typically it’ll be a Precise server in each data center. Incidentally, for the very largest deployments you can actually federate these together so you can look corporate wide at what’s going on and be able to offer reporting, etc. Now as I had mentioned, we have a lot of technical analytics. Not everybody needs to go into the expert GUI, so we offer a customizable dashboard. And each of these portlets or widgets, they’re all optional. And somebody might just want to go, “Hey, how can you hit an alert on any tier within our environment? How are the end use groups doing from a performance perspective?” Or maybe you might have a question about the infrastructure, getting into maybe even Tuxedo performance. Or even load balancing. It’s kind of interesting here in this load balancing part. I’m looking at the portlet in the middle on the left-hand side. You can see that the number of executions is very similar between each of the web servers. But the response time is very different on the top one. You can actually drill in and find out exactly the reason why the response time on that web server was much slower than the other ones.
One thing about load balancing, this is very important, and load balancing policies, you know, not every load balancing policy is appropriate for every application. It’s actually really helpful to validate your load balancing policy. We’re actually seeing with some applications like the new PeopleSoft Fluid GUI, where actually some web servers will go offline. And so that’s something that’s really critical. If you’re deploying PeopleSoft Fluid GUI, please contact us. We can provide you with a lot of insight and a lot of knowledge about what other customers have faced with it. Each of these portlets can be pretty detailed. Like the middle right, with the blue and green, actually shows the sword tip pattern, it kind of shows that your garbage collection within the WebLogic tier is running the way you expect it to run. Each of these portlets can be highly focused or can be very high level. And the reason that that’s important, or could be important, is a lot of times it’s not good enough to just have this information within IT, sometimes you have to share this information with applications’ owners and sometimes with senior management, on what’s going on.
I wanted to share with you a couple of stories, kind of, “Success in the Datacenter.” And these are database focused and I have other stories that are middle-tier focused. But for today I really want to focus on the database tier. Let’s take a look at screen freezes. Now, what happened here is that this particular shop had a business SLA, that if an order is received by 3 p.m., the order ships that day. And so the warehouse is extremely busy during that time frame. And then with getting screen freezes it was very frustrating. And so the supervisor – this is a smaller company – the supervisor actually walked into IT and of course goes up to the DBA and says, “Now, what is going on?” And so what we did, is we were able to show exactly what was going on. Now this is JD Edwards, a multi-tier application, this is the sales order screen. You can get an idea of what the business was, basically a just-in-time inventory, and so you’re basically looking at warehouse applications. And now you’re basically shipping to a number of various customer sites, different stores. And what we did is we opened up Precise.
Now in this case, before we looked at Oracle, here we’re looking at SQL Server, and now the top half shows us a stacked bar graph of where the SQL statements spend their time while executing. Every weak state is accounted for in the y-axis. The x-axis if of course across time and you can see that the stacked bar graph changes from the time slice depending upon what’s executing and how it uses the system. Now in this particular case we focused on the third SQL sequence from the top. It’s texted SELECT FROM PS_PROD and you can see in that column that we’ve captured the actual execution plan. And you can see throughout the number of executions. The fact that that particular SQL statement was responsible for 9.77 percent of the resource consumption during this time frame that we’re looking at – and that’s an important point, the time frame, Precise keeps a rolling history – and so I can basically dial in and find out what happened at any particular point in time or over time. I’m able to view trending.
Now this SQL statement, you see that stacked bar graph there, it’s dark blue. That says we’re using all CPU. Let’s go ahead and focus by clicking this “TUNE” button on that particular SQL statement. What we do is we take it into that workshop, pre-built workshop that’s designed to say, “Well what’s the DBA going to know about this particular SQL statement?” And you can see on the right-hand side there’s a tab called “History” that has been selected. And what I’d like for you to do now is kind of shift over to the left-hand side where it says “Changes vs Duration Average,” average duration. And each of those bars represent events a day.
You can see on Wednesday, Thursday, Friday, the execution time was, I’m going to round to point two. The y-axis shows point four seconds, so point two. Very few screen freezes, operations are going great, [inaudible] in the SLA. Unfortunately on February 27th execution plan changed and that caused an immediate change in the execution time. All of the sudden the execution time is going up, four X, maybe five X, and things are running really poorly. Now Precise, in its repository actually journals all the changes that might impact behavior. And you can see here that we’ve actually captured axis plane changes. The one in the middle says “Table Volume Changed.” And so the tables are growing and we’re right on the cusp, when the SQL statement is parsed, the optimizer chooses one execution plan or a different execution plan.
Now luckily, on this week here on Monday it flip-flopped, so it was at a good time. Unfortunately it flip-flops again, and you know what, the end users start anticipating screen freezes and they start resubmitting that screen and they push the execution count up and up and up. We have a huge amount of detail, but to solve this problem and then avoid it in the future, we need one additional piece of information. And that’s shown to me under the comparison of those execution plans. On March 5th when it was fast and efficient, on the left-hand side it shows the execution plan. When it was slow and inefficient on March 12th, you can see it’s doing a filter join. The filter join just forces a lot more CPU consumption, doing a lot more work. The outcome is identical, it’s just doing a lot more work. It’s like you go and get your supplies one ingredient at a time, rather than go to the pantry and get all of the ingredients at once. And so there is this kind of a more efficient way to do this. Now usually knowing this, the DBA was able to use query plan to avoid this slow execution plan and lock in fast, high performance.
Now the next kind of war story was “Reports Are Late.” I think a lot of people can identify with this scenario. You might have ad hoc reporting, you might use a tool like NVISION, you might have some third-party reporting tool. And what happens is the tool develops the SQL. And oftentimes the SQL isn’t really coded that well. And this could also apply to a situation where, you know, you have some third-party application, right, where the SQL wasn’t in-house written, and so as a DBA, “I don’t control the SQL, what am I going to do about it?” Well Precise provides something that I’m not aware of any other database tool providing and that is an object view. Combined with recommendations and modeling. And so what we can do is actually turn visibility on its head. Rather than just look at activity, let’s investigate, well what object is heaviest on the system? And in kind of the lower part of the screen you can see the order line SQL and you can see the “in MS-SQL” column. And the order line table’s like ten times busier than any other table in the system. I think what you’ll also notice in the top half, the space allocation is growing and you can also look at the specs on the server what version of software we’re running. Precise will actually check tracked changes to the primary settings. Once again, cause and effect.
Now, focusing on the order line table, what I can do with my detailed historic repository is I can actually correlate the SQL statements that go against the order line table. And you can start to look at the where clause in those SQL statements. And you start to notice that the where clause is pretty similar between the different SQL statements. And I would suggest to you that in your recording system you’ll find the same thing. Because the business users, the business analysts are going to want to do things like aggregate business activity over the last day, the last week, the last month, the last quarter, the last year. You’re going to see very similar where clauses, order by, group by, and that means that there’s going to be certain indexes that make sense for those SQL statements.
And so Precise has a recommendation engine, you can see that in the top right-hand corner, and what we can do is actually get recommendations. Say, “Hey, I’m running all of the SQL statements, what indexes would address them?” The indexes are presented to you and you can actually see the DBL. Now Precise is read only, it doesn’t offer the ability to click a button and create the index, but that’s easy enough to do outside of Precise. But here’s the crucial thing, is Precise allows you to evaluate and model the changes, so there’s this Evaluate button in the lower left-hand corner of the screen. And what that does is it shows the SQL statements in the before and after.
Let’s look at these SQL statements. Do you see this column here that says “in MS-SQL,” and it says one hour, four minutes? That top SQL statements executes or consumes about 64 minutes’ worth of resources. And its projected improvement is 98 percent. These changes are going to save hours’ worth of processing. The next SQL statement is 27 minutes and will basically save a third. That’s about ten minutes of processing. Summed together you’re actually going to save hours and hours’ worth of processing by these proposed changes. And so being able to know this up front, being able to model this. You can also use the “what-if” capability to kind of say, “Well, I don’t want to make that index, or what happens if I change the order of the column?” And so I can use this modeling capability to find out exactly what’s going to go on.
The other thing that’s crucial is that when I make the change I can actually measure for an individual SQL statement. You saw the SQL statement history in the previous example, and I can actually verify if I achieved the savings that were modeled. And so that feedback, completing the feedback loop is absolutely crucial.
Alright, here is the final example I was going to have for you. This is an SAP shop and, you know, they had gone for a major upgrade, they were doing some stuff with custom transactions, and basically an end user was unsatisfied with the performance. And so what we did is we were able to focus in on what that end user experienced. And you can see at the top of the list, “CHOUSE” and the response time is a little over 61 seconds. This thing is taking a minute to execute. Now you can see we have a stacked bar graph that’s geared towards SAP. In the right-hand side it shows client time, queueing time. The blue is application time and in an SAP environment, that’s ABAP code, and then the database. And so the database, you know, it could be Oracle, it could be SQL, it could be HANA. We basically are able to show that.
Now what we do with Precise is we focus, for that transaction and that user, what SQL statements were coming out. Once again, that context to connect the dots. Now this top SQL statement, you can see it’s circled, it executes in two milliseconds. You really can’t blame the database if it’s executing so quickly. Execution count is very high. Actually we’re able to go back to ABAP coder and say, “Hey, what’s going on?” We actually found that the code in the database was put in the wrong place, was nesting in the wrong place within the loop, made the change and then we’re able to measure after. You can actually see what the performance is after. Not only at the SQL statement level but also at the custom code level. And so they could live with a four-and-a-half-second execution time. And so these are just a couple of examples of how Precise might be leveraged, you might leverage it. Just as a quick recap, Precise shows performance by location, by the end-user ID, it provides context through the application stack. You can drill in on root cause. And I think one of the big differentiators is to be able to know, not just the SQL statement, but why SQL statement is running slowly, and be able to identify the contention and basically offer more options for solving problems. This is what Precise has to offer and you can consume us, you know, at a lightweight way or if you have very deep, very challenging problems, we love to take those on as well.
Eric Kavanagh: Alright, I have to say that was a lot of fantastic detail, Bill. Thank you for showing all those screenshots. And from my perspective you really fulfilled what I was kind of explaining at the top of the hour which is, number one, you need to have the right tool. You must have a tool that allows you the amount of context required to identify all the elements in the equation, as someone said in a movie once, that was kind of funny. But let me go ahead and hand it over to Dez because I bet he’s got some questions for you and I want to push one more of these slides just for visual candy, if you will. I’m actually, hold on, let me take this back. But Dez, I’m sure you’ve got some questions, take it away.
Dez Blanchfield: Yeah, I do, wow. This tool’s come a long way since I originally knew it, and I wasn’t aware you’d actually gotten quite so deep in the stack now. It’s just quite mind-boggling. Just really quickly, a couple of things. The deployment model, can you just really quickly, in a minute or two, just outline the traditional or typical deployment model. You mentioned it was available as a virtual machine. It can be run in the cloud. And I guess one of the questions that will probably come up and I think there was a couple of questions that came through in the Q&A section. Just to recap them in summary, so the normal deployment model and the type of axis that’s required, is it traditionally deployed on-premise or hosted or in the cloud? What are the types of deployment models that you normally see? And what type of axis is required to kind of get that to work? Do we have to change things at security level around network access, and so forth? Or can this just behave as an end user?
Bill Ellis: Yeah, so currently the majority of installations are on-prem. More and more people are putting components of the application stack into the cloud and so we can handle that as well. The deployment we need a server to run on, it’s going to meet certain specifications. We need to have a database to store the historic repository, so meeting those prerequisites is kind of the first step. The next thing is that we definitely need to have some knowledge of the application itself and the installation is wizard driven and basically fill in the blanks. Because of the depth of information we’re getting, you know, from a web process level to the code that’s executing, we do need to have some privileges. We have a secure data model, or security model, I need to say, because the agents run under credentials that are totally separate from people who are using the metadata about the transactions, etc.? Precise does communicate via TCP over IP and so we require certain ports to be open. As a quick example, like our default port is 2702. That type of detailed stuff is something if people are interested, we could get into it in more detail. But typically we’re very quick time-to-value. If somebody is facing a big problem, we can often get the thing installed and shine a bright light on a situation in a matter of hours.
Dez Blanchfield: Yeah I definitely got that sense too. In the deployment model you talked about a very large scale and up to 500 instances and how that could be federated. At the very entry level, what does it look like if somebody wants to – because I know IDERA’s very big on giving access to free trials, free demos, and I remember seeing on the website almost everything can sort of be played with. For folk on here, and I think I missed it earlier on, but I think there was a question that came up around what does a typical site look like and how do people get access to this and start to play with it and get that type of experience where they can see whether they’ve got a way to address some performance issues? Can they download an ODS and spin it up on their hypervisor, Hyper-V or laptop or do they need a dedicated machine to run it on? You outlined the architecture before but just very briefly, in a minute or two, what does that look like for the entry-level deployment just to do a proof of concept for example?
Bill Ellis: Yeah, so our model’s a little bit different than the IDERA tools. We kind of fit more into the Embarcadero scenario where you would want to contact one of our sales reps. We would want to just discuss with you what are the challenges and then we quite typically would, you know, one of the SEs would be assigned and would basically work through the installation with somebody. Typically you would not run Precise on your laptop. You would want to have a VM or a server within the data center where the application lives, to do the collections. But we’d help you through every step of that. If anybody’s interested in pursuing that, you definitely want to contact IDERA.
Dez Blanchfield: One of the other things that struck me was that, I mean, a lot of what we’ve covered today is around reacting to performance issues. But it seemed to me that, and on live environments as people are using them so, as your first slide show, somebody picks up the phone and says, “Application’s running slow, help.” But it struck me that during prerelease of applications or upgrades or new patches and fixes, you could go through a bunch of capacity planning and stress testing and have Precise looking at the entire environment and actually find issues before you even put end users on the environment. Is that a use case that you’ve seen before or are people sort of doing that as well, or is that not a typical use case?
Bill Ellis: Absolutely, we would want to use Precise throughout the application development life cycle or the upgrade life cycle as well. Precise offers a scalability view, it will show the number of executions overlaid with the response time. Obviously, if both the number of executions and the response time grow together, you’re not scaling and you need to do something. That type of thing has helped immensely. I think it’s a little less true now, but when people started putting production applications onto VMware they were a little bit hesitant and it was like, you know, at the first thing they’d be like, “Oh we need to move this to physical.” And what we can actually do is show what the resource consumption is so you can make the application more efficient. At every step of the application life cycle you definitely want to use Precise. But I would have to say that production is really where performance matters the most and Precise is geared towards 24/7 production monitoring and so you really don’t want to run your production applications with no visibility.
Dez Blanchfield: Absolutely. One other quick question just on that spec – depth test, immigration, UAT and so forth – I mean, it’s great to have this tool and I imagine app developers would absolutely love to have access to this through the life cycles of the development life cycle. With the more complex architectures you’re seeing now, so we’ve moved from dedicated service to virtualizations and virtualization, we’re moving now to sort of, you know, adoption of outsource to cloud hosting and we’re also seeing a transition to containerization. Have you seen many people deploy this and model the sort of regions or zones, so someone might have – and in Australia we have a very big issue around privacy and I know in Europe it’s the same thing and I think it’s becoming more of a case in the U.S. where data that is able to identify me personally often needs to be in a more secure environment to the actual application layer to the web layer. And so we have these deployments now where people might keep their database and their application stuff internally, but they can put their web layer and their delivery end and application and so forth in a cloud provider such as Azure or [inaudible], or Amazon Web Services and software. How does that work with your normal deployment? Is that a case that you’ve just got another set of collectors in the region and they just aggregate some more? What does that look like in the Precise world in today’s sort of bimodal approach of running IT of old legacy stuff in one place and your goods are sometimes in the cloud?
Bill Ellis: Yeah, so we support a mixed environment. One thing to consider is that there’s different contracts with the cloud providers. Some of them will not allow any kind of agent or any kind of outside monitoring within the cloud. In order to install and monitor with Precise you need to have a type of contract that allows that type of access. There are definitely some restrictions that sometimes we have to work through and so those are important kind of criteria that you consider when you’re, I guess, first signing those contracts and then and/or if you need to deploy Precise.
Dez Blanchfield: Yeah I’ve seen a number of instances where even with traditional database environment if you’re procuring that as part of the service, particularly with the likes of Azure, as you’re procuring the likes of HDInsight or SQL as a service, as a platform, your usual tools can only dive so deep because they’re not really that keen for you to look at what’s under the hood. And so you kind of end up with a certain level or depth that you can monitor to and all of the sudden you just can’t see behind the magic curtain. Is self-service a thing? Is this traditionally something that would run inside a network operations center where the technical team, the folk under the CIO would only get access to, or is this also something that you can provide a level of access to end users? Maybe not necessarily the reception desk and traditional HR and finance people, but more savvy users who are doing, you know, like for example, data scientists, actuaries, statisticians, people who are doing really heavy workloads. Is it a case that they can get access to some sort of self-service access to see what’s happening when they run these heavy queries and where the pain is coming about so they can sort of tune how their workload runs?
Bill Ellis: There’s pretty good security within Precise so you can set up users that have different levels of access. At very basic levels just the dashboards provides oversight. And then within the, you know, if somebody did want to go into the Expert GUI you can kind of restrict what they’re able to see and what they’re able to do. And kind of circling back to your previous question that, you know, in health care you have all the HIPAA laws and so there are definitely some considerations and there’s actually some deployment options so that we can work with it in both environments. One thing to consider with the data that you’ve seen in this presentation is it’s all metadata about performance, not the content of the tables, you know, and so it’s really, it’s not going to get into, kind of, those types of privacy concerns.
Dez Blanchfield: Yeah, I did like that. I had a eureka moment about your fourth or fifth slide of the screen snaps and I realized you’re just pulling performance, well not just, but you’re pulling performance data, you’re pulling stuff, as you said, metadata out of the various levels of the stack, you’re not actually looking at the content. And I think this is an interesting thing because it’s one of those tools where you could either deploy it for a short term and look at what’s happening in the environment, but you don’t have to have access to the data itself. You can even look at the way the crews are being run. The last thing, I guess, just quickly, and then I’ll hand back to Eric so if you’ve got a question, then get Rebecca to wrap up, you mentioned before that the overhead is nominal, it is a case that it’s even a noticeable overhead from the monitoring side of things and just watching the background or is it such a negligible amount of overhead that it’s just not worth considering?
Bill Ellis: Yeah, so I think on the database tier, you know, each technology is a little bit different. On the database tier Precise is pretty well known to beat the lowest overhead. On the middle tier there is, you know, there’s kind of a balancing act, you know, it’s not just Precise, it applied to everybody, in terms of visibility and overhead. And so one of the things is we offer a number of sophisticated tools to control what the overhead is. We are designed for production and, you know, it is definitely useful to nip as many problems in the bud on development and QA, but, you know, there’s nothing like knowing what’s happening in production.
Dez Blanchfield: Eric, across to you, have you got any final questions?
Eric Kavanagh: Yeah, I’ll just say that I think you did a great job of pointing out that context really is the key and it’s almost like if we move towards this era of the internet of things, you want everything instrumented. And I think the standard now in manufacturing is to do that, which is good news, right? Because you want to be able to pull information from all of these different environments and stitch it all together. And I guess I’ll just turn it over to you for some follow-up comments, though. That’s what you guys are focused on is providing a visual interface through which some analyst, an IT analyst essentially, can monitor and analyze what’s happening in this complex environment and then figure out what to change. Because it’s not just a tool. You must have the tool but you need that person who’s going to dig into that detail and find the answers, right?
Bill Ellis: Yeah, I kind of see it as boiling up to the top and prioritizing where is the most buyback, you know? If it turns out it’s a different situation because not every problem is in the database. If the database is, you know, things are executing in a tenth of a second but on the application tier things are taking three seconds, that’s where the most buyback is. And so kind of being able to isolate the problem tier and then what’s happening within the tier to really focus in on where the buyback is. That really accelerates the resolution and the optimization of the application and it’s so much faster and so much better and so much more fun than people gathered into a conference room going, “Well it’s not me, it must be someone else.”
Eric Kavanagh: That’s right. I saw a great meme the other day that said something like, “Be informed, not just opinionated.” You walk into a meeting, you have the information, you can point to the data. That’s the key and we’re getting there, thank goodness. Okay folks we’re going to go ahead and wrap up, but we do archive all these webcasts for later viewing. Feel free to check it out anytime. We list all of our webcasts now, the Hot Tech series and the Briefing Room series at Techopedia.com, so hop online and check those folks out. With that we’re going to bid you farewell. Thanks for your time today, Bill. Thanks to you and all of your hard work, Dez. And we’ll talk to you next time, folks. Take care. Bye bye.