The Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasts

Why Trust Techopedia
KEY TAKEAWAYS

Host Eric Kavanagh discusses forecasting with Dr. Robin Bloor, Rick Sherman and IDERA's Bullett Manale.

Eric Kavanagh: Ladies and gentlemen, hello once again and welcome back to the Hot Technologies webcast series! My name is Eric Kavanagh, I'll be your host for today's web seminar, called “Saving Time, Money and Trouble with Optimal Forecasts.” ‘Course I missed the first part of the title there, “The Best Laid Plans.” We always talk about that on this show. So, Hot Technologies of course is our forum for understanding what some of the cool products are out there in the world today, the world of enterprise technology, what folks are doing with them, how they work, all that kind of fun stuff.

And the topic today, as I suggest, deals with forecasting. Really you're trying to understand what's going to be happening in your organization. How are you going to keep your users happy, no matter what they're doing? If they're doing analysis, if they're doing real work, they're facing real customers with transactional systems, whatever the case may be, you want to understand how your systems are running and what's going on, and that's what we'll talk about today. It's kind of funny because forecasting is not something I like to do, 'cause I'm superstitious, like I think if I forecast too much, bad things will happen, but that is just me. Don't follow my lead.

So, here are our presenters today, yours truly in the top left-hand corner, Rick Sherman is dialing in from Boston, our buddy Bullett Manale from IDERA and our very own Dr. Robin Bloor. And with that, I'll hand it over to Robin and just remind folks: Ask questions, don't be shy, we love good questions, we'll put them out to our presenters and others today. And with that, Robin, take it away.

Robin Bloor: OK, well, as I'm in the pole position as they say, I thought I'd tell a SQL story today, because it's the background for what the discussion that's going to go on and it will inevitably not clash with because Rick is not focusing on this, and will not clash with what Rick has to say. So, the SQL story, there are some interesting things about SQL because it's so dominant. See, that's a typo, SQL is a declarative language. The idea was that you could create a language in which you would request what you wanted. And the database would work out how to get it. And it's worked out rather well, actually, but there are a number of things that are kind of worth saying about it, the consequences of basing the whole of the IT industry on a declarative language. The user does not know or care about the physical organization of the data, and that's the good thing about the declarative language – it separates you out from all of that, and even worrying about it – just ask for whatever you want, and the database will go and get it.

But the user has no idea whether the way they structure the SQL query is going to affect the performance of the query and that's a bit of a downside. I've seen queries that are hundreds and hundreds of lines long, that are just one SQL request, you know, begins with “select” and just goes on and on with sub-queries and so on and so forth. And it actually turns out that if you want a particular collection of data out of a database, you can ask for it in many different ways with SQL, and get the same answer if you kind of have some familiarity with the data. So, one SQL query isn't necessarily the best way to ask for data, and databases will respond quite differently according to the SQL that you put into them.

And so, SQL actually affects performance, so people that use SQL, it's true of them, it's also true of the SQL programmers that use SQL and they're even less likely to think about the impact that they're going to have, because most of their focus is actually on the manipulation of data and not on the getting, putting of data. And the same is also true of BI tools, I've seen the SQL that gets, if you like, squeezes out of BI tools of various databases and it has to be said, that a lot of that is, well, I wouldn't write SQL queries like that. It's somebody has created, if you like, a little motor that whatever the parameters are, it'll throw out some SQL, and again, that SQL will not necessarily be efficient SQL.

Then I thought I'd mention the impedance mismatch, the data that programmers use is different than the data as it sorts. So, our DMS stores data in tables, organized the object-oriented code are mostly coders, are programming object-oriented form nowadays and they order data in object structures, so it doesn't map one to the other. So, there's a necessity to translate from what the programmer thinks the data is to what the database thinks what the data is. Which seems like we must have done something wrong for that to be the case. SQL has DDL for data definition, it has DML – data manipulation language – select, project and join, for getting that data. Now, there's very little math and very little time-based stuff, so it's the imperfect language, although it has to be said it's been extended and continues to be extended.

And then, you get the SQL barrier problem, which is always straighter than the diagram, in that but a lot of people were asking questions for analytical reasons, once they got the answer to the question data terms, want to ask another question. So, it becomes a dialog thing, well, SQL wasn't built for dialogs, it was built for asking what you want all at once. And it's kind of worth knowing that, because there are some products out there that actually forsakes SQL in order to make conversation possible between the user and the data.

In terms of database performance – and this kind of spreads out to everything – yes, there's CPU, there's memory, there's disk, there's network overheads and there's the locking problem of more than one person wishing to have exclusive use of the data at a given point in time. But there's also poor SQL calls, there's an awful lot that can be done if you actually optimize the SQL, in terms of performance. So, database performance factors: bad design, bad program design, concurrency of workload missing, load balancing, query structure, capacity planning. That is data growth. And in a few words, SQL is convenient, but it doesn't self-optimize.

Having said that, I think we can pass on to Rick.

Eric Kavanagh: All right, Rick, let me give you the keys to the WebEx car. Take it away.

Rick Sherman: All right, great. Well thanks Robin, as we started off in the beginning of the presentation, my graphics are still pretty boring, but we'll go with it. So, I agree with everything Robin talked about on the SQL side. But what I want to concentrate a little bit now on is the demand for data, which we'll go through very quickly, the supply as in tools used in that space or the need for the tools in that space.

First off, there’s some in every article that you read has to do with big data, lots of data, unstructured data coming from the cloud, big data everywhere that you can imagine. But the growth of the database market has continually been with SQL, relational database probably as of 2015, is still 95 percent of the database market. The top three relational vendors have about 88 percent of the market share in that space. So, we're still talking, as Robin talked, about SQL. And in fact, even if we're looking on the Hadoop platform, Hive and Spark SQL – which my son, who's a data scientist, uses all the time now – is certainly the dominant way for people to get to data.

Now, on the database side, there are two broad categories of usage of databases. One is for operational database management systems, so enterprise relationship planning, customer relationship manning so, the Salesforce ERPs, Oracles, EPICs, N4s, etc., of the world. And the, there's a wide amount and expanding amount of data that's in data warehouses and other business intelligence-based systems. 'Cause everything, regardless of where and how it is captured, stored or transacted, eventually gets analyzed and so there's a huge demand and increase in the use of databases, particularly relational databases out in the marketplace.

Now, we've got the demand, we have huge amounts of data coming. And I'm not really talking just about big data, I'm talking about the use of data across all sorts of enterprises. But accompanying that from a supply side, for people who can manage those resources, we have first off, we have sort of a DBA shortage. We have according of the Bureau of Labor Statistics, from 2014–2024 the DBA jobs are only going to grow by 11 percent – now that's people who have DBA job titles, but we'll talk about that in a second – versus the 40-plus percent annual data growth space. And we have a lot of DBAs; on average that same study talked about the average age is pretty high compared to other IT professions. And then we have a lot of people leaving the field, not necessarily retiring, but shifting into other aspects, going into management, or whatever.

Now, part of the reason they're leaving, is because the DBA job keeps getting harder and harder. First off, we have DBAs managing many different databases themselves, physical databases, located all over the place, as well as different types of databases. Now that might be relational, or they may be other database, types of database, too. But even if it's relational, they could have any of one, two, three, four different vendors that they're actually trying to manage. DBAs typically get involved after the design of the database or the application. Robin talked about how databases or applications get designed, how SQL gets designed. Well, when we're talking about data modeling, ER modeling, extended ER modeling, dimension modeling, advanced dimensional modeling, whatever, typically application programmers and application developers design with their end goal in mind – they're not designing for the efficiency of the database structure itself. So we have a lot of poor design.

Now, I'm not talking about the commercial enterprise application vendors; they usually have ER models or extended ER models. What I'm talking about is there's a lot more business processes and applications being built by application developers in every company – those are the ones aren't necessarily designed for efficiency or effectiveness of deployment. And the DBAs themselves are overworked and they have 24/7 responsibility sometimes, they keep getting more and more databases. I think that has do a little bit with that people don't quite understand what they do, or how they do it. Their own little group and people just keep thinking, “Well all these tools are just so easy to use, we can just keep throwing on more and more databases on their workload,” which isn't the case.

Which leads us to the part-time and accidental DBAs. We have IT teams that are small and they cannot necessarily afford a dedicated DBA. Now that's true of small- to medium-sized businesses, where the expansion of database and database applications has exploded in the last decade and continues to expand. But it's also the case of large corporations, typically have been doing data warehousing, business intelligence analytics for a long, long time. Long time ago we used to get dedicated DBAs for those projects; we never get a dedicated DBA anymore. We're responsible for designing the database, which is fine, if it's somebody who has experience. But in general, the DBAs are application developers, they often take that role as a part-time part of their job, they don't have formal training in it and again, they're designing it for their end goals, they're not designing it for efficiencies.

And there's a lot of difference between design and development, versus deployment and management. So, we have the “penny wise, pound foolish,” with a little piggy bank there, skipping on getting the skills and resources needed in the projects. Thinking that everybody is from “Revenge of the Nerds,” my little picture there. Now, as far as what people need, so we have an expanding usage of databases and data in SQL. We have constraining number of DBAs – people who are skilled and expert at these tuning and designing and management and deployment situations. And we have more and more part-time or accidental DBAs, people who haven't had the formal training.

So, what are some of the other things that are also getting into the issue of the fact that these databases aren't being tuned as well, or managed as well? First off, many people assume that the database system themselves have sufficient tools in order to manage themselves. Now, the tools are getting easier and easier to do – design and development – but that's different than doing a good design, and good managing, capacity planning, monitoring, etc. for deployment. So, first off, people assume that they have all the tools they need. Second off, if you're a part-time or accidental DBA, you don't know what you don't know.

I guess I forgot some of the phrase there, so that a lot of times they just don't understand what they even need to look at in the design or when they're managing or operating the databases. If that isn't your profession, then you're not going to understand what you need to do. Third one, is that SQL is a go-to tool, so Robin talked about SQL, and how poorly SQL sometimes is constructed, or often is constructed. And also one of my pet peeves in the BI data warehousing, data migration, data engineering space is that rather than using tools, people have a tendency to write SQL code, stored procedures, even if they're using an expensive data integration tool or an expensive BI tool, they often really use it just to run stored procedures. So that the importance of understanding database design, of construction of SQL, is getting even more and more important.

And finally there is this silo approach, in which we have individual people look at individual databases. They don't look at how the applications work and interact with each other. And they also really are often looking at the databases versus the applications that they use them for. So, the workload that you get on the database is critical in the design, critical in tuning it, critical in trying to figure out how to plan for capacity, etc. So, looking at the forest from the trees, people are in the weeds, looking at the individual tables and databases and not looking at the overall interaction of these applications in the workload.

Finally, people need to look at the key areas that they need to look at. When they're planning to manage databases, they need to first think about the, develop some application-centric performance metrics, so they need to look at not just how this table is structured, how it's particularly modeled, but how is it used? So, if you have enterprise application that's due in supply chain management, if you're taking orders off the web, if you're doing BI – whatever you're doing – you need to look at who's using it, how they're using it, what the data volumes are, when it's going to happen. What you're really trying to look for is the wait times, because no matter what, all applications are judged by how long it takes to get something done, whether it's a person or just the interchange of data between applications or processors. And what are the bottlenecks? So often when you're trying to debug issues, of course, you're really trying to look at what are the real bottlenecks – not necessarily how to tune everything, but how do you get rid of and move the performance up the wait times and throughput – whatever you need to look at.

And you really need to separate out the data capture, the transactions, the transformations aspects in the database along with the analytics. Each of those have different design patterns, each of them have different usage patterns and each of them need to be tuned differently. So, you need to think about how this data is used, when it's used, what it's used for, and figure out what the performance metrics and what are the key things you want to analyze related to that usage. Now, when you're looking at monitoring the performance, you want to look at the database operations itself; you want to look at both the data structures, so the indexes, partitioning and other physical aspects of the database, even the structure of the database – whether it's ER model or dimensional model, however it's structured – all those things have an impact on performance, especially in the different contexts of data capture analytics and the transformations that happen.

And as Robin mentioned on the SQL side, looking at the SQL that's being run by these different applications across these databases, and tuning it is critical. And looking at the overall application workloads, and the infrastructure environment that these databases and applications run on. So, that the networks, the servers, the cloud – whatever they're running on – also looking at the impact that these applications and these databases have within that context, all these have interplay of being able to tune the database.

And finally, when you're looking at tools, you want to be able to look at the three different kinds of analytics related to that. You want to look at descriptive analysis: what's happening and where, related to the database and the application performance. You want to have the ability to do diagnostic analytics to figure out not only what is happening but why is it happening, where are the bottlenecks, where are the problems, what's running well, what's not running well? But being able to analyze and drill down into the problem areas in order to address those, either for design or whatever you need to do.

And finally, the most aggressive or proactive type of analysis is to actually do some predictive analysis, predictive analytics modeling, whatever. We know that the database and the applications work in this context, if we upped the capacity, if we get more users, if we do more throughput, whatever we're doing, being able to project out what, how and where that'll impact the database, the applications, allows us to plan for and to figure out proactively, where the bottlenecks are, where the wait times might suffer and what we need to do in order to fix things. So we want to have tools that are able to implement the performance metrics, monitor the performance, as in do these three types of analysis. And that's my overview.

Eric Kavanagh: All right, let me hand it off to – those are two great presentations, by the way – let me hand this off to Bullett Manale to take it from there. And folks, don't forget to ask good questions; we have some good content already. Take it away, Bullett.

Bullett Manale: Sounds good. Thanks, Eric. So, a lot of what Rick said and Robin said, obviously I agree with 100 percent. I would say that I pulled this slide up, 'cause I think it's fitting, I don't know for those of you that are “A-Team” fans back in the ‘80s, John Hannibal Smith had a saying he'd always say, “I love it when a plan comes together,” and I think that when you're talking about particularly the SQL Server, which is where we're focusing, which is the product that we're going to talk about today, SQL Diagnostic Manager, it's definitely one of those things that you have to have; you have to be able to leverage the data that you have, and be able to make decisions from that data, and in some cases, you're not looking for a decision; you're looking for something to tell you when something is going to run out resources, when you're going to run out of resources, when you're going to have a bottleneck, those kinds of things.

It's not just about monitoring a specific metric. So, with Diagnostic Manager, one of the things it does very well is going to help you in terms of forecasting, and understanding specific to the workloads and we're going to talk about a lot of that today. The tool is geared for the data manager, the DBA or the acting DBA, so a lot of the things that Rick was mentioning about, the acting DBA is so true. In a lot of cases, if you're not a DBA, there's going to be a lot of question marks that you're going to have when it comes time to managing a SQL environment, things you don't know. And so you're looking for something to help you, take you through that process, and also educate you in the process as well. And so, it's important that the tool you use for those kinds of decisions is going to give you some insight into the reasons why those decisions are being made, it's not just telling you, “Hey, do this.”

Because I'm the acting DBA, eventually I might be the full-blown DBA with the actual expertise and knowledge to back that title up. So, that said, when we're talking about being a database administrator – I always kind of show this slide first, because the DBA has some different roles and depending on the organization that you're with, you're going to have, those are going to vary from one place to another – but typically, you're always going to be in some way responsible for your storage, your planning of that storage and understanding of anticipating, I should say, how much space you're going to need, whether it's for your backups, or whether it's for the databases themselves. You're going to need to understand and assess that.

In addition, you're going to need to be able to understand and optimize things on as-needed basis, and as you go through the monitoring of the environment, it's obviously important that you make changes as they're needed based on things that change within the environment itself. So, things like the number of users, things like the popularity of applications, the seasonality of a database, all should be considered when you're doing your forecasting. And then, obviously looking at other things in terms of being able to provide the reports and the information that's necessary as it relates to making those decisions. In a lot of cases that means doing comparative analysis; it means being able to look specifically at a particular metric and understand what the value of that metric has been over time, so that you can anticipate where it's going to move forward.

So what a lot of the tool Diagnostic Manager does has those capabilities and people use it every day for being able to do things like forecasting, and I've put the definition here of capacity planning. And it's a pretty broad and actually pretty vague definition, which is just the process of determining the production capacity needed by an organization to meet the changing demands for its products, and at the end of the day, that's really what it's all about: It's about being able to take information that you have some way or another and taking that information and making decisions to help you move forward as you progress through the life cycle of your databases. And so, the types of things that are the reasons why people need to do this are obviously first and foremost, in most cases, to save money. Businesses, obviously, that's their main goal is to make money and save money. But in the process along with that, that also means being able to make sure that your downtime, there is no downtime. And being able to make sure that you mitigate any chance of downtime occurring, so keeping it from happening to begin with, in other words, not waiting for it to happen and then reacting to it.

As well as being able to overall increase the productivity of your users, making them more efficient so that you can get more business done is obviously the key here, so these are the types of things that as the DBA or somebody involved in forecasting or capacity planning is going to have to be able to wade through the information to be able to make those decisions. And then, overall, this is obviously going to help you eliminating waste, not just waste in terms of money, but also in terms of time and in terms of just generally resources that could be used for other things, possibly. So, being able to eliminate that waste so that you don't have opportunity costs that are tied to the waste itself.

So, with that said, what are the types of questions that we get, specific to the person that's a DBA? When am I going to run out of space? That's a big one, not only how much space am I consuming now, but when am I going to run out, based off the trends and the past history? Same thing with the actual instances of SQL, the databases, which servers can I consolidate? I'm going to put some on the VMs, what makes sense in terms of which databases I'm going to consolidate and which instances of SQL should they reside on? All those types of questions need to be able to be answered. Because in most cases, if you're a DBA or acting DBA, you're going to consolidate it sometime in your career. In a lot of cases you're going to be doing that on an ongoing basis. So, you need to be able to make those decisions quickly, not play guessing games when it comes to that.

We talked about bottlenecks and where they're going to occur next, being able to anticipate that, once again, instead of waiting for them to happen. So, obviously all of these things we're talking about, make sense in the sense that you're relying on historical data, in most cases, to be able to generate these recommendations, or in some cases be able to formulate decisions yourself, to be able to come up with these answers. But it reminds me of the, of when you hear the radio ads for somebody selling securities or something like that, it's always “past performance is not indicative of future results” and those kinds of things. And the same thing holds true here. You're going to have situations where these forecasts and these analyses may not be 100 percent right. But if you're dealing with things that have happened in the past and the known, and being able to take and do the “what if” with a lot of these types of questions, you're going to run into, is very valuable and it's going to get you a lot further than playing the guessing game.

So, these types of questions obviously they're going to come up, so how we handle a lot of these questions with Diagnostic Manager, first of all we have forecasting capabilities, being able to do this at the database, at the table as well as the drive or the volume. To be able not only to say, “Hey, we're full of space,” but six months from now, two years from now, five years from now, if I'm budgeting for that, how much drive space am I going to need to budget for? Those are questions I'm going to have to ask, and I'm going to need to be able to use some method of doing that rather than guessing and putting my finger up in the air and waiting to see which way the wind blows, which is a lot of times, unfortunately, the way a lot of these decisions are made.

In addition to that, being able to – looks like my slide got cut off there a little bit – but being able to provide some assistance in the form of recommendations. So, it's one thing to be able to show you a dashboard full of metrics and be able to say, “OK, here's all the metrics and where they're at,” but then to be able to make some or have some understanding of what to do, based off of that is another leap. And in some cases, people are educated enough in the role of DBA to be able to make those decisions. And so we have some mechanisms in the tool that will help with that, which we'll show you in just a second. But being able to show not only what the recommendation is, but to also provide some insight as to why that recommendation is being made and then also on top of that, in some cases, being able to actually come up with a script that automates the remediation of that issue is ideal as well.

Moving on to the next one here, which we'll see, it's just generally speaking understanding down to the metric level what is normal. I can't tell you what's not normal if I don't know what normal is. And so, having some way to measure that is key and you've got to be able to take into consideration multiple types of areas, for example – or I should say time frames – different groupings of servers, being able to do this dynamically, from an alerting perspective, in other words, during the middle of the night, during my maintenance window, I expect my CPU to be running at 80 percent based on all of the maintenance that's going on. So, I might want to increase my thresholds higher, during those time frames versus during maybe in the middle of the day, when I'm not having as much activity.

Those are some things that obviously will be environmental, but things that you can apply to what's being managed, to be able to help you manage that environment more efficiently, and making it easier to do so. The other area, obviously, is being able to just overall provide the reports and the information to be able to answer those types of “what if” questions. If I've just made a change to my environment, I want to understand what that impact has been, so that I can apply that same change to other instances or other databases in my environment. I want to be able to have some information or some ammunition to be able to make that change with some peace of mind and knowing that it's going to be a good change. So, being able to do that comparative reporting, being able to rank my instances of SQL, being able to rank my databases against each other, to say, “Which is my highest consumer of CPU?” Or which one is taking the longest in terms of waits and things like that? So a lot of that information is going to be available with the tool as well.

And then, last but not least, is just an overall ability that you need a tool that's going to be able to handle whatever situation comes your way, and so what I mean by that is, if you've got a large environment with a lot of instances, you're going to probably run into situations where you need to pull metrics that traditionally are not metrics that a DBA would want to even monitor in some cases, depending on that particular situation. So, having a tool that you can, that's extensible, to be able to add additional metrics and to be able to use those metrics in the same form and fashion that you would use them if you were using an out-of-the-box metric, for example. So, being able to run reports, being able to alert, baseline – all the things we're talking about – is also a key part of being able to do this forecasting and making it so you get the answers you're looking for to be able to make those decisions, moving forward.

Now the way Diagnostic Manager does this, we have a centralized service, a group of services that runs, collects data against 2000 to 2016 instances. And then what we do is we take that data and we put that into a central repository and then what we'll do with that data, obviously, is we do a lot to be able to provide further insight. Now, in addition to that – and one of the things that's not on here – is we also have a service that runs in the middle of the night, which is our predictive analysis service, and that does some number crunching and it helps to understand and help you as a DBA or acting DBA, to be able to make those types of recommendations, to be able to also provide some insight in terms of baselines.

So, what I'd like to do, and this is just a quick example of the architecture, the big takeaway here is there's not any agents or services that are actually sitting on the instances that you’re managing. But what I'd like to do is just actually take you in to the application here and give you a quick demo. And let me just go out too, and make that happen. So, let me know, I think Eric, can you see that OK?

Eric Kavanagh: I got it now, yeah.

Bullett Manale: OK, so I'm going to take you through some of these different parts that I spoke on. And essentially let's start with the kind of things that are more along the lines of here's something that you need to do, or here is something that's a point in time in the future and we're going to give you some insight around it. And this is being able to really anticipate – or I should say dynamically anticipate – things as they're happening. Now, in the case of reports, one of the things we have in the tool are three different forecasting reports. And in the case, for example, of a database forecast, what I would probably do in the situation of being able to anticipate the size of a database over a period of time, and I'll just give you a couple of examples of that. So, I'm going to take my audit database, which is pretty I/O intensive – it's got a lot of data going to it. We've got, let's see, we'll do this one here, and let's just pick the healthcare database up here.

But the point is, I'm not just seeing what the space is on this, I'm able to say, “Look, let's take the last year's worth of data” – and I'm going to fib here a little bit, I don't really have a year's worth of data, I have about two months' worth of data – but, because I'm choosing a sample rate of months here, I'm going to be able to anticipate or forecast out in this case the next 36 units because our sample rate is set to months – that is a unit, is a month – and then I would be able to, to then run a report to basically show me where we would anticipate our future growth, for these three databases. And we can see we have a varying degree of difference, or variance, between the three different databases, particular to the amount of data they're consuming historically.

We can see the data points here represent the historical data, and then the line's going to be providing us with the forecast, along with the numbers to back that up. So we can do that at the table level, we can do that even at the drive level, where I can anticipate how big my drives are going to get, including mount points. We would be able to forecast this same type of information out, but once again, depending upon the sample rate, will allow me to determine how many units and where we're taking what we want to forecast out. Notice also we have different types of forecast type. So you get a lot of options and flexibility when it comes time to doing the forecasting. Now, that's one thing we'll do, in actually giving you specific date and being able to say “Hey on this date, this is where we would anticipate the growth of your data being.” In addition to that, though, we can provide you with other insights that are related to some of the analysis that we perform during the off hours and the service when it runs. Some of the things it does, is it tries to anticipate the things that will likely happen, based off of the history of when things occurred in the past.

So we can see here, actually, a forecast is providing us some insight into the likelihood of us having problems throughout the evening based off of things that have once again happened in the past. So, obviously this is great, especially if I'm not a DBA, I can look at these things, but what's even better if I'm not a DBA, is this analyze tab. So, before this was here in the tool we would go through and show the product to people and they would be “That's great, I see all these numbers, I see everything, but I don't know what to do” (laughs) “as a result of that.” And so what we have here, is a better way for you to be able to understand, if I'm going to take action to help with performance, if I'm going to take action to even help with the health of my environment, being able to have a ranked way of providing those recommendations, as well as useful tips in information to learn more about those recommendations and actually having even external links to some of that data, that will show me and take me to the reasons why these recommendations are made.

And in many cases, being able to provide a script that would automate, like I said, the remediation of these issues. Now, part of what we're doing here with this analysis – and I'll show you when I go in to configure the properties of this instance, and I go to the analysis configuration section – we have a lot of different categories that are listed here, and part of that, we have index optimization and query optimization. So, we're evaluating not only the metrics themselves, and things like that, but also things like the workloads and the indexes. In the case here, we'll actually do some additional hypothetical index analysis. So, it's one of those situations where I don't want to, in many cases, I don't want to add an index if I don't need to. But at some point there's kind of a tipping point, where I say, “Well, the table is getting to the size or the types of queries that are running within the workload make sense now to add an index. But it wouldn't have made sense maybe six weeks prior.” So this allowing you to dynamically get that insight as to things that will likely, like I said, improve performance based off of what's happening in the environment, what's happening within the workloads, and doing those kinds of things.

And so you get a lot of good information here, as well as the ability to optimize these things automatically. So, that's another area where we would be able to help out, in terms of what we call predictive analysis. Now, in addition to that, I should say, we also have other areas that I think just generally lend themselves to helping you make decisions. And when we talk about making decisions, once again, being able to look at historical data, provide some insight to get us to where we need to be to improve that performance.

Now, one of the things we can do is we have a baseline visualizer which allows us to selectively choose whichever metric we would want – and let me find a decent one here – I'm going to SQL CPU usage, but the point is you can go back over however many weeks to paint these pictures for you to see when your outliers are, to see generally speaking where that value falls within the periods of time that we've been collecting data. And then, in addition to that you'll also notice that when we go out to the actual instance itself, we have the ability to configure our baselines. And the baselines are a really important part about being able to automate things as well as being able to be notified of things. And the challenge, as most DBAs would tell you, is that your environment is not always running the same, throughout the course of the day, versus the evening and whatnot as we mentioned earlier in the example with the maintenance periods of time, when we have high levels of CPU or whatever that might be happening.

So, in the case here, with these actual baselines, we can have multiple baselines, so I might have a baseline for example, that's during my maintenance hours. But I could just as easy create a baseline for my production hours. And the point of doing that is when we go into an instance of SQL and we actually have these multiple baselines, then we would be able to anticipate and be able to perform some type of automation, some type of remediation or just alerting in general, differently specific to those windows of time. So, one of the things you'll see here, is these baselines that we generate are using the historical data to provide that analysis, but more importantly, I can change these thresholds statically, but I can also automate these dynamically as well. So, as the maintenance window, or I should say the maintenance baseline window comes up, these thresholds would automatically switch specific to the loads that I'm encountering during that window of time, versus maybe in the middle of the day when my loads are not as much, when the workloads are not as impactful.

So, that's something else to keep in mind, in terms of the baseline. Obviously these are going to be really helpful for you, in terms of also understanding what is normal and being able to also understand, engage when you're going to be also running out of resources. Now, the other kind of thing that we have in the tool, that's going to help you make decisions, in addition the baselining and being able to set up alerts around those baselines and the thresholds that you create dynamically, is like I said earlier, just being able to run a whole myriad of reports that help me answer questions about what's going on.

So, as an example, if I had a 150 instances I'm managing – in my case I don't, so we have to play the pretend game here – but if I had all my production instances and I needed to understand where is the area that I need the attention on, in other words, if I'm only going to have a limited amount of time to perform some type of administration to improve performance, I want to focus on the key areas. And so, with that said, I would be able to say, “Based on that environment, rank my instances against each other, and give me that ranking by contention pipe.” So whether it's disk usage, memory usage, whether it's waits, whether it's response time, I am able to correlate – or I should say rank – those instances against each other. Obviously the instance that's at the top of each list, if it's the same instance, that's probably something I really want to focus on, because it's obviously once again at the top of the list.

So, you have a lot of reports in the tool that help you in terms of ranking the environment at the instance level; you can do this also at the database level as well, where I can rank my databases against one another. Particular to thresholds and areas that I can set, I can even set up wildcards here if I want to, to only focus on particular databases, but the point being is that I can compare my databases in the same fashion. Also, as far as other types of comparative analysis and the big one in this tool, is the baseline analysis that we have. So if you scroll down to the service view here, you'll see that there's a baseline statistics report. Now this report obviously is going to help us to understand not only what the metric values are, but for a specific instance I could go out, and for any of these metrics, be able to actually look at the baselines for these metrics.

So, whatever it might be, as a percent or whatever I could go out and say, “Let's see the baseline for this broken out in the last 30 days,” in which case it's going to show me the actual values versus the baseline and I would be able to make some decisions using that information, obviously, so this is one of those situations, where it's going to depend on what question it is, that you're asking at the time. But this is obviously going to help you for a lot of those questions. I wish I could say we had one report that does it all, and it's kind of like the easy report, where you press and button and it just answers every “what if” question you could ever answer. But the reality is, you're going to have a lot of attributes and a lot of options to be able to choose from in these pull-downs to be able to formulate those “what if”-type questions that you're looking for.

So a lot of these reports are geared towards being able to answer those types of questions. And so, it's real important also that these reports and in addition, all of the things we've already shown you in the tool, as I mentioned before, having the flexibility to incorporate new metrics, to be managed, even being able to create counters, or SQL queries that are incorporated into your polling intervals, to be able to help me answer these questions, that maybe out of the box we didn't anticipate to monitor, you can add that stuff. And you would be able to then do all the same things I just showed you: baseline, run reports, and create reports from that metric, and be able to answer and do a lot of these different types of things that I'm showing you here.

Now, in addition to that – and one of the things we've obviously run into quite a bit lately is – first it was the, everybody flipping or switching to VMs. And now we've got a lot of people that are heading off to the cloud. And there's a lot of questions that are coming up around those types of things. Does it make sense for me to move to the cloud? Am I going to save money by moving to the cloud? If I were to put these things on a VM, on a shared-resource machine, how much money can I save? Those types of questions, obviously are going to be coming up as well. So, a lot of that stuff keep in mind, with Diagnostic Manager, we can add and pull from the virtualized environments of both VMware and Hyper-V. We also can add instances that are out on the cloud, so your environments like Azure DB, for example, or even RDS, we can pull metrics from those environments as well.

So there's a lot of flexibility and a lot of being able to answer those questions as it relates to those other types of environments that we see people heading off to. And there's still a lot of questions around this stuff, and as we see people consolidating those environments they're going to need to be able to answer those questions as well. So, that's a pretty good overview, I'd say, of Diagnostic Manager, as it relates to this topic. I know that the subject of business intelligence came up and we also have a tool for business intelligence that we didn't talk about today, but it also is going to provide you insight in terms of answering these types of questions as it relates to your cubes and all of those different types of things, as well. But hopefully this has been a good overview, at least in terms of how this product can help with being able to formulate a good plan.

Eric Kavanagh: All right, good stuff. Yeah, I'll throw it out to Rick, if he's still out there. Rick, any questions from you?

Rick Sherman: Yes, so first up, this is great, I like it. I particularly like the extending out to VMs and clouds. I see a lot of app developers think that if it's in the cloud then they don't need to tune it. So—

Bullett Manale: Right, we still have to pay for it, right? You've still got to pay for whatever it is that people are putting on the cloud, so if it's poorly running, or if it's causing a lot of CPU cycles, it's more money you've got to pay, so it's not, you still need to measure this stuff, absolutely.

Rick Sherman: Yeah, I've seen a lot of poor designs in the cloud. I did want to ask, would this product also be used – I know you mentioned the BI product and you have tons of other products that interact with each other – but would you start looking at SQL performance, individual queries in this tool? Or would it be other tools that would be used for that?

Bullett Manale: No, this would, absolutely. That's one of the things that I didn't cover and I meant to, is the queries portion of it. We have a lot of different ways to identify query performance, whether it's related to, specifically to waits like we see on this view here, or whether it's related to the resource consumption of queries overall, there's a whole number of ways we can analyze query performance. It's whether it's duration, CPU, I/O, and once again, we can also look at the workloads themselves to provide some insight. We can provide the recommendations in the analyze section and we also have a web-based version that provides information around the queries themselves. So I can get recommendations on missing indexes and the ability to view the execution plan and all that kind of stuff; it's also a capability as well. So, absolutely, we can diagnose queries seven ways to Sunday (laughs) and be able to provide that insight in terms of the number of executions, be it resource consumption, the waits, the duration, all that good stuff.

Rick Sherman: OK, great. And then what's the load on the instances themselves with all this monitoring?

Bullett Manale: It's a good question. The challenge with answering that question is, is it depends, it's just like anything else. A lot of what our tool has to offer, it provides flexibility and part of that flexibility is you get to tell it what to collect and what not to collect. So for example, with the queries themselves, I don't have to collect the wait information, or I can. I can collect information related to queries that surpass a duration of time, of execution. As an example of that, if I were to go into the configure query monitor and I were to say, “Let's change this value to zero,” the reality is that just basically makes the tool collect every query that runs and that's really not the spirit of why that's there, but generally speaking if I wanted to provide a full sample of data for all the queries, I could do that.

So, it's very relative to what your settings are, generally speaking, out of the box. It's anywhere from about 1–3 percent overhead, but there's other conditions that will apply. It also depends on how much port queries are running on your environment, right? It also depends on the method of collection of those queries and what version of SQL it is. So, for example, SQL Server 2005, we're not going to be able to pull from extended events, whereas so we would pull from a trace to do that. So, it would be a little bit different in terms of the way we would go about gathering that data, but that said, like I said, we've been around for I guess since about 2004 with this product. It's been around a long time, we've got thousands of customers, so the last thing we want to do is have a performance monitoring tool that causes performance problems (laughs). And so we try to steer clear of that, as much as possible, but generally speaking, like so about 1–3 percent is a good rule of thumb.

Rick Sherman: OK, and that's pretty low, so that's terrific.

Eric Kavanagh: Good. Robin, any questions from you?

Robin Bloor: I'm sorry, I was on mute. You've got a multiple database capability, and I'm interested in how 'cause you can look at multiple databases and therefore you can know a larger resource base is possibly divided up between various virtual machines and so on and so forth. I'm interested in how people actually use that. I'm interested in what the customers are doing with that. Because that looks to me, well, it certainly, when I was messing about with databases, something I never had on hand. And I would only ever consider one instance in any meaningful way at any given point in time. So, how do people use this?

Bullett Manale: Generally speaking, you're talking about in general just the tool itself? How they're using it? I mean, generally, it's about being able to have a central point of presence of the environment. Having peace of mind and knowing that if they stare at a screen and they see green, they know everything is good. It's when problems happen and obviously most of the cases from a DBA's perspective, a lot of times those problems happen when they're in front of the console, so being able to be notified as soon as the problem is happening. But in addition to that, being able to understand when the problem does happen, being able to get to the heart of the information that is providing them some context in terms of why it's happening. And so that's, I think, the biggest part: being proactive about it, not being reactive.

Most of the DBAs I talk to – and I don't know, it's a good percentage of them – unfortunately are still in the reactive type of environment; they wait for a consumer to approach them to tell them there's a problem. And so, we see a lot of people trying to break away from that and I think that's a big part of the reason why people like this tool is that it helps them to be proactive but it also provides them the insight into what is going on, what's the problem, but in a lot of cases, what we find at least – and maybe it's just the DBAs telling us this – but the DBAs, the perception is it's always their problem, even if it's the application developer that wrote the application that didn't write it properly, they're the ones that are going to be taking the blame, 'cause they're taking that application into their systems or servers and then when the performance is bad, everybody points to the DBA says, “Hey it's your fault.”

So this tool is, a lot of times, is going to be used to help in terms of making the case for the DBA to say, “Hey, this is where the problem lies and it's not me.” (Laughs) We need to improve this, whether it be changing the queries or whatever it might be. In some cases it will fall in their bucket in terms of their responsibility, but at least having the tool to be able to help them understand that and know that, and doing it in a timely way is obviously the ideal approach.

Robin Bloor: Yeah, most of the sites that I'm familiar with, but it has been a while since I've been out there, looking at various multi-database sites, but mostly what I used to find was that there would be DBAs that focused on a handful of databases. And those would be the databases, that if they ever went down, it would be a real big problem for the business, and so on and so forth. And the other ones, they'll just be collecting stats every now and then to see they didn't run out of space and they'd never look at them at all. And while you were doing the demo I was looking at this and I was thinking well, in one way or another, you extend, just by providing something like this for databases that were often, nobody cared too much about, because they have data growth, they have application growth at times as well. You're extending DBA coverage in quite a dramatic way. So that's what the question is really about, is it that with a set of tools like this, you end up being able to pretty much give a DBA service to every database that is in the corporate network?

Bullett Manale: Sure, I mean, the challenge is, is that like you said pretty eloquently, is like there's some databases that the DBAs care about and then there's some they don't care about as much. And the way that this particular product, the way it's licensed is on a per-instance basis. So, there is, I guess you'd say, a threshold of when people decide “Hey, this isn't a critical enough instance that I want to manage it with this tool.” That said, there are other tools that we do have that are more, I guess, catering to those less important instances of SQL. One of them would be like the Inventory Manager, where we do light health checks against the instances, but in addition to that what we do is we do discovery, so we identify new instances that have been brought online and then, from that point, as a DBA I can say, “OK, here's a new instance of SQL, now is it Express? Is it the free version or is an enterprise version?” That's probably a question I want to ask myself, but secondly, how important is that instance to me? If it's not that important, I might have this tool going out and doing it, generic, what I would call generic health checks in the sense that they're the elemental types of things I care about as a DBA: Is the drive filling up? Is the server responding to issues? The main things, right?

Whereas with Diagnostic Manager, the tool I was just showing you, it's going to get down to the query level, it's going to get down into the recommendation of indexes, looking at the execution plan and all that good stuff, whereas this is mainly focused on who owns what, what is it that I own and who's responsible for it? What service packs and hot fixes do I have? And are my servers running with the main ingredients of what I would consider to be a healthy instance of SQL? So to answer your question, there is a little bit of a mix. When we have people looking at this tool, they're typically looking at a more critical set of instances. That said, we have some folks that buy every instance that they have and manage it, so it just depends. But I tell you, overall, there's definitely a threshold of those folks that consider their environment is important enough to have a tool like this to manage those instances.

Robin Bloor: Okay, another question before I hand it on to Eric. The impression one gets, just from watching the industry is that databases still have a life, but all of the data is pouring into all of these data lakes and so on and so forth. That's the hype, really, and the hype never reflects the reality, so I'm interested in what kind of reality you're perceiving out there? Are the important databases within an organization, are they experiencing the traditional data growth, which I used to think of as 10 percent a year? Or are they growing more than that? Is big data making these databases balloon? What's the picture you see?

Bullett Manale: I think a lot of cases we're seeing some of the data being moved into those other segments where it makes more sense, when there's other technologies that are become available. As of recent, some of the bigger data stuff. But these databases, I would say, it's hard to generalize in a lot of cases 'cause everybody is a little bit different. Generally speaking, though, I do see some divergence. I see, like I said, people are moving to the elastic models in a lot of cases, because they want to grow the resources and not so much in other areas. Some folks are moving to the big data. But it's hard to get a feel for, you say, the perception, because generally speaking the folks I'm talking to all have the traditional databases and are using this on a SQL Server environment.

That said, I'd say in terms of SQL itself, I definitely still think it's gaining market share. And I think that there's a lot of folks that are still heading towards SQL from other places like Oracle, because it's more affordable and seems to be obviously, as SQL versions become more advanced – and you're seeing this with the more recent things that are going on with SQL, in terms of encryption and all of the other capabilities that are making it an environment or a database platform – that is obviously very mission critical capable, I guess. So, I think we're seeing that as well. Where you're seeing a shift, it's still happening. I mean, it was happening 10 years ago, it's still, I think, happening in terms of SQL Server, where the environment's growing and the market share is growing.

Robin Bloor: OK, Eric, I presume the audience has a question or two?

Eric Kavanagh: Yeah, let me throw one quick one over to you. It's a pretty good question, actually. One of the attendees is asking, will this tool tell me if a table may need an index to speed up the query? If so, can you show an example?

Bullett Manale: Yeah, so I don't know if I have one for a specifically adding an index, but you can see here, we have fragmentation recommendations here. I also just believe we just had and this was part of the Diagnostic Manager offering the web-based version, where it's telling me I have a missing index. And we can view those recommendations and it'll tell us the potential gain of that by indexing that information. The other thing I should just mention is that when we do the recommendations, for many of these, the script will be built for it. That one's not a good example, but you would be able to see, yes, the situations where an index – either a duplicate index, or the addition of an index – would improve performance, and like I said earlier, we do a lot of that through hypothetical index analysis. So, it really helps in terms of understanding the workload, to be able to apply that to the recommendation.

Eric Kavanagh: That's great stuff, and this will give me a good segue to the final comments here. Robin and I and Rick as well, have heard over many years now, there's talk about self-tuning databases. It's a self-tuning database! All I can tell you is: Don't believe them.

Bullett Manale: Don't believe the hype.

Eric Kavanagh: There may be some small little things that get done dynamically, but even that, you might want to check it out and make sure it's not doing something you don't want it to do. So, for quite some time, we're going to need tools like this to understand what's happening at the database level and like Robin said, data lakes are fascinating concepts, but there's probably about as much chance of them taking over as there is of there being a Loch Ness Monster anytime soon. So, I would just say again, the real world has a lot of database technology, we need people, DBAs, to look at this stuff and synthesize it. You can tell, you need to know what you're doing to make this stuff work. But you need the tools to give you the information to know what you're doing. So, bottom line is DBAs are going to be doing just fine.

And big thanks to Bullett Manale and our friends at IDERA. And of course, Rick Sherman and Robin Bloor. We do archive all of these webcasts, so hop online insideanalysis.com or to our partner site www.techopedia.com for more information on all that.

And with that, we'll bid you farewell, folks. Thanks again, we'll talk to you next time. Take care. Bye bye.