Building a Business-Driven Data Architecture

Why Trust Techopedia
KEY TAKEAWAYS

Host Rebecca Jozwiak discusses data architecture solutions with Eric Little of OSTHUS, Malcolm Chisholm of First San Francisco Partners and Ron Huizenga of IDERA.

Rebecca Jozwiak: Ladies and gentlemen, hello, and welcome to Hot Technologies of 2016. Today we’re discussing “Building a Business-Driven Data Architecture,” definitely a hot topic. My name is Rebecca Jozwiak, I will be your host for today’s webcast. We do tweet with a hashtag of #HotTech16 so if you’re in Twitter already, please feel free to join in on that as well. If you have questions at any time, please send them in to the Q&A pane at the bottom right of your screen and we’ll make sure they get answered. If not, we’ll make sure that our guests get them for you.

So today we’ve got a really fascinating lineup. A lot of heavy hitters on with us today. We have Eric Little, VP of data science from OSTHUS. We have Malcolm Chisholm, chief innovation officer, which is a really cool title, for First San Francisco Partners. And we have Ron Huizenga, senior product manager from IDERA. And, you know, IDERA’s a got really full suite of data management and modeling solutions. And today he’s going to give us a demo about how his solution works. But before we get to that, Eric Little, I am going to pass the ball to you.

Eric Little: Okay, thanks a lot. So I’m going to go through a couple of topics here that I think are going to relate to Ron’s talk a bit and hopefully set the stage for some of these topics as well, some Q&A.

So the thing that interested me with what IDERA’s doing is that I think they correctly point out that complex environments really are driving a lot of business values nowadays. And by complex environments we mean complex data environments. And technology is really moving fast and it’s hard to keep up in today’s business environment. So those people who work in technology spaces will often see that you have customers who are working out problems with, “How do I use big data? How do I incorporate semantics? How do I link some of this new stuff with my older data?” and so on, and that kind of leads us nowadays into these four v’s of big data that many people are pretty familiar with, and I understand there can be more than four sometimes – I’ve seen as many as eight or nine – but normally, when people talk about things like big data or if you’re talking about big data then you usually are looking at something that’s kind of enterprise scale. And so people will say, okay, well, think about the volume of your data, which is normally the focus – that’s just how much you have. The velocity of the data has to do with either how fast I can move it around or how fast I can query it or get the answers, and so on. And personally I think the left side of that is something that’s being solved and handled relatively quickly by a lot of different approaches. But on the right side I see a lot of capability for improvement and a lot of new technologies that are really coming to the foreground. And that’s really having to do with the third column, the data variety.

So in other words, most companies nowadays are looking at structured, semi-structured and unstructured data. Image data is starting to become a hot topic, so being able to use computer vision, look at pixels, being able to scrape text, NLP, entity extraction, you have graph information that’s coming out of either statistical models or that’s coming out of semantic models, you have relational data that’s existing in tables, and so on. And so pulling all of that data together and all these different types is really representing a large challenge and you’ll see this in, you know, in Gartner and other people who are sort of following the trends in the industry.

And then the final thing that people talk about in big data is often this notion of voracity, which is really the uncertainty of your data, the fuzziness of it. How well do you know what your data’s about, how well do you understand what’s in there and, you know? The ability to use statistics and the ability to use some type of information around what you might know or to use some context, can be of value there. And so the ability to look at data in this way in terms of how much you have, how fast you need to move it around or get at it, all the types of data you may have in your enterprise and how certain you are about where it is, what it is, what quality it’s in, and so on. This really requires a large, coordinated effort now between a lot of individuals to manage their data effectively. Modeling data, therefore, is increasingly important in today’s world. So good data models are really driving a lot of success in enterprise applications.

You have data sources from a variety of sources, like we were saying, which really requires a lot of different kinds of integration. So pulling it all together is really useful to be able to run queries, for example, across numerous types of data sources, and pull information back. But in order to do that you need good mapping strategies and so mapping those kinds of data and keeping up on those mappings can be a real challenge. And then you have this issue of, well how do I link my legacy data to all these new data sources? So suppose I’ve got graph, do I take all my relational data and put it into graph? Usually that’s not a good idea. So how is it that people are able to manage all of these kinds of data models that are going on? Analysis really has to be run on a lot of these different kinds of data sources and combinations. So the answers that are coming out of this, the answers that people need to really make good business decisions is critical.

So this is not about just building technology for the sake of technology, it’s really, what am I going to do, what can I do with it, what kind of analysis can I run, and the ability, therefore, as I’ve already been talking about, to pull this stuff together, to integrate it is really, really important. And one of these types of analyses then runs on things like federated search and query. That’s really becoming a must. Your queries have to normally be threaded across multiple kinds of sources and pull information back in a reliable.

The one key element that often, especially people are going to look at key things like semantic technologies – and this is something that I’m hoping Ron is going to talk about a bit in the IDERA approach – is how do you separate or manage the model layer of your data from the data layer itself, from that raw data? So down at the data layer you may have databases, you may have document data, you may have spreadsheet data, you may have image data. If you’re in areas like the pharmaceutical industries you’ve got vast amounts of scientific data. And then on top of this people normally look for a way to build a model that allows them to quickly integrate that data and really when you’re looking for data now you’re not looking to pull all of the data up into you model layer, what you’re looking at the model layer to do is to give you a nice logical representation of what things are, common vocabularies, common types of entities and relationships, and the ability to really reach into the data where it is. So it has to say what it is, and it has to say where it is, and it has to say how to go fetch it and bring it back.

So this has been an approach that has been quite successful in propelling semantic technologies forward, which is an area where I work in a lot. So a question that I wanted to pose for Ron, and that I think will be useful in the Q&A section, is to see how is this accomplished by the IDERA platform? So is the model layer actually separate from the data layer? Are they more integrated? How does that work and what are some of the results and benefits that they’re seeing from their approach? Therefore reference data is really becoming also critical. So if you’re going to have these kinds of data models, if you’re going to be able to federate out and search across things, you really have to have good reference data. But the problem is reference data can be really hard to maintain. So oftentimes naming standards in and of themselves are a difficult challenge. One group will call something X and one group will call something Y and now you have the problem of how does someone find X and Y when they’re looking for this type of information? Because you don’t want to just give them a portion of the data, you want to give them everything related. At the same time terms change, software becomes deprecated, and so on, how do you keep up and maintain that reference data over time?

And, again, semantic technologies, specifically using things like taxonomies and vocabularies, data dictionaries, have provided a standard space way of doing this, which is really highly robust, it utilizes certain kinds of standards, but the database community has done this for a long time as well, just in different ways. I think one of the keys here is to think about how to use perhaps entity-relation models, how to use perhaps graph models or some type of an approach here that’s really going to give you hopefully a standard spaced way of handling your reference data. And then of course once you have the reference data, the mapping strategies have to manage a wide variety of names and entities. So subject matter experts often like to use their own terms.

So a challenge in this is always, how do you give someone information but make it relevant to the way that they talk about it? So one group may have one way of looking at something, for example, you may be a chemist working on a drug, and you may be a structural biologist working on the same drug, and you may have different names for the same types of entities that relate to your field. You have to figure out ways to bring those personalized terminologies together, because the other approach is, you have to force people to drop their term and use someone else’s, which they often don’t like. Another point here is handling large numbers of synonyms gets difficult, so there are lots of different words in many people’s data that can refer to the same thing. You have a problem of reference there using a many-to-one set of relations. Specialized terms vary from industry to industry so if you’re going to come up with kind of an overarching solution for this type of data management, how easily portable is it from one project, or one application over to another? That can be another challenge.

Automation is important and it’s also a challenge. It’s expensive to manually handle reference data. It’s expensive to have to keep manually mapping and it’s expensive to have subject matter experts stop doing their day-to-day jobs and have to go in and constantly fix data dictionaries and re-update definitions and so on, and so on. Replicable vocabularies really show a lot of value. So those are vocabularies oftentimes that you can find external to your organization. If you’re doing work in crude oil, for example, there will be certain kinds of vocabularies you can borrow from open-source spaces, same with pharmaceuticals, same with the banking industry and financial, same with lots of these kinds of areas. People are putting reusable, generic, replicable vocabularies out there for people to use.

And, again, looking at the IDERA tool, I’m curious to see how they’re handling this in terms of using kinds of standards. In the semantics world you often see things like SKOS models that provide standards for at least broader than/narrower than relationships and those things can be difficult to do in ER models but, you know, not impossible, it just depends on how much of that machinery and that linking that you can handle in those types of systems.

So lastly I just wanted to kind of make a comparison to some semantic engines that I see in the industry, and kind of ask Ron and prime him a little bit to talk about perhaps how IDERA’s system has been used in conjunction with any semantic technologies. Is it capable to be integrated with triple stores, graph databases? How easy is it to use external sources because those kinds of things in the semantic world can often be borrowed using SPARQL Endpoints? You can import RDF or OWL models directly into your model – refer back to them – so, for example, the gene ontology or the protein ontology, that can live somewhere in its own space with its own governance structure and I can simply import all or part of that as I need it into my own models. And I’m curious to know how IDERA approaches this issue. Do you have to maintain everything internally, or are there ways to go use other kinds of standardized models and pull them in and how does that work? And the last thing I mentioned here is how much manual work is really involved to build the glossaries and the metadata repositories?

So I know Ron’s going to show us some demos on these kinds of things which will be really interesting. But the problems that I often see consulting with customers is that a lot of errors occur if people are writing in their own definitions or their own metadata. So you get misspellings, you get fat-finger errors, that’s one thing. You also get people who may take something from, you know, just Wikipedia or a source that’s not necessarily of the quality you may want in your definition, or your definition is only from one person’s perspective so it’s not complete, and it’s not clear then how the governance process works. Governance, of course, being a very, very big issue any time you’re talking about reference data and any time you’re talking about how this may fit in to someone’s master data, how they’re going to use their metadata, and so on.

So I just wanted to put some of these topics out there. These are items that I see in the business space across a lot of different kinds of consulting engagements and a lot of different spaces, and I’m really interested to see what Ron is going to show us with IDERA to point out some of these topics. So thank you very much.

Rebecca Jozwiak: Thanks so much, Eric, and I really like your comment that many errors can occur if people are writing their own definitions or metadata. I know in the journalism world there’s a mantra that “many eyes make few the errors,” but when it comes down to practical applications, too many hands in the cookie jar tends to leave you with a lot of broken cookies, right?

Eric Little: Yeah, and germs.

Rebecca Jozwiak: Yeah. With that I’m going to go ahead and pass it off to Malcolm Chisholm. Malcolm, the floor is yours.

Malcolm Chisholm: Thank you very much, Rebecca. I would like to kind of look a little bit on what Eric’s been talking about, and add to, kind of, a few observations which, you know, Ron may care to respond to also, in talking about “Toward Business-Driven Data Architecture” – what does it mean to be business driven and why is that important? Or is it just some form of hype? I don’t think it is.

If we look at what’s been going on since, you know, mainframe computers really became available to companies – say, around 1964 – to today, we can see that there’s been a lot of changes. And these changes I would summarize as being a shift from process-centricity to data-centricity. And that’s what makes business-driven data architectures so important and so relevant for today. And I think, you know, it’s not just a buzzword, it’s something that’s absolutely real.

But we can appreciate it a little bit more if we do dive into history, so going back in time, way back to the 1960s and for some time thereafter, mainframes dominated. These then gave way to PCs where you had actually rebellion of the users when PCs came in. Rebellion against centralized IT, who they thought were not fulfilling their needs, weren’t agile enough. That quickly gave rise to distributed computing, when PCs were linked together. And then the internet started to happen, which blurred the boundaries of the enterprise – it could now interact with parties outside itself in terms of data exchange, which had not been happening before. And now we’ve gone into the era of cloud and big data where the cloud is platforms which really are commoditizing infrastructure and so we’re leaving, as it were, IT of the need to run big data centers because, you know, we’ve got the cloud capacity available to us, and concomitant with that big data which Eric has, you know, so eloquently discussed. And overall, as we see, as the shift in technology occurred, it has become more data-centric, we do care more about data. Like with the internet, how data is being exchanged. With big data, the four or more v’s of the data itself.

At the same time, and perhaps more importantly, business use cases shifted. When computers were first introduced, they were used to automate things like books and records. And anything that was a manual process, that involved ledgers or things like that, were programmed, essentially, in house. That shifted in the 80s to the availability of operational packages. You didn’t need to write your own payroll anymore, you could buy something that did it. That resulted in a large downsizing at the time, or restructuring, in many IT departments. But then business intelligence, with things like data warehouses appeared, mostly in the 90s. Followed by dotcom business models which were, of course, a big frenzy. Then MDM. With MDM you start to see that we’re thinking not about automation; we’re just actually focusing on curating data as data. And then analytics, representing the value you can get out of the data. And within analytics you see companies that are very successful whose core business model revolves around data. Google, Twitter, Facebook would be part of that, but you could argue also that Walmart is.

And so the business is now really thinking about data. How we can get value out of data? How data can drive the business, the strategy, and we are in the golden age of data. So given that, what’s happening in terms of our data architecture, if data is no longer regarded as simply the exhaust that comes out of the back end of applications, but is really central to our business models? Well, part of the problem that we have in achieving that is IT is really stuck in the past with the systems development life cycle which was a consequence of having to deal rapidly with that process automation phase in the early age of IT, and working in projects is a similar thing. To IT – and this is a little bit of a caricature – but what I’m trying to say is that some of the barriers to getting a business-driven data architecture are because we’ve, kind of, uncritically accepted a culture in IT which derives from a bygone age.

So everything’s a project. Tell me your requirements in detail. If things don’t work, it’s because you didn’t tell me your requirements. Well that doesn’t work today with data because we’re not starting with un-automated manual processes or a, you know, a technical conversion of business processes, we’re starting very often with already existing production data that we’re trying to get value out of. But nobody who is sponsoring a data-centric project really understands that data in depth. We have to do data discovery, we have to do source data analysis. And that doesn’t really match with the systems development, you know – waterfall, SDLC lifecycle – of which Agile, I would maintain, is a kind of a better version of that.

And what is being focused on is technology and functionality, not data. For instance, when we do testing in a testing phase it’ll typically be, does my functionality work, let’s say my ETL, but we’re not testing the data. We’re not testing our assumptions about the source data coming in. If we did, we would be in perhaps better shape and as somebody who has done data warehouse projects and suffered through upstream changes, busting my ETLs, I would appreciate that. And in fact, what we want to see is testing as a preliminary step to continuous production data quality monitoring. So we’ve got here a lot of attitudes where it’s difficult to achieve the business-driven data architecture because we’re conditioned by the era of process-centricity. We need to make a transition to data-centricity. And this is not a total transition, you know, there’s still a lot of process work to do out there, but we’re not really thinking in data-centric terms when we need to, and the circumstances that occur when we’re really obliged to do that.

Now the business realizes the value of the data, they want to unlock the data, so how are we going to do that? So how do we do the transition? Well, we put data at the heart of development processes. And we let the business lead with information requirements. And we understand that nobody understands the existing source data at the start of the project. You could argue that the data structure and the data itself got there through IT and operations, respectively, so we should know that, but really, we don’t. This is data-centric development. So we have to, in thinking about where do we and how do we do data modeling in a data-centric world, we have to have feedback loops to the users in terms of refining their information requirements, as we do data discovery and data profiling, foresee source data analysis, and as we gradually get more and more certainty about our data. And now I’m speaking of a more traditional project like an MDM hub or a data warehouse, not necessarily the big data projects, although this is still, I maintain, fairly close to that. And so those feedback loops include the data modelers, you know, gradually advancing their data model and interacting with the users to make sure the information requirements are refined based on what’s possible, what’s available, from the source data as they better understand it, so it’s not a case anymore of the data model being, you know, in a state that’s either not there or completely done, it’s a gradual bringing into focus of it.

Similarly, more downstream of that we have quality assurance where we develop rules for data quality testing to make sure that the data is within the parameters that we’re making assumptions about. Going in, Eric was referring to changes in reference data, which may happen. You don’t want to be, as it were, a downstream victim of, sort of, unmanaged change in that area, so the quality assurance rules can go into post-production, continuous data quality monitoring. So you can start to see if we are going to be data-centric, how we do data-centric development is quite different to the functionality-based SDLC and Agile. And then we have to pay attention to business views as well. We have – and again this echoes what Eric was saying – we have a data model defining a data story blueprint for our database, but at the same time we need those conceptual models, those business views of data which traditionally haven’t been done in the past. We’ve sometimes, I think, thought that the data model can do it all, but we need to have the conceptual view, the semantics, and look into the data, render it through an abstraction layer which translates the storage model into the business view. And, again, all the things that Eric was talking about in terms of the semantics, becomes important to do that, so we actually have additional modeling tasks. I think that’s, you know, interesting if you’ve come up in the ranks as a data modeler like I did, and again, something new.

And finally I’d like to say that the larger architecture has also got to reflect this new reality. Traditional customer MDM, for instance, is kind of, okay, let’s get our customer data into a hub where we can, you know, make sense of it in terms of really just data quality for back office applications. Which from a business strategy viewpoint is kind of a yawn. Today, however, we are looking at customer MDM hubs that have additional customer profile data in them, not just the static data, which then really do have a bidirectional interface with transaction applications of the customer. Yes, they still support the back office, but now we know about these behaviors of our customers as well. This is more expensive to build. This is more complex to build. But it is business-driven in a way in which the traditional customer MDM isn’t. You are trading off an orientation to the business against simpler designs which are easier to implement, but for the business, this is what they want to see. We’re really in a new era and I think there’s a number of levels that we have to respond to the business-driving data architecture and I think it’s a very exciting time to be doing things.

So thank you, back to you Rebecca.

Rebecca Jozwiak: Thanks Malcolm, and I really enjoyed what you said about data models must feed the business view, because, kind of unlike what you were saying, where IT held the reins for so long and it’s just kind of not that case anymore and that the culture does need to shift. And I’m pretty sure that there was a dog in the background who agreed with you 100%. And with that I am going to pass the ball to Ron. I’m really excited to see your demo. Ron, the floor is yours.

Ron Huizenga: Thank you very much and before we jump into that, I’ll be going through a few slides and then a little bit of demo because, as Eric and Malcolm have pointed out, this is a very broad and deep topic, and with what we’re talking about today we’re just scraping the surface of it because there are so many aspects and so many things that we really need to consider and look at from a business-driven architecture. And our approach is to really make that model-based and derive true value out of the models because you can use them as a communication vehicle as well as a layer to enable other systems. Whether you’re doing service-oriented architecture, or other things, the model really becomes the lifeblood of what’s going on, with all the metadata around it and the data that you have in your business.

What I want to talk about, though, is almost taking this a step backwards, because Malcolm had touched on some of the history of the way solutions have evolved and that type of thing. One way to really point out how important it is to have a sound data architecture is a use case that I used to run into very often when I was consulting before I came into a product management role, and that was, I would go into organizations whether they were doing business transformation or just replacing some existing systems and that type of thing, and it became evident very quickly of how poorly organizations understand their own data. If you take a particular use case like this one, whether you’re a consultant going in or maybe it’s a person that has just started with an organization, or your organization has been built up over the years with acquiring different companies, what you end up with is a very complex environment very quickly, with a number of new different technologies, as well as legacy technology, ERP solutions and everything else.

So one of the things that we can really do with our modeling approach is to answer the question of, how do I make sense of all of this? We can really start to piece the information together, so the business can leverage the information that we have properly. And it comes out to, what is it that we have out there in those environments? How can I use the models to drive out the information that I need and understand that information better? And then we have the traditional types of metadata for all the different things like the relational data models, and we’re used to seeing things like definitions and data dictionaries, you know, data types and that type of thing. But what about additional metadata that you want to capture to really give even more meaning to it? Such as, which entities are really the candidates that should be reference data objects, which should be master data management objects and those types of things and tie them together. And how does the information flow through the organization? Data flows from how they’re consumed from both a process point of view, but also data lineage in terms of the journey of information through our businesses and how it makes its way through the different systems, or through the data stores, so we know when we’re building the I-solutions, or those types of things, that we’re actually consuming the correct information for the task at hand.

And then very importantly is, how can we get all those stakeholders to collaborate, and particularly the business stakeholders because they are the ones that do give us the true meaning of what that data is. The business, at the end of the day, owns the data. They supply the definitions for the vocabularies and things that Eric was talking about, so we need a place to tie all of that together. And the way we do that is through our data modeling and data repository architectures.

I’m going to touch on a few things. I’m going to be talking about ER/Studio Enterprise Team Edition. Primarily I’m going to be talking about the data architecture product where we do the data modeling and that type of thing, but there are a lot of other components of the suite that I’m just going to touch on very briefly. You’ll see one snippet of the Business Architect, which where we can do conceptual models, but we can also do business process models and we can tie those process models in to link the actual data that we have in our data models. It really helps us to bring that tie together. Software Architect allows us to do additional constructs such as some UML modeling and those types of things to give supporting logics to some of those other systems and processes that we’re talking about. But very importantly as we move down we have the repository and team server, and I’ll talk about that as kind of two halves of the same thing. The repository is where we store all of the modeled metadata as well as all the business metadata in terms of the business glossaries and terms. And because we have this repository-based environment, we can then stitch all these different things together in that same environment and then we can actually make those available for consumptions, not only for the technical folks but for the businesspeople as well. And that’s how we really start to drive collaboration.

And then the last piece that I’ll talk about briefly is, when you walk into these environments, it’s not just databases that you have out there. You’re going to have a number of databases, data stores, you’re also going to have a lot of, what I would call, legacy artifacts. Maybe people have used Visio or other diagrams to map out some things. Maybe they’ve had other modeling tools and that type of thing. So what we can do with the MetaWizard is actually extract some of that information and bring it into our models, make it current and be able to use it, consume it, in a current fashion again, rather than just having it sit out there. It now becomes an active part of our working models, which is very important.

When you walk into an organization, like I said, a lot of disparate systems are out there, a lot of ERP solutions, mismatched departmental solutions. Many organizations are also using SaaS solutions, which are also externally controlled and managed, so we don’t control the databases and those types of things in hosts on those, but we still need to know what that data looks like and, of course, the metadata around that. What we also find is a lot of obsolete legacy systems that haven’t been cleaned out because of that project-based approach that Malcolm had talked about. It’s amazing how in recent years organizations will spin up projects, they’ll replace a system or a solution, but there’s often not enough project budget left to decommission the obsolete solutions, so now they’re just in the way. And we have to figure out what we can actually get rid of in our environment as well as what’s useful going forward. And that ties into the poor decommissioning strategy. That’s part and parcel of that same thing.

What we also find, because a lot of organizations have been built up from all these different solutions, is we see a lot of point-to-point interfaces with a lot of data moving around in a number of places. We need to be able to rationalize that and figure out that data lineage that I briefly mentioned before so we can have a more cohesive strategy such as utilization of service-oriented architecture, enterprise service buses and those types of things, to deliver the correct information to a publish-and-subscribe type of model that we use correctly throughout our business. And then, of course, we still need to do some kind of analytics, whether we’re using data warehouses, data marts with traditional ETL or using some of the new data lakes. It all comes down to the same thing. It’s all data, whether it’s big data, whether it’s traditional data in relational databases, we need to bring all of that data together so that we can manage it and know what we’re dealing with throughout our models.

Again, the complexity we’re going to do is we have a number of steps that we want to be able to do. First of all, you walk in and you may not have those blueprints of what that information landscape looks like. In a data modeling tool like ER/Studio Data Architect you’re first going to be doing a lot of reverse engineering in terms of let’s point at the data sources that are out there, bring them in and then actually stitch them together into more representative models that represent the entire business. The important thing is, is we want to be able to decompose those models as well along business lines so that we can relate to them in smaller chunks, which our business people can also relate to, and our business analysts and other stakeholders that are working on it.

Naming standards are extremely important and I’m talking about it in a couple of different ways here. Naming standards in terms of how we name things in our models. It’s fairly easy to do in logical models, where we have a good naming convention and a good data dictionary for our models, but then also, we see different naming conventions for a lot of these physical models that we’re bringing in. When we reverse engineer, quite often we see abbreviated names and that type of thing that I’ll talk about. And we need to translate those back into meaningful English names that are meaningful to the business so that we can understand what all these data pieces are that we have in the environment. And then universal mappings is how we stitch them together.

On top of that we would then document and define further and that’s where we can classify our data further with something called Attachments, that I’ll show you a couple of slides on. And then closing the loop, we want to apply that business meaning, which is where we tie in our business glossaries and can link them to our different model artifacts, so we know, when we’re talking about a certain business term, where that’s implemented in our data throughout the organization. And then lastly, I’ve already talked about the fact that we need all this to be repository based with a lot of collaboration and publishing capabilities, so our stakeholders can utilize it. I’m going to talk about reverse engineering fairly quickly. I’ve already kind of given you a very quick highlight of that. I will show that to you in an actual demo just to show you some of the things that we can bring in there.

And I want to talk about some of the different model types and diagrams that we would produce in this type of a scenario. Obviously we’ll do the conceptual models in a lot of instances; I’m not going to spend much time on that. I really want to talk about logical models, physical models and the specialized types of models that we can create. And it’s important that we can create these all in the same modeling platform so that we can stitch them together. That includes dimensional models and also models that utilize some of the new data sources, such as the NoSQL that I’ll show you. And then, what does the data lineage model look like? And how do we stitch that data into a business process model, is what we’ll be talking about next.

I’m going to switch over to a modeling environment here just to give you a very high and quick overview. And I believe you should be able to see my screen now. First of all I want to talk about just a traditional type of data model. And the way we want to organize the models when we bring them in, is we want to be able to decompose them. So what you’re seeing here on the left-hand side is we have logical and physical models in this particular model file. The next thing is, is we can break it down along the business decompositions, so that’s why you see the folders. The light blue ones are logical models and the green ones are physical models. And we can also drill down, so within ER/Studio, if you have a business decomposition, you can go as many levels deep or sub-models as you like, and changes that you make at the lower levels automatically reflect up at the higher levels. So it becomes a very powerful modeling environment very quickly.

Something that I also want to point out that’s very important to start to pull this information together is we can have multiple physical models that correspond to one logical model as well. Quite often you may have a logical model but you may have physical models on different platforms and that type of thing. Maybe one’s a SQL Server instance of it, maybe another’s an Oracle instance. We have the ability to tie all of that together in the same modeling environment. And there again, what I’ve got here is an actual data warehouse model that can, again, be in the same modeling environment or we can have it in the repository and link it in across different things as well.

What I really wanted to show you on this is some of the other things and other variants of the models that we get into. So when we get into a traditional data model like this we’re used to seeing the typical entities with the columns and the metadata and that type of thing, but that viewpoint varies very quickly when we start to deal with some of these newer NoSQL technologies, or as some people still like to call them, the big data technologies.

So now let’s say we’ve also got Hive in our environment. If we reverse engineer from a Hive environment – and we can forward and reverse engineer from Hive with this exact same modeling tool – we see something that’s a little bit different. We still see all the data as constructs there, but our TDL’s different. Those of you that are used to seeing SQL, what you would see now is Hive QL, which is very SQL-like but out of the same tool you’re now able to start working with the different scripting languages. So you can model in this environment, generate it out into the Hive environment, but just as importantly, in the scenario that I’ve described, you can reverse engineer it all in and make sense of it and start to stitch it together as well.

Let’s take another one that’s a little bit different. MongoDB is another platform that we support natively. And when you start getting into the JSON types of environments where you have document stores, JSON’s a different animal and there are constructs in that, that do not correspond to relational models. You soon start to deal with concepts like embedded objects and embedded arrays of objects when you start to interrogate the JSON and those concepts don’t exist in the traditional relational notation. What we’ve done here is we’ve actually extended the notation and our catalog to be able to handle that in the same environment.

If you look over in the left here, instead of seeing things like entities and tables, we are calling them objects. And you see different notations. You still see the typical types of reference notations here, but these blue entities that I’m showing in this particular diagram are actually embedded objects. And we show different cardinalities. The diamond cardinality means that it’s an object on the one end, but the cardinality of one means that we have, within the publisher if we follow that relationship, we have an embedded address object. In interrogating the JSON we’ve found it’s exactly the same structure of objects that’s embedded in the patron, but that’s actually embedded as an array of objects. We’re seeing that, not only through the connectors themselves, but if you look on the actual entities you’ll see that you see addresses under patron that’s also classified it as an array of objects. You get a very descriptive viewpoint of how you can bring that in.

And again, now what we’ve seen so far in just a few seconds is traditional relational models that are multi-level, we can do the same thing with Hive, we can do the same thing with MongoDB, and other big data sources as well. What we can also do, and I’m just going to show you this very quickly is, I talked about the fact of bringing things in from other different areas. I’m going to assume I’m going to import a model from a database or reverse engineer it, but I’m going to bring it in from external metadata. Just to give you a very quick viewpoint of all the different types of things that we can start to bring in.

As you see, we have a myriad of things that we can actually bring the metadata into our modeling environment with. Starting with things like even Amazon Redshift, Cassandra, a lot of the other big data platforms, so you see a lot of these listed. This is in alphabetical order. We’re seeing a lot of big data sources and that type of thing. We’re also seeing a lot of traditional or older modeling environments that we can actually bring that metadata through. If I go through here – and I’m not going to spend time on every one of them – we see a lot of different things that we can bring it in from, in terms of modeling platforms and data platforms. And something that’s very important to realize here is another part that we can do when we start to talk about data lineage, on the Enterprise Team Edition we can also interrogate ETL sources, whether it’s things like Talend or SQL Server Information Services mappings, we can actually bring that in to start our data lineage diagrams as well and draw a picture of what’s happening in those transformations. Altogether out of the box we have over 130 of these different bridges that are actually part of the Enterprise Team Edition product, so it really helps us to pull together all artifacts into one modeling environment very quickly.

Last but not least, I also want to talk about the fact that we can’t lose sight of the fact that we need the other types of constructs if we’re doing data warehousing or any types of analytics. We still want to have the ability to do things like dimensional models where we have fact tables and we have dimensions and those types of things. One thing I want to show you as well is we can also have extensions to our metadata that help us to categorize what are the types of dimensions and everything else. So if I look at the dimensional data tab here, for instance, on one of these, it will actually automatically detect, based on the model pattern that it sees, and give you a starting point as to whether it thinks it’s a dimension or a fact table. But beyond that, what we can do is within the dimensions and that type of thing we even have different types of dimensions that we can use to classify the data in a data warehousing type of environment as well. So very powerful capabilities that we’re stitching this altogether with.

I’m going to jump into this one since I’m in the demo environment right now and show you a couple of other things before I jump back to the slides. One of the things that we’ve recently added to ER/Studio Data Architect is we’ve run into situations – and this is a very common use case when you’re working on projects – developers think in terms of objects, whereas our data modelers tend to think in terms of tables and entities and that type of thing. This is a very simplistic data model, but it represents a few basic concepts, where the developers or even business analysts or business users, might think of them as different objects or business concepts. It’s been very difficult to classify these until now but what we’ve actually done in ER/Studio Enterprise Team Edition, in the 2016 release, is we now have a concept called Business Data Objects. And what that allows us to do is it allows us to encapsulate groups of entities or tables into true business objects.

For instance, what we’ve got here on this new view is the Purchase Order header and Order Line have been pulled together now, they’re encapsulated as an object, we would think of them as a unit of work when we persist the data, and we bring them together so it’s now very easy to relate that to different audiences. They’re reusable throughout the modeling environment. They are a true object, not just a drawing construct, but we also have the added benefit that when we’re actually communicating from modeling perspective we can selectively collapse or expand them so we can produce a summarized view for dialogues with certain stakeholder audiences, and we can also, of course, keep the more detailed view like we’re seeing here for more of the technical audiences. It really gives us a really good vehicle of communication. What we see now is combining multiple different model types, augmenting them with the concept of business data objects, and now I’m going to talk about how we actually apply some more meaning to these types of things and how we stitch them together in our overall environments.

I’m just trying to find my WebEx back here so that I am able to do that. And there we go, back to the Hot Tech slides. I’m just going to fast forward a few slides here because you’ve already seen these in the model demonstration itself. I want to talk about naming standards very quickly. We want to apply and enforce different naming standards. What we want to do is, we have the capability to actually store naming standards templates in our repositories to basically build that meaning through, through words or phrases or even abbreviations, and tie them back to a meaningful English type of word. We’re going to use business terms, the abbreviations for each, and we can specify the order, the cases and add prefixes and suffixes. The typical use case for this is typically when people have been building a logical model and then actually going forward to create a physical model where they might have been using abbreviations and everything else.

The beautiful thing is it’s just as powerful, even more powerful in reverse, if we can just tell what some of those naming standards were on some of those physical databases that we’ve reverse engineered, we can take those abbreviations, turn them into longer words, and bring them backwards into English phrases. We actually now can derive proper names for what our data looks like. Like I say, the typical use case is we would move forward, logical to physical, and map the data stores and that type of thing. If you look at the screenshot on the right-hand side, you’ll see that there are abbreviated names from the source names and when we’ve applied a naming standards template, we’ve actually got more full names. And we could put spaces and everything like that in if we want to, depending on the naming standards template we used. We can make it look however we want it to look to bring into our models. Only when we know what something is called can we actually start to attach definitions to it, because unless we know what it is, how can we apply a meaning to it?

The nice thing is, is we can actually invoke this when we’re doing all kinds of things. I talked about reverse engineering, we can actually invoke naming standards templates simultaneously when we’re doing reverse engineering. So in one set of steps through a wizard, what we’re able to do is, if we know what the conventions are, we can reverse engineer a physical database, it’s going to bring it back as physical models in a modeling environment and it’s also going to apply those naming conventions. So we will see what the English-like representations of names are in the corresponding logical model in the environment. We can also do it and combine it with XML Schema generation so we can take a model and even push it out with our abbreviations, whether we’re doing SOA frameworks or that type of thing, so we can then also push out different naming conventions that we actually have stored in the model itself. It gives us a lot of very powerful capabilities.

Again, here’s an example of what it would look like if I had a template. This one is actually showing that I had EMP for “employee,” SAL for “salary,” PLN for “plan” in a naming standards convention. I can also apply them to have them running interactively as I’m building out models and putting things in. If I was using this abbreviation and I typed in “Employee Salary Plan” on the entity name, it would act with the naming standards template I have defined here, it would have given me EMP_SAL_PLN as I was creating the entities and given me the corresponding physical names right away.

Again, very good for when we’re designing and forward engineering as well. We have a very unique concept and this is where we really start to bring these environments together. And it’s called Universal Mappings. Once we’ve brought all of these models into our environment, what we’re able to do, assuming that we’ve now applied these naming conventions and they’re easy to find, we can now use a construct called Universal Mappings in ER/Studio. We can link entities across models. Wherever we see “customer” – we’ll probably have “customer” in a lot of different systems and a lot of different databases – we can start to link all of those together so that when I’m working with it in one model I can see where are the manifestations of customers in the other models. And because we’ve got the model layer representing that, we can even tie it in to data sources and bring it down to in our where used inquiries into which databases do these reside in as well. It really gives us an ability to tie all this together very cohesively.

I’ve showed you the business data objects. I also want to talk about the metadata extensions, which we call Attachments, very quickly. What that does is it gives us the ability to create additional metadata for our model objects. Quite often I would set up these types of properties to drive a lot of different things out from a data governance and data quality perspective, and also to help us with master data management and data retention policies. The basic idea is you create these classifications and you can attach them wherever you want to, at the table level, column level, those types of things. The most common use case, of course, is that entities are tables, and then I can define: what are my master data objects, what are my reference tables, what are my transactional tables? From a data quality perspective I can do classifications in terms of importance to the business so that we can prioritize data cleansing efforts and that type of thing.

Something that is often overlooked is, what is the retention policy for different types of data in our organization? We can set these up and we can actually attach them to the different types of information artifacts in our modeling environment and, of course, our repository as well. The beauty is, is these attachments live in our data dictionary so when we’re utilizing enterprise data dictionaries in the environment, we can attach them to multiple models. We only have to define them once and we can leverage them over and over again across the different models in our environment. This is just a quick screenshot to show that you can actually specify when you do an attachment, what all the pieces are that you want to attach it to. And this example here is actually a list of values, so when they’re going in you can pick from a list of values, you have a lot of control in the modeling environment of what’s being picked, and you can even set what the default value is if a value isn’t picked. So a lot of power there. They live in the data dictionary.

Something I want to show you a little further down on this screen, in addition you see the attachments kind of showing up in the top part, underneath it you see data security information. We can actually apply data security policies to the different pieces of information in the environment as well. For different compliance mappings, data security classifications, we ship a number of them out of the box that you can just generate and start to use, but you can define your own compliance mappings and standards as well. Whether you’re doing HIPAA or any of the other initiatives out there. And you can really start to build up this very rich set of metadata in your environment.

And then the Glossary and Terms – this is where the business meaning is tied in. We quite often have data dictionaries out there that quite often an organization are using as a starting point to start to drive out glossaries, but the terminology and the verbiage is often very technical. So what we can do is we can, if we wish, use those as a starting point to drive out glossaries, but we really want the business to own these. What we’ve done in the team server environment is we’ve given the ability for people to create business definitions and then we can link them to the different model artifacts that they correspond to in the modeling environment as well. We also recognize the point that was discussed earlier which is, the more people you have typing, the more potential there is for human error. What we also do in our glossary structure is, one, we do support a hierarchy of glossary, so we can have different glossary types or different types of things in the organization, but just as importantly, is if you already have some of these sources out there with the terms and everything defined, we can actually do a CSV import to pull these into our modeling environment, and our team server or our glossary as well, and then start linking from there. And every time something is changed there’s a full audit trail of what the before and after images were, in terms of the definitions and everything else, and what you’re going to see coming in the very near future is also more of an authorization workflow so we can really control who’s in charge of it, approvals by committees or individuals, and that type of thing, to make the governance process even more robust as we go forward.

What this also does for us is when we have these glossary terms in our team server glossary, this is an example of editing in an entity in the model itself that I’ve brought up here. It may have linked terms, but what we also do is if there are words that are in that glossary that appear in the notes or descriptions of what we have in our entities here, those are automatically shown in a lighter hyperlinked color, and if we mouse over them, we can actually see the definition from the business glossary as well. It even gives us richer information when we’re consuming the information itself, with all the glossary terms that are out there. It really helps to enrich in the experience and apply the meaning to everything that we’re working with.

So, again, that was a very quick flyby. Obviously we could spend days on this as we delve into the different parts, but this is a very quick flyby over the surface. What we’re really striving to do is understand what those complex data environments look like. We want to improve the visibility of all of those data artifacts and collaborate to drive them out with ER/Studio. We want to enable more efficient and automated data modeling. And that’s all types of data that we’re talking about, whether it’s big data, traditional relational data, document stores or anything else. And again, we accomplished that because we have powerful forward and reverse engineering capabilities for the different platforms and the other tools that you may have out there. And it’s all about sharing and communicating across the organization with all the stakeholders that are involved. That’s where we apply meaning through naming standards. We then apply definitions through our business glossaries. And then we can do further classifications for all of our other governance capabilities with the metadata extensions such as data quality extensions, classifications for master data management, or any other types of classifications that you want to apply to that data. And then we can summarize further and enhance communication even more with the business data objects, with the different stakeholder audiences, particularly between modelers and developers.

And again, what’s very important about this is, behind it all is an integrated repository with very robust change management capabilities. I had no time to show it today because it gets fairly complex, but the repository has very robust change management capabilities and audit trails. You can do named releases, you can do named versions, and we also have the capability for those of you that are doing change management, we can tie that right into your tasks. We have the ability today to put tasks in and associate your model changes with tasks, just like developers would associate their code changes with the tasks or user stories that they’re working with as well.

Again, that was a very quick overview. I hope it’s been enough to whet your appetite so that we can engage in much deeper conversations on splitting out some of these topics as we go forward in the future. Thank you for your time, and back to you, Rebecca.

Rebecca Jozwiak: Thanks, Ron, that was fantastic and I have quite a few questions from the audience, but I do want to give our analysts a chance to sink their teeth into what you’ve had to say. Eric, I’m going to go ahead and perhaps if you want to address this slide, or a different one, why don’t you go ahead first? Or any other question.

Eric Little: Sure. Sorry, what was the question, Rebecca? You want me to ask something specific or…?

Rebecca Jozwiak: I know you had some questions initially for Ron. If you want to ask now for him to address any of those, or some of them off your slide or anything else that piqued your interest that you want to ask about? About IDERA’s modeling functionalities.

Eric Little: Yeah, so one of the things, Ron, so how do you guys, it looks like the diagrams that you were showing are general kinds of entity relationship diagrams like you would normally use in database construction, correct?

Ron Huizenga: Yeah, generally speaking, but of course we have the extended types for the document stores and that type of thing as well. We’ve actually varied from just pure relational notation, we’ve actually added additional notations for those other stores as well.

Eric Little: Is there a way that you guys can utilize graph-based types of modelings, so is there a way to integrate, for example, let’s suppose that I have something like a top quadrant, TopBraid composer tool or I’ve done something in Protégé or, you know, like the financial guys in FIBO, they’re doing a lot of work in semantics, RDF stuff – is there a way to bring in that type of entity-relationship graph type modeling into this tool, and utilize it?

Ron Huizenga: We’re actually looking at how we can handle graphs. We’re not explicitly handling graph databases and that type of thing today, but we’re looking at ways that we can do that to extend our metadata. I mean, we can bring things in through XML and that type of thing right now, if we can at least do some kind of a rendition of XML to bring it in as a starting point. But we’re looking at more elegant ways to bring that in.

And I also showed you that list of reverse engineering bridges that we have there as well, so we’re always looking at getting extensions to those bridges for specific platforms as well. It’s a continually, ongoing effort, if that makes sense, to start to embrace a lot of these new constructs and the different platforms out there. But I can say that we’re definitely at the forefront of doing that. The stuff I showed on, for instance, MongoDB and that type of thing, we’re the first data modeling vendor to actually do that natively in our own product.

Eric Little: Okay, yeah. So the other question I had for you, then, was in terms of the governance and the maintaining of the – like when you used the example, when you showed the example of the person who’s an “employee,” I believe it was a “salary” and then you have a “plan,” is there a way, how do you manage, for example, different kinds of people that may have – let’s suppose you have a large architecture, right, let’s suppose you have a large enterprise and people start pulling things together in this tool and you’ve got one group over here that has the word “employee” and one group over here that has the word “worker.” And one person over here says “salary” and another person says “wage.”

How do you guys reconcile and manage and govern those kinds of distinctions? Because I know how we would do it in the graph world, you know, you would use synonym lists or you would say there’s one concept and it has several attributes, or you could say in the SKOS model I have a preferred label and I have numerous alternate labels that I can use. How do you guys do that?

Ron Huizenga: We do it in a couple of different ways, and primarily let’s talk about the terminology first. One of the things that we do, of course, is we want to have the defined or sanctioned terms and in the business glossary obviously is where we want them. And we do allow links to synonyms in the business glossary as well so what you can do is you can say, here’s my term, but you can also define what all the synonyms for those are.

Now the interesting thing, of course, is when you start looking across this vast data landscape with all these different systems that you’ve got out there, you can’t just go out there and change the corresponding tables and those types of things to correspond to that naming standard because it may be a package that you bought, so you have no control over changing the database or anything at all.

What we could do there, in addition to being able to associate the glossary definitions, is through those universal mappings that I talked about, what we would do, and kind of a recommended approach, is to have an overarching logical model that lays out what these different business concepts are that you’re talking about. Tie the business glossary terms into those, and the nice thing is now that you’ve got this construct that represents a logical entity as it were, you can then start to link from that logical entity to all of the implementations of that logical entity in the different systems.

Then where you need to [inaudible] on that, you can see, oh, “person” here is called “employee” in this system. “Salary” here is called “wage” here in this other system. Because you’ll see that, you’ll see all the different manifestations of those because you’ve linked them together.

Eric Little: Okay great, yeah, got it. In that sense, is it safe to say that’s kind of like some of the object-oriented approaches?

Ron Huizenga: Somewhat. It’s a little more intensive that than, I guess you could say. I mean, you could take the approach of manually linking and going through and inspecting and doing all of them as well. The one thing I didn’t really have a chance to talk about – because again, we have a lot of capabilities – we also have a full automation interface in the Data Architect tool itself. And a macro capability, which is really a programming language in the tool. So we can actually do things like write macros, have it go out and interrogate things and create links for you. We use it for importing and exporting information, we can use it for changing things or adding attributes, event based in the model itself, or we can use it to run in batches to actually go out and interrogate things and actually populate different constructs in the model. So there’s a full automation interface that people can take advantage of as well. And utilizing the universal mappings with those would be a very powerful way to do that.

Rebecca Jozwiak: Okay, thanks Ron, and thanks Eric. Those were great questions. I know we’re running a little bit past the top of the hour, but I’d like to give Malcolm a chance to toss some questions Ron’s way. Malcolm?

Malcolm Chisholm: Thanks, Rebecca. So, Ron, it’s very interesting, I see there’s a lot of capabilities here. One of the areas that I’m very interested in is, say if we have a development project, how do you see the data modeler using these capabilities and working maybe more collaboratively with business analysts, with a data profiler, with a data quality analyst, and with the business sponsors who are ultimately going to be responsible for the actual information requirements in the project. How does the data modeler really, you know, make the project more effective and efficient with the capabilities we’re looking at?

Ron Huizenga: I think that one of the first things that you have to do there is as a data modeler – and I don’t mean to pick on some of the modelers, but I will anyway – is some folks still have the impression that the data modeler is really the gatekeeper type of role of like, we’re defining how it works, we’re the guards that make sure that everything is correct.

Now there is an aspect of that, that you have to make sure that you’re defining a sound data architecture and everything else. But the more important thing is as a data modeler – and I found this quite a bit obviously when I was consulting – is you have to become a facilitator, so you have to pull these people together.

It’s not going to be a design up front, generate, build databases anymore – what you need to be able to do is you need to be able to work with all these different stakeholder groups, doing things like reverse engineering, importing information in, having other people collaborate, whether it’s on the glossaries or documentation, everything like that – and be a facilitator to pull this into the repository, and link the concepts together in the repository, and work with those people.

It really is much more of a collaborative type of environment where even through definition of tasks or even discussion threads or that type of thing that we have in team server, that people can actually collaborate, ask questions and arrive at the final end products that they need for their data architecture and their organization. Did that sort of answer?

Malcolm Chisholm: Yeah, I agree. You know, I think that the facilitation skill is something that’s really highly desirable in data modelers. I agree that we don’t always see that, but I think that’s necessary and I would suggest that there is an inclination sometimes to stay in your corner doing your data modeling, but you really need to be out there working with the other stakeholder groups or you just don’t understand the data environment that you’re dealing with, and I think the model suffers as a result. But that’s just my opinion.

Ron Huizenga: And it’s interesting because you mentioned something earlier in your slide about the history about how businesses are kind of turned away from IT because they weren’t delivering the solutions in a timely fashion and those types of things.

It’s very interesting that in my later consulting engagements, prior to becoming a product manager, most of the projects that I did in the last two years before that, were business sponsored, where it was really the business that was driving it and the data architects and modelers were not a part of IT. We were a part of a business-sponsored group and we were there as facilitators working with the rest of the project teams.

Malcolm Chisholm: So I think that’s a very interesting point. I think we’re starting to see a shift in the business world where the business is asking, or thinking maybe, not so much as what do I do, being process like, but they’re also starting to think about what is the data that I work with, what are my data needs, what is the data I’m dealing with as data, and to what extent can we get IDERA products and capabilities to support that viewpoint, and I think that the [inaudible] needs of the business, even though it’s kind of still a little bit nascent.

Ron Huizenga: I agree with you and I think we’re seeing it go more and more that way. We’ve seen an awakening and you touched on it earlier in terms of the importance of data. We saw the importance of data early in IT or in the evolution of databases, then as you say, we kind of got into this whole process management cycle – and process is extremely important, don’t get me wrong there – but now what’s happened is when that happened, data kind of lost focus.

And now organizations are realizing that data really is the focal point. Data represents everything that we’re doing in our business so we need to make sure that we have accurate data, that we can find the correct information that we need to make our decisions. Because not everything comes from a defined process. Some of the information is a byproduct of other things and we still need to be able to find it, know what it means, and be able to translate the data that we see there ultimately into knowledge that we can use to drive our businesses better.

Malcolm Chisholm: Right, and now another area I’ve been struggling with is what I would call the data life cycle which is, you know, if we look at the sort of data supply chain going through an enterprise, we’d start with data acquisition or data capture, which might be the data entry but it might equally be, I’m getting data from outside the enterprise from some data vendor.

And then from data capture we go to data maintenance where I’m thinking about standardizing this data and shipping it around to places where it’s needed. And then data use, the actual points where data is, you’re going to get value out of the data.

And in the old days this is all done in one individual style, but today it might be, you know, an analytics environment, for instance, and then beyond that, an archive, a store, where we put the data when we no longer need it and finally a purge kind of process. How do you see data modeling fitting into the management of this entire data life cycle?

Ron Huizenga: That’s a very good question and one thing I really didn’t have time to delve in into any detail here today at all, is what we’re really starting to talk about is data lineage. So what we are actually able to do is we have a data lineage capability in our tools and, like I say, we can actually extract some of it from ETL tools, but you can also map it just by drawing the lineage as well. Any of these data models or databases that we’ve captured and brought into models we could reference the constructs from that in our data lineage diagram.

What we’re able to do is draw a data flow, like you say, from source to target, and through the overall life cycle of how that data transits through the different systems and what you’re going to find is, let’s take employees’ data – it’s one of my favorites based on a project that I did years ago. I worked with an organization that had employee data in 30 different systems. What we ended up doing there – and Rebecca’s popped up the data lineage slide – this is a fairly simplistic data lineage slide here, but what we were able to do was bring in all the data structures, reference them in the diagram, and then we can actually start to look at what are the flows between, and how are those different data entities linked together in a flow? And we can go beyond that as well. This is part of a data flow or lineage diagram that we see here. If you want to go beyond that we also have the business architect part of this suite and same thing applies there.

Any of the data structures that we have captured in the data modeling environment, those can be referenced in the business modeling tool so that even in your business model diagrams or your business process diagrams, you can reference individual data stores if you wish to out of the data modeling environment, and while you’re using them in the folders in your business process model, you can even specify the CRUD on those as well, as to how that information is either consumed or produced, and then we can start to generate things like impact and analysis reports and diagrams out of that.

What we’re aiming at getting to, and we have a lot of capabilities already, but one of the things that we kind of have as kind of a goalpost that we’re looking at, as we continue to evolve our tools going forward, is being able to map out that end-to-end, organizational data lineage and the full life cycle of data.

Malcolm Chisholm: Okay. Rebecca, am I allowed one more?

Rebecca Jozwiak: I will allow you one more, Malcolm, go ahead.

Malcolm Chisholm: Thank you so much. Thinking about data governance and thinking about data modeling, how do we get those two groups to work effectively together and understand each other?

Eric Little: Well it’s interesting, I think it really depends on the organization, and it goes back to my earlier concept is, in those organizations where the initiatives were business driven we were tied right in. For instance, I was leading a data architecture team but we were tied right in with the business users and we were actually helping them to stand up their data governance program. Again, more of a consultative approach but it’s really more of a business function.

What you really need to be able to do that is you need data modelers and architects that really understand business, can relate to the business users and then have helped them stand up the governance processes around it. The business wants to do it, but generally speaking we have the technology knowledge to be able to help them to stand out those types of programs. It really does have to be a collaboration, but it does need to be business owned.

Malcolm Chisholm: Okay, that’s great. Thank you.

Dr. Eric Little: Okay.

Rebecca Jozwiak: Okay, thanks so much. Audience members, I’m afraid we did not get to your questions, but I will make sure that they do get forwarded to the appropriate guest we had on the line today. I want to thank you so much to Eric, Malcolm and Ron for being our guests today. This was great stuff, folks. And if you enjoyed today’s IDERA webcast, IDERA is also going to be on a Hot Technologies next Wednesday if you want to join, discussing the challenges of indexing and Oracles, so another fascinating topic.

Thank you so much, folks, take care, and we’ll see you next time. Bye bye.