Data Warehousing 101
Data warehousing provides a solid foundation for consolidating historical, current and future data, allowing an organization to generate reports, conduct advanced analysis and do some data mining.
Many businesses continuously collect large amounts of data. But in order to use that information, a functional set of processes and procedures must be put in place to make sense of it.
Whether you’re a data warehouse developer or you're hearing the term data warehousing for the first time, understanding the basics of data warehousing – including what it means, how it's used and the benefits it can provide – is essential.
Once data is properly analyzed, it can be used to create a clearer picture of the positive and negative impacts that common trends and patterns have on an enterprise. That sounds simple enough, but ensuring that data is useful is one of the major challenges in data warehousing.
What Is Data Warehousing?
A data warehouse is a centralized storage unit (database) that defines and assembles data and all its in-depth details. These details might include information pertaining to an organization's customer base, service providers, suppliers, transactions or business processes through the use of an integrated data model. (To learn about business process management, see BPM and SOA: How They Drive Business.)
Data warehousing pulls data from various sources that are made available across an enterprise; this data can then be analyzed in a variety of different ways. A data warehouse is an integrated, nonvolatile, time-variant and subject-oriented collection of information. What this means is that a data warehouse should achieve the following goals:
- Capture and deliver access to business metadata
- Improve data quality and minimize generated report inconsistencies
- Integrate data from many different sources and provide for data sharing
- Increase the speed and performance of all reporting needs by merging historical and current data effectively and efficiently
Types of Data
A data warehouse provides enhanced business intelligence techniques by taking data from various sources and allowing business users to quickly access critical data from one shared location. The type of data collected in a data warehouse is subject-oriented, integrated and identified or synchronized within a specific time period.
When it comes to data warehousing, there are four key types of data:
Nonvolatile data delivers operational updates that are not made available in the data warehouse environment. This is a separate unit of data that is transformed by the operational environment of a data warehouse. Nonvolatile data needs to be loaded and accessed initially, but it does not require any control mechanisms, recovery or transaction processing.
Time-variant data ensures that all information stored in the data warehouse is current and generated in real time. All key structures in the data warehouse contain an element of time by providing information from a horizon perspective, such as the past five to 10 years.
Subject-oriented data is organized based on a business’s major subject categories, such as customers, sales, products and services. Subject orientation provides a simple and concise assessment of specified subject issues by focusing on the model and analyses of data that will be used by the organization’s key decision makers.
Integrated data is made up of multiple, mixed sources, such as relational databases, online transaction records and flat files. Once the specified sources have integrated successfully, data cleaning is applied. This ensures consistency in attribute measures, naming conventions, encoding structures and key terms through data conversion.
Business Analytics and Report Generation
A data warehouse is based on multidimensional data modeling. A multidimensional data model creates a variety of different views in the form of a data cube, which allows data to be moderated and viewed through multiple dimensions. A data warehouse is one of the first steps used when an organization expands and evolves. It is primarily used when a company decides to start investing in business analysis. Business analysis takes a variety of technological applications and procedures used by the corporation to locate and identify business needs and improvement based on statistical data.
Business analytics helps organizations discover and recognize patterns that can be used to predict, shape and improve business outcomes. However, it's the results gathered through this process that really count, because these are used to propose the creation, implementation and management of new strategies. (For background reading, check out An Introduction to Business Intelligence.)
Business analytic solutions take quantitative and statistical fact-based data to evaluate past performance and prepare for future business planning and alternatives. Business data collection is typically generated by machines or applications through the use of statistical software. That's why many companies use statistical software to make improvements based on analytics.
Statistical Software and Business Intelligence
Statistical software is also referred to as business intelligence (BI) software. For many companies, there is no specific software selection process, while others abide by a corporate standard or have a database or reporting tool already in place that just needs to be activated. The process used when selecting the appropriate analytical software begins with creating a BI strategy and complying with the overall business requirements already set in place.
Business managers and analysts play a large role in selecting the appropriate software and ensuring that their business analysis techniques will start them off in the right direction. Businesses like Amazon are known to track trends in purchasing behavior among customers in order to figure out price ranges that the target market is most comfortable with. Businesses are then able to effectively decide competitive price rates without causing too much of an impact on their overall profit margin. Without a predefined BI strategy, it is common that the type of software purchased will not provide an organization with the appropriate customization abilities it needs.
Data mining involves digging deep into data to produce useful insights to make evidence and fact-based decisions. In technical terms, data mining can be used to find correlations or patterns among various fields from within large relational databases. More specifically, it is the process of analyzing information from multiple perspectives and summarizing it into useful data. In a best-case scenario these insights can help a business cut costs, increase sales and influence other key performance indicators.
Data mining is a powerful technology that can be used to discover several different dimensions, categories and relationships that exist among different data sources and records. For example, in the retail sector, data mining could help a company recognize sales patterns and customer behavior, thus allowing them to exploit the information to their advantage. One infamous example is retailer Target's ability to determine which of its shoppers may be expecting, enabling the store to send coupons for baby items at a time when parents tend to begin shopping for them.
Data Warehousing In a Nutshell
When integrating and applying data warehousing techniques, business analytics methodologies allow organizations to enhance their overall business strategies and allow for optimized decision making through the use of BI software. Analytics play a vital role in any organization, and many different procedures, including data mining and other various analytical methods, can be used to support and generate appropriate data collection services and marketing. New opportunities and possibilities are explored through data warehousing techniques by improving customer service, simplifying inventory management, cross-promoting products that cater to individual customer needs, and providing critical product and service analysis.
Data warehousing is what allows organizations to find the answers to complex questions in large sets of data. That's the power of digital data collection and storage.