Data Warehouse Glossary
70
BI - Business Intelligence
Business Process
A business process is basically a set of related activities. Business processes are roughly classified by the topics of interest to the business. To extract a candidate list of high potential business processes necessitates prioritization of requirements. Examples of business processes are customers, profit, sales, organizations, and products.
When we refer to a business process, we are not simply referring to a business department. For example, consider a scenario where the sales and marketing department access the orders data. We build a single dimensional model to handle orders data rather than building separate dimensional models for the sales and marketing departments. Creating dimensional models based on departments would no doubt result in duplicate data. This duplication, or data redundancy, can result in many data quality and data consistency issues.
Consolidation
Synonyms: Roll-up, Aggregate
See: Formula, Hierarchical Relationships, Children, Parents
Cube
Synonyms: Array, Hypercube, Multi-dimensional Structure
Data mart
Whereas a data warehouse combines databases across an entire enterprise, data marts are usually smaller and focus on a particular subject or department. Some data marts, called dependent data marts, are subsets of larger data warehouses.
The data mart typically contains a subset of corporate data that is valuable to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captured from transaction processing systems (called independent data marts), or from an existing enterprise data warehouse (called dependent data marts). It is important to realize that the functional scope of the data mart's users defines the data mart, not the size of the data mart database.
Data Mining
Data Warehouse
Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining. Although data warehouses are built on relational database technology, the design of a data warehouse database differs substantially from the design of an online transaction processing system (OLTP) database.
Contrast with Datamart.
Dimension
A dimension is sometimes referred to as an axis for business analysis. Time, Location and Product are the classic dimensions.
A dimension is a structural attribute of a cube that is a list of members, all of which are of a similar type in the user's perception of the data. For example, all months, quarters, years, etc., make up a time dimension; likewise all cities, regions, countries, etc., make up a geography dimension.
A dimension table is one of the set of companion tables to a fact table and normally contains attributes or (fields) used to constrain and group data when performing data warehousing queries.
Dimensions correspond to the "branches" of a star schema.
DW Components
The DW components differ not only by content of data but also by the way they store the data and by whom it can be accessed.
Staging area: For handling data extracted from source systems. There can be data transformations at this point and/or as the data is loaded into the data warehouse. The structure of the staging area depends on the approach and tools used for the extract, transform, and load (ETL) processes. The data model design affects not only performance, but also scalability and ability to process new data without recreating the entire model.
Data warehouse: This is the area, also called the system of record (SOR), that contains the history data in 3NF and is typically not accessed for query and analysis. Use it for populating the summary area, analytical areas, and the dependent data marts.
Summary area: This area contains aggregations. Structures are usually derived from the data warehouse where one or more attributes are at the higher grain (less detail) than in the data warehouse. These are constructed for high performance data analysis where low level detail is not required.
Analytical area: Contains multidimensional (MD) structures, such as the star schema, snowflakes, or multi-star schemas, constructed for high performance data analysis.
E/R modeling
E/R modeling is a design technique in which we store the data in highly normalized form inside a relational database.
The E/R model basically focuses on three things, entities, attributes, and relationships. An entity is any category of an object in which the business is interested. Each entity has a corresponding business definition, which is used to define the boundaries of the entity - allowing you to decide whether a particular object belongs to that category or entity.
A disadvantage of an E/R model is that it is not as efficient when performing very large queries involving multiple tables. In other words, an E/R model is good at INSERT, UPDATE, or DELETE processing, but not as good for SELECT processing.
EDW (Enterprise Data Warehouse)
An enterprise data warehouse is one that will support all, or a large part, of the business requirement for a more fully integrated data warehousing environment that has a high degree of data access and usage across departments or lines of business. That is, the data warehouse is designed and constructed based on the needs of the business as a whole. Consider it a common repository for decision-support data that is available across the entire organization, or a large subset of that data.
We use the term Enterprise here to reflect the scope of data access and usage, not the physical structure.
The enterprise data warehouse may also be called a Hub and Spoke data warehouse implementation if the control is logically centralized even if the data is spread out and physically distributed
ETL - Extract, transform, and load
ETL is a process that involves
- extracting data from outside sources,
- transforming it to fit business needs, and ultimately
- loading it into the data warehouse.
It is the way data actually gets loaded into the data warehouse.
Fact Table
A fact table consists of the measurements, metrics or facts of a business process.
The Fact Table is located at the centre of a star schema, surrounded by dimension tables.
Formula
A formula is a virtual hypercube calculated on the fly from other measures. It is generally not stored in the database. For example tThe formula Average Price is dimensioned according to Time and Style and has a decimal type.
From the user's point of view there is no difference between a stored measure and a formula. Both are defined by dimensions and by type.
Grain
Hierarchy
The positions of a dimension organised according to a series of cascading one to many relationships. This way of organizing data is comparable to a logical tree, where each member has only one parent but a variable number of children.
For example the positions of the Time dimension might be months, but also days, periods or years.
Hierarchical Level
In a hierarchy, positions are classified into levels. All the positions for a level correspond to a unique classification. For example, in a "Time" dimension, level one stands for days, level two for months and level three for years.
KPI
Measure
Measure is a member with a numeric value. It is a business indicator or KPI (Key Performance Indicator) which is dimensioned by the axis of analysis. For example a measure Quantity could have the dimensions Time and Brand.
Member
A dimension member is a discrete name or identifier used to identify a data item's position and description within a dimension. For example, January 1989 or 1Qtr93 are typical examples of members of a Time dimension. Wholesale, Retail, etc., are typical examples of members of a Distribution Channel dimension.
Synonyms: Position, Item, Attribute
Multidimensional Modeling
The dimensional modeling approach provides a way to improve query performance for summary reports without affecting data integrity. A dimensional database generally requires much more space than its relational counterpart.
The technique uses in particular the modeling of a particular data construct known as a star schema or its derived form known as a snowflake schema.
Multi-Star Schema
A multi-star model is a dimensional model that consists of multiple fact tables, joined together through dimensions.
Navigation
Navigation is a term used to describe the processes employed by users to explore a cube interactively by drilling, rotating and screening, usually using a graphical OLAP client connected to an OLAP server.
ODS (Operational Data Store)
An operational data store (ODS) is a type of database often used as an interim area for a data warehouse (ODSs are commonly used to populate data warehouses and data marts).
It can be used also as a set of integrated, scrubbed data without history or summarization provided for tactical decision support.
Unlike a data warehouse, which contains static data, the contents of the ODS are updated through the course of business operations.
An ODS is designed to quickly perform relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the data warehouse.
An ODS is similar to your short term memory in that it stores only very recent information; in comparison, the data warehouse is more like long term memory in that it stores relatively permanent information.
OLAP Database
Online analytical processing (OLAP) is a technology designed to provide superior performance for ad hoc business intelligence queries. OLAP is designed to operate efficiently with data organized in accordance with the common dimensional model used in data warehouses.
OLTP database
OLTP systems are designed to meet the day-to-day operational needs of the business, and the database performance is tuned for those operational needs. Consequently, the database can retrieve a small number of records quickly, but it can be slow if you need to retrieve a large number of records and summarize data on the fly.
OLTP systems are almost exclusively associated with E/R data modeling.
Page Dimension
A page dimension is generally used to describe a dimension which is not one of the two dimensions of the page being displayed, but for which a member has been selected to define the specific page requested for display. All page dimensions must have a specific member chosen in order to define the appropriate page for display.
Position
Scoping
Restricting the view of database objects to a specified subset. Further operations, such as update or retrieve, will affect only the cells in the specified subset. For example, scoping allows users to retrieve or update only the sales data values for the first quarter in the east region, if that is the only data they wish to receive.
Slice
A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.
For example, if the member Actuals is selected from the Scenario dimension, then the sub-cube of all the remaining dimensions is the slice that is specified.
The data omitted from this slice would be any data associated with the non-selected members of the Scenario dimension, for example Budget, Variance, Forecast, etc. From an end user perspective, the term slice most often refers to a two- dimensional page selected from the cube.
Snowflake Schema
In a star schema each dimension is typically stored in one table; the snowflake design principle expands a dimension and creates tables for each level of a dimensional hierarchy.
For example, a Region dimension may contain the levels Street, City, State and Country. In a star schema, all these attributes would be stored in one table, in a snowflake schema one would expand the schema and a designer might add city and state secondary tables.
Source Field
The database field that data is extracted from to be populated into a data warehouse or data mart system.
Subject Area
A data warehouse is subject-oriented. It is oriented to specific selected subject areas in the organization, such as customer and product.
Star (Join) Schema
Arrangement of data in a relational database. In the middle is the fact table, whose columns constitute the multidimensional measures. The branches of the star, which radiate from the fact table, correspond to the dimensions. The conceptual data model represents this star schema.
Staging Area
The staging area is the place where the extracted and transformed data is placed in preparation for being loaded into the data warehouse.
It contains a collection of data - extracted from OLTP systems - primarily in 3NF, and represented by an E/R model. However, the staging area may also contain denormalized models.
In case of independent data warehouse architecture, there are separate staging areas for each data mart and therefore no sharing of data between these disparate staging areas.
Target Field
A field in a data warehouse or data mart system that is to be populated with data from a source system.
Variable
Synonym of Measure.
PrintShare it! — Rate it: up down flag this hub








