A data warehouse is a centralized repository that hosts all the data that different of an enterprise business systems collect. This repository can be physical or logical. It represents a megabase, theme most often formed to analyze large volumes of very detailed data, durable, in principle dated, and have been stored and organized (data sourcing) on a powerful computer system.
The goal is to synthesize in order to extract the most relevant critical information and thus facilitate decision-making. Datamining this analysis is to move from a mass of detail to a workable synthesis. The added value of a data warehouse is the quality of the data it contains. It is therefore appropriate to do with supply sufficiently reliable and consistent data.
Data warehouses emphasize the capture of data from diverse sources for all purposes of access and analysis. However, their initial concept ignores the point of view of the end user, it is likely to need access to specialized databases, sometimes local. The latter situation was more a Datamart.
The data warehouse allows two vertical approaches: up and down.
The first approach - bottom - breaks down the constituent data stores from the warehouse to specific user groups, and once the data warehouse created in its entirety.
The second approach - downward - starting with build data stores, and then combines them into a single data warehouse.
In general, the data warehouse is hosted on the mainframe server business, but it is increasingly in the cloud.
Data from different sources, including transactional processing applications (OLTP, OnLine Transaction Processing), are selectively extracted to be used by analytical applications and user requests.
The term data warehouse was created by William H. Inmon, known as the "Father of Data Warehouse." William Inmon describes the latter as an integrated whole, non-volatile, variable over time, and focused on the themes, data coming support decision making of the company management.
The 4 characteristics of Data Warehouse:
- Topic Oriented
At the heart of the data warehouse, data is organized by theme. The specific theme data, such as sales, will return different OLTP production bases and consolidated.
The data comes from disparate sources using each type of format. They are integrated before being offered for use
The data do not disappear and do not change over the treatment over time (Read-Only).
The non-volatile data is also time stamped. One can thus see the evolution in time of a given value.
The detail of the archive is of course on the nature of the data. All data do not deserve to be archived.