Sunday, April 20, 2008

Enterprise data warehousing

Introduction :

A data warehouse is the main repository of an organization's historical data, its corporate memory.

It contains the raw material for management's decision support system.

The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems.

Bill Inmon,an early and influential practitioner, has formally defined a data warehouse in the following terms;

Subject-oriented
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;

Time-variant
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;

Non-volatile
Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and

Integrated
The database contains data from most or all of an organization's operational applications, and that this data is made consistent.

A data warehouse might be used to find the day of the week on which a company sold the most widgets in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001–2005.

While operational systems are optimized for simplicity and speed of modification (see OLTP) through heavy use of db normalization and an entity-relationship model, the data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). Frequently data in data warehouses are heavily denormalised, summarised or stored in a dimension-based model.

However, this is not always required to achieve acceptable query response times.

More comming ur way so keep visiting..... :)

No comments: