Enterprise data warehousing: April 2008

Thursday, April 24, 2008

Enterprise Data Architecture

The Data Architecture - Different methods of storing data in a data warehouse

OLTP:

OLTP databases, handle real time transactions which inherently have some special requirements. If you’re running a store, for instance, you need to ensure that as people order products they are properly and efficiently updating the inventory tables while they are updating the purchases tables, while their updating the customer tables, so on and so forth. OLTP databases must be atomic in nature (an entire transaction either succeeds or fails, there is no middle ground), be consistent (each transaction leaves the affected data in a consistent and correct state), be isolated (no transaction affects the states of other transactions), and be durable (changes resulting from committed transactions are persistent). All of this can be a fairly tall order but is essential to running a successful OLTP database.

Benefits

Online Transaction Processing has two key benefits: simplicity and efficiency.

Reduced paper trails and the faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes things simpler for businesses. It also provides a concrete foundation for a stable organization because of the timely updating. Another simplicity factor is that of allowing consumers the choice of how they want to pay, making it that much more enticing to make transactions.

OLTP is proven efficient because it vastly broadens the consumer base for an organization, the individual processes are faster, and it’s available 24/7.

Disadvantages:

For B2B transactions, businesses must go offline to complete certain steps of an individual process, causing buyers and suppliers to miss out on some of the efficiency benefits that the system provides. As simple as OLTP is, the simplest disruption in the system has the potential to cause a great deal of problems, causing a waste of both time and money. Another economic cost is the potential for server errors. This can cause delay or even wipe out an immeasurable amount of data.

OLAP:

On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

Types:

Multi-Dimensional

MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP uses database structures that are generally optimal for attributes such as time period, location, product or account code. The way that each dimension will be aggregated is defined in advance by one or more hierarchies.

Relational

ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design

Hybrid

There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

The goal of a data warehouse is to bring data together from a variety of existing databases to support management and reporting needs. The generally accepted principle is that data should be stored at its most elemental level because this provides for the most useful and flexible basis for use in reporting and information analysis.

There are two leading approaches to organizing the data in a data warehouse. The dimensional approach advocated by Ralph Kimball and the normalized approach advocated by Bill Inmon.

In the "dimensional" approach, transaction data is partitioned into either a measured "facts" which are generally numeric data that captures specific values or "dimensions" which contain the reference information that gives each transaction its context. As an example, a sales transaction would be broken up into facts such as the number of products ordered, and the price paid, and dimensions such as date, customer, product, geographical location and salesperson.

The main advantages of a dimensional approach are that the Data Warehouse is easy for business staff with limited information technology experience to understand and use. Also, because the data is pre-joined into the dimensional form, the Data Warehouse tends to operate very quickly

The main disadvantage of the dimensional approach is that it is quite difficult to add or change later if the company changes the way in which it does business.

The "normalized" approach uses database normalization. In this method, the data in the data warehouse is stored in third normal form. Tables are then grouped together by subject areas that reflect the general definition of the data (Customer, Product, Finance, etc.). The main advantage of this approach is that it is quite straightforward to add new information into the database -- the primary disadvantage of this approach is that because of the number of tables involved, it can be rather slow to produce information and reports. Furthermore, since the segregation of facts and dimensions is not explicit in this type of data model, it is difficult for users to join the required data elements into meaningful information without a precise understanding of the data structure.

Monday, April 21, 2008

Data Warehouse Architecture

What is Data Warehouse?

"A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process"
Example: In order to store data, over the years, many application designers in each branch have made their individual decisions as to how an application and database should be built. So source systems will be different in naming conventions, variable measurements, encoding structures, and physical attributes of data. Consider a bank that has got several branches in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The following example explains how the data is integrated from source systems to target systems.

Example of Source Data

System Name	Attribute Name	Column Name	Datatype	Values
Source System 1	Customer Application Date	CUSTOMER_APPLICATION_DATE	NUMERIC(8,0)	11012005
Source System 2	Customer Application Date	CUST_APPLICATION_DATE	DATE	11012005
Source System 3	Application Date	APPLICATION_DATE	DATE	01NOV2005

In the aforementioned example, attribute name, column name, datatype and values are entirely different from one source system to another. This inconsistency in data can be avoided by integrating the data into a data warehouse with good standards

Example of Target Data(Dara Warehouse)

Target System	Attribute Name	Column Name	Datatype	Values
Record #1	Customer Application Date	CUSTOMER_APPLICATION_DATE	DATE	01112005
Record #2	Customer Application Date	CUSTOMER_APPLICATION_DATE	DATE	01112005
Record #3	Customer Application Date	CUSTOMER_APPLICATION_DATE	DATE	01112005

In the above example of target data, attribute names, column names, and datatypes are consistent throughout thetarget system. This is how data from various source systems is integrated and accurately stored into the data warehouse

See Figure 1.12 below for Data Warehouse Architecture Diagram

Sunday, April 20, 2008

Enterprise data warehousing

Introduction :

A data warehouse is the main repository of an organization's historical data, its corporate memory.

It contains the raw material for management's decision support system.

The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems.

Bill Inmon,an early and influential practitioner, has formally defined a data warehouse in the following terms;

Subject-oriented: The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;

Time-variant: The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;

Non-volatile: Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and

Integrated: The database contains data from most or all of an organization's operational applications, and that this data is made consistent.

A data warehouse might be used to find the day of the week on which a company sold the most widgets in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001–2005.

While operational systems are optimized for simplicity and speed of modification (see OLTP) through heavy use of db normalization and an entity-relationship model, the data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). Frequently data in data warehouses are heavily denormalised, summarised or stored in a dimension-based model.

However, this is not always required to achieve acceptable query response times.

More comming ur way so keep visiting..... :)

Thursday, April 17, 2008

Google maps

src="http://www.google.com/jsapi?key=ABQIAAAA_Z537LLRS0at1ZRkyRYLERSxZzijNKwAW_xgBG20OiJu7ZvUYBS-FAhCf3x-P2nhP2yGYGJ72OWv3w" type="text/javascript">

src="http://www.google.com/jsapi?key=ABQIAAAA_Z537LLRS0at1ZRkyRYLERSxZzijNKwAW_xgBG20OiJu7ZvUYBS-FAhCf3x-P2nhP2yGYGJ72OWv3w" type="text/javascript">