Thursday, April 24, 2008

Enterprise Data Architecture

The Data Architecture - Different methods of storing data in a data warehouse

OLTP:

OLTP databases, handle real time transactions which inherently have some special requirements. If you’re running a store, for instance, you need to ensure that as people order products they are properly and efficiently updating the inventory tables while they are updating the purchases tables, while their updating the customer tables, so on and so forth. OLTP databases must be atomic in nature (an entire transaction either succeeds or fails, there is no middle ground), be consistent (each transaction leaves the affected data in a consistent and correct state), be isolated (no transaction affects the states of other transactions), and be durable (changes resulting from committed transactions are persistent). All of this can be a fairly tall order but is essential to running a successful OLTP database.

Benefits

Online Transaction Processing has two key benefits: simplicity and efficiency.

Reduced paper trails and the faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes things simpler for businesses. It also provides a concrete foundation for a stable organization because of the timely updating. Another simplicity factor is that of allowing consumers the choice of how they want to pay, making it that much more enticing to make transactions.

OLTP is proven efficient because it vastly broadens the consumer base for an organization, the individual processes are faster, and it’s available 24/7.

Disadvantages:

For B2B transactions, businesses must go offline to complete certain steps of an individual process, causing buyers and suppliers to miss out on some of the efficiency benefits that the system provides. As simple as OLTP is, the simplest disruption in the system has the potential to cause a great deal of problems, causing a waste of both time and money. Another economic cost is the potential for server errors. This can cause delay or even wipe out an immeasurable amount of data.

OLAP:

On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

Types:

Multi-Dimensional

MOLAP
is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP uses database structures that are generally optimal for attributes such as time period, location, product or account code. The way that each dimension will be aggregated is defined in advance by one or more hierarchies.

Relational

ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design

Hybrid

There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

The goal of a data warehouse is to bring data together from a variety of existing databases to support management and reporting needs. The generally accepted principle is that data should be stored at its most elemental level because this provides for the most useful and flexible basis for use in reporting and information analysis.

There are two leading approaches to organizing the data in a data warehouse. The dimensional approach advocated by Ralph Kimball and the normalized approach advocated by Bill Inmon.

In the "dimensional" approach, transaction data is partitioned into either a measured "facts" which are generally numeric data that captures specific values or "dimensions" which contain the reference information that gives each transaction its context. As an example, a sales transaction would be broken up into facts such as the number of products ordered, and the price paid, and dimensions such as date, customer, product, geographical location and salesperson.

The main advantages of a dimensional approach are that the Data Warehouse is easy for business staff with limited information technology experience to understand and use. Also, because the data is pre-joined into the dimensional form, the Data Warehouse tends to operate very quickly

The main disadvantage of the dimensional approach is that it is quite difficult to add or change later if the company changes the way in which it does business.

The "normalized" approach uses database normalization. In this method, the data in the data warehouse is stored in third normal form. Tables are then grouped together by subject areas that reflect the general definition of the data (Customer, Product, Finance, etc.). The main advantage of this approach is that it is quite straightforward to add new information into the database -- the primary disadvantage of this approach is that because of the number of tables involved, it can be rather slow to produce information and reports. Furthermore, since the segregation of facts and dimensions is not explicit in this type of data model, it is difficult for users to join the required data elements into meaningful information without a precise understanding of the data structure.

No comments: