Saturday, August 29, 2009

Nature of OLAP Analysis / Reporting

Nature of OLAP Analysis

• Aggregation Comparison
• Budget vs. Expenses
• Ranking
• Access to detailed and aggregate data
• Complex criteria specification
• Visualization

Reporting






Strengths of OLAP / Typical OLAP Query

Strengths of OLAP

• It is a powerful visualization paradigm.
• It provides fast, interactive response times.
• It is good for analyzing time series.
• It can be useful to find some clusters and outliners.

Typical OLAP Query

• Write a multi-table join to compare Budget for each Org YTD this year vs. last year.
• Repeat the above process to find the top 5 Donors to Funds
• Repeat the above process to find the Funds of a Org to new vs. existing Donor.

Reporting

Business intelligence tools: These are software applications that simplify the process of development and production of business reports based on data warehouse data.

Executive information systems (known more widely as Dashboard (business):
These are software applications that are used to display complex business metrics and information in a graphical way to allow rapid understanding.

OLAP Tools: OLAP tools form data into logical multi-dimensional structures and allow users to select which dimensions to view data by.

Data Mining: Data mining tools are software that allow users to perform detailed mathematical and statistical calculations on detailed data warehouse data to detect trends, identify patterns and analyze data.

Star Schema Data Star Schema Data Model

Star Schema Architecture

• Dimension tables
– Textual descriptions of the dimensions of data Textual descriptions of the dimensions of data
– Represents the data that you slice and dice Represents the data that you slice and dice

• Fact tables Fact tables
– Place where numerical measurements of business are Place where numerical measurements of business are stored

Goal: Unified Access to Data










•Collects and combines information
•Provides integrated view, uniform user interface
•Supports sharing

Heterogeneous Information Sources











• Different interfaces
• Different data representations
• Duplicate and inconsistent information
• Vertical fragmentation of informational systems (vertical stove pipes)
• Result of application (user)-driven development of operational systems
Digital Libraries

What are the users What are the users saying

•Data should be integrated across the Data should be integrated across the
enterprise enterprise

•Summary data had a real value to the Summary data had a real value to the
organization organization

•Historical data held the key to Historical data held the key to
understanding data over time understanding data over time

•What What-if capabilities are if capabilities are required

What What does a data warehouse do?

• Integrate divergent information from various systems which Integrate divergent information from various systems enable users to quickly produce powerful ad enable users to quickly produce powerful ad-hoc queries hoc queries and perform complex analysis and perform complex analysis

• Create an infrastructure for reusing the data in numerous Create an infrastructure for reusing the data in numerous ways ways

• Create an open systems environment to make useful Create an open systems environment to make useful information easily accessible to authorized users information easily accessible to authorized users

• Help managers make informed decisions Help managers make informed decisions

Data Warehouse VS OLTP

Data Warehouse V’s OLTP
Enterprise Data Warehouse Traditional Database
Integrated Data Application-specific Data
Current/Historical Data Current Data
Organized by Subject Organized for Data Entry
Non-Volatile Data Updated Data
Denormalized Data Normalized Data
Descriptive Data Encoded Data
Detailed/Summarized Data Raw Data
Knowledge user (Manager) Clerical User

What is a Data Warehouse?

A data warehouse is a subject-integrated, time-varying, non-volatile collection of data in support of the management's decision-making process."
• ---Bill Inmon

“A data warehouse is simply a single,complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
• ---Barry Devlin, IBM Consultant

Sunday, March 22, 2009

Data Integration Tools

Data Integration Tools

Tags: Extract, transform and load (ETL), Enterprise data integration (EDI), Service-oriented architecture (SOA) data services, Data integration tools, VIEW ALL TAGS

These Apps Can Help You Know Your Customers

CRM (customer relationship management) is tricky business. First there are the CRM tools themselves: big, bulky systems from mega-vendors such as Siebel, Oracle, and SAP, or lesser players such as Vantive and Clarify. Installing them can be a major project; teaching users to use them can be even bigger.

As if that weren't enough, most firms store their data in different silos across the enterprise, each with its own format and subset of data. The result? Customer data—your lifeblood—is often fractured, flawed, and messy.

What's more, companies can't build unified views of data across their many systems, meaning they can't get a grasp on who their customers are, what they buy, when they buy it, how they buy it, and most crucially, why they buy it.

A Triad Of Tools

Of course, there are tools to help you cure your data disease. In fact, there are many, but the top three are ETL, EII, and DQ.

ETL. Short for extract, transform, and load, ETL tools are used to grab data from one database and merge it into another. The first step, extraction, reads data from the first database and stores it in memory. Next comes the transformation phase: The ETL tools convert the data into a form that matches the new database. Last is loading, in which the tool writes the data to the target database.

Of course, a tool that can do this has a broader range than simple data migration. ETL can be used to form data marts (a database for a single department or division) and even the odd data warehouse (a larger system that's used across the enterprise and often pools the data from several database systems at once).

EII. Next on the list of tools is EII, short for enterprise information integration. As the name implies, these tools are used to merge data from different systems across the enterprise and show the data in a single, unified (also called "federated") view, as though it came from a single source. Unlike its cousin ETL, EII does not write to a new database or change any data source. Instead, it holds the data in a cache that lets you read—but not write—different records.

DQ. Last is DQ, or data quality tools. These are the simplest of the three. They're used to "scrub" or "cleanse" dirty data (data that's missing fields or contains other errors, such as spelling mistakes).

DQ is utterly needed to keep a clean database that you can rely on (after all, it's awkward to send a form letter to a customer and spell his name wrong). But it's merely a first, and partial, step in cleaning your data. For quality business use, cleansed data needs to be carefully vetted by human eyes, a costly step that not every enterprise can afford.



The Goal

Here's yet another acronym for you to master: CDI. Short for Customer Data Integration, it refers to the process of building a customer hub: a single, central place to store and view data from multiple (and often disparate) systems across the enterprise. A customer hub is one of the seminal concepts of data integration, as it can be used as an end product in itself (a tool for users such as managers and business strategists to view and vet data) or a source for downstream applications (a "feeder" that sends data to other tools, such as analytics and finance software).

You can buy a soup-to-nuts CDI solution from several vendors, the most famous of which is Siperian (www.siperian.com), whose Master Reference Manager and Activity Manager let you troll internal and external silos for customer data and integrate that data into a single, sensible hub.

Or you can build one of your own using the tools mentioned above. After all, ETL, EII, and DQ are meant to gather and clean data. But do they work for such a daunting task as CDI? Remember, the average enterprise can have upward of 20 data silos in just as many departments.

The answer is yes: ETL, EII, and DQ tools can help you with CDI, but they're largely inefficient and ill-suited to the task. First, these tools support only one data modality (or method of moving data from place to place): batch or real-time. Yet most CDI apps need both because there are times when you'll need to update the hub with a high-volume batch process that loads data into it from a new silo, as well as times when you'll need to retrieve single records in real-time. And second, none of the ETL, EII, and DQ tools uses a metadata framework. Metadata, or data about the data in your database, lets data stewards and others quickly configure and extend your data systems without having to write reams of code.

Anurag Wadehra, vice president of marketing for Siperian, notes that using ETL, EII, and DQ for customer data integration is like "having a plumber build your house. Like plumbing in a house, the tools that push data through the pipes don't represent the overarching blueprint needed for CDI architecture."

It's a point well taken, and one you'll need to ponder as you begin to integrate your customer data. Proceed with care: It's one of the hardest tasks that IT can face.

by David Garrett

Data Lifecycle Management

Data, like everything else in this world, has a lifecycle, from the time it's first written to a database to the time it grows obsolete and gets deleted from the system.

Short for data lifecycle management, DLM tools let you manage data from youth to old age, storing newer, more vital data on faster servers or systems, and moving old, rarely used data to slower, cheaper storage, such as tape.

CDI (customer data integration) and DLM systems need to be linked in the enterprise, since the former unifies data from different silos into a customer hub, and the latter decides how and where to store the hub's information. When planning your CDI architecture, be sure to use experts who can take DLM into account.



CDI Vendors

CDI (customer data integration) is the process of building a single, central place to store and view data from many systems across the enterprise. Its major vendors include:

Acxiom | www.acxiom.com ; (501) 342-1000

Ascential | www.ascential.com ; (508) 366-3888

Journee | www.journee.com ; (512) 634-5111

Siebel | www.siebel.com ; (650) 295-5000

Siperian | www.siperian.com ; (650) 350-2200



RELATED CONTENT
Extract, transform and load (ETL)
Data integration costs hurting companies with poor negotiation skills
Wholesaler turns to data integration adapters for mainframe to Oracle migration
Pervasive the most persuasive, cost-effective data integration platform, study says
Integration competency centers centralize data integration projects
Data warehouse case study: Midsized insurer goes enterprise with data warehouse
Enterprise data integration quiz
Data migration planning: Key things to remember
The ETL process and MySQL
Gartner data integration Magic Quadrant 2007: Platforms, market expand
ETL tools and EDR tools: What's the difference?

Enterprise data integration (EDI)
Data integration costs hurting companies with poor negotiation skills
Wholesaler turns to data integration adapters for mainframe to Oracle migration
Exec explains IBM's Information On Demand (IOD) initiative
SaaS-based data quality and integration tools gaining momentum
Exploring key trends in data integration for data warehouses
Pervasive the most persuasive, cost-effective data integration platform, study says
IBM unveils new data management software, services
How to develop a sustainable data integration plan for business intelligence projects
Data integration certifications: Finding the value
Data mashups meet business intelligence: "Bashups" explained

Service-oriented architecture (SOA) data services
How to complete the MDM requirements-gathering analysis process
Event-driven architectures: Understanding concepts, benefits and the bottom line for data management
Event-driven architectures' implications for enterprises and data management professionals
What are the components of service-oriented architecture (SOA)?
What's the difference between SOA and Web services?
Pervasive the most persuasive, cost-effective data integration platform, study says
Master data management as a service-oriented architecture enabler
Understanding master data management and service-oriented architectures, with IBM's Dan Wolfson
SOA governance best practices
A growing integration service providers market gives customers better choices, lower prices

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary