Wednesday, December 26, 2012

Customer Data Intergration Software

 Customer Data Integration: Creating a Single Version of the Truth

Effective customer management happens with online, accurate, integrated and up-to-date customer information. In most organizations, customer information is distributed and duplicated across various applications, and it is difficult to get a single version of the truth. To enhance customer management, organizations are increasingly investing time and money into customer data management, with the customer data integration (CDI) system. CDI involves the integrating and unifying of customer information from disperse and heterogeneous business applications. This integrated customer repository becomes the central repository of customer information being used by various applications, and produces the true view of the customer. 

The customer is the heart and soul of an organization. This is the central entity around which business of an organization revolves. The customer is the golden nugget on which the survival and prosperity of an organization depends. The entire universe may be considered as the customer base of an organization in a true sense. The customer can be viewed in various perspectives, three of which are shown in Figure 1.



Figure 1: Customer Classification
From the business interaction angle, customers can be:
  • Account holders: The existing customers possessing active, inactive, dormant or closed accounts.
  • Organizations: The internal organization can be analyzed at various levels of granularity and functions, e.g., the employees themselves can be account holders.
  • Partners: These are channels as well as business supporters. Partner behavior can be studied to understand their needs through the business events. This can be effectively utilized to earn loyalty as well as to grow business.
  • Competitors: In order to strategically frame the business, competitor activities need to be monitored.
Based on status, the customers can be classified as:
  • Active customer: One whose account is active and running.
  • Inactive customer: Customer having valid account but has not used in a period of time.
  • Dormant customer: Individual or corporation with whom organization did business in the past.
  • Prospective customer: An individual or corporate that can be targeted as a potential customer.
Based on the organization, the customer can be grouped as:
  • Individual: A specific person of interest to the organization, i.e., employee, agent or dealer.
  • Corporate: A group of individuals who have banded together for a commercial purpose.

CDI Overview

Organizations have to build strong customer relationships to stay competitive and grow in today's market. Effective customer management mandates easy and quick access to up-to-date and accurate customer information. To enhance customer management, organizations are investing time and money toward building an integrated central customer repository, which can provide the online, accurate, integrated and up-to-date customer information.
Customer data integration (CDI) is an approach toward integration and unification of the customer information from disperse and heterogeneous business applications. CDI processes consolidate customer information from all available sources, such as operational systems, call centers, customer relationship management (CRM) and data warehousing (DW) applications, and ensures the access of the current and complete view of customer information to the relevant departments/ business groups.



Figure 2: CDI Context Diagram
A successful CDI solution helps organizations in:
  • Effective customer management by providing a timely and accurate understanding of customer needs and behaviors.
  • Improved cross-selling and up-selling opportunities by understanding the prospective customers.
  • Removing duplication and misleading customer information and providing single version of truth across the various business units of an organization.
  • Providing effective campaign management.
  • Complying with legislation, regulations and privacy requirements.
  • Optimizing operational, maintenance and enhancement cost by having a central integrated environment (hardware, software).

Customer Data Integration - Challenges

In most organizations, since customer information is distributed across various applications, the unification and integration of customer information from heterogeneous and dispersed applications is a big challenge. Forester Research has found that though 92 percent companies say that having an integrated customer application is critical or important, only 2 percent have managed to achieve this. There are numerous challenges faced during customer data integration:
Duplicate Customer Data Duplicate customer records hinder the organization's ability to identify the customer uniquely and correctly. Duplicate records also cause problems in relating customer transactions to a single customer record. Also, it becomes difficult for the customer service representative to correctly understand the history of interactions made with a customer. The other significant drawback of duplicate records is that it causes duplicate campaigning. Key factors influencing data duplication issues are:
  • Local maintenance and storages of customer information in an individual application;
  • Inorganic growth of the organization (merger and acquisition) resulting in heterogeneous processes and systems to maintain and support customer information;
  • Different customer details fed through different channels (Web, telephone, etc.);
  • Data entry error;
  • Relaxed data entry service level agreements (SLA) and audit;
  • Lack of briefing, training and education to the customer service/front-end staff about the important and significance of customer data fields.
Individual systems have their own way of maintaining customer details, which lead to ambiguous and duplicate customer information. Data entry error or inconsistent data entry by customer service agents leads to an ambiguous data set. To compound the problem, the customer maintains different contact details when they are interacting through different channels. For example, the name and address can be captured in the various ways, which leads to duplicate or inconsistent customer details within and across the applications.
Inconsistent and Inaccurate Data Inconsistent and inaccurate customer data limits the organization's ability to understand and analyze the customer. This leads to poor decision-making that causes customer dissatisfaction. This inconsistent and inaccurate data set can generate a different version of the customer information and defeat the prime purpose of CDI, which is to produce a single version of the truth. It also leads to data reconciliation issues and affects the functioning of the business applications. Key factors influencing consistency and correctness issues are:
  • Lack of common metadata control: Distributed and disintegrated customer metadata across application can lead to the inconsistent definition of the customer.
  • Clerical errors (data entry error): Inaccurate and insufficient data entered by data entry operators or call center agents leads to data sufficiency and accuracy issues.
  • Lack of data ownership, infrequent audit and relaxed SLA. Based upon business needs, individual business groups (sales, operations, marketing, human resource, etc.) primarily focus on the given subset of customer information. The data fields not being used by given business groups can contain a default or meaningless value in the database. Inconsistent domain range and or default values definition can generate the data consistency issues, i.e., default values for birthdate can be different for different departments. The missing data fields cause data sufficiency issues.

With the increasing volume and velocity of data, managing data growth and maintaining the latest and accurate customer information is a challenging task. Decayed and old data contains no value to the business. The two major areas of data management are growth and latest data management.

Growth management. Business applications generate millions of customer records every year. Inefficient data management and storage can have an adverse affect on the performance and usability of the application. Factors contributing data growth are:
  • Nature of operational systems (business applications),
  • Inappropriate historical data management strategy,
  • Lack of data archival and housekeeping strategy, and
  • Inappropriate reference data management strategy.
Current data management. Customer information changes over time. The CDI application should track the changes and maintain the most current customer information. Factor contributing information changes are:
  • Changes in customer credentials;
  • Changes in customer contact details; and
  • Changes in customer demographic, psychographic and geographic details.

The CDI Solution Architecture

A customer data integration (CDI) system is the central application to capture, integrate and distribute customer information. The goal of a CDI application is to integrate customer information from different applications with minimum latency. Based on the needs of an organization and the dynamism of the customer data, the CDI architecture can be implemented either using batch processes (ETL - extract, transform and load) or using real-time messaging (EAI - enterprise application integration).
Figure 3: CDI Logical Architecture (Hub-and-Spoke Model)

A CDI system extracts customer information from disperse applications and performs data cleansing, customer matching (deduping) and integration as per the predefined cleansing, matching and integration rules. The central repository contains the integrated customer data with different views of customer information. The data access interface defines the data access mode, restriction and privileges. Business applications and user communities can access only that data set they are authorized to. Business rules (data cleansing, customer matching, data integration and data access rules) can be stored in the central metadata repository or reside in the individual tools repository.

Conceptual Data Model (CDM)
CDI is an application to store and distribute meaningful customer information. The CDI data model contains customer and related entities. The generic CDM is illustrated in the Figure 4.

Figure 4: CDI Conceptual Data Model

Customer and customer classification. A Customer is a Person or Organization of interest. Customers enter in a relationship with other customers. The nature of this involvement is used to determine whether a specific customer in an external customer, employee, supplier, partner or a competitor. The customer can be viewed from various perspectives as discussed in the beginning of this article.

Customer relationship. This entity stores relationship between two customers. Customer relationships can be categorized as personal or professional, e.g., Parent-Child, Employer-Employee, etc.

Customer contact. This entity captures the customer contact details. Customer contact details can be the physical address, telephone contact and electronic information. The postal address can be subgrouped as current address, permanent address, office address, bill-to-address and ship-to-address. Telephone contact consists of home phone number, office phone number, cell number, corporate office number and local office number. Electronic address consists of personal, office and corporate email ID.

Customer details. This entity captures the customer demographic, psychographic and geographic details. This information can be used for customer segmentation and analysis.

The demographic details to be captured are gender, age group, marital status, number of children, profession, income group, other financial details, etc. The psychographic information captured includes channel preference, privacy specifications, market research, etc. The geographic details to be captured are location (country, region), population groups, country development status, primary currency etc.

Customer accounts. A customer account is a contractual relationship between a customer and an organization and is associated with a given product or services. The account entity stores the account details, account type, account status and other related information.

Customer household and household details. Households are the collection of existing or prospect customers. Households and their demographic, psychographic and geographic information will help in understanding the associated patterns and defining the proactive campaign management.

CDI Processes
CDI processes facilitate the consolidation and unification of disparate customer data into integrated and meaningful customer information. The key driver for customer data integration is to provide the true view of customer. The process steps involved in the customer data integration are data acquisition, data cleansing, data integration and data management.


Figure 5: CDI Processes

Data Acquisition
The data acquisition phase helps in understanding the customer data and defining the data extraction strategy. It involves the identification, analysis and extraction of customer data from various business applications (operational systems). A detailed study of source data is performed to understand the data format, characteristics, pattern and usability. A data extraction strategy and approach is defined to extract the relevant customer information from source systems.

Data Cleansing
The data cleansing phase encompasses the processes and procedures for data correction and standardization. Data correction is the process of fixing, spelling and correcting the address, ZIP code, Social Security number and permanent account number. Once the data has been corrected, it needs to be standardized according to the predefined data format and structure through a data standardization process such as storing the Social Security number as 999-99-9999.

The data integration phase includes the processes for matching, merging and linking of customer information. This involves the following processes:
  • Customer matching and linking - Customer data is deduped to remove the duplicate customer records and generate a single customer record valid across the business applications (source systems). Also, customer records get linked with the other related records, i.e., households and organizations.
  • Data transformation and integration - Data will be transformed and integrated to produce the true view of customer. On a need basis, in-house customer information will be integrated with external third-party customer data set (e.g., Dun & Bradstreet, Experian) and produce the integrated customer database with various data access views.
Data Management
Data management includes the processes for monitoring and maintenance of customer data, which is dynamic by nature and changes over time. It requires periodic data monitoring and maintenance to keep the up-to-date customer information available.

Data monitoring processes periodically analyze customer data to understand any changes in the customer information. Data maintenance makes the latest information available and archives the old data set. The archived data set is required to reproduce the snapshot of customer information at any given point in time.

Customer Data Integration Architecture



As above figure depicts, building a customer data hub requires both bulk data movement from ERP, CRM and other operational systems as well as transaction level data validation and customer master management from the customer touch points.

In reality, most data is generated by the operational systems, such as an SAP R/3 system or a Siebel application. Customer name and address data will be maintained by the various operational components that need to communicate with the customer. These systems perform tasks such as invoicing, campaign execution and shipping – each of which provide customer touch points that can aggregate more customer information. One approach for maintaining data integrity would be to attack the problem at the operational system level. This seems to be a practical approach. After all, operational systems are the place where detailed transactions are completed. However, these applications are dedicated to performing one function that represents specific business requirements. The data collected in this environment is a by-product of the transactions that have been executed, and for the most part, the applications found here are not integrated with any other applications. Furthermore, each application is its own standalone environment and is optimized for the particular needs of the application. While this data is optimized for the operational system, to fully understand your customer, you need to consolidate that data, by customer, into a single customer-centric database. 

 
The goal of CDI is to provide the best information from the combination of the customer systems. By combining the systems, you know the customer at each touch point across every line of business. This requires an accurate, coherent customer view. Specifically, the goal is to:

Resolve customer data duplications and ambiguities throughout the entire enterprise.
Supplement gaps in the knowledge of customers from external sources.
Support customer data extraction and creation of an integrated customer database



Customer Data Intergration Techniques

Techniques for managing complexity

Attributes and their values can become extremely complex and dynamic due to the many changes individuals go through. Multiply all these fields by the millions of records a business or organization may have in its data sources, then factor in how quickly and how often this information changes. The Data Warehousing Institute (TDWI) says: “The problem with data is that its quality quickly degenerates over time. Experts say 2% of records in a customer file become obsolete in one month because customers die, divorce, marry and move.”[1]
To put this statistic into perspective, assume that a company or charity has 500,000 customers, donors or prospects in its databases. Cumulatively, if 2% of these records become obsolete in one month, 10,000 records go stale per month; or 120,000 records every year. Within two years about half of all the records may become obsolete if left unchecked.
Peppers and Rogers[who?] call the problem, "an ocean of data"[this quote needs a citation]. Jill Dyche and Evan Levy, gurus in this field[citation needed], have boiled the challenges down to five primary categories:
  1. completeness – organizations lack all the data required to make sound business or organizational decisions
  2. latency – it takes too long to make the data valuable: by the time of use, too much has become obsolete or outdated (slowed by operational systems or extraction methods)
  3. accuracy
  4. management – data integration, governance, stewardship, operations and distribution all combine to make-or-break data-value
  5. ownership – the more disparate the owners of the data-source owners, the more silos of data exist, and the more difficult it becomes to solve problems

History of customer data integration

In the late 1990s Acxiom and GartnerGroup coined the term "customer data integration" (CDI).[citation needed] The process of CDI, as Acxiom and Gartner described it, includes:
  1. cleansing, updating, completing contact-data
  2. consolidating the appropriate records, purging duplicates and linking records from disparate sources to enable customer or donor recognition at any touch-point
  3. enriching internal and transactional data with external knowledge and segmentation
  4. ensuring compliance with contact suppression to protect the individual and the organization
As of 2009, service providers deliver CDI as a hosted solution in batch volumes, on demand using a software as a service (SaaS) model, or on-site as licensed software in companies and organizations with the resources to drive their own data integration processing. CDI enables companies to optimize merchandizing (assortment, promotion, pricing and rotation) based on demographics, lifestyle and life-stage, to ensure inventory turn and to reduce waste.[citation needed] CDI also aids companies and organizations in choosing the best location for new branch offices or outlets.[citation needed]
CDI commonly supports both customer relationship management and master data management, and enables access from these enterprise applications to information confidently describing everything known about a customer, donor, or prospect, including all attributes and cross references, along with the critical definition and identification necessary to uniquely differentiate one customer from another and their individual needs.

Customer Data Integration Software

When companies and organizations wish to compile all of their customer or consumer information into one client, they use customer data integration software. Customer data integration software is used to integrate customer addresses, sales, demographics, customer needs, and features that would appeal to certain customers into one interface system so that the business entity can view all of this information at once in order to speed up production and make more sales. In this article, we will look at customer data integration software and various products available on the market.

In data processing, customer data integration (CDI) combines the technology, processes and services needed to set up and maintain an accurate, timely, complete and comprehensive representation of a customer across multiple channels, business-lines, and enterprises — typically from multiple sources of associated data in multiple application systems and databases. It applies data-integration techniques in this specific area.

What is Customer Data Integration Software
Customer data integration software is used to integrate customer information into one user-friendly console so that companies, organizations, and small businesses may review the information without wasting time by searching through large databases or files. Customer data integration software is generally top of the line but some software is more expensive than others. While many of these programs are similar in nature, they each have slightly different functions and display methods that may make it easier to view customer data. In order to find a customer data integration software that is good for you and your company, you will need to preview several customer data integration programs and select one that your company likes.

Popular Customer Data Integration Software
Customer data integration comes in various forms and is sometimes hard to distinguish from other forms of data integration such as application integration. Many companies that provide other forms of data integration, however, also provide customer data integration software. For this reason, we have compiled a small list of customer data integration software for you to review. The following is that list.

IBM WebSphere Customer Center
The IBM WebSphere Customer Center is a very powerful and user-friendly customer data integration software that will get the job done without taxing your patience. The IBM WebSphere Customer Center comes with over 500 individual services and functions to help you manage your customer information. The software is also based on open-source data so its services are constantly being updated and there is much support available on the Internet for this software. Some of the key features of this software is the ability to recognize and process duplicate customer data as well as the ability to integrate IBM WebSphere Customer Center with your other enterprise software.

Adeptia Customer Data Integration Accelerator
The Adeptia Customer Data Integration Accelerator is more than just a customer data integration software. It not only allows you to store and process information from customers but also from distributors, manufacturers, suppliers, and even your partner companies. The Adeptia Customer Data Integration Accelerator is able to process files from multiple sources and even in multiple formats so that you will never have to worry about converting your data to the same type of file. The Adeptia Customer Data Integration Accelerator works with databases, email clients, and a large number of applications which makes it capable of cross-platform functionality.

SAS
SAS is first and foremost a data analysis program but it can also be used as a customer data integration software. SAS is able to process large amounts of customer information and present it in readily-available reports. These reports allow you to not only view the information that you need but also to manage that data in a powerful way. SAS can be used to compile all customer information into one spot and then make predictions about your company's future.

Altova MapForce
Altova MapForce is a data mapping software that is also capable of processing and integrating data into one interface. Altova MapForce can also be used to create customer data integration software from a number of applications and integration methods. Altova MapForce can collect data from databases, spreadsheets, and other documents and sources. MapForce can then build an entirely customized customer data integration software by allowing the user to combine functions and tools. Altova MapForce can be combined with other services such as Visual Studio and Eclipse to build even more advanced software. Altova MapForce also allows the user to build web-based applications which can be used by the entire company from one online location.

Siperian
Siperian, also known as Informatica, is a source for customer data integration software as well as other forms of data integration. Siperian specializes in data integration and can be used by almost anyone, whether they have experience in customer data integration or not. Siperian offers many of the same features that other customer data integration software does but executes their services in a way that has more functionality and support.

Wednesday, December 19, 2012

SQL Transformation in Informatica


SQL Transformation is a connected transformation used to process SQL queries in the midstream of a pipeline. We can insert, update, delete and retrieve rows from the database at run time using the SQL transformation.

The SQL transformation processes external SQL scripts or SQL queries created in the SQL editor. You can also pass the database connection information to the SQL transformation as an input data at run time.

The following SQL statements can be used in the SQL transformation.
  • Data Definition Statements (CREATE, ALTER, DROP, TRUNCATE, RENAME)
  • DATA MANIPULATION statements (INSERT, UPDATE, DELETE, MERGE)
  • DATA Retrieval Statement (SELECT)
  • DATA Control Language Statements (GRANT, REVOKE)
  • Transaction Control Statements (COMMIT, ROLLBACK)

Configuring SQL Transformation


The following options can be used to configure an SQL transformation
  • Mode: SQL transformation runs either in script mode or query mode.
  • Active/Passive: By default, SQL transformation is an active transformation. You can configure it as passive transformation.
  • Database Type: The type of database that the SQL transformation connects to.
  • Connection type: You can pass database connection information or you can use a connection object.

We will see how to create an SQL transformation in script mode, query mode and passing the dynamic database connection with examples.

Creating SQL Transformation in Query Mode


Query Mode: The SQL transformation executes a query that defined in the query editor. You can pass parameters to the query to define dynamic queries. The SQL transformation can output multiple rows when the query has a select statement. In query mode, the SQL transformation acts as an active transformation.

You can create the following types of SQL queries

Static SQL query: The SQL query statement does not change, however you can pass parameters to the sql query. The integration service runs the query once and runs the same query for all the input rows.

Dynamic SQL query: The SQL query statement and the data can change. The integration service prepares the query for each input row and then runs the query.

Dynamic SQL query: A dynamic SQL query can execute different query statements for each input row. You can pass a full query or a partial query to the sql transformation input ports to execute the dynamic sql queries.

SQL Transformation in Informatica Example Using Static SQL query

Q1) Let’s say we have the products and Sales table with the below data.

Table Name: Products
PRODUCT 
-------
SAMSUNG
LG
IPhone

Table Name: Sales
PRODUCT QUANTITY PRICE
----------------------
SAMSUNG 2        100
LG      3        80
IPhone  5        200
SAMSUNG 5        50
 
Create a mapping to join the products ant sales table on product column using the SQL Transformation? The output will be

PRODUCT QUANTITY PRICE
----------------------
SAMSUNG 2        100
SAMSUNG 5        500
LG      3        80
 

SQL Transformation in Informatica Example Using Full Dynamic query

Dynamic SQL query: A dynamic SQL query can execute different query statements for each input row. You can pass a full query or a partial query to the sql transformation input ports to execute the dynamic sql queries.

Q2) I have the below source table which contains the below data.


Table Name: Del_Tab
Del_statement
------------------------------------------
Delete FROM Sales WHERE Product = 'LG'
Delete FROM products WHERE Product = 'LG'

Solution:

Just follow the same steps for creating the sql transformation in the example 1.
  • Now go to the "SQL Ports" tab of SQL transformation and create the input port as "Query_Port". Connect this input port to the Source Qualifier Transformation.
  • In the "SQL Ports" tab, enter the sql query as ~Query_Port~. The tilt indicates a variable substitution for the queries.
  • As we don’t need any output, just connect the SQLError port to the target.
  • Now create workflow and run the workflow.

SQL Transformation in Informatica Example Using Partial Dynamic query

Q3) In the example 2, you can see the delete statements are similar except Athe table name. Now we will pass only the table name to the sql transformation. The source table contains the below data.


Table Name: Del_Tab
Tab_Names
----------
sales
products

Solution:

Create the input port in the sql transformation as Table_Name and enter the below query in the SQL Query window.

Delete FROM ~Table_Name WHERE Product = 'LG'

Monday, August 27, 2012

The State of Dashboards in 2012: Pathetic

The State of Dashboards in 2012: Pathetic

Over the last several months, my colleague VP and Research Director Tony Cosentino and I have been assessing vendors and products in the business intelligence market as part of our upcoming Value Index.

Tony recently wrote about the swirling world of business analytics, covering many of the dynamics of this industry. He and I have been reviewing the breadth and depth of over 15 of these vendors using our Value Index methodology, which examines the products closely in terms of usability, adaptability, reliability, capability and manageability. As we have gone through this analysis, we see the dashboard as the most common tool for displaying business intelligence. The early forms of dashboards appeared in the 1980s, but in my honest evaluation, today’s dashboards have not gotten much more intelligent in all those years. The graphics have gotten better, and we can interact with charts in what is commonly called visual discovery so you can drill into and page through data to change its presentation. So some progress has been made, but the basic presentation of a number of charts on the screen has not improved significantly and worse yet neither has the usefulness of the charts. Let’s face it: It’s a big mistake to place several bar and pie charts on a screen side by side and assume that business viewers will know what they mean and what is important in them. We cannot assume that individuals in an audience have the ability to interpret charts and draw the right conclusions from them; just being pretty or interactive will not communicate the desired message.

The lack of adoption of business intelligence that includes dashboards is notorious in this industry, and so are the billions of dollars that companies have spent on BI products in the last decade. It is not helpful to make a big statement that the technology has failed; we should look for reasons that have held it back. Here we might start by questioning whether the tools present the right information in a useful form for business people or if organizations have properly configured what tools they have purchased. If the goal is to inform them through dashboards, then maybe we need to make it explicit what the dashboard or collection of charts actually mean. Typically, this means describing in words the issues or priorities that need to be examined further. A little discipline in populating the dashboard could help, such as presenting only the charts that clearly point out issues that need attention and determining which ones to use by applying analytics. If we ask why Microsoft PowerPoint is so popular as a business intelligence tool, we probably would find that the answer is the descriptive text boxes that accompany charts, providing summary sentences or emphasizing specific bullets in a list on the slide. While many people do not like the static nature of Microsoft Excel based charts in presentations or PDF versions of them, they do through human intervention with annotation and commentary provide better explanation of the charts than dashboards are doing today. If we expect our organizations to move beyond personal productivity tools and work in a collaborative enterprise environment with dashboards, we better understand how business intelligence should adapt to the way people work and operate not the other way around. In this case it may not be true that, as the old saying goes, one picture is worth a thousand words but a hundred or so words explaining the relevance of the chart could really help.

Many technology vendors believe they need to provide better context in their dashboards, so they try to align the charts to the geographic area of focus, or to the product line of responsibility or to management key performance indicators to make them more usable. Providing better role-based dashboards that are generated based on the individual’s level of responsibility and the business context is a good first step, though most business intelligence vendors do not provide this level of support. But just presenting charts tuned to the context of the individual’s role that may or may not require action is not enough. We need to prioritize the information and make it like the news, with headlines and stories that people can read to determine if they need to make decisions or take action. Whether you are reading the physical or the digital version of The Wall Street Journal or USA Today, newspapers have survived over the centuries as the main source of what humans read in formats they can comprehend. When is the last time you saw a dashboard that communicated the story of its charts and explained the analytics?

Well, once upon a time analytics and logic were applied to generate stories, in the early 1990s in a product called IRI CoverStory. Then it was classified as an expert system that programmatically would create English sentences based on the interpretation of the analytics in a memo that the system created. I would even be happy if we had titles and sub-titles to the charts that were dynamically created and represented something to guide an individual to what the purpose of the chart is to represent. Many of the current business intelligence technologies do not even allow for a free form text box that can be placed besides a chart which is really sad as this is one of the most basic methods used in business today. It would be great if dashboards could make these steps forward and make it easier to understand what is presented, but 20 years later, they have not.
Another thing dashboards need to do is help individuals take action based on the information they receive. My colleague Robert Kugel has written about action-oriented information technology frameworks and how they can help increase the productivity and effectiveness of our workers. To date, most developments of the notion of an action-enabled dashboard have focused on data discovery and supporting root-cause analysis; that can’t match the familiar people type actions that happen in our organization – collaboration through dialogue to address issues and opportunities.

Some of my industry colleagues have written books on dashboards to capitalize on the hype surrounding the topic. It’s about time for a set of books about the death of the dashboard or moving beyond dashboards; the current designs are not advancing the ability to take appropriate action on the information presented or provide the right level of guidance using analytics. We are entering the next wave of discussion on visual discovery, but so far much of this focus is just about using visualization on greater volumes and velocity of data, not making it more useful for the general population of business users. If we want to learn from the disappointing decades of business intelligence deployments, then we should find out what our business users really need to take action and make decisions on the information; delivering prettier charts won’t help. Until then, we are just perpetuating the past, and we know it has not had the best track record in advancing usefulness and adoption of business intelligence and dashboards.

I will follow up on this rant of the state of dashboards by writing about the lack of improvement in the types of metrics and indicators as they relate to overall business analytics, which are another source of the problems that underlie our current methods of delivering and providing access to analytics through business intelligence. We all can do a much better job in meeting the needs of business and truly advancing the usefulness of technology that still holds promise for significantly impacting organizations’ effectiveness.
This blog originally appeared at Ventana Research.

Principles of Data Visualization

Eight Principles of Data Visualization

Imagine you are walking out of the office after a long day and your phone buzzes with a new email. Taking a quick glance, you see that it’s from Joe in operations: "Hey, wondering if you could run me a few numbers and put them in a nice chart to show how well our new store layouts are doing along with the latest sale promo we started last week. Need to put it into a presentation for the executive team next Monday. Thanks."

What does Joe really need? Where do you start? For anyone in a business environment who collects or manages some kind of raw data, tasks that are becoming more pervasive, the need to process that data into a human-usable form is increasingly common.

Visualizations, like the chart Joe asked for, are a great way to accomplish this, but they can be difficult to do properly, as anyone who has sat through a slide show presentation with an unreadable pie chart or vague growth projection graph can attest. As available data becomes more complex and extensive, weaving it into a visualization that invites engagement, understanding and decision-making is a bigger challenge, with a bigger opportunity for payoff.

Some of the traditional business standbys, like a one-off pie chart or simple line graph, even if done well, may not offer enough data to answer multi-faceted questions like Joe's. (See Figures 1 and 2, at left.) How can we take visualizations to the next level, so they can take on the challenge of today's business complexity?

Get the Fundamentals Right

The first step is to back up and focus on the basics. If you have ever played a team sport with a good coach, you may recall that he or she spent a lot of time working on fundamentals. Trick plays or advanced moves don’t win a game without solid fundamentals supporting them, and data visualization is no different. The most complex, data-rich graphic is useless unless it follows basic principles of good visualization:

1. Understand the problem domain. If you are producing visualization for your own use or that of your department, chances are good you already understand the area you will be working in. But if, as in our scenario with Joe, the visualization is for another department, or even an external stakeholder such as a customer or partner, you may need to ask questions and do more research to understand what is involved. In this case, you should investigate when these initiatives started, whether any others are in progress at the same time and what metrics the executive team will use to determine success.

2. Get sound data. This may seem obvious, but good data is at the heart of any effective visualization. Make sure the data you select is as accurate as possible, and that you have a sense of how it was gathered and what errors or inadequacies  may exist. For example, maybe our store sales data for Joe is only current as of the last close of business, thanks to an older cash register system. Make sure you get relevant data and enough of it. We probably want not only sales data after these changes, but also the month or quarter before and even the same period in past years for comparison purposes. Above all, to create an effective visualization, you need to understand the meaning of the data you are working with. This can be a challenge if it has been stored as raw numbers. In this case, we may need to determine the store visitor counting method  being used to know what those numeric tallies mean.

3. Show the data and show comparisons. Picking the best type of visualization is an art and science; however, the basic rule of thumb is to choose a spatial metaphor that will show your data and the relationships within it, with minimum distractions or effort on the part of the viewer. As Eddie Breidenbach explains, most graphic arrangements fall into one of four categories or metaphors (see Figure 3, at left):
  • Network - to show connections, sometimes in a radial layout.
  • Linear - to show how something varies over time or in relation to another factor, often on an X/Y space.
  • Hierarchical - to show groupings and importance; these can come in many different layouts.
  • Parallel - to show reach, frequency or shares of a whole; these can come in many different layouts.
For Joe's chart, we can start with a well-labeled, linear line graph since we want to see how sales have been affected since introducing these new initiatives. (See Figure 4, at left.)

4. Incorporate visual design principles. Using sound visual design elements, like line, form, shape, value and color, with principles like balance and variety, make a visualization both more inviting and easier to read for trends and comparisons. (See Figure 5, at left.) This will become particularly important as we take our linear metaphor visualization to the next level.

Bring in More Dimensions

Once we have good data and a sound underlying spatial metaphor (in this case, a linear metaphor), it is time to take account of the complexity at play. Though it might seem like we have satisfied the initial question at face value (“Sales are up since changing the store layout and starting the new promo”), this answer is likely to spur more questions

Based on our knowledge and research into the problem domain, we can come up with  initial follow-up questions after looking at the simple linear metaphor visualization:
  1. We started both of these initiatives right before a holiday weekend. How do we know that this uptick in sales is not just a seasonal trend?
  2. Total sales are up, but has the new store layout succeeded in improving the performance of some departments that were struggling before?
  3. Are we succeeding in getting more customers into the store and not just selling more to existing ones?
  4. Are customers shopping more departments and buying a more diverse mix of items?
Asking these kinds of questions is a great exercise to begin taking a visualization to the next level because they prompt us to add more dimensions that allow viewers to explore and understand the subject from additional angles and in more detail. There are a variety of solid techniques that can help achieve this additional dimensionality. Below are the answers to these questions:

5. Add small multiples. As described by author Edward Tufte, small repeated variations of a graphic side-by-side allow for quick visual comparison. Whenever possible, scales should be kept the same and the axis of comparison, aligned. Adding some small, stacked thumbnails of our chart next to the main one allows a comparison of sales trends for the same period last year, and the one before that. (See Figure 6, at left.) This answers our first question: sales do normally go up this time of year, but the increase seems to be quite a bit bigger this time, so it is probably not just the normal seasonal cycle.

6. Add layers. Adding extra levels of information, while preserving the high-level summary data, can make a graphic more flexible and useful. Next, we are going to break down the "top line" of total sales into departments. (See Figure 7, at left.)The resulting stacked area chart answers our second question, showing that sales from the appliances department have increased as a proportion of the whole, but media department sales have not improved much.

7. Add axes or coding patterns. Another way to get more dimensions in a graphic is to add additional patterns for coding information, such as varying the shape or color of points on a plot based on a variable. In some cases, an extra axis in space, alongside an existing one or in a new direction (for a 3D chart), can also be useful for showing new variables. It's important to be careful with this approach, as it can add clutter, but when used sparingly and with good design principles it can increase a graphic's usefulness. In Figure 8 (at left) we added an additional vertical axis on the right to show daily foot traffic into the store, with its scale overlaid carefully to be comparable but distinct. To answer question number three, “Yes; we have increased foot traffic, but only after the sales promotion.”

8. Combine metaphors. So far, we have used a linear metaphor for our visualization. However, to answer our last question, we want to add a network metaphor to show connections between product categories in purchases. A pair of circular relationship (chord) diagrams showing snapshots at the beginning and end of the time period under consideration can help compare these connections. Like a pie chart, each product category is assigned a section of the circle, by percentage of total sales, but the center of the circle is hollow. If a majority of purchases containing items in one category also included items in a second category, a line is drawn to that second category; line width is based on the average proportion of both categories in the mixed purchases. As shown in Figure 9 (at left), the increase in these chord lines from the first to second diagram suggests there are indeed more purchases that cross departments since our initiatives went into place.
This relationship data would be even better if we could see it at any chosen point in time (for example, to see what effect, if any, the layout change alone had, before the promotion started). A zoomed-in view of the chord diagrams for detailed study might be useful, too. Clearly, some presentation media lend themselves to these opportunities more than others. As our graphics increase in complexity and sophistication, we need to think more carefully about how to deliver them.

Consider New (and Old) Delivery Methods

The point of any visualization is to be viewed by the right people, in the right context. Unfortunately, many business visualizations have a fleeting life on a slide, up one minute on a low-resolution projector to be scanned from across the room, and nothing but a vague memory the next.
What if, instead of a “flash on a slide” with all of these limitations, Joe's final visualization was printed in high-resolution color on a handout? Everyone could refer back to it as a touchstone during the whole presentation, seeing how the data backs up Joe's conclusions. Afterward, they could tack it up on a whiteboard for further study and follow-up.
On the other hand, maybe Joe needs people at a remote site to see this graphic or he would just prefer not to kill so many trees. He might consider putting a high-resolution version on the Web (or corporate intranet) for viewing on a PC or tablet. This could be as simple as a static graphic like the paper copy, but it also opens all kinds of possibilities for interactivity. To give just a few examples, we could enable scrubbing through time (great for seeing more network metaphors), drilling down and zooming out for a bird's eye view, seeing new data live as it becomes available or even manipulating future variables to watch different scenarios play out.

For more ideas of what's possible, and a great tool for building these using HTML standards that will work on the boss’s iPad, the Data Driven Documents JavaScript library is a great place to start.

Toward the Future

As visualization moves toward delivery via electronic medium, complex data visualization is increasingly blending into the discipline of user experience design and programming. Business analysts, IT staff and knowledge workers  will need more skills designing, building and using fluid, interactive, dynamic visualizations. Fortunately, there are great tools  and great groups of people focused on user experience, The potential payoff for the investment is huge: visualizations invite us to explore, understand and decide, not as one-off disposable products, but rather as robust, enduring touchstones that customers and leaders return to for insight, conversation and connection.
Note: For more on visualization fundamentals, a good place to start is Edward Tufte's excellent series beginning with “The Visual Display of Quantitative Information.” Also see “Visual Design Fundamentals: A Digital Approach” by Alan Hashimoto.

Ryan Bell is a user interface developer for EffectiveUI, where he gets to employ his passion for building great user experiences and indulge his inner information-design enthusiast.

Principles for Enterprise Data Warehouse Design

Seven Principles for Enterprise Data Warehouse Design

This month, I'd like to narrow the focus to one particular aspect of the enterprise information management spectrum: data warehouse (DW) design.
Contrary to popular sentiment, data warehousing is not a moribund technology; it's alive and kicking. Indeed, most companies deploy data warehousing technology to some extent, and many have an enterprise-wide DW.
However, as with any technology, a DW can quickly become a quagmire if it's not designed, implemented and maintained properly. With this in mind, I'd like to discuss seven principles that I believe will help you start - and keep - your DW design and implementation on the road to achieving your desired results (see Figure 1). I'm including both business and IT principles because most IT issues really involve business and IT equally.

Business Principles 
Organizational Consensus
From the outset of the data warehousing effort, there should be a consensus-building process that helps guide the planning, design and implementation process. If your knowledge workers and managers see the DW as an unnecessary intrusion - or worse, a threatening intrusion - into their jobs, they won't like it and won't use it.
Make every effort to gain acceptance for, and minimize resistance to, the DW. If you involve the stakeholders early in the process, they're much more likely to embrace the DW, use it and, hopefully, champion it to the rest of the company.
Data Integrity
The brass ring of data warehousing - of any business intelligence (BI) project - is a single version of the truth about organizational data. The path to this brass ring begins with achieving data integrity in your DW.
Therefore, any design for your DW should begin by minimizing the chances for data replication and inconsistency. It should also promote data integration and standardization. Any reasonable methodology you choose to achieve data integrity should work, as long as you implement the methodology effectively with the end result in mind.
Implementation Efficiency
To help meet the needs of your company as early as possible and minimize project costs, the DW design should be straightforward and efficient to implement.  This is truly a fundamental design issue. You can design a technically elegant DW, but if that design is difficult to understand or implement or doesn't meet user needs, your DW project will be mired in difficulty and cost overruns almost from the start.
Opt for simplicity in your design plans and choose (to the most practical extent) function over beautiful form. This choice will help you stay within budgetary constraints, and it will go a long way toward providing user needs that are effective.
User Friendliness
User friendliness and ease of use issues, though they are addressed by the technical people, are really business issues. Why? Because, again, if the end business users don't like the DW or if they find it difficult to use, they won't use it, and all your work will be for naught.
To help achieve a user-friendly design, the DW should leverage a common front-end across the company - based on user roles and security levels, of course. It should also be intuitive enough to have a minimal learning curve for most users.  Of course, there will be exceptions, but your rule of thumb should be that even the least technical users will find the interface reasonably intuitive.
Operational Efficiency
This principle is really a corollary to the principle of implementation efficiency.  Once implemented, the data warehouse should be easy to support and facilitate rapid responses to business change requests. Errors and exceptions should also be easy to remedy, and support costs should be moderate over the life of the DW. 
The reason I say that this principle is a corollary to the implementation efficiency principle is that operational efficiency can be achieved only with a DW design that is easy to implement and maintain. Again, a technically elegant solution might be beautiful, but a practical, easy-to-maintain solution can yield better results in the long run.
IT Principles 
Scalability
Scalability is often a big problem with DW design. The solution is to build in scalability from the start. Choose toolsets and platforms that support future expansions of data volumes and types as well as changing business requirements.  It's also a good idea to look at toolsets and platforms that support integration of, and reporting on, unstructured content and document repositories.
Compliance with IT Standards
Perhaps the most important IT principle to keep in mind is to not reinvent the wheel when you build your DW. That is, the toolsets and platforms you choose to implement your DW should conform to and leverage existing IT standards.
You also want, as much as possible, to leverage existing skill sets of IT and business users. In a way, this is a corollary of the user friendliness principle. The more your users know going in, the easier they'll find the DW to use once they see it.


Following these principles won't guarantee you will always achieve your desired results in designing and implementing your DW. Beware of any vendors that tell you it's a slam-dunk if you follow their methodology. There will almost always be problems that seem intractable at first - and may eventually prove to be so. Nevertheless, if you build your DW following these seven principles, you should be in a better position to recognize and address potential problems before they turn into project killers.

Rich Cohen is a principal in Deloitte Consulting LLP's Information Dynamics practice where he is responsible for the strategy, development and implementation of data governance, data warehousing, decision support and data mining engagements to support the emergence of world-class business intelligence applications. Cohen has more than 27 years of experience in the design, development, implementation and support of information technology in a variety of industries. Over the last 18 years, he has had extensive experience in the creation of technology strategies, implementations and deployment of CRM and business intelligence solutions to drive improved business performance.

Monday, August 20, 2012

Informatica - Power Center - Transformations

Transformations



Power Center Transformations (partial list)

Source Qualifier: reads data from flat file and relational sources
Expression: performs row-level calculations
Filter: drops rows conditionally
Sorter: sorts data
Aggregator: performs aggregate calculations
Joiner: joins heterogeneous sources
Lookup: looks up values and passes them to other objects
Update Strategy: tags rows for insert, update, delete, reject
Router: routes rows conditionally
Transaction Control: allows data-driven commits and rollbacks

Advanced Power Center Transformations

Union: Performs a union-all join between two data streams
Java: allows Java syntax to be used within Power Center
Midstream XML Parser: reads XML from anywhere in mapping
Midstream XML Generator: writes XML to anywhere

More Source Qualifiers: read from XML, message queues
and applications

Mapplet - Set of Transformation that can be reusable


Example : Data Sources Defined Outside Mapplet


Recap

1. ETL - a. Extract, transform and load data
2. Designer - b. Create mapping objects
3. Mapping - c. Logically defines the ETL process
4. Transformation - d. Generates or manipulates data
5. Mapplet - Set of transformations that can be reused in multiple mappings

Informatica - Power Center Basic Concepts

Power Center Introduction
  • Is a single, unified enterprise data integration platform that allows companies and government
    organizations of all sizes to access, discover,and integrate data from virtually any business
    system, in any format, and deliver that data throughout the enterprise at any speed
  • An ETL Tool (Extract, Transform and Load)
Power Center Client Applications


Designer Tools – Create mappings



Mapping



A mapping is a set of source and target definitions linked by transformation
objects that define the rules for data transformation. Mappings represent the
data flow between sources and targets. When the Integration Service runs a
session, it uses the instructions configured in the mapping to read,
transform, and write data.
• Every mapping must contain the following components:
Source definition. Describes the characteristics of a source table or file.
Transformation. Modifies data before writing it to targets. Use different transformation objects to
perform different functions.
Target definition. Defines the target table or file.
Links. Connect sources, targets, and transformations so the Integration Service can move the
data as it transforms it.
• A mapping can also contain one or more mapplets. A mapplet is a set of transformations that you

Example

Give me an Excel file with Total Order Amount per Customer. I also need to know when this data was
extracted (date) and the customer type initial ( first letter of the customer type)
• Define the sources
• Orders
• Customers
• Define any required transformation
• Sum of order amount
• Get extracted date
• Get first letter of customer type
• Create the file

Saturday, July 21, 2012

Migrating to an InfoSphere Warehouse instance that is installed on a different computer

Migrating to an InfoSphere Warehouse instance that is installed on a different computer

You can migrate from Data Warehouse Edition V9.1.x to InfoSphere Warehouse when these products are installed on two separate computers. Assume that you want to migrate a Data Warehouse Edition V9.1.x instance that is installed on computer A to an InfoSphere Warehouse instance that is installed on computer B.
Before you begin
  1. Back up the metadata and scheduler databases on computer A.
  2. Copy all of the data warehouse projects that you want to migrate to computer B.
  3. Copy all of the deployed data warehousing applications from computer A to the same location on computer B.
    To deploy data warehousing applications, three directories are used:
    • Application home directory
    • Log directory
    • Working directory
    All the three directories must be created at exactly the same location on computer B. For example, if you have an application deployed on computer A at C:\application_dir\applicaton_1, then this directory becomes the application home directory. Assume that C:\log\ is the log directory, and C:\temp\working\ is the working directory. You must create these three directories on computer B, and copy all the files in these directories from computer A to computer B. Otherwise, the migrated application will not work. Note that each data warehouse application can have their own three directories. So, if you have 100 applications, there can be 300 different directories that you must copy to computer B. If some directories are not copied, you might see an error message in the migration log that indicates that certain files are missing during migration.
ProcedureTo complete the migration from computer A to computer B:
  1. Restore the metadata and scheduler databases on computer B.
  2. Migrate the WebSphere® application profile. For detailed help on migrating the WebSphere application profile that is located on a separate computer, see the WebSphere Information Center.
  3. Run the InfoSphere Warehouse Configuration Tool on computer B.
  4. Specify the migration settings in the migration.properties file.
  5. Run the migration script on computer B.

Post-migration tasks

Post-migration tasks

After migrating the data warehouse projects with the InfoSphere Warehouse Migration wizard, you must perform some post-migration tasks to ensure that you do not face errors when using the migrated data warehouse projects in the Design Studio. You must also specify the install location of the InfoSphere Warehouse in the config.properties file before you start using the Administration Console.
Before you begin
  • Migrate the data warehouse projects from Data Warehouse Edition V9.1.x to InfoSphere Warehouse.
ProcedurePerform the following post-migration tasks:
  1. Switch to a new workspace for working with the migrated projects in the Design Studio, Version 9.5.1
  2. Import the migrated projects into this new workspace:
    1. Click File > Import. The Import window opens.
    2. Click General > Existing Projects into Workspace and then click Next.
    3. Browse to the directory that contains the migrated projects.
    4. Select the projects that you want to import in the workspace and click Finish. The imported projects are displayed in the Data Project Explorer.
    Recommended: After migrating the old Data Warehouse Edition V9.1.x projects to InfoSphere Warehouse, import the migrated projects to a new workspace before you start using these projects in InfoSphere Warehouse. If you accidentally or intentionally import the migrated projects to an old workspace that was used in Data Warehouse Edition V9.1.x, you might not see some of the views in the BI perspective or the new menu shortcuts (for example, Workload Management, Text Analysis, and so on). Therefore, to continue using the old workspace, you must perform one of the following two steps:
    • Reset the BI perspective by selecting Window > Reset Perspective.
    • Select Window > Customize Perspective and select the shortcuts that you would like to see in the various menus of the BI perspective.
  3. Re-create the database connections in the Database Explorer view. The migration wizard does not migrate the workspace preferences and the database connections to the destination directory. Only the data warehouse projects that are contained in the source directory are migrated. Therefore, you must re-create the database connections in the Database Explorer view.
  4. If you migrated mining flows that are contained in your data warehouse projects, then you must restore the links to the database connections that are used by the mining flows. Perform one of the following two steps:
    • In the Data Project Explorer, right-click the Mining Flows folder in your data warehouse project. Select Set Online Database and then select the mining flows for which you want to restore the database links. Design Studio sets the online database to the same value as the SQL execution database for each of the selected mining flows.
    • Alternatively, you can right-click the name of mining flow and select the Set Database option. Then, select the name of the database connection and click OK. This is not a convenient method to restore database links, particularly when you have to restore links to several database connections.
  5. Before you start using the InfoSphere Warehouse Administration Console, perform these steps:
    1. Open the config.properties file that is located in the InfoSphere Warehouse installation directory (product_installation_directory\DWEAdmin\lib\custom).
    2. Add the keyword dwe.installLocation and specify the complete path to the product installation directory.
      For example, dwe.installLocation = C\:Program Files\IBM\dwe95
      Remember: The keyword dwe.installLocation is case sensitive.
    3. Restart the WebSphere® Application Server.

Migrating data warehouse projects and OLAP metadata

Migrating data warehouse projects and OLAP metadata

You can migrate data warehouse projects and OLAP metadata from within the Design Studio by using the InfoSphere Warehouse Migration wizard.
Before you begin
Ensure that:
  • You installed the Design Studio, migration tool plug-ins, and the SQL Warehousing Tool plug-ins of InfoSphere Warehouse.
  • The common repository database is created on the DB2® V9.5 database server.
  • The projects to be migrated are from DB2 Data Warehouse Edition V9.1.x.
About this task
You can migrate the following components by using the InfoSphere Warehouse Migration wizard:
  • Data warehouse projects (SQW and mining flows)
  • OLAP metadata in databases
You can migrate one component at a time or both of the components together. In addition, you can run the wizard multiple times on different data warehouse projects. For example, if you have 100 data warehouse projects to migrate, you can choose to migrate 50 projects in the first run, and then run the migration wizard again to migrate the remaining 50 projects. You don't need to migrate all of the projects at the same time.
Remember: Do not run the Migration wizard more than once to migrate the OLAP metadata in databases, if the metadata was migrated successfully in the first run. You will receive an error when you run the Migration wizard again to migrate the metadata. However, if the first run fails, then the Migration wizard rolls back the migration process and you can run the wizard again.
Procedure
To migrate data warehouse projects and OLAP metadata:
  1. Start the InfoSphere Warehouse Design Studio by using a new workspace. You will use this workspace for migration purposes only.
  2. In the Database Explorer view, create a connection to:
    • The database that you will use to migrate the OLAP metadata from.
    • The databases that are used in the data warehouse projects.
  3. From the main menu, click Data Warehousing > InfoSphere Warehouse Migration. The Migration wizard opens.
  4. In the Component Selection page, perform these steps:
    1. Select the components to migrate.
    2. Specify the location to save the migration log. By default, a MigrationLog.txt file is created in your current workspace directory.
    3. Click Next.
  5. To migrate data warehouse projects:
    1. In the Project Selection page, click Add. The Project Selection window opens.
    2. In the Source directory field, click the ellipsis (...) button and browse to the directory that contains the data warehouse projects to be migrated. This can be any directory where the projects are stored. It need not be a genuine Eclipse workspace directory. Ensure that you have read permission on this directory. The list of projects that are stored in the selected directory is displayed in the Select Projects area.
      Note: The directories must be real local file system directories. If the projects are stored in a version control system, such as CVS, ClearCase®, and so on, you must first check out the files and copy them to the local file system. Then, after the running the Migration wizard, check in the migrated files to the version control systems, if needed.
    3. In the Select projects area, select one or more data warehouse projects to migrate. By default, the wizard selects all the projects in the workspace for migration.
    4. In the Destination directory field, click the ellipsis (...) button and browse to a location on your local computer where you want to store the migrated project. By default, the destination directory is the same as the source directory but with a _95 suffix added to it. Ensure that you have write permission on this directory. For example, if the name of the source directory is v912DWHProjects, then the default name of the destination directory is v912DWHProjects_95.
      Note: If the name of the destination directory is the same as the source directory, then the old data warehouse projects are overwritten, and you can immediately use the migrated projects without importing the migrated projects into the workspace.
    5. Click OK. The source and destination directories are displayed on the Project Selection page.
  6. To migrate OLAP metadata:
    1. Select the databases that are to be used for migration.
    2. Optional: If you did not create a connection for the databases, click New Connection and enter the database details to create a connection. Click Next.
    3. Specify the connection information for the common repository database that you created during the InfoSphere Warehouse installation. Test the repository connection.
  7. Click Finish. Based on your selection for migration, the data warehouse projects or OLAP metadata is migrated. The data warehouse projects are migrated to the destination directory, and the OLAP metadata is migrated to the common repository database. A Migration Log window opens and displays the migration summary.
After a successful migration, the old data warehouse projects in V9.1.x are not modified, unless you select the destination directory that is the same as the source directory. The existing OLAP metadata is dropped from the database.