The Complete Project Lifecycle for Decision Support Applications
Step 9. ETL Design
9.1 Create the source to target mapping document
Source data for the BI applications will come from a variety of platforms, which are managed by a variety of operating systems and applications. The purpose of the ETL process is to merge data from these heterogeneous platforms into a standard format for the BI target databases in the BI decision support environment. When building this environment, it is critical to perform the common data transformations for all BI target databases only once and to reconcile these data transformations back to the operational source files and source databases. It is related to validity of the data in the various BI target databases and related to the data consistency among the various BI target databases. The ETL process begins with preparations for reformatting, reconciling and cleansing the source data
One of the stage of ETL design especially ninth stage is create the source to target mapping document. Data mapping is the process of creating a link between two distinct data models (named source and target) tables and attributes. A source to target mapping document is the result of data mapping process. Target mapping document using the source data analysis results and the business rules from the previous steps and incorporate them into the transformation specifications. The most important information contained in such document are the relationships between source datastores fields and target datastore fields.
Before the ETL process flow can be designed, the detailed ETL transformation specifications for data extraction, transformation, and reconciliation have to be developed, given that they will dictate the process flow. A common way to document the ETL transformation specifications is in a source to target mapping document, which can be a matrix or a spreadsheet.
The source to target mapping document should list all BI tables and columns and their data type and lengths. It should map the applicable source data elements to the columns, along with their data types and lengths, and it should show the source files and source databases from which the source data elements are being extracted. The document should specify the transformation logic for each column. This document can then be used to create the actual programming specifications for the ETL developers or to create instructions (technical meta data) for the ETL tool.
9.2 Test the ETL tool functions
When using an ETL tool, the transformation specifications get translated into instructions for the ETL tool. These instructions can then be stored as technical meta data in the meta data repository. Expanding the ETL process and running regression tests are made easier with the tool because there is less human intervention and therefore fewer chances to introduce errors.
It is very important to test the ETL tool functions before designing the ETL process flow and before deciding how to set up the staging area. For example, it would be worthless to install a currently popular ETL tool that cannot read flat files on a mainframe if 90 percent of your source data is in flat files on your mainframe. Therefore, test the ETL tool functions and determine whether supplemental code must be written to perform some complicated and lengthy transformations that the tool cannot handle
When evaluating ETL products, follow these steps:
- Perform a cost benefit analysis to compare licensing (buying) an ETL product with building the ETL process in house
- Compile a list of ETL products and vendors that are likely to meet business requirements
- Compare the ETL products and vendors for data transformation requirements
- Compare the ETL products with another products
- Evaluate each ETL product objectively
- Choose the one products and test the tools