White Paper Strategic Approach to Complex ETL Testing PDF

Title White Paper Strategic Approach to Complex ETL Testing
Author N K Pandey
Course Fundamentals of Engineering
Institution Chhattisgarh Swami Vivekanand Technical University
Pages 5
File Size 468.9 KB
File Type PDF
Total Downloads 37
Total Views 142

Summary

This is a white paper on strategic approach to complete ETL testing....


Description

A Strategic Approach to Complex ETL Testing Jeffrey R. Bocarsly, Ph.D Vice President and Chief QuerySurge Architect

Connect:

©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017

Data Warehouse Testing A data warehouse is a repository of transactional data that has been extracted from original electronic sources and transformed so that query, analysis and reporting on trends within historic data are possible and efficient. The analyses provided by data warehouses may involve strategic planning, decision support, and monitoring the outcomes of a chosen strategy. Typically, data that is loaded into a data warehouse is derived from diverse sources of operational data, which may consist of data from databases, feeds, application files or flat files. The data must be extracted from these diverse sources, transformed to a common format, and loaded into the data warehouse. Typically, it is further aggregated into a data mart for efficient reporting. The ETL (Extract, transform and load) process is a critical step in any data warehouse implementation, and continues to be an area of major significance whenever the ETL code is updated. Once the data warehouse and data marts are populated, business intelligence applications facilitate querying, analysis and reporting. The business intelligence tools may provide simple presentations of data based on queries, or may support sophisticated statistical analysis options. Data warehouses may have multiple front-end applications, depending on the needs of the user community. Figure 1 shows a simplified data warehouse scheme.

An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture:

1.

The ETL layer

2.

The data warehouse itself

3.

Associated data marts

4.

The front-end business intelligence/reporti ng applications

Each of these units must be treated separately and in combination, and since there may be multiple components in each (multiple feeds to ETL, multiple databases or data repositories that constitute the warehouse, multiple data marts, and multiple front-end applications), each of these subsystems must be individually validated.

One issue specific to data warehousing in the Pharmaceutical industry is the variety and types of data sources. In addition to application databases with varied data models, data sources may utilize Pharma-specific information exchange formats (e.g., HL7, CDISC). Organizations may have multiple proprietary internal data formats, which may have been acquired in the process of industry consolidation. Thus, pharmaceutical data warehouses may have to support a broader range of input formats to their ETL layers than data warehouses in other industries.

©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017

Data Verification The recommended pre-deployment strategy is to build test automation (both functional and performance) for every test entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal of the strategy is to provide automated tools for rapid localization of issues between test entry points (see Figure 1).

HL7 CDISC

ETL ETL

XML

ETL

ETL

File

Data

Data

ETL ETL BI Tools

F i gur e 1 . A simplified Data Warehouse scheme

In Figure 1, data from a variety of sources is transformed by the ETL into the Data Warehouse. A second ETL “leg” aggregates data from the Data Warehouse into Datamart tables for efficient reporting. Front-end applications and Business Intelligence applications access the Datamart in order to provide historical and statistical analyses of company data.

Data Comparison

Data Comparison

HL7 CDISC XML File

ETL ETL

ETL

ETL

Data

Data

ETL BI Tools

ETL

Data Comparison Data Comparison

F i gur e 2 . Data comparison points in the Data Warehouse scheme

As suggested by Figure 1, there are four major points of data comparison in the scheme. These are indicated in Figure 2 and are found between the:

• • • •

Source(s) and Data Warehouse Data Warehouse and Datamart Datamart and BI Tool Source(s) and Datamart

©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017

In automated Data Warehouse testing, the emphasis is the validation of data integrity between all points of comparison to ensure the proper implementation of the ETL mappings and transformations across the architecture. Using test automation, data can be tracked from the source layer, through the ETL processing and through the Data Warehouse and Datamart components, to the front-end applications. If corrupt data is found in a front-end application, the execution of automated tests can quickly determine whether the problem is located in the data source, an ETL process, in a data warehouse database, a datamart or in the frontend/Business Intelligence tool. Rapid determination of the application tier in which an issue is located can dramatically lower turnaround times for remediation, as well as enhance quality.

Performance and Scalability Performance

HL7 CDISC

Performance ETL ETL

XML

ETL

ETL

File

Data

Data

ETL ETL BI Tools Performance

F i gur e 3 . RTTS’ Data Warehouse Post-Deployment Testing Strategy

On the performance side, the test entry points analogous to the data comparison points are used, to focus on characterizing subsystem response under load. These are indicated in Figure 3. Using industry standard performance tools, the performance of each of the components in the ETL processes can be evaluated. Often, a major focus of performance and scalability testing is the Business Intelligence tool client interfaces because these are the user-facing components of the Data Warehouse architecture. Perceived poor performance by the user community is frequently a significant concern, and performance measurements across a series of releases can often reduce project tensions around this issue.

Overall Strategy The emphasis on rapid localization of either data or performance problems in complex data warehouse architectures provides a key tool for:



Promoting efficiencies in the development cycle



Shortening build cycles and meeting release targets



Delivering high quality Data Warehouse architectures.

©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017

Bibliography 1.

Inmon, W. H., Building the Data Warehouse, John Wiley & Sons; 3rd edition, 2002

2.

Kimball, R.; Reeves, L.; Ross, M.; Thornthwaite, W., The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses, John Wiley & Sons, 1998

About the Author Jeffrey R. Bocarsly, Ph.D.,

a Vice President and functional testing division manager with RTTS, has

implemented many automated software testing projects at Fortune 500 firms. His experience includes projects in various sectors including brokerage, utility, media, pharmaceutical, so ftware, banking, and insurance. Jeff is a co-author of “Software Test Engineering with IBM Rational Functional Tester: The Definitive Resource,” IBM Press, 2010.

He holds a BS (UCLA) and Masters and PhD degrees (Columbia University).

About RTTS RTTS offers the most comprehensive suite of quality assurance services to help organizations

drive

positive

results

from

their

critical

software

projects.

Headquartered in New York, NY, our expert team has worked closely with over 400 clients

to

improve

their

testing

processes,

tool

knowledge,

and

application

deployment outcomes. RTTS was founded in 1996, and has forged partnerships with

the

world’s

leading

test

tool

vendors.

Our

satellite

locations

are

in

Philadelphia, Atlanta, and Phoenix, and many of our consulting and education services are offered through the cloud. No matter where you are, RTTS will ensure application

functionality,

performance,

scalability,

and

security

for

your

organization.

About QuerySurge™ RTTS’ team of test experts developed QuerySurge™ to address the unique testing needs

in

ranging

the

data

from

data

warehousing warehousing

space. It and

ETL

has

been

processes

implemented to

data

on

projects

migrations

and

database upgrades. QuerySurge™ can verify as much as 100% of all data from source systems, through the ETL process, to the target data warehouse. The tool has increased test coverage and reduced test cycle time for several organizations, helping them to mitigate risk and meet business requirements.

QuerySurge™ is offered exclusively by RTTS, as are any accompanying Data Warehouse Testing services. If you are interested in learning more, or to schedule a private demonstration, please contact us by e-mail here: [email protected]. If you would like to speak with our Sales team, please call (212) 240-9050.

©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017...


Similar Free PDFs