Title | White Paper Strategic Approach to Complex ETL Testing |
---|---|
Author | N K Pandey |
Course | Fundamentals of Engineering |
Institution | Chhattisgarh Swami Vivekanand Technical University |
Pages | 5 |
File Size | 468.9 KB |
File Type | |
Total Downloads | 37 |
Total Views | 142 |
This is a white paper on strategic approach to complete ETL testing....
A Strategic Approach to Complex ETL Testing Jeffrey R. Bocarsly, Ph.D Vice President and Chief QuerySurge Architect
Connect:
©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017
Data Warehouse Testing A data warehouse is a repository of transactional data that has been extracted from original electronic sources and transformed so that query, analysis and reporting on trends within historic data are possible and efficient. The analyses provided by data warehouses may involve strategic planning, decision support, and monitoring the outcomes of a chosen strategy. Typically, data that is loaded into a data warehouse is derived from diverse sources of operational data, which may consist of data from databases, feeds, application files or flat files. The data must be extracted from these diverse sources, transformed to a common format, and loaded into the data warehouse. Typically, it is further aggregated into a data mart for efficient reporting. The ETL (Extract, transform and load) process is a critical step in any data warehouse implementation, and continues to be an area of major significance whenever the ETL code is updated. Once the data warehouse and data marts are populated, business intelligence applications facilitate querying, analysis and reporting. The business intelligence tools may provide simple presentations of data based on queries, or may support sophisticated statistical analysis options. Data warehouses may have multiple front-end applications, depending on the needs of the user community. Figure 1 shows a simplified data warehouse scheme.
An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture:
1.
The ETL layer
2.
The data warehouse itself
3.
Associated data marts
4.
The front-end business intelligence/reporti ng applications
Each of these units must be treated separately and in combination, and since there may be multiple components in each (multiple feeds to ETL, multiple databases or data repositories that constitute the warehouse, multiple data marts, and multiple front-end applications), each of these subsystems must be individually validated.
One issue specific to data warehousing in the Pharmaceutical industry is the variety and types of data sources. In addition to application databases with varied data models, data sources may utilize Pharma-specific information exchange formats (e.g., HL7, CDISC). Organizations may have multiple proprietary internal data formats, which may have been acquired in the process of industry consolidation. Thus, pharmaceutical data warehouses may have to support a broader range of input formats to their ETL layers than data warehouses in other industries.
©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017
Data Verification The recommended pre-deployment strategy is to build test automation (both functional and performance) for every test entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal of the strategy is to provide automated tools for rapid localization of issues between test entry points (see Figure 1).
HL7 CDISC
ETL ETL
XML
ETL
ETL
File
Data
Data
ETL ETL BI Tools
F i gur e 1 . A simplified Data Warehouse scheme
In Figure 1, data from a variety of sources is transformed by the ETL into the Data Warehouse. A second ETL “leg” aggregates data from the Data Warehouse into Datamart tables for efficient reporting. Front-end applications and Business Intelligence applications access the Datamart in order to provide historical and statistical analyses of company data.
Data Comparison
Data Comparison
HL7 CDISC XML File
ETL ETL
ETL
ETL
Data
Data
ETL BI Tools
ETL
Data Comparison Data Comparison
F i gur e 2 . Data comparison points in the Data Warehouse scheme
As suggested by Figure 1, there are four major points of data comparison in the scheme. These are indicated in Figure 2 and are found between the:
• • • •
Source(s) and Data Warehouse Data Warehouse and Datamart Datamart and BI Tool Source(s) and Datamart
©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017
In automated Data Warehouse testing, the emphasis is the validation of data integrity between all points of comparison to ensure the proper implementation of the ETL mappings and transformations across the architecture. Using test automation, data can be tracked from the source layer, through the ETL processing and through the Data Warehouse and Datamart components, to the front-end applications. If corrupt data is found in a front-end application, the execution of automated tests can quickly determine whether the problem is located in the data source, an ETL process, in a data warehouse database, a datamart or in the frontend/Business Intelligence tool. Rapid determination of the application tier in which an issue is located can dramatically lower turnaround times for remediation, as well as enhance quality.
Performance and Scalability Performance
HL7 CDISC
Performance ETL ETL
XML
ETL
ETL
File
Data
Data
ETL ETL BI Tools Performance
F i gur e 3 . RTTS’ Data Warehouse Post-Deployment Testing Strategy
On the performance side, the test entry points analogous to the data comparison points are used, to focus on characterizing subsystem response under load. These are indicated in Figure 3. Using industry standard performance tools, the performance of each of the components in the ETL processes can be evaluated. Often, a major focus of performance and scalability testing is the Business Intelligence tool client interfaces because these are the user-facing components of the Data Warehouse architecture. Perceived poor performance by the user community is frequently a significant concern, and performance measurements across a series of releases can often reduce project tensions around this issue.
Overall Strategy The emphasis on rapid localization of either data or performance problems in complex data warehouse architectures provides a key tool for:
•
Promoting efficiencies in the development cycle
•
Shortening build cycles and meeting release targets
•
Delivering high quality Data Warehouse architectures.
©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017
Bibliography 1.
Inmon, W. H., Building the Data Warehouse, John Wiley & Sons; 3rd edition, 2002
2.
Kimball, R.; Reeves, L.; Ross, M.; Thornthwaite, W., The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses, John Wiley & Sons, 1998
About the Author Jeffrey R. Bocarsly, Ph.D.,
a Vice President and functional testing division manager with RTTS, has
implemented many automated software testing projects at Fortune 500 firms. His experience includes projects in various sectors including brokerage, utility, media, pharmaceutical, so ftware, banking, and insurance. Jeff is a co-author of “Software Test Engineering with IBM Rational Functional Tester: The Definitive Resource,” IBM Press, 2010.
He holds a BS (UCLA) and Masters and PhD degrees (Columbia University).
About RTTS RTTS offers the most comprehensive suite of quality assurance services to help organizations
drive
positive
results
from
their
critical
software
projects.
Headquartered in New York, NY, our expert team has worked closely with over 400 clients
to
improve
their
testing
processes,
tool
knowledge,
and
application
deployment outcomes. RTTS was founded in 1996, and has forged partnerships with
the
world’s
leading
test
tool
vendors.
Our
satellite
locations
are
in
Philadelphia, Atlanta, and Phoenix, and many of our consulting and education services are offered through the cloud. No matter where you are, RTTS will ensure application
functionality,
performance,
scalability,
and
security
for
your
organization.
About QuerySurge™ RTTS’ team of test experts developed QuerySurge™ to address the unique testing needs
in
ranging
the
data
from
data
warehousing warehousing
space. It and
ETL
has
been
processes
implemented to
data
on
projects
migrations
and
database upgrades. QuerySurge™ can verify as much as 100% of all data from source systems, through the ETL process, to the target data warehouse. The tool has increased test coverage and reduced test cycle time for several organizations, helping them to mitigate risk and meet business requirements.
QuerySurge™ is offered exclusively by RTTS, as are any accompanying Data Warehouse Testing services. If you are interested in learning more, or to schedule a private demonstration, please contact us by e-mail here: [email protected]. If you would like to speak with our Sales team, please call (212) 240-9050.
©2014 Real-Time Technology Solutions (212) 240-9050 • www.rtts.com • [email protected] 360 Lexington Ave. Fl 9, New York, NY 10017...