Final PDF Key considerations when deciding on data virtualization software PDF

Title	Final PDF Key considerations when deciding on data virtualization software
Author	UNE ELLE
Course	Big Data, traitement des données massives
Institution	Université Paris 1 Panthéon-Sorbonne
Pages	15
File Size	473.3 KB
File Type	PDF
Total Downloads	63
Total Views	116

Preview

CLICK TO PREVIEW PDF

Summary

data
virtualization...

Description

TIBCO whitepaper

Contents Introduction 1 Challenges, Benefits, Drawbacks 2

This document is intended to provide architects, analysts, and developers with guidance to make informed decisions on data virtualization software.

Challenges Addressed by Data Virtualization 2

Introduction

Benefits of Data Virtualization 2

Data virtualization provides a comprehensive approach to managing, accessing, integrating, and providing data to drive better decision-making throughout an organization.

Drawbacks of Data Virtualization 3

When to Use Data Virtualization 3 Analytics Use Cases 4 Operational Use Cases 6 Emerging Data Virtualization Use Cases 8 Enterprise Architecture Use Cases 10

Best Practices for Data Virtualization 11 Technical Considerations 12 Data Virtualization Deployment Strategies 13 Conclusion 15

Data Virtualization is an established form of data integration. Gartner has estimated that organizations that include data virtualization in their overall data integration strategy spend approximately 45% less than those who do not.1 It has become an important part of modern data infrastructure, used because it speeds development, increases data reuse, reduces data replication and movement, and breaks down data silos. The agility and flexibility of data virtualization combined with the reliability of traditional forms of data integration such as ETL, enables customers to derive higher value from their data and meet a wider range of use cases. It is important to recognize that while data virtualization has a number of advantages, it is not a panacea for all data integration challenges. The following sections outline some of the key considerations when deciding if data virtualization is the right choice for your organization.

1

Zaidi, Ehtisham et al. Gartner Market Guide for Data Virtualization. November 16, 2018.

TIBCO whitepaper | 2

Challenges, Benefits, Drawbacks Challenges Address by Data Virtualization Data virtualization (DV) is designed to meet the demands of today’s complex hybrid, multi-cloud data silo environments. The technology has the following capabilities to make it easy to access data, no matter where it resides: • Solution – Abstraction is used to map incompatible native structures and formats into the required forms. • Location – DV federates and formats data stored in multiple locations (on premises or the cloud) and makes it appear as if it is stored in a single logical location. To consumers, it appears as a single dataset. • Completeness – The technology allows fragments of data from disparate sources to be combined into a full picture. • Latency – DV allows real-time access to current data.

Benefits of Data Virtualization When you use data virtualization to implement a modern data layer, you provide data consumers with: • Business-friendly data – Deliver data in a businessrelevant way instead of how it is stored in IT schemas. Maintain consistency with business definitions so everyone is on the same page. • Faster time-to-data – Take advantage of the latest data from across distributed data sources. Provision new data requests quickly and react rapidly as requirements change. • Self-service data access – Let users focus on how to best apply data to analytics and applications while technical teams focus on how to provision and manage it. Technical teams also benefit via: • Reduced IT costs – Reduce data engineering demand and the need for physical copies of data to, in turn, lower the cost of physical storage as well as the need for rigid ETL and its associated time and cost. • Enhanced governance and security – Protect your data assets from unauthorized access. Comply with regulations including those that require encryption and masking. • Enterprise scale – Support diverse demands driven by multiple lines of business, hundreds of projects, and thousands of users, while meeting your most demandingSLAs.

TIBCO whitepaper | 3

Drawbacks of Data Virtualization Transformations are repeated with every query: This is the nature of on-demand transformations. Repeated transformations may not be an issue if the load is small, or if the servers and infrastructure can be scaled to handle the load. One option is to convert the transformations to periodic instead of on-demand through caching; this drawback is often traded for real-time access to data. Complex joins or transformations can be very slow: Queries or transformations that are very complex or operate on large datasets can take a long time to process, which may not be acceptable for a user-interactive application. Use of specialized join algorithms, query optimizers, and statistics-gathering can often overcome this challenge. Caching helps, but may introduce data latency. The former is often sufficient with the latter implemented as a last resort. No data history: The only historical data available is in source systems. Because data is not physically persisted with data virtualization, historical archives are not accumulated. Cumulative, incremental caching strategies can be used to preserve historical data (similar to warehouse trickle-feeding techniques) at the cost of implementation complexity and storage management.

When to Use Data Virtualization How do you decide when to use data virtualization? To make an informed decision, you should: • Clarify your business requirements or problem statement • Understand the use cases for data virtualization • Identify the benefits you expect • Examine existing usage patterns prescribed for datavirtualization • Follow the recommendations given by industry experts • Consider the technical constraints of the environment Once the business requirements have been clearly defined, surveying the use cases for data virtualization can be effective in helping you understand the ways that this technology can be used. This section lists common data virtualization use cases. Please see the solution brief, Applying Data Virtualization: 13 Use Cases that Matter for more details.

TIBCO whitepaper | 4

Analytics Use Cases • Prototyping for Physical Data Integration • Data Access / Semantic Layer for Analytics (BI, reporting, self-service, data science, etc.) • Logical Data Warehouse Architecture • Data Preparation

Operational Use Cases • Abstract Data Access Layer/Virtual ODS • Registry-style Master Data Management • Legacy System Migration • Application Data Access

Emerging Use Cases • Cloud Data Sharing • Edge Data Access in IoT Integration • Data Hub Enablement • Regulatory Constraints on Data Usage • Data as a Service • Data Fabric

Enterprise Architecture Use Cases • Enterprise Shared Data Services

Analytics Use Cases Prototyping for Physical Data Integration Use Cases: Physical integration is a proven approach to analytic data integration, but it requires significant data engineering efforts and a complex software development lifecycle of, on average, seven weeks, according to TDWI. Its main benefits include the ability to interactively refine requirements, and based on actual data, build virtual data services side-by-side with business users. Once physical integration is tested, you can transparently migrate from virtual to physical without loss of service.

TIBCO whitepaper | 5

DATA WAREHOUSE PROTOTYPE

XLS

XML

Figure 1. Virtual Data Warehouse Serves as Prototype to Enable RapidDevelopment

Data Access / Semantic Layer for Analytics (BI, reporting, self-service, data science) Use Cases: Vendor-specific analytic semantic layers provide specialized data access and semantic transformation capabilities that simplify your analytic application development. However, these vendor-specific solutions have limitations, including the inability to share analytic datasets with other vendors’ analytic tools. In this case, the main benefit provided by data virtualization is a vendoragnostic solution for data access/semantic layer. You can access any data source required, model and transform analytic datasets quickly, and share and reuse datasets across multiple vendors’ tools. See also https://www.tibco.com/solutions/ virtual-data-layer Logical Data Warehouse Architecture Use Cases: Logical data warehouse (LDW) architecture combines the strengths of traditional warehouses with alternative data management and access strategies to improve your agility, accelerate innovation, and respond more efficiently to changing business requirements. The main benefits of data virtualization in this use case include one logical place to go for analytic datasets regardless of source or application, which leads to higher quality analysis from broader data access; the ability for more complex transformations; and consistent, well-understood data. Enterprise data security and governance are also strengthened. See also https://www.tibco.com/solutions/logical-datawarehouse

TIBCO whitepaper | 6

BUSINESS INTELLIGENCE

SELF-SERVICE ANALYTICS

DATA SCIENCE

LOGICAL DATA WAREHOUSE

DATA WAREHOUSE

DATA WAREHOUSE

CLOUD DATA WAREHOUSE

DATA WAREHOUSE

Data Preparation Use Cases: Self-service data preparation has proven to be a great way for business users to quickly transform raw data into more analytic friendly datasets. However, some agile data preparation requirements need data engineering skills and higher-level integration capabilities. For this use case, data virtualization provides rapid, IT-grade datasets that meet analytic data needs, either as-is or as the foundation for additional selfservice data preparation by analysts.

Operational Use Cases Abstract Data Access Layer/Virtual ODS Use Cases: Physical operational data stores (ODS) have proven a useful compromise that balances operational data access needs with operational system SLAs. There are significant development investments to set up a physical ODS, in addition to higher operating costs for managing the associated infrastructure. A virtual ODS can mitigate the challenges. The main benefis of data virtualization for this use case are that it provides one virtual place to go for operational data and lower costs due to less maintenance for replicated data in physical ODSs. In addition, the impact on operational sources via query optimization and intelligent caching is reduced. See also https://www.tibco.com/solutions/virtual-data-layer

TIBCO whitepaper | 7

VIRTUAL ODS

XLS

XML

Figure 2. Using Data Virtualization to Create Virtual ODS

Registry-style Master Data Management Use Cases: Master Data Management (MDM) is an essential capability. Analyst firms such as Gartner have identified four MDM implementation styles (consolidation, registry, centralized, and coexistence) that you can deploy independently or combine to help enable successful MDM efforts. The main benefit of data virtualization here is that it provides a complete solution for registry-style MDM implementations with integrated support for consolidation, centralization, and coexistence of MDM implementation styles to create 360° views. See also https:// www.tibco.com/solutions/anything-360

360 VIEW

MASTER DATA HUB

ETL SERVER

X LS

X ML

TIBCO whitepaper | 8

Legacy System Migration Use Cases: New technology provides more advanced capabilities and lower cost infrastructure. You want to take advantage. However, migrating legacy data repositories to new ones, or legacy applications to new applications technology, is not easy. Business continuity requires non-stop operations before, during, and after the migration. Applications and data repositories are often tightly coupled making them difficult to change. Using data virtualization provides a flexible solution for legacy system migration challenges. You can create a loosely coupled, middletier of data services that mirror as-is data access, transformation, and delivery functionality. And you can test and tune these data services on the side without impacting current operations. Application Data Access Use Cases: Your applications run on data; However, application data access can be difficult. Challenges include the need to understand and access increasingly diverse and distributed data sources and types including data-in-motion and data-at-rest. Using data virtualization, you’ll have one place to go for both analytic and operational application data. The broader data access with support for more complex data transformations, increases an application’s functionality and value. In addition, there will be lower costs due to application dataset reuse across diverseapplications.

Emerging Data Virtualization Use Cases Cloud Data Sharing Use Cases: With the rise of cloud-based applications and infrastructure, more data than ever resides outside your enterprise. As a result, your need to share data across your cloud and on-premises enterprise sources has grown significantly. With data virtualization, you can access nearly any major cloud data source and quickly model and transform those cloud datasets. You can integrate data from both cloud and on-premises sources and deliver it to a wide range of application development tools via industry-standard APIs, protocols, and architectures including ODBC, JDBC, SOAP, REST, and more. BUSINESS INTELLIGENCE

SELF-SERVICE ANALYTICS

DATA SCIENCE

360° VIEW

DATA VIRTUALIZATION

TRANSACTIONAL APPLICATIONS

FIREWALL

DATA VIRTUALIZATION

CACHE

ON-PREM DATA SERVICES

METADATA REPOSITORY

ON-PREMISE

CACHE

CLOUD DATA SERVICES

CLOUD

METADATA REPOSITORY

TIBCO whitepaper | 9

Edge Data Access in IoT Integration Use Cases: Device data from IoT presents new analytic and operational application opportunities. Taking advantage of these opportunities requires understanding and accessing increasingly diverse and distributed IoT data sources and types, some of which are in motion. For edge access, the main benefit of data virtualization is the access and transformation of IoT data using standard streaming data manipulation functions including enrichment, cleansing, and sliding windows. In addition, you can model and combine IoT data and other data sources to create integrated IoT datasets. Data Hub Enablement Use Cases: The data hub is a logical architecture that enables data sharing by connecting producers of data (applications, processes, and teams) with consumers of data (other applications, processes, and teams). Master data hubs, customer data hubs, reference data stores, and others are examples. Data hub domains might be geographically focused, business process focused, or application focused. For these use cases, data virtualization offers a complete solution. It can provide complete visibility into data hub data flows and end-to-end security and governance. Regulatory Constraints on Data Usage Use Cases: Regulatory constraints on data continue to expand with more governments passing new laws on data privacy and use. These constraints or requirements include: • Limits on what data can be seen and by whom • The ability to anonymize data • The ability to delete personally identifiable information • The need to report what data you have, who has seen that data, and in what context • Limits on moving or replicating data beyond an enterprise and/or a country Data virtualization provides many capabilities to comply with these new regulations. It provides virtual data services that eliminate the need to replicate regulated data. And it can implement a variety of security protocols to conform with compliance policies, including using column and row masking rules to hide, replace, and/or obfuscate personally identifiable information. Data as a Service Use Cases: Data-as-a-service (DaaS) is another modern data architecture concept that offers the flexibility to address various data demands and consumption methods. It can provision data via microservices, choreographed calls, or APIs for any internal and external data source and take advantage of open source, market data, data aggregators, and data brokers.

TIBCO whitepaper | 10

Data virtualization can provide the flexible foundation to integrate both internal and external data sources and then provision that data to both internal and external consumers through called services and APIs. See also https://www.tibco. com/solutions/data-service Data Fabric Use Cases: A data fabric is an emerging data architecture design concept that is enabled by an optimized combination of data management and data integration capabilities that leverage AI/ML and self-service techniques including master data management, metadata management, data governance, data catalog, data security, data preparation, data virtualization, data integration, and data streaming. Data virtualization can accelerate the implementation of a data fabric architecture. It can support more use cases and data sources without the rebuild of existing architecture or replacement of the systems already in use. Further, it allows a data fabric to be deployed in phases across distributed on-premises, hybrid, and multi-cloud environments.

Enterprise Architecture Use Cases Enterprise Shared Data Services Use Cases: Data virtualization creates a data abstraction layer to transform data from native sources into reusable views and services; federates data warehouses, data lakes, cloud data sources, and other sources into a unified layer to provide common enterprise data; and creates shareable data services that integrate well with SOA infrastructure to promote greater agility and re-use. With data virtualization, creating re-usable data views and service assets promotes business agility and data consistency.

XLS

Figure 3. Re-usable Data Views and Service Assets Promote BusinessAgility

TIBCO whitepaper | 11

Best Practices for Data Virtualization TIBCO has a free recommendation tool that provides simple 80/20 type decision-making to help you quickly and accurately select the right data integration strategy for each new development project. Scenarios where data virtualization is a good fit: • Use data virtualization for prototyping data warehouse and BI projects. A virtual data prototype can assist with gathering, refining, and documenting business needs. This agile development approach is less time-consuming, allows changes to be made on the fly, and reduces risk of missed requirements. • Augment data warehouses with current data. Combine historical data from data warehouses with data virtualization snapshots for a more complete view. • Use virtual data marts over physical ones. Create virtual data marts based on a data warehouse instead of moving the data into marts. • Augment ETL. Data from ETL is latent, while data from virtualization is current. Combine the two for a more complete view. • Monitor the impact of data virtualization on source systems. Monitor performance of operational systems and collect performance metrics over time to predict and avert impacts of BI activities. • Determine data virtualization infrastructure requirements. Take into account hardware, software, and network performance and scalability when implementing data virtualization projects to make sure adequate infrastructure exists to support them. • Define a shared business view of source data. Create shared business views to ensure a consistent view of source data, reduce the proliferation of individual views of the same data, and isolate data consumers from changes to data sources. • Introduce a shared business vocabulary (SBV) to promote a consistent understanding of data. Use of a SBV, for example, for common data names and data definitions to ens...