DABI PDF

Title DABI
Course Data and software analytics
Institution NEOMA Business School
Pages 6
File Size 121.8 KB
File Type PDF
Total Downloads 71
Total Views 119

Summary

Download DABI PDF


Description

Introduction Changing business environment and evolving needs for decision support and analytics: Group communication and collaboration. Increased hardware, software and network capabilities. Manage data warehouse and big data. Improved data management. Overcoming limits in processing and storing data. Analytical support. Knowledge management. Anywhere anytime support. Evolution of computerized decision support to analytics and data science: decision support system, enterprise/executive information system, business intelligence, business analytics and bid data. Business Intelligence is an umbrella term that combine architecture, tools, data warehouse, analytical tools, applications and methodologies. It is descriptive analytics tools and techniques. There are four major components of business intelligence: data warehouse (DW), business analytics, business performance management (BPM) and user interface (dashboard). Online Transaction Processing uses operational data to capture data, and online analytic processing uses data in data warehouse to support decision making. Real-time on-demand BI tools: RFID, web services and intelligent agents. Critical BI system considerations: develop or acquire BI System (make versus buy, BI shells), justification and cost-benefit analysis, security and privacy protection, integration to other systems and application. Business Analytics is the process of developing decisions or recommendations for actions based on insights generates from historical data, it is a combination of computer technology, management science and statistics. There are three types of business analytics: descriptive analytics, predictive analytics and prescriptive analytics. Descriptive Analytics refer to understand what is happening in the organization and understand trends and cause of such occurrences. Predictive Analytics determine what is likely to happen in the future. And Prescriptive Analytics uses insights from descriptive and predictive analytics to determine the best possible decisions. Business analytics can be used in healthcare and retail values chain industry. Big Data is data that cannot be processed or stored easily using traditional tools. It typically refers to data that come in different forms: large, structure, unstructured and continuous. 3V: volume, variety and velocity.

Descriptive Analytics I Data refers to a collection of facts obtained from experiments, observations, transactions or experiences. Data can consist of numbers, letters, words, image, audios and videos. And data it the lowest level of abstraction from which information and knowledge are derived (data is the source of information and knowledge). Automated data collection systems enable us to collect more volumes of data and enhance the quality and integrity of data, and the Metrics for Analytics Ready Data are: data source reliability, data content accuracy, data accessibility, data security and privacy, data richness, data consistency, data currency, data granularity, data validity and data relevancy. Data can be classified as Structured Data and Unstructured Data (Semi-Structured Data). Structured data can be classified as categorical data (nominal and ordinal) and numerical data (interval and ratio), and unstructured data can be classified as textual, multimedia (image, audio and video) and web content. The real-world data is dirty, misaligned, overly complex and inaccurate, and there are Four Phases of Data Preprocess to make it ready for analytics: data consolidation (collect data, select data, integrate data), data cleaning (input values, reduce noise, eliminate duplicates), data transformation (normalize data, discretize data, create attributes) and data reduction (reduce dimension, reduce volume and balance data). Descriptive Analytics refers to knowing what is happening in the organization and understanding underlying trends and cause of such occurrences. And there are Two Branches of Descriptive Analytics: OLAP and statistics. Statistics is a collection of mathematical techniques to characterize and interpret data. And Statistical Models can be classified as descriptive statistics and inferential statistics. Descriptive Statistics describe the basic characteristics of data, and Inferential Statistics inference or conclude the characteristics of data. The Methods in Descriptive Statistics can be classified as measures of centrality tendency (data, mean median mode), measures of dispersion (range, variance, standard deviation, mean absolute deviation, quartile and interquartile range and box-and-whiskers plot) and the shape of data distribution (skewness and kurtosis). The technique for Inferential Statistics is regression, and Regression can be classified as liner regression, logistics regression and time series forecasting. Liner regression can be developed by scatter chart and ordinary least squares method. And we can evaluate the regression mode by R2 (R Square), the overall F-test, and the root mean square error (RMSE), the value of R2 range from 0 to 1, 0 indicates that the model is not good and 1 indicates that the model is good. Business Report can be classified as metric management reports, dashboard-type reports and balanced scorecard. Data Visualization can be used in dashboard type reports, and for the Performance Dashboard, there are three layers of information: monitor, analysis and management.

Descriptive Analytics II Business Intelligence refers to the use of data for managerial decision support, BI is the early stage of busines analytics - descriptive analytics. Business intelligence has four major components: data warehouse, business analytics, business performance management and user interface. Data Warehouse is a physical repository where relational data are specially organized to provide enterprise-wide, cleaned data in a standardized format, it is a subject-oriented, intergraded, time variant, nonvolatile collection of data in support of management decision making process. There are four major components of data warehouse: data mart, operational data store, enterprise data warehouse and metadata. Data Mart is a subset of data warehouse, it is usually smaller and focuses on a subject or department, and there are two types of data mart: depend data mart and independent data mart. Operational Data Store is a temporary area for data warehouse. Enterprise Data Warehouse is a data warehouse for enterprise. And Metadada is data about data. Data Warehousing Process: data source - ETL process - data warehouse (metadata) - data marts - middleware - front end applications. Data Warehouse Architectures are called n-tier architecture, three-tier architecture has data acquisition(back-end) software, data warehouse and client (front-end) software, and there are five alternative data warehouse architectures: independent data marts, data mart bus architecture, hub-and-spoke architecture, centralized data warehouse and federated data warehouse. Data Integration comprises three major processes: data access, data federation and change capture, and there are three major Integration Technologies: enterprise application integration, enterprise information integration and ETL (extraction, transformation and load). Data Warehouse Development has two Approaches: Inmon Model - the EDW approach and Kimball Model - DM approach. The Data Representation in Data Warehouse has always been based on the concept of dimensional modeling, and there are two types of data representation in data warehouse: start schema and snowflake schema. And OLAP is used to analyze the data in data warehouse. Data warehouse can face the issues of data warehouse implementation, massive data warehouse and scalability and data warehouse security issues and data warehouse administrator take cares of the data warehouse. And the future of data warehouse can be sourcing and infrastructure (data lake is an unstructured data storage technology for big data). Business Performance Management refers to the business processes, methodologies, metrics and technologies used by enterprise to measure, monitor and manage busines performance. And there are four components for Closed-Loop BPM Cycle: strategize, plan, monitor, and act and adjust. And it has two types of systems: Performance Measurement System (KPI) and Performance Management System (balanced scoreboard and six sigma).

Predictive Analytics Predictive Analytics determines the future by analyzing the historical data, it has four major enablers: data mining, text mining, web mining and social media analytics. Data Mining is the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data stored in structed databases. (Data mining is a blend of multiple disciplines: statistics, artificial intelligence, machine learning, management science, information system and databases.) Using existing and relevant data obtained from within and outside the organization, Data Mining builds models to discover patterns among the attributes presented in the data set (Models are the mathematical representations that identify the patterns among the attributes of the things described within the data set). And it has four major patterns: prediction, clusters, association and sequential relationships. Data mining and statistics both look for relationships within data, statistics is the foundation of data mining. Statistics starts with a well-defined proposition and hypothesis; data mining starts with a loosely defined discovery statement. Statistics use sample data; data mining use all the existing data to discover novel patterns and relationships. Statistics use the right size of data, and data mining use data sets that are as big as possible. Data mining can be applied in bank, brokerage and securities trading, insurance, retailing and logistics, manufacturing and production, customer relationship management, homeland security and law enforcement, entertainment industry, computer hardware and software, sports, healthcare, medicine, government and defense and travel industry. Data mining has three major process: CRISP-DM (Cross Industry Standard Process for Data Mining), it has six steps: business understanding, data understanding, data preparation, model building, testing and evaluation, deployment. SEMMA, it has five steps: sample, explore, modify, model and assess. KDD (Knowledge Discovery in Database), it has five steps: data selection, data preprocessing, data transformation, data mining and interpretation/evaluation. Data mining has three major methods: Classification learns pattern from past data to place new instances into their respective groups or classes. Confusion matrix estimate the accuracy of the classification. And it has three modeling technique: decision tree analysis, statistical analysis, neural networks. Cluster analysis classify items, events or concepts into common groups called clusters, and the most commonly used clustering algorithms are k-means and self-organizing maps. Association rule mining discover two or more items that go together, it is commonly referred to as market basket analysis, and the most commonly used association algorithm is Apriori.

Prescriptive Analytics Prescriptive Analytics involves using an analytical model to guide a decision maker in making a decision or automating the decision process so that a model can make recommendations or decisions. The decision-making process can be supported by Mathematical Model. All quantitative models are made up of four basic components: result (or outcome) variables, decision variables, uncontrollable variables (or parameters), and intermediate result variables. These four components are linked by mathematical (algebraic) expressions - equations or inequalities. And depending on the knowledge level (ranging from complete knowledge to complete ignorance) of the decision makers, decision making process can be under certainty, uncertainty or risk. Decision models can be developed and implemented by the software spreadsheet. Spreadsheet is the most popular modeling tool. It has many powerful functions such as what-if analysis, goal seeking, optimization and simulation and it incorporates both static and dynamic models. Mathematical Programming is a spreadsheet-based optimization tool to solve managerial problems in which the decision maker must allocate scarce resources among competing activities to optimize a measurable goal. And Linear Programming (LP) is the best-known technique of mathematical programming, and all relationships among the variables are linear in liner programming. LP model has three components: decision variables, result variables, and uncontrollable variables (constraints). Multiple Goals Decision Making can be modeled and solved by sensitivity analysis. Sensitivity Analysis attempts to assess the impact of a change in the input data or parameters on the proposed solution, and What-If Analysis and Goal Seeking are the two most common methods of sensitivity analysis. And Simple Decision-Making can be modeled and solved by Decision Tables and Decision Trees. The decision-making process can also be supported by simulation. Simulation is the appearance of reality. In decision systems, simulation is a technique for conducting experiments (e.g., what-if analyses) with a computer on a model of a management system.

Future Trends Internet of Thing (IoT) is the phenomenon of connecting the physical world to the internet, in contrast to the internet of human that connects human to each other through technology. In IoT, physical devices are connected to sensors that collects data on the operation, location and state of a device. IoT uses sensors or sensing devices and it can be used for self-driving cars, fitness trackers, smart bins and smart refrigerators. There are three reasons of exponential growth in IoT: hardware is smaller, affordable and more powerful, availability of BI tools and emergence of new and innovative use cases. And there are four blocks of IoT technology infrastructure: hardware, connectivity, software and applications. Radio Frequency Identification (RFID) is one of the earliest sensor technologies, it uses radio frequency waves to identify objects, and it is mostly used in retail supply chain to improve their supply chain, reduce the costs and increase sales. Data produced by sensors in IoT is large, and Fog Computing propose fog notes to process the data close to IoT, and fog note can be placed anywhere between the network connection. Cloud Computing is a style of computing in which dramatically scalable and virtualized resources are provided over the internet. And it has three different deployment models: private cloud, public cloud and hybrid cloud. Service Oriented Decision Support System is a cloud-based analytics system, and it has several types of services, data as a service, software as a service, platform as a service and infrastructure as a service. Virtualization is the essential technology of cloud computing, is the creation of a virtual version of operating system or server. And it has three levels of virtualization: network virtualization, storage virtualization and server virtualization. There are two types of location-based analytics: Geospatial Analytics and Geographic Information System (GIS), GIS is used in agricultural applications, crime detection and disease spread prediction. There are three major issues of analytics: Legality, Privacy and Ethics....


Similar Free PDFs
DABI
  • 6 Pages