Exam 1 cheat sheet PDF

Title	Exam 1 cheat sheet
Author	Michael Oboite
Course	Organismal Biology
Institution	University of Georgia
Pages	2
File Size	163.8 KB
File Type	PDF
Total Downloads	37
Total Views	156

Preview

CLICK TO PREVIEW PDF

Summary

cheat sheet...

Description

Strategic decision- A decision that involves higher-level issues and that is concerned with the overall direction of the organization, defining the overall goals and aspirations for the organization’s future. Tactical decision- A decision concerned with how the organization should achieve the goals and objectives set by its strategy. Operational decision- A decision concerned with how the organization is run from day to day. Descriptive analytics- Analytical tools that describe what has happened. Oldest and most outdated Data query- A request for information with certain characteristics from a database. Data dashboard- A collection of tables, charts, and maps to help management monitor selected aspects of the company’s performance. Predicitive analytics- Techniques that use models constructed from past data to predict the future or to ascertain the impact of one variable on another. Techniques used in Predictive Analytics include: • Linear regression. • Time series analysis. • Models of association that find patterns or relationships among variables in a large database • Simulation involves the use of probability and statistics to construct a computer model to study the impact of uncertainty on a decision. • Multi-variate statistical analysis Data mining- Techniques that use models constructed from past data to predict the future or to ascertain the impact of one variable on another.- The use of analytical techniques for better understanding patterns and relationships that exist in large data sets These terms refer to a set of techniques for discovering patterns in a large dataset . Prescriptive analysis -Techniques that analyze input data and yield a best course of action. A forecast or prediction, when combined with a rule, becomes a prescriptive model. Part of Optimization models- A mathematical model that gives the best decision, subject to the situation’s constraints. Rule based model-A prescriptive model that is based on a rule or set of rules. Uitility theory- The study of the total worth or relative desirability of a particular outcome that reflects the decision maker’s attitude toward a collection of factors such as profit, loss, and risk. Data analysis- A technique used to develop an optimal strategy when a decision maker is faced with several decision alternatives and an uncertain set of future events. Simulation - The use of probability and statistics to construct a computer model to study the impact of uncertainty on the decision at hand. Simulation optimization- The use of probability and statistics to model uncertainty, combined with optimization techniques, to find good decisions in highly complex and highly uncertain settings. Hadoop- An open-source programming environment that supports big data processing through distributed storage and distributed processing on clusters of computers. Big data- Any set of data that is too large or too complex to be handled by standard data-processing techniques and typical desktop software. • Because data are collected electronically, we are able to collect more of it. • To be useful, these data must be stored, and this storage has led to vast quantities of data. Velocity: • Real-time capture and analysis of data present unique challenges both in how data are stored and the speed with which those data can be analyzed for decision making. More complicated types of data are now available and are proving to be of great value to businesses. • Text data are collected by monitoring what is being said about a company’s products or services on social media platforms. • Audio data are collected from service calls. • Video data are collected by in-store video cameras and used to analyze shopping behavior. • Analyzing information generated by these nontraditional sources is more complicated in part because of the processing required to transform the data into a numerical form that can be analyzed. • Veracity has to do with how much uncertainty is in the data. • Inconsistencies in units of measure and the lack of reliability of responses in terms of bias also increase the complexity of the data Data security- Protecting stored data from destructive forces or unauthorized users. Data scientist- Analysts trained in both computer science and statistics who know how to effectively process and analyze massive amounts of data. Internet of things- The technology that allows data collected from sensors in all types of machines to be sent over the Internet to repositories where it can be stored and analyzed. Advanced analytics- Predictive and prescriptive analytics. Three developments spurred recent explosive growth in the use of analytical methods in business applications:- First development: Technological advances—scanner technology, data collection through e-commerce, Internet social networks, and data generated from personal electronic devices—produce incredible amounts of data for businesses.- Businesses want to use these data to improve the efficiency and profitability of their operations, better understand their customers, price their products more effectively, and gain a competitive advantage. Second development-Ongoing research has resulted in numerous methodological developments, including: • Advances in computational approaches to effectively handle and explore massive amounts of data. • Faster algorithms for optimization and simulation. • More effective approaches for visualizing data. Third development - The methodological developments were paired with an explosion in computing power and storage capability. • Better computing hardware, parallel computing, and cloud computing have enabled businesses to solve big problems faster and more accurately than ever before. • Very recently GPU’s for DEEP NN Decision making- Managers responsibility - To make strategic, tactical, or operational decisions. Decision making process- 1. Identify and define the problem. 2. Who are the stakeholders? 3. Determine the criteria that will be used to evaluate alternative solutions. (Analytical Models) 4. Determine the set of alternative solutions. (Analytical Models) 5. Evaluate the alternatives. (Analytical Models) 6. Choose an alternative. (Analytical Models) 7. Deployment Approaches to making decisions - • Tradition. • we have always done it that way, bad • Intuition. • Rules of thumb. • Can be useful if applied correctly, can be developed from Business Analytics • Using the relevant data available. • Business Analytics Business analytics - • Scientific process of transforming data into insight for making better decisions. • Used for data-driven or fact-based decision making, which is often seen as more objective than other alternatives for decision making. • Predictive and Prescriptive • Problem Centric Vs Data Centric Tools of business analytics can aid decision making by - • Creating insights from data. • Improving our ability to more accurately forecast for planning. • Helping us quantify risk. • Yielding better alternatives through analysis and optimization. IBM describes the phenomenon of big data through the four Vs • Volume. • Velocity. • Variety. • Veracity

Observation : A set of values corresponding to a set of variables. Population : The set of all elements of interest in a particular study. Random Variable -uncertain variable: A quantity whose values are not known with certainty Cross sectional data- Data collected from several entities at the same, or approximately the same, point in time Time series data-Data collected over several time periods. • Graphs of time series data are frequently found in business and economic publications. • Graphs help analysts understand what happened in the past, identify trends over time, and project future levels for the time series • Experimental study: • A variable of interest is first identified. • Then one or more other variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest. • Nonexperimental study or observational study: • Makes no attempt to control the variables of interest. • A survey is perhaps the most common type of observational stud • Frequency distribution: A summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes. • Typically referred to as bins, when dealing with distributions. Frequency distribution is literally the number . • Relative frequency distribution: A tabular summary of data showing the relative frequency for each bin. • Percent frequency distribution: Summarizes the percent frequency of the data for each bin. • Percent frequency distribution is used to provide estimates of the relative likelihoods of different values of a random variable. -to calculate relative frequency divide the piece that you want by 100 . That’s relative . Percent distribution is when you multiply the relative by 100. Histograms: • Histograms provide information about the shape, or form, of a distribution. • Skewness: Lack of symmetry. • Skewness is an important characteristic of the shape of a distribution. X_1 = value of variable x for the first observation. • x_2 = value of variable x for the second observation. • x_i = value of variable x for the ith observation. • Multimodal data: Data contain at least two modes. • Bimodal data: Data contain exactly two modes. How to calculate variance- 1.find mean 2. Subtract the mean from each data point. 3.Square each result.4. find the sum of the squared values . 5. Divide answer by n-1 6. Standard deviation is the variance squared . Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean), or expected value. A low standard deviation means that most of the numbers are close to the average. A high standard deviation means that the numbers are more spread out. The coefficient of variation is a descriptive statistic that indicates how large the standard deviation is relative to the mean. Expressed as a percentage. To calculate coeffiecnt of variation – standard deviation divided by the mean multiplied by a 100. -A percentile is the value of a variable at which a specified (approximate) percentage of observations are below that value. • The pth percentile tells us the point in the data where: • Approximately p percent of the observations have values less than the pth percentile. • Approximately () 100 p − percent of the observations have values greater than the pth percentile.

Quartiles: When the data is divided into four equal parts: • Each part contains approximately 25% of the observations. • Division points are referred to as quartiles. first quartile, or 25th percentile. second quartile, or 50th percentile (also the median). third quartile or 75th percentile. • The difference between the third and first quartiles is often referred to as the interquartile range, or IQR. The z-score measures the relative location of a value in the data set. • Helps to determine how far a particular value is from the mean relative to the data set’s standard deviation. • Often called the standardized value. Zscore-to find z score you subtract the mean from the number u want to find the z score for divided by the standard deviation.( number you want to find – mean )/ std Empirical Rule: • When the distribution of data exhibits a symmetric bell-shaped distribution the empirical rule can be used to determine the percentage of data values that are within a specified number of standard deviations of the mean. • For data having a bell-shaped distribution: • Approximately 68% of the data values will be within 1 standard deviation. • Approximately 95% of the data values will be within 2 standard deviations. • Almost all the data values will be within 3 standard deviations. • Outliers: Extreme values in a data set. • They can be identified using standardized values (z-scores). • Any data value with a z-score less than –3 or greater than +3 is an outlier. • Such data values can then be reviewed to determine their accuracy and whether they belong in the data set. To read a box plot :The minimum (the smallest number in the data set). The minimum is shown at the far left of the chart, at the end of the left “whisker.” :First quartile, Q1, is the far left of the box (or the far right of the left whisker). The median is shown as a line in the center of the box. Third quartile, Q3, shown at the far right of the box (at the far left of the right whisker).The maximum (the largest number in the data set), shown at the far right of the box. Covariance is a descriptive measure of the linear association between two variables: To calculate covariance -calculate the average of x and y points .subtract the averge from each x and y value . calculate the prodocts by corresponding points. So (x1 and y1) then find the sum of all the products. Then divide by n -1. The correlation coefficient measures the relationship between two variables. • Not affected by the units of measurement for x and y....