ECN377 SAS Gchart 2015 - Lecture notes SAS PDF

Title ECN377 SAS Gchart 2015 - Lecture notes SAS
Course Applied Managerial Economics
Institution University of North Carolina Wilmington
Pages 3
File Size 97.2 KB
File Type PDF
Total Downloads 80
Total Views 134

Summary

Lecture Notes Dumas ...


Description

UNC-Wilmington Department of Economics and Finance

ECN 377 Dr. Chris Dumas

SAS--Proc Gchart Frequency Distributions In SAS, Proc Gchart can be used to create Frequency Distributions to help visualize the skew and kurtosis in the values of discrete numerical variables and character/text variables. (As described later in this handout, Proc Gchart can also be used to create Histograms, which are an effective graphical technique for visualizing the skewness and the kurtosis of continuous measurement variables.) For example, suppose we want to create charts for variable "Opinion" located in dataset02. Within Proc Gchart, the "vbar" command is used to create vertical bar charts, and the "hbar" command is used to create horizontal bar charts. By default, Proc Gchart produces a frequency chart: For a vertical frequency chart (the default):

Proc Gchart data=dataset02; vbar Opinion; run; If you want a cumulative frequency, percentage, or percentage frequency chart, use the "type =" option in the vbar command. For a vertical percentage chart:

For a vertical cumulative frequency chart:

For a vertical cumulative percentage chart:

Proc Gchart data=dataset02; vbar Opinion / type=pct; run; Proc Gchart data=dataset02; vbar Opinion / type=cfreq; run; Proc Gchart data=dataset02; vbar Opinion / type=cpct; run;

By default, for character/text variables, Proc Gchart will display the categories in alphabetical order. If you want to control the order in which character/text categories are displayed in the chart, use a "midpoints=" option in the vbar command. For example, with the "Opinion" variable, we want the categories ordered sa, a, i, d, sd rather than alphabetically. To achieve this, we could use the "midpoints=" option as shown here: Proc Gchart data=dataset02; vbar Opinion / midpoints= 'sa' 'a' 'i' 'd' 'sd'; run; Notice above that each category is enclosed separately in single quotes, and there is a space between each pair of categories. You can use the midpoints= option with any of the four chart types described above. For ordinal and nominal numerical variables, use the "discrete" option to have Proc Gchart treat the numbers as character/text categories. For example, suppose we had a variable Rating that recorded survey respondents' satisfaction ratings on a scale of 1 to 10, we could use the following commands to create vertical bar chart: Proc Gchart data=dataset02; vbar Rating / discrete; run; Instead of the vertical bar charts, you may want to use a horizontal bar chart for two reasons: (1) you can fit more categories on one page on a horizontal bar chart and (2) the horizontal bar chart command automatically

1

produces frequency/count, cumulative frequency, percentage, and cumulative percentage values for each category. Nice! To produce a horizontal bar chart, replace the "vbar" in the Proc Gchart command with "hbar", as shown below: Proc Gchart data=dataset02; hbar Opinion / midpoints= 'sa' 'a' 'i' 'd' 'sd'; run; As shown above and below, the “midpoints=” and “discrete” options can be used on hbar charts. Proc Gchart data=dataset02; hbar Rating / discrete; run; You don't need the “type=” option, because an hbar chart automatically gives you all the information from all four chart types. You can include multiple vbar or hbar commands between the "Proc Gchart" and "run" commands if you want to make more than one chart. By default, Proc Gchart omits categories that contain no data values from the chart; if you want these categories included, add the option "missing" to the vbar or hbar command. If desired, you can use multiple options; for example, you can use the “type=” , the “midpoints= “ ,and the “missing” options in a vbar or hbar command, if desired.

Histograms (Proc Gchart) The purpose of a Histogram is to show how many of the data values fall into various categories of a continuous measurement variable. The histogram is an effective graphical technique for visualizing both the skewness and the kurtosis of a continuous measurement variable. You typically make a separate histogram for each continuous numerical variable in a dataset. When making histograms, you can use "vbar" to create a vertical histogram or "hbar" to create a horizontal histogram. You can use the "type=" option for vbar histograms to specify which type of histogram you want (count/frequency, cumulative count/frequency, percentage, or cumulative percentage), but you don't need to use the "type=" option for hbar histograms, because hbar histograms automatically give you all the information for all four types. You can include multiple vbar or hbar commands between the "Proc Gchart" and "run" commands if you want to make more than one histogram. In SAS, you can use Proc Gchart to make histograms, but you need to tell SAS how to divide the continuous data into categories by using a "levels= " option. The "levels=" option divides the range of data values into equally-sized categories. For example, if we want our data divided into ten equally-spaced categories, we add "levels=10" to our vbar or hbar command in Proc Gchart. For example, suppose we have a continuous numeric variable named Revenue. If we use Proc Gchart and specify "levels=10", SAS will divide the range of values between the smallest value of Revenue in the data set and the largest value of Revenue in the data set into 10 equally-spaced categories and then determine the number of Revenue data values that fall into each of the categories. We could use the following commands to accomplish this: Proc Gchart data=dataset02; vbar Revenue / levels=10; run; TIP: When making histograms, use a larger number for levels when you have many observations in your data set, and use a smaller number of levels when you have few observations. You want to adjust the number of levels until you produce a histogram that (1) reveals the "spread" in your data but (2) doesn't result in many categories with zero observations (i.e., histogram bars with zero height).

2

As another alternative, you can use the “midpoints” option as shown below to specify the first number on the value axis, the last number on the value axis, and the number of values between each “tick mark” on the value axis: Proc Gchart data=dataset02; vbar Revenue / midpoints = 10000 to 100000 by 5000; run; With the midpoint option, the following is true by default: •The lowest midpoint consolidates all data points from negative infinity to the median of the first two midpoints. •The highest midpoint consolidates all data points from the median of the last two midpoints up to infinity. •All other values in value-list specify the median of a range of values, and the GCHART procedure calculates the midpoint values. You can use the “midpoints” option with either vbar or hbar histograms. Use either “levels” or “midpoints”, not both, for histograms.

3...


Similar Free PDFs