Chapter 3 Solutions.docx Week 2 Assignment answers PDF

Title Chapter 3 Solutions.docx Week 2 Assignment answers
Course Data Mining
Institution Pepperdine University
Pages 10
File Size 803 KB
File Type PDF
Total Downloads 108
Total Views 162

Summary

Data mining assignment...


Description

Chapter 3: Problem Solutions 3.1 Shipments of Household Appliances: Line Graphs. The file ApplianceShipments.jmp contains the series of quarterly shipments (in millions of dollars) of US household appliances between 1985 and 1989. a.

Create a well-formatted time plot of the data using the JMP Graph Builder.

b.

Does there appear to be a quarterly pattern? For a closer view of the patterns, try zooming in on different ranges of data on the two axes. Yes, there appears quarterly pattern in the above time series. The time series plot shows a repeating pattern each year (increasing shipment in Q2 and Q3 and decreasing shipments in Q1 and Q4).

c.

Create one chart with four separate lines, one for each of Q1, Q2, Q3, and Q4. This can be achieved by creating a transformed variable (right-click on Quarter in the Graph Builder, and select Date Time, Quarter). Then drag this new column to the Overlay, Wrap or Group Y zone. Does there appear to be a difference between quarters? Yes, we can see differences between the quarters. From the above plot we can see that the shipments for quarters Q2 and Q3 are larger than in quarters Q1 and Q4. In the graph below, a transformed variable for Year was also created. This is used in the X zone.

d.

Create a line graph of the series at a yearly aggregated level (i.e., the total shipments in each year). Again, you’ll need to create a transformed variable in Graph Builder.

In the graph below, the variable Year, created in the previous step, is used in the X zone. The summary statistic (on the bottom left) was changed to Sum.

3.2 Sales of Riding Mowers: Scatterplots. A company that manufactures riding mowers wants to identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is interested in classifying households as prospective owners or nonowners on the basis of income (in $1000s) and lot size (in 1000 ft2). The marketing expert looked at a random sample of 24 households, given in the file RidingMowers.jmp. a.

Create a scatterplot of lot size versus income, color-coded by the outcome variable Ownership. Make sure to obtain a well-formatted plot. The result should be similar to Figure 9.2. Describe the potential relationship(s) of ownership to lot size and income. It appears that owners have higher incomes and larger lot sizes.

b. Explore different methods for saving your work in JMP. Search for ‘‘saving work’’ in the JMP documentation or see the Using JMP section of the JMP Learning Library at jmp.com/learn. Mention two methods. Save the script to the data table. Use the Selection Tool to copy and paste into another program. Use the selection tool to select the output, then use Save Selection As (or Export) to save as a jpg, Powerpoint, or in another format. 3.3 Laptop Sales at a London Computer Chain: Bar Charts and Boxplots. The file LaptopSalesJanuary2008.jmp contains data for all sales of laptops at a computer chain in London in January 2008. This is a subset of the full dataset that includes data for the entire year.

a. Create a bar chart, showing the average retail price by store (Store Postcode). Adjust the �-axis scaling to magnify differences. Which store has the highest average? Which has the lowest? Use the Local Data Filter (under the red triangle > Scripts) to isolate these two stores.

From the above bar chart we can see that store postcode “N17 6QA” has highest average (495) and store postcode “W4 3PH” has lowest average (481) for the retail price. Hint: Double click on the Y-axis to change the minimum and maximum axis values, and add the reference lines. b. To better compare retail prices across stores, create side-by-side boxplots of retail price by store. Now compare the prices in the two stores above. Does there seem to be a difference between their price distributions?

From the above side by side box plot we can see that there is slight difference between two stores. Interquartile range for the two plots is approximately equal. Hint: The data filter or local data filter can be used to display only the two stores in the graph. 3.4 Laptop Sales at a London Computer Chain: Interactive Visualization. The file LaptopSales.txt is a commaseparated file with nearly 300,000 rows. ENBIS (the European Network for Business and Industrial Statistics) provided these data as part of a contest organized in the fall of 2009.

Scenario: Imagine that you are a new analyst for a company called Acell (a company selling laptops). You have been provided with data about products and sales. You need to help the company with their business goal of planning a product strategy and pricing policies that will maximize Acell’s projected revenues in 2009. Import the data into JMP (for details on importing data into JMP, search for ‘‘import text files’’ in the JMP documentation or at jmp.com/learn). Check to ensure that the data and modeling types in the data table are correct for each of the variables, and answer the following questions. NOTE: There are many ways of visualizing these data. We show below a few examples, which are by no means the only correct solution. In general, it is important to properly format and annotate the visualizations. a. Price Questions: i. At what price are the laptops actually selling? ii. Does price change with time? (Hint: Make sure that the date column is recognized as such. JMP will then allow dynamic transformations and allow you to plot the data by weekly or monthly aggregates, or even by day of week.) iii. Are prices consistent over retail outlets? iv. How does price change with configuration? i. The median price is $500 and the majority of laptops sell for prices between approximately $265 and $810. The histogram of price is roughly symmetric and single-peaked.

ii. A line graph shows pretty large changes over time (we show it per month, with individual lines per day of the week. When date is formatted as a date variable in JMP, the dynamic transformation can be used to transform by day, week, month, year, and more).

iii. We can see that overall the median price is similar across stores (around $500, plus minus $50). However, it is easy to see that there are two types of stores: those with larger price ranges with slightly higher average prices and those with smaller price ranges and lower average prices.

iv. This graph shows Price versus Configuration, colored by RAM and with different markers by batter life. Depending on your audience this may be too much information. The data filter can be used to explore the impact of different configurations on price.

Box plots can be used to compare different configuration options. The column switcher can be used to explore the impact of the options on price, one at a time.

b. Location Questions: i. Where are the stores and customers located? ii. Which stores are selling the most? iii. How far would customers travel to buy a laptop? ◦ Hint 1: You should be able to aggregate the data, such as by a plot of the sum or average of the prices. ◦ Hint 2: Use the coordinated highlighting between multiple visualizations in the same page, for example, select a store in one view to see the matching customers in another visualization. ◦ Hint 3: Explore the use of filters to see differences. Be sure to filter in the zoomed-out view. For example, try to use ‘‘store location’’ as an alternative way to dynamically compare store locations. This might be more useful to spot outlier patterns if there were 50 store locations to compare. iv. Try an alternative way of looking at how far customers traveled. Do this by creating a new data column that computes the distance between customer and store. (For information on creating formulas in JMP, search for “Creating Formulas’’ in the JMP documentation or see jmp.com/learn.) i.

We don’t have latitude and longitude – if we did, we could use a map to show locations. Here, the two scatterplots correspond to the customers’ and stores’ locations – the X and Y coordinates correspond to geographic locations. Most of the customers are located in the center (downtown) and in the southeast. Most of the stores are located downtown, with a few stores scattered in the suburbs. We used

colors in the stores scatterplot to denote average price. We also created a new column of sales by store, and sized the points in the stores scatterplot by these values. We can immediately see that the downtown stores sell the most expensive laptops, on average. Also, using color on the customers’ scatterplot shows us that many customers shop at a particular store. ii. In general, stores downtown seem to sell more than stores in the suburbs. One interesting outlier in the scatter plot is the store that brings a fair amount of money but sells a smaller number of configurations. The bar chart reveals one store that has very few sales. This might be a new store -Plotting a time line could help to confirm this hypothesis. If you select the stores with the highest revenue (these were selected below), they are all downtown and customers come mostly from the north of the town (see below).

iii. To study the distance travelled by customers you can visually estimate it on the visualizations, or use create a new data column with the distance between the store and the customer calculated using a formula. The formula would be sqrt((([OS X Customer] - [OS X Store]) ^ 2) + (([OS Y Customer] [OS Y Store]) ^ 2)). Stores in the suburbs require customers to travel longer distances. The store corresponding to the large circle in the lower left area of the scatter plot requires the longest average travel.

c. Revenue Questions: i. How do the sales volume in each store relate to Acell’s revenues? ii. How does this depend on the configuration? i.

There are big differences between the stores. The number of sales and total revenue show similar patterns. We reinforced the slight differences in the top chart by color coding by the average retail price, showing which stores tend to sell cheaper laptops. One store with high revenue sells lower price laptops.

ii. There does not appear to be much difference in the distribution of configuration options between the stores. The mosaic plot, along with the column switcher allows us to explore options for the different stores.

d. Configuration Questions: i. What are the details of each configuration? How does this relate to price? ii. Do all stores sell all configurations? i.

One of the screen shots shown in (a) already partially answered the revenue question (we reproduce it below). The box plots show the effect of various configuration options on the retail price.

ii. The bar chart above shows that all stores except one sell most configurations. The bottom scatterplot display shows the same information in a different way – each point is a configuration sold at the store.....


Similar Free PDFs