Data Mining Homework 4 PDF

Title	Data Mining Homework 4
Course	Data Mining
Institution	Fordham University
Pages	2
File Size	126.3 KB
File Type	PDF
Total Downloads	53
Total Views	165

Preview

CLICK TO PREVIEW PDF

Summary

Download Data Mining Homework 4 PDF

Description

Data Mining Homework 4 (47 Points) 1. (8 points) Chapter 5, exercise 13a on page 320.

2. (10 points) Chapter 6 exercise 2a-d, on page 404 3. (19 points) A database has 4 transactions, shown below. TID Date items_bought T100 10/15/04 {K, A, D, B} T200 10/15/04 {D, A, C, E, B} T300 10/19/04 {C, A, B, E} T400 10/22/04 {B, A, D} Assuming a minimum level of support min_sup = 60% and a minimum level of confidence min_conf = 80%: (a) Find all frequent itemsets (not just the ones with the maximum width./length) using the Apriori algorithm. Show your work—just showing the final answer is not acceptable. For each iteration show the candidate and acceptable frequent itemsets. You should show your work similar to the way the example was done in the PowerPoint slides. (15 points) (b) List all of the strong association rules, along with their support and confidence values, which match the following metarule, where X is a variable representing customers and itemi denotes variables representing items (e.g., “A”, “B”, etc.). x  transaction, buys(X, item1)  buys(X, item2)  buys(X, item3) Hint: don’t worry about the fact that the statement above uses relations. The point of the metarule is to tell you to only worry about association rules of the form X  Y  Z (or {X, Y}  Z if you prefer that notation). That is, you don’t need to worry about rules of the form X  Z. (4 points)

4. (10 points) Here are several short questions on clustering. a. List one significant commonality between clustering algorithms and instance based learning algorithms like nearest-neighbor.

b. A decision tree can be used to generate a partitional clustering. How?

c. Will outliers have a big impact on the K-means algorithm? Why or why not?

d. Will outliers have a big impact on the DB-scan algorithm?

e. We had talked about Manhattan distances earlier in the course. If this is used in a clustering algorithm, what shape will the clusters take on?...