CSE4334 HW1 - First homework PDF

Title CSE4334 HW1 - First homework
Author Sagar Poudel
Course DATA MINING
Institution The University of Texas at Arlington
Pages 3
File Size 213.8 KB
File Type PDF
Total Downloads 8
Total Views 159

Summary

First homework...


Description

CSE 4334-002 : Data Mining Homework 1 (100 points) SUBMISSION DEADLINE : 03/20/2020, 11:59 PM REQUIREMENTS : Homework must be submitted electronically through Canvas, in pdf format only. We will accept only typed answers as long as hand written submissions are hard to check for accuracy. If your typed file format is Microsoft Word docx, please convert it to pdf format before your submission.

1. Vector Space Model Consider 2 sets of Facebook users: U1={A,B,C,D} and U2={E,F,G,H}. Each member of U1 is a friend of all members of U2. Table 1 shows the number of comments posted by each member of U2 to each member of U1. For instance, E has posted 75 comments in total to A's posts.

Table 2 represents the total number of friends for each member of U2. Assume each member of U2 has posted at least 1 comment on the posts of each of their friends. Assume the total number of users in Facebook is 10000.

a. (20 points) Assume i) the number of comments made by a user X on another user Y's posts denotes the significance of X to Y and ii) if a user X makes comments on less friends’ posts than another user Z, then X's comments are considered to be more significant than Z's. Based on these two assumptions, find out the significance of each member of U2 to each member of U1.

In solving the problem, use TF-IDF weighting and base-10 logarithms. Consider the users of U1 and U2 as documents and terms, respectively. Each user in U1 is modeled as a vector using U2 as the features. In other words, a user is represented by their friends who comment on their posts. b. (15 points) Find out which user in U1 is most similar to A. Use cosine similarity as the similarity Measure.

2. Decision Tree Table 3 depicts a dataset regarding the weather when a person will play tennis. Considering PlayTennis as the class attribute, Figure 1 depicts a possible decision tree built using Table 3 as the training set. a. (15 points) Show that the choice of Humidity at the root level is correct. Use GINI index as the measure of purity/impurity. b. (15 points) Show that the choice of the Wind at the second level is correct. Use entropy as the measure of purity/impurity. c. (5 points) List a tuple so that the decision tree in Figure 1 agrees with it. d. (5 points) List a tuple so that the decision tree in Figure 1 will make wrong prediction regarding it.

3. Naïve Bayes Classifier a. (15 points) Build a Naïve Bayes Classifier for Table 3. Explain the steps in detail and show the calculations of all probabilities that are needed in building this classifier. Use Laplace smoothing. b. (10 points) Predict the likelihood of playing tennis for the weather condition in Table 4, using the above Naïve Bayes classifier. Show detailed calculations...


Similar Free PDFs