Data Mining- Syllabus - Spring 2015 PDF

Title Data Mining- Syllabus - Spring 2015
Author A1 Ernesto C.R. Big Data Analytics Machine Learning
Course Data Mining
Institution George Washington University
Pages 5
File Size 203.3 KB
File Type PDF
Total Downloads 23
Total Views 133

Summary

Download Data Mining- Syllabus - Spring 2015 PDF


Description

Decision Sciences Department COURSE NUMBER:

DNSC 6279

COURSE TITLE:

Data Mining

COURSE DESCRIPTION:

COURSE PRE-REQS:

PROFESSORS:

TEACHING ASSISTANT:

This course provides an in-depth exposure to various supervised and unsupervised data mining techniques that can be used to both discover relationships in large data sets, and build predictive models. Techniques covered include regression models, decision trees, neural networks, clustering, and association analysis.

Stochastics for Analytics I, Statistics for Analytics, or equivalent (JUD/DAD), MSBA Program Candidacy or instructor approval. Dr. Srinivas Prasad Office: Funger Hall, 415 D Phone: 202-994 2078 E-mail: [email protected] Office Hours: Th: 3:00-5:00pm and by appointment

Ran Ji Office: Funger Hall 415H Email: [email protected] Office Hours: TBA

RECOMMENDED TEXTBOOKS: Data Mining Techniques by Berry and Linoff Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner Galit Shmueli, Nitin R. Patel, Peter C. Bruce

An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani COURSE OBJECTIVES:

How can organizations make better use of the increasing amounts of data they seem to be collecting? How can they convert data into information that is useful for managerial decision making? We will attempt to answer these questions by examining several data mining and data analysis methods and tools for exploring and analyzing data sets.

LEARNING OBJECTIVES:   

READING ASSIGNMENTS

SOFTWARE:

Develop a solid foundation in Supervised and Unsupervised learning techniques Learn how to build and assess different types of predictive models Learn how to build models using real data using standard data mining software tools

The student is responsible for studying and understanding all assigned materials. If reading generates questions that are not discussed in class, the student has the responsibility of addressing the instructor privately or raising the issue in a discussion section on Blackboard. Additional reading, including technical papers and on-line material, may be assigned during the course. The course will primarily involve using SAS Enterprise Miner and R. Occasionally, we may use some other packages too (details will be made available on Blackboard)

TENTATIVE SCHEDULE: Weeks

Date

1-2

Jan 14/21

2-3

Jan 21/28

3-4 4-5

Jan 28/Feb 4 Feb 4/11

6

Feb 18

Topics (Readings will be posted on Blackboard) Introduction to Data Mining / Supervised and Unsupervised Learning Data Pre-Processing / Intro to SAS Enterprise Miner Project Team Formation / Initial Project Proposal Building Predictive Models Regression / Stepwise / Logistic Regression Enterprise Miner Reference: Regression Node, Predictive Modeling, Model Assessment/Lift Charts, ROC Curves Classification/Decision Trees/Modeling Interactions Using Trees Enterprise Miner Reference: Tree Node. Association Analysis

7-8 9

Feb 25/Mar 4 Mar 11 Mar 18

10 11

Mar 25 Apr 1

12

Apr 8

13 14 15

Apr 15 Apr 22 Finals Week (May 6)

Enterprise Miner Reference: Association Node Project Proposal Presentations Spring Break – Holiday Neural Networks Enterprise Miner Reference: Neural Network Node. Neural Networks/Support Vector Machines Clustering / Memory Based Reasoning Enterprise Miner Reference: Clustering Node, Memory Based Reasoning Node Text Analytics, Link Analysis Enterprise Miner Reference: Link Analysis Node Bayesian Belief Networks, Genetic Algorithms Project Presentations Final Exam

Students are expected to come to class: 1) having read and prepared to discuss the material for the current lecture; 2) having reviewed the material of the previous lectures. Participation in class discussions is expected from all students. GRADING:

The course grade will be based on team homework assignments, quizzes, and a final exam, and a team project. Each grading component is described in detail below. Quizzes There will be several in-class quizzes, typically every week. They will be based on current and prior assigned readings and material covered in the class sessions. The lowest quiz will be dropped. Homework Assignments Homework assignments will typically require the use of software, and can be done in teams of 3 to 4 students. A typical homework assignment will consist of a few problems with several parts and will be given one week before it is due. Solutions will be posted on the course web site. No late homework assignments will be accepted. Submission guidelines for homework assignments: In preparing the submissions, please follow these guidelines:  Make sure the solutions are typed or easily readable by anyone;  Ensure a clear logical flow and mark your answers;  Print/type your name(s) on the top right hand corner of every page. Final Exam

The final exam is individual and will be scheduled during

finals' week. No make-up final exam will be given. Project The project is designed to serve as an exercise in applying one or more of the data mining techniques covered in the course to analyze real life data sets. A primary objective is to understand the complexities that arise in mining massive, real life datasets that are often inconsistent, incomplete, and unclean. Students can use a variety of software tools to perform the analysis, but the primary toolkit that will be used is SAS Enterprise Miner. This is a semester long project, and students have the option to work in 3 or 4 person teams. The deliverables include a formal project proposal (due midsemester), and a final report (due at the end of the semester at the time of your final project presentation - Session 14). Examples of data mining projects and datasets can be found at http://www.kdnuggets.com/competitions/index.html, http://www.dataminingcasestudies.com/ and http://kdnuggets.com/datasets/.

Grading Weights  Quizzes: 25%.  Homework assignments: 25%.  Final exam: 25%. 

Project: 25%

The final exam and all quizzes are individual. No make-up exam / quiz / homework assignment will be given. ACADEMIC INTEGRITY:

DISABILITY SERVICES:

ATTENDANCE:

Cheating and plagiarism will not be tolerated. Any case will automatically result in loss of all the points for the assignment, and may be a reason for a failing grade and/or grounds for dismissal. In case of a group assignment, all group members will receive a zero grade. Any suspected case of cheating or plagiarism or behavior in violation of the rules of this course will be reported to the Office of Academic Integrity. Students are expected to know and understand all college policies, especially the code of academic integrity available at: http://www.gwu.edu/~ntegrity/code.html

Please contact the Disability Support Services office to establish eligibility and to coordinate reasonable accommodation. For additional information, refer to http://gwired.gwu.edu/dss/ The George Washington University Bulletin, Graduate Programs, 2009 –2010: "Regular attendance is expected. Students may be dropped from any class for

undue absence…. Students are held responsible for all of the work of the courses in which they are registered, and all absences must be excused by the instructor before provision is made to make up the work missed." CHANGES:

The instructors reserves the right to make revisions to any item on this syllabus, including, but not limited to any class policy, course outline and schedule, grading policy, tests, etc. Note that the requirements for deliverables may be clarified and expanded in class, via email, or on Blackboard. Students are expected to complete the deliverables incorporating such additions and to check email and Blackboard announcements frequently....


Similar Free PDFs