DATA2002 - Unit Outline PDF

Title DATA2002 - Unit Outline
Course Data Analytics: Learning from Data
Institution University of Sydney
Pages 4
File Size 221.8 KB
File Type PDF
Total Downloads 13
Total Views 145

Summary

Unit Outline...


Description

Faculty of Science

DATA2002 : Data Analytics: Learning from Data Semester 2, 2019 | 6 Credit points | Mode of delivery: Normal (lecture/lab/tutorial) day | Unit type: Standard Coordinator: Garth Tarr Faculty of Science, School of Mathematics and Statistics Unit description Technological advances in science, business, and engineering have given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2002 is an intermediate unit in statistics and data science, focusing on learning data analytic skills for a wide range of problems and data. How should the Australian government measure and report employment and unemployment? Can we tell the difference between decaffeinated and regular coffee? In this unit, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforcing your programming skills through experience with a statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skill to analyse various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems. Prohibitions: STAT2012 or STAT2912 or DATA2902 Pre-requisites: [DATA1001 or DATA1901 or ENVX1001 or ENVX1002] or [MATH10X5 and MATH1115] or [MATH10X5 and STAT2011] or STAT2911 or [MATH1905 and MATH1XXX (except MATH1XX5)] or [BUSS1020 or ECMT1010 or STAT1021] Assumed knowledge: Basic linear algebra and some coding for example MATH1014 or MATH1002 or MATH1902 and DATA1001 or DATA1901 Unit aims Data science is the art and science of extracting meaningful and actionable information from data. This is an emerging interdisciplinary field whose core skill sets are derived from Statistics and Computer Science whose aim is to solve real word problems in areas including (but not limited to) information technology, the biological sciences, medicine, psychology, neuroscience the social sciences and the humanities, business and finance. This is achieved through the deep understanding and management of data to extract and generate new discipline specific knowledge. As such critical thinking with data is of key importance to many fields of study and has a direct impact on opportunities available in both academia and industry. A data scientist or applied statistician will have the ability to understand problems from outside one’s discipline, place the problem into a statistical framework, solve the problem through computational means, interpret the results and communicate them to clients or collaborators. The aim of this unit is to equip students with the skills to tackle a range of data science problems. Learning outcomes At the completion of this unit, you should be able to: LO1.

Formulate domain/context specific questions and identify appropriate statistical analysis.

LO2.

Extract and combine data from multiple data resources.

LO3. LO4.

Construct, interpret and compare numerical and graphical summaries of different data types including large and/or complex data sets. Develop expertise in the use of a software version control system.

LO5.

Identify, justify and implement an appropriate parametric or non-parametric two sample statistical test.

LO6.

Formulate, evaluate and interpret appropriate linear models to describe the relationships between multiple factors.

LO7.

Perform statistical machine learning using a given classifier and create a cross-validation scheme to calculate the prediction accuracy.

LO8.

Create a reproducible report to communicate outcomes using a programming language.

Graduate qualities The graduate qualities are the qualities and skills that all University of Sydney graduates must demonstrate on successful completion of an award course. As a future Sydney graduate, the set of qualities have been designed to equip you for the contemporary world. For more information go to sydney.edu.au/students/graduate-qualities Study commitment Typically, there is a minimum expectation of 1.5-2 hours of student effort per week per credit point for units of study offered over a full semester.

DATA2002 Data Analytics: Learning from Data, Semester 2 2019, Page 1 of 4

Faculty of Science

Teaching staff and contact details Lecturer:

Garth Tarr, [email protected]

Administrative and professional staff:

School of Mathematics and Statistics, [email protected]

Weekly schedule Week

Weeks 1-3

Weeks 4-6

Weeks 7-9

Weeks 10-12

Learning activity type

Topic Module 1: Categorical data • Review on data collection • Chisq Test / Goodness-of-fit Test / Fisher’s exact test. • Odds ratios and relative risk as test statistics and general two-way tables • Conditional probability; tests for homogeneity and independence. Module 2: Data from case control study • Review on testing, CLT, power and sample size calculations. • Finding significant in small samples: discussion on non-parametric test. • Permutation test and concept of multiple testing. Module 3: Multiple factors comparison • Concept on partitioning variability. • One-way ANOVA and simultaneous confidence interval. • Two-way ANOVA from balance design with focus on describe interaction effect graphically. Module 4: Learning and prediction • Multiple regression and prediction interval • Concepts behind statistical machine learning including both supervised and unsupervised learning.

Week 13 Revision

Learning outcomes

Lecture Workshops

1,2,3,4,8

Lecture Workshops

1,2,3,4,5,8

Lecture Workshops

1,2,3,4,6,8

Lecture Workshops

1,2,3,4,7,8

Lectures Workshop

1,3,5,6,7

Assessments Title

Category

Type

Description of assessment type

Individual or group

Length / duration

Weight

Due date & time

Learning outcomes

Module 1 report

Submitted work

Assignment

Written report

Individual

Up to 5 pages

5

2/9/2019

1,2,3,7,8

Module 2 report

Submitted work

Assignment

Up to 5 pages

5

Module 3 report

Submitted work

Assignment

Up to 5 pages

5

Group project

Submitted work

Assignment

8 min presentation and 2 page report

20

Written report Written report

Group Group

Presentation and executive summary report

Group

11:59pm 23/9/2019

1,2,3,5,7,8

11:59pm 21/10/2019

1,2,3,6,7,8

11:59pm Presentations in week 12

1,2,3,4,5,6,7, 8

Executive summary due 4/11/2019 11:59pm

Online quiz 1

Exam

Tutorial quiz, small test or online task

Online quiz

Individual

1 hour

5

25/8/2019 11:59pm

1,3,7

Online quiz 2

Exam

Tutorial quiz, small test or online task

Online quiz

Individual

1 hour

5

13/10/2019 11:59pm

1,3,5,7

Online quiz 3

Exam

Tutorial quiz, small test or online task

Online quiz

Individual

1 hour

5

10/11/2019 11:59pm

1,3,6,7

Final exam

Exam

Final exam

Final exam

Individual

2 hours

50

Exam period

1,3,5,6,7

DATA2002 Data Analytics: Learning from Data, Semester 2 2019, Page 2 of 4

Faculty of Science

Overview of assessments Below are brief assessment details. Further information can be found in the Canvas site for this unit. •

Online quizzes: Online quizzes will be available through Canvas. You will need access to R in order to answer some of the questions.



Module reports: For the first three modules you will need to submit a module report. There will be time allocated in workshops to attemp t the module reports. The first module report needs to be done individually, the remaining two can be completed in groups of 1, 2 or 3 students. Students need to self-assign into groups.



Group project: The group project will consist of an 8-minute presentation and a 2 page executive summary, primarily related to content from Module 4. This will be done in groups of 4 or 5 students randomly assigned from within your tutorial.

Readings There is no prescribed text book. These texts may be of use: - An Introduction to Mathematical Statistics and Its Applications (any edition) by Richard J. Larsen and Morris L. Marx. - R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire, Garrett Grolemund. - Happy Git and GitHub for the useR by Jenny Bryan and the STAT 545 TAs. - Modern Data Science with R by Baumer, Kaplan and Horton.

Late penalties The Assessment Procedures 2011 provide that any written work submitted after 11:59pm on the due date will be penalised by 5% of the maximum awardable mark for each calendar day after the due date. If the assessment is submitted more than ten calendar days late, a mark of zero will be awarded. Special consideration A special consideration application can be made for short-term circumstances beyond your control, such as illness, injury or misadventure, which affect your preparation or performance in an assessment. If you are eligible for special consideration, you must submit an online application and supporting documents within three working days of the assessment, unless exceptional circumstances apply. Assessment grading The University awards common result grades, set out in the Coursework Policy 2014 (Schedule 1). As a general guide, a High distinction indicates work of an exceptional standard, a Distinction a very high standard, a credit a good standard, and a pass an acceptable standard. Result name

Mark range

Description

High distinction

85 - 100

Representing complete or close to complete mastery of the material.

Distinction

75 - 84

Representing excellence, but substantially less than complete mastery.

Credit

65 - 74

Representing a creditable performance that goes beyond routine knowledge and understanding, but less than excellence.

Pass

50 - 64

Representing at least routine knowledge and understanding over a spectrum o topics and important ideas and concepts in the course.

Fail

0 - 49

When you don’t meet the learning outcomes of the unit to a satisfactory standard.

For more information see: sydney.edu.au/students/guide-to-grades Educational integrity While the University is aware that the vast majority of students and staff act ethically and honestly, it is opposed to and will not tolerate academic dishonesty or plagiarism and will treat all allegations of dishonesty seriously. All written assignments submitted in this unit of study will be submitted to the similarity detecting software program known as Turnitin.

DATA2002 Data Analytics: Learning from Data, Semester 2 2019, Page 3 of 4

Faculty of Science

Turnitin searches for matches between text in your written assessment task and text sourced from the Internet, published works and assignments that have previously been submitted to Turnitin. If such matches indicate evidence of plagiarism to your teacher, they are required to report your work for further investigation. Further information on academic honesty and the resources available to all students can be found on the Academic Integrity page of the current students’ website: sydney.edu.au/educational-integrity Work, health and safety requirements for this unit We are governed by the Work Health and Safety Act 2011, Work Health and Safety Regulation 2011 and Codes of Practice. Penalties for non-compliance have increased. Everyone has a responsibility for health and safety at work. The University’s



Work Health and Safety policy explains the responsibilities and expectations of workers and others, and the procedures for managing WHS risks associated with University activities. Closing the loop •

We received thoughtful and constructive feedback from the students who took DATA2002 the first time it ran in Semester 2, 2018.



Some comments from students regarding the best aspects of this unit: students appreciated the labs and having the opportunity to apply theory/methods to real problems, including through the end of module reports for continual assessment. There was positive feedback about the lecture delivery, and the short in lecture quizzes (which we have continued to build upon this year). The Ed discussion board and access to DataCamp was commented on favourably, as was the inclusion of a guest lecturer. Some students commented that they liked that the unit was well organised and the staff were supportive.



Some comments from students regarding how the unit can be improved, have been taken on board for the 2019 offering: o

The end of the unit had too many assessments close together and more generally the workload was considered to be too high. Students also commented that there was too much weighting on the final exam. §

We have reduced the number of module reports required (from 4 to 3).

§

We have increased the number online quizzes (from 1 to 3) to provide more immediate feedback during semester.

§

We have increased the weighting of the in-semester tasks relative to the final exam (exam was 60%, now 50% in 2019)

o

The Carslaw computer lab does not provide a productive learning environment

o

§ We have requested not to use the Carslaw lab for DATA2002 Expectations and understanding around what’s required in the final exam, as someone put it, they had difficulties putting “boundaries around the required content” as there was additional content alluded to, including DataCamp and extra reading, but not examinable. §

We will try to make it clearer what is and what is not examinable in future offerings.

Links to policies and other information for students • • • • • • •

Student administration: sydney.edu.au/student centre Wellbeing and support: sydney.edu.au/students/health-wellbeing Study resources: sydney.edu.au/students/learning-services Expectations of student conduct: sydney.edu.au/student-responsibilities Learning and Teaching Policy: sydney.edu.au/policies/ Academic appeals: sydney.edu.au/students/academic-appeals Libraries: sydney.edu.au/students/libraries

Other links •

Science student portal: canvas.sydney.edu.au/courses/7114



School of Mathematics and Statistics student portal: https://canvas.sydney.edu.au/courses/7913

DATA2002 Data Analytics: Learning from Data, Semester 2 2019, Page 4 of 4...


Similar Free PDFs