DATA7001 DT - A1 DT PDF

Title DATA7001 DT - A1 DT
Author Max He
Course Introduction to Data Science
Institution University of Queensland
Pages 5
File Size 93.9 KB
File Type PDF
Total Downloads 96
Total Views 165

Summary

A1 DT...


Description

DATA7001

Human centred problem formulation in data science Due 27 March 17:00, submit via Blackboard 10 Marks, Individual In this assignment, you will apply your learning from the design thinking lecture to undertake human centred formulation for a data problem. The assignment consists of a short (approx. 2-page) report that presents the results of your investigation from a semi-structured interview with stakeholders and study of a data intensive domain, namely Learning Analytics. Summary of the domain: In modern education systems, students work in online environments and systems to undertake a range of learning and assessment activities. The data that is collected from such systems holds rich insights into the students’ learning profiles, behaviour and outcomes. The analysis of such data is broadly referred to as Learning Analytics. Stakeholders: Hassan Khosravi (Project Lead) Nick Joseph (Software Engineer) Solmaz Abdi (Learning Analytics Designer) The semi-structured interview with the stakeholders is available below. If needed students can also ask further questions on the course Piazza site piazza.com/uq.edu.au/semester12020/sem12020/home between 16 March and 26 March. Where possible, you are encouraged to plan and discuss your questions with your peers before posting them on piazza using the subject of “Interview Question for Design Thinking Assignment” Your report will have several sections as detailed below. Begin the report with a simple introduction that explains the purpose of the document and its contents; end the report with a short conclusion. Prepare the body of the report using the sections below: Data profile: This section will describe the data in human or social terms: What is the data? What categories? Parameters? How much data is there? How often is it updated? How far does it go back in time? These are initial and not exhaustive example questions. Stakeholders: Identify and describe the stakeholders. Who are the people who own the data? Who else has interests in it? Who benefits from it? How? You may use a diagram to represent stakeholder groups and place the panellists as stakeholders into this diagram along with the other stakeholders you identify. Scenarios of use:

Identify three use cases of the data. The first should be what you understand to be the typical use case—who is using the data, in what circumstances, how they are working with it, what kind of relationships they are probing, what kind of questions they can find answers to, and for whom. The second should be a little farther afield, e.g. a different type of stakeholder, a different class of question. The third should be a “fringe” scenario: a possible but atypical scenario, relating to an often overlooked use of the data, and the set of circumstances, motives and/or skill set required to pursue the data questions related to this use. For each use case, provide a rationale that links the information you obtained from the panel discussion to values / needs that you had deduced to your statement of the scenario, then identify who benefits and who may be marginalised by its answer from the data. Limits: Identify two questions that are (just) beyond the scope of the data, but that with a little more data, or other kinds of data, or an external source of data to compare etc. could be answered. Again, for each, identify who is likely to benefit and who may be marginalised. Ensure that you provide a rationale that links the information you obtained from the panel discussion to your questions (tip: sometimes this information may come from what is not said). For these last two “limit” questions you have identified, outline how you think you would have to work with the data; what kind of data collection or analysis you would need to conduct in order to get an answer to the question. Where possible, articulate what would be sufficient to count as a conclusive answer one way or the other, versus what would be indeterminate, and why. Assessment: Your report will be assessed on the following criteria:  Sensibleness: How reasonable and grounded in human experience, research or evidence is the report? How much sense does it make? (2 marks)  Scope: How complete is the report in terms of probing the properties of the data, the people and their purposes that may have interests in it? (2 marks)  Creativity: How well does the report show imagination for how people may use the data and for what purposes? (2 marks)  Understanding: How well does the report demonstrate an understanding of data and data science possibilities as they relate to human agendas and issues? (2 marks)  Clarity and style: How well does the report communicate professionally and clearly? Is the presentation succinct, to the point and within page limit? (2 marks)

Interview Transcript Find below the interview transcript for analysis, where Q refers to the interviewer and A refers to the interviewee’s responses. Each line is numbered for ease in your analysis and referencing for your own purpose.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

Q A

Q A

Q A

What is learning analytics and why do you use it? Learning analytics (LA) is fairly a new field where the aim is to utilize data from students’ interaction within a course to model, visualise and communicate students’ engagement, performance and trajectories in online courses which in turn can be utilized for evaluating and optimizing the course design for students. LA can also be employed for providing predictions such as identifying at risk students, or estimating individual student’s knowledge state on different concepts that are taught at the course and providing a personalized learning experience for students. What data are you collecting? What are you looking for? My current project focuses on the Blended courses of UQ2U offered through Edge edX platform. This is part of the partnership between The University of Queensland and edX (massive open online course (MooCs) provider). The aim is to gather, interpret and manipulate data we get from Edge edX as well as data obtained from BlackBoard, SiNet, Echo360 and Kultura to develop statistically grounded early interventions for students. We want to ensure that students are provided with timely and effective feedback which can facilitate their success. We also want to examine students’ interactions with the system and communicate them to the course designers and instructors to help them evaluate the usage and the effectiveness of the course contents and to become able to take appropriate intervention strategies or optimising the design of the course based on this insight. How we collect data? Who owns the data? How frequently is it being collected? From edX, we get two types of data. The first type of data we get from Edge edX is called event logs which is also known as clickstream data. The event logs is students’ interactions that are released in the form of browser or mobile events and the responses emitted by Edge edX to these interactions that are released as server events. These data are received on a daily basis in the form of JSON (ndjson) records. For each course, we receive a separate compressed file of these JSON records ranging from hundreds of Kilobytes to tens of megabytes. The other types of data we receive from Edge edX is of called Database data files which include course content databases, courseware databases, forums databases, and open responses databases. Again, for each course, we receive a compressed file that and ranges from hundreds of megabytes to tens of gigabytes in size. We receive this data on a weekly basis. These data are stored in AWS S3 buckets and is made accessible by edX.The

46 47 48 49 50 51 52 53 54 55 56 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

Q A

Q A

Blackboard data is owned by Pro-vice chancellor (T & L) and it’s accessible through Reportal. ITS provides access to the Blackboard data both in a raw format and in the curated format through a platform called ADaaS (Analytics Data as a Service). We generally use ADaaS to get access to BlackBoard data as we are able to query it through our own scripts. Blackboard data includes Blackboard clickstream and also gradebook information. The SINeT data is owned by academic service division (ASD) and planning and business intelligence (PBI) hosted on their data warehouses at UQ. We get access to it through Reportal. It gives us information about enrolment and demographics. I should add that Reportal is a business intelligence tool. Why is collecting this data important? Who will use the data? We use data from different sources to provide a multidimensional data incorporating all necessary information about students. We use this data to get an understanding about students’ performance, engagement and learning pathways within the course. For example, we want to find out how students’ interaction within Edge edX platform is correlated with their achievements or how students organise their time within the learning system which can be used by course designers and instructors for optimising the design of the course. Ideally, we want to communicate the result of our analysis to all of our stakeholders working with the system to gain the most insight out of it, but it should be provided in a way that is understandable by each group of stakeholders. We should consider that different stakeholders such as instructors, course designers, students have different roles so they have different insight needs. Learners likely need a rather different presentation of analysis and model results than instructors; as both groups will need to act in different ways. The visualizations must be actionable, i.e., needed changes can be understood and implemented by key stakeholders. Where is it being used? Where else can it also be used? One way that students’ event log might be beneficial is that through event log, we can obtain the amount of time that is taken by students in the course for completing a given content. On the other hand, course instructors provide an estimation of how much time is required for each content to be completed. By getting the difference between the two, we can get an estimation of whether the design of the course is based on instructors’ expectations or not. If not, course designers and instructors collaborate with each other to re-design the course content to align it with the expectations. We also aggregate students’ daily activity and learning events and make it accessible to the instructors through a learning analytics dashboard developed at UQ called “Course Insight”. The aim of “Course Insight” is to merge data from different sources such as Edge edX, Blackboard, Echo360 and Kultura for creating multi-dimensional data sets. Course Insight has filtering functionalities that allows course instructors to

96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 Q 125 A 126 127 128 129 130

drill-down into the underlying data and investigate different indexes about a sub-group of students. Different analytical segments of Course Insights provides information about students’ demographics and educational features, time-aware statistical information regarding students’ online learning activities, statistical information about students’ assessment results, etc. It also provides an intervention functionality makes it possible for instructors to identify and contact a specific student or a group of students regarding their performance. We also aim to implement recommendation process into course insight that recommends the insight-ful drill-down criteria to course instructors based on the findings a recently published papers by the researchers in the Institute for Teaching and Learning Innovation (ITaLI). We aim to develop a student-facing learning dashboard that make it possible for students to observe information regarding their engagement, performance and learning path in the course through actionable visualisation with the aim of enhancing their selfregulated skills. It is also possible to use information from students’ performance on the assessments (both formative and summative) to estimate students’ competencies on the course and update it on a real time basis as students advance in the course. This information can be conveyed to students through visualisations that can help students better understand their own learning needs and improve self-regulation. We can also use students’ performance data and develop a recommender system to recommend personalised learning resources to students tailored towards their learning needs. Who is disadvantaged? The field of learning analytics provides a great possibility to support students and instructors at scale. However, the types of data becoming available are dramatically expanding. So, we should always make sure that learning analytics is conducted based on ethical considerations and in a responsible way that protects instructors and students...


Similar Free PDFs