Navigation

Data Analytics for Information Systems

Contents

This course provides a hands-on introduction to master the essentials of data analytics and machine learning using R.

The growing ubiquity of information systems both in organizational and private consumer contexts increasingly makes large data streams available in various domains. As part of the digital transformation, knowing how to handle these data sets, how to analyze and to interpret them, becomes a more and more important skillset in companies, policymaking and in academic research.

The course builds on real-word data sets from information systems in the realm of consumer behavior, in particular in the resource consumption context. Based on hands-on examples and practical challenges, we cover fundamental data analytics methods using the software environment R.

The course starts with basic concepts from descriptive and inferential statistics that will be needed in the following course units, followed by an introduction to the statistics software R and R Studio. Students will be introduced to experimental design to distinguish between correlation and causation and to critically evaluate the validity and reliability of results. In the following, a large share of the course is dedicated to regression analysis, clustering, and different classification techniques. Students will apply these methods to data sets from concrete real-world challenges. The course closes with a discussion of relevant privacy regulations and also highlights social concerns and ethical aspects.

In the second half of the semester, students have the possibility to earn bonus points in a course project (self-study), by applying the skills and methods covered in the lecture and exercise sessions in the analysis of a large real-world dataset.

In this course, students will acquire
  • an introduction (or refresher) to fundamental concepts in statistics needed for various quantitative methods in data analytics
  • skills to design and use information systems to collect behavioral data
  • skills to formulate hypotheses and to perform and explain the corresponding statistical tests
  • skills to formulate, solve, and interpret linear and logistic regression analyses
  • skills to conduct clustering analyses
  • skills to set up, train, and evaluate machine learning algorithms, including K-means, regression, and support vector machines
  • programming skills in the statistics software R that allow you to efficiently perform the related tasks
  • a solid understanding of the ethical issues when dealing with personal data and of the privacy regulations to follow
Recommended prerequisites An introductory part that covers essential concepts from statistics and an introduction to R is part of the course. However, a basic level of familiarity with some programming languages prior to the course is strongly recommended.
Time and room Block course via Zoom in the first half of the semester (4×90 minutes per week). Lecture: Tuesdays 9:45-11:15 + 11:30-13:00; exercise session: Thursdays 9:45-11:15 + 11:30-13:00
Method of examination Written examination (90 minutes)
Grading procedure Written examination (100 %) – Bonus points can be acquired in a project in the second half of the semester. Students who pass the exam may increase their exam grade by up to 0.7 with the project.
Module frequency Each winter term
Workload Lecture and exercise sessions: 50h
Self-study: 100h
Module duration In WS 2020, the module will be taught in blocked sessions mainly in the first half of the semester.
Teaching and examination language English
(Recommended) reading Will be announced in class