Data Analytics for Information Systems


This course provides a hands-on introduction to master the essentials of data analytics and machine learning using R.

The growing ubiquity of information systems both in organizational and private contexts increasingly makes large data streams available in various domains. Knowing how to handle these data sets, how to analyze and to interpret them, becomes a more and more important skillset in companies, policymaking and in academic research.

The course builds on real-word data sets from information systems in the realm of consumer behavior, in particular in the resource consumption context. Based on hands-on examples and practical challenges, we cover fundamental data analytics methods using the software environment R.

The course starts with basic concepts from descriptive and inferential statistics that will be needed in the following course units, followed by an introduction to the statistics software R and R Studio. Students will be introduced to experimental design to distinguish between correlation and causation and to critically evaluate the validity and reliability of results. In the following, a large share of the course is dedicated to regression analysis, clustering, and different classification techniques. Students will apply these methods to data sets from concrete real-world challenges. The course closes with a discussion of relevant privacy regulations and also highlights social concerns and ethical aspects.

In this course, students will acquire
  • an introduction (or refresher) to fundamental concepts in statistics needed for various quantitative methods in data analytics
  • skills to design and use information systems to collect behavioral data
  • skills to formulate hypotheses and to perform and explain the corresponding statistical tests
  • skills to formulate, solve, and interpret linear and logistic regression analyses
  • skills to conduct clustering analyses
  • skills to set up, train, and evaluate machine learning algorithms, including K-means, regression, and support vector machines
  • programming skills in the statistics software R that allow you to efficiently perform the related tasks
  • a solid understanding of the ethical issues when dealing with personal data and of the privacy regulations to follow
Recommended prerequisites An introductory part that covers essential concepts from statistics and an introduction to R is part of the course. However, a basic level of familiarity with some programming languages prior to the course is strongly recommended.
Time and room Lecture: Monday, 9:45 -11:15, room 0.143; exercise session: Tuesday, 8:00 – 9:30, room 0.143
Method of examination Written examination (90 minutes)
Grading procedure Written examination (100 %)
Module frequency Each winter term
Workload Lecture hours: 30
Excercise session: 15h
Self-study: 105
Module duration 1 semester
Teaching and examination language English
(Recommended) reading Will be announced in class