Data Science course

Course, Bsc Computer Science, Leiden University, 2022

Data Science places data mining, machine learning and statistics in context, both experimentally and socially. If you want to correctly deploy data mining techniques, you must be able to translate a (broadly formulated) question by a customer or a co-worker into an experimental set-up, to make the right choices for the methods you use, and to be able to process the data in the right form to apply those methods. After performing your experiments, you should not only be able to evaluate the results but also interpret and translate it back to the original question (e.g. by visualization). Socially, data science is of great importance because the media simplify many data-driven results and statistical research, often making mistakes. Thus, a lot of nonsense comes down on us and it is up to you, the data scientists of the future, to recognize, explain and correct that nonsense. This course is a combination of lectures and practical sessions, in which you take a hands-on approach to solving real-world data science problems.

Course objectives

  • You can explain the following machine learning concepts: supervised learning, unsupervised learning, classification, regression
  • You can list two advantages and two disadvantages of rule-based methods and of machine learning methods
  • You know and can explain the following experimental and statistical principles in your own words: bias, overfitting, cross validation, high-dimensional data, sparseness, dimensionality reduction, feature extraction, class imbalance.
  • You can describe typical use cases for Decision trees, kNN, SVM, Na├»ve Bayes and deep neural networks
  • You can explain the purpose and principles of feature extraction from semi-structured data, text data, image data and graph data.
  • You can explain the difference between engineered features and raw features, in content and in dimensionality
  • You know and can explain the use and importance of measuring the quality and reliability of human-labeled data.
  • You can give the definitions of the most important evaluation measures: Accuracy, Mean Squared Error, Precision, Recall, F1 and Mean Average Precision.
  • You know and can explain the benefits and challenges of big data.
  • You know and can explain the principles of responsible data science

Mode of instruction

14 lectures, 2x45 minutes