BSc Computer Science and Mathematics with Industrial Experience

Year of entry: 2024

Course unit details:
Mathematics and Applications of Machine Learning

Course unit fact file
Unit code MATH36160
Credit rating 20
Unit level Level 3
Teaching period(s) Full year
Available as a free choice unit? No

Overview

Machine learning and artificial intelligence have become a cornerstone of our everyday life. We have self-driving cars on our streets, use large language models to turn ideas into text, and converse with our digital assistant using voice recognition. In this module, we will step into the world of modern machine learning. We begin with an analytical treatment of supervised learning problems from an approximation theoretic viewpoint. We especially study linear and polynomial regression, as well as k-nearest neighbour and support vector machine classification. Based on the Bayesian formulation of classification and as a generalisation of logistic regression, we then introduce and discuss deep neural networks, also with respect to their properties as function approximators. As a training methodology, we introduce gradient descent and stochastic gradient descent algorithms. The first part of the course finishes with a discussion of unsupervised techniques, especially Gaussian mixture models and k-means.  

 

The second part of the course starts with a broad introduction to the predictive modelling pipeline, beginning with the design of and retrieval from databases (introducing database normal forms and SQL), in-memory data structures (such as Pandas DataFrames), and basic data exploration and cleansing. The focus will then move to model selection, hyperparameter tuning, and model evaluation using the functionality embedded in the open source scikit-learn training software library. Hands-on coding experience will involve linear, polynomial, and logistic regression, k-nearest neighbours, linear and kernel support vector machines, decision trees, as well as different clustering techniques. Whenever possible, students will learn to code simple Python implementations of these methods from scratch to gain a deep algorithmic understanding of these techniques and critically question model outputs – especially with respect to the theory studied in the first part. The lecture material will be complemented with exercises and coding assessments. 

Pre/co-requisites

Unit title Unit code Requirement type Description
Numerical Analysis 1 MATH24411 Pre-Requisite Compulsory
Introduction to Programming for Physicists PHYS20161 Pre-Requisite Optional
Programming with Python MATH20621 Pre-Requisite Optional
Introduction to Programming 1 COMP16321 Pre-Requisite Optional
math36160 pre requisites

Aims

The unit aims to give the students a rigorous analytical introduction to machine learning methodology in semester 1 that is complemented by an application-driven, computational perspective in semester 2. 

Learning outcomes

ILO 1

Distinguish important supervised machine learning models and analyse sources of errors. Explain model selection methodology and the role of bias-variance trade off.

ILO 2

Develop deep neural networks starting from logistic regression and articulate basics of universal approximation. Explain gradient descent and stochastic gradient descent and prove convergence when applied to appropriate target functions.

ILO 3

Distinguish supervised and unsupervised learning problems. Explain Gaussian mixture models and their role in clustering. Develop the expectation-maximisation algorithm as an approximate soft clustering method and the k-means algorithm as a hard clustering version of it.

ILO 4

Apply an appropriate framework in Python to handle data and to use data to select and train diverse machine learning models.  

ILO 5

Explain and implement basic machine learning models and their associated training procedure.

Teaching and learning methods

3 contact hours per week. Term 1: 2 hours of lectures, 1 hour of tutorials; Term 2: 1 hour of lectures, 2 hours of computer labs. 

Assessment methods

Method Weight
Other 10%
Written exam 45%
Written assignment (inc essay) 45%

Feedback methods

Online quiz in semester 1 Automatically marked online

Exam at the end of semester  Generic feedback supplied after exam period.  

Coursework 1 in semester  2 Individual feedback / automarking

Coursework 2 in semester  2 Individual feedback / automarking

Group project  in semester  2 Group feedback

Recommended reading

Bottom, Curtis, Nocedal (2018): Optimization Methods for Large-Scale Machine Learning, SIAM Review 60(2): 223-311.

 

Higham, Higham (2018): Deep Learning: An Introduction for Applied Mathematicians, SIAM Review 61(4): 860-891.

 

James, Witten, Hastie, Tibshirani, Taylor (2023): An Introduction to Statistical Learning with Applications in Python, Springer.

 

Mohri, Rostamizadeh, Talwalkar (2018): Foundations of Machine Learning, second edition, MIT Press.

 

Grus (2019): Data Science from Scratch, second edition, O’Reilly

 

Estève et al. (2022). INRIA/scikit-learn-mooc: (session-3).  https://doi.org/10.5281/zenodo.7220307  
 

Study hours

Scheduled activity hours
Lectures 32
Practical classes & workshops 22
Tutorials 11
Independent study hours
Independent study 135

Teaching staff

Staff member Role
Jonas Latz Unit coordinator
Stefan Guettel Unit coordinator

Return to course details