- UCAS course code
- GG41
- UCAS institution code
- M20
BSc Computer Science and Mathematics with Industrial Experience
Year of entry: 2024
- View tabs
- View full page
Course unit details:
Mathematics and Applications of Machine Learning
Unit code | MATH36160 |
---|---|
Credit rating | 20 |
Unit level | Level 3 |
Teaching period(s) | Full year |
Available as a free choice unit? | No |
Overview
Machine learning and artificial intelligence have become a cornerstone of our everyday life. We have self-driving cars on our streets, use large language models to turn ideas into text, and converse with our digital assistant using voice recognition. In this module, we will step into the world of modern machine learning. We begin with an analytical treatment of supervised learning problems from an approximation theoretic viewpoint. We especially study linear and polynomial regression, as well as k-nearest neighbour and support vector machine classification. Based on the Bayesian formulation of classification and as a generalisation of logistic regression, we then introduce and discuss deep neural networks, also with respect to their properties as function approximators. As a training methodology, we introduce gradient descent and stochastic gradient descent algorithms. The first part of the course finishes with a discussion of unsupervised techniques, especially Gaussian mixture models and k-means.
The second part of the course starts with a broad introduction to the predictive modelling pipeline, beginning with the design of and retrieval from databases (introducing database normal forms and SQL), in-memory data structures (such as Pandas DataFrames), and basic data exploration and cleansing. The focus will then move to model selection, hyperparameter tuning, and model evaluation using the functionality embedded in the open source scikit-learn training software library. Hands-on coding experience will involve linear, polynomial, and logistic regression, k-nearest neighbours, linear and kernel support vector machines, decision trees, as well as different clustering techniques. Whenever possible, students will learn to code simple Python implementations of these methods from scratch to gain a deep algorithmic understanding of these techniques and critically question model outputs – especially with respect to the theory studied in the first part. The lecture material will be complemented with exercises and coding assessments.
Pre/co-requisites
Unit title | Unit code | Requirement type | Description |
---|---|---|---|
Numerical Analysis 1 | MATH24411 | Pre-Requisite | Compulsory |
Introduction to Programming for Physicists | PHYS20161 | Pre-Requisite | Optional |
Programming with Python | MATH20621 | Pre-Requisite | Optional |
Introduction to Programming 1 | COMP16321 | Pre-Requisite | Optional |
Aims
The unit aims to give the students a rigorous analytical introduction to machine learning methodology in semester 1 that is complemented by an application-driven, computational perspective in semester 2.
Learning outcomes
ILO 1
Distinguish important supervised machine learning models and analyse sources of errors. Explain model selection methodology and the role of bias-variance trade off.
ILO 2
Develop deep neural networks starting from logistic regression and articulate basics of universal approximation. Explain gradient descent and stochastic gradient descent and prove convergence when applied to appropriate target functions.
ILO 3
Distinguish supervised and unsupervised learning problems. Explain Gaussian mixture models and their role in clustering. Develop the expectation-maximisation algorithm as an approximate soft clustering method and the k-means algorithm as a hard clustering version of it.
ILO 4
Apply an appropriate framework in Python to handle data and to use data to select and train diverse machine learning models.
ILO 5
Explain and implement basic machine learning models and their associated training procedure.
Teaching and learning methods
3 contact hours per week. Term 1: 2 hours of lectures, 1 hour of tutorials; Term 2: 1 hour of lectures, 2 hours of computer labs.
Assessment methods
Method | Weight |
---|---|
Other | 10% |
Written exam | 45% |
Written assignment (inc essay) | 45% |
Feedback methods
Online quiz in semester 1 Automatically marked online
Exam at the end of semester Generic feedback supplied after exam period.
Coursework 1 in semester 2 Individual feedback / automarking
Coursework 2 in semester 2 Individual feedback / automarking
Group project in semester 2 Group feedback
Recommended reading
Bottom, Curtis, Nocedal (2018): Optimization Methods for Large-Scale Machine Learning, SIAM Review 60(2): 223-311.
Higham, Higham (2018): Deep Learning: An Introduction for Applied Mathematicians, SIAM Review 61(4): 860-891.
James, Witten, Hastie, Tibshirani, Taylor (2023): An Introduction to Statistical Learning with Applications in Python, Springer.
Mohri, Rostamizadeh, Talwalkar (2018): Foundations of Machine Learning, second edition, MIT Press.
Grus (2019): Data Science from Scratch, second edition, O’Reilly
Estève et al. (2022). INRIA/scikit-learn-mooc: (session-3). https://doi.org/10.5281/zenodo.7220307
Study hours
Scheduled activity hours | |
---|---|
Lectures | 32 |
Practical classes & workshops | 22 |
Tutorials | 11 |
Independent study hours | |
---|---|
Independent study | 135 |
Teaching staff
Staff member | Role |
---|---|
Jonas Latz | Unit coordinator |
Stefan Guettel | Unit coordinator |