# BSc Computer Science with Industrial Experience / Course details

Year of entry: 2022

## Course unit details:Mathematical Topics in Machine Learning

Unit code COMP34312 10 Level 3 Semester 2 Department of Computer Science No

### Overview

Topic 1: Empirical risk minimization, regularisation; bias/variance theory and the relation to overfitting; probabilistic view: likelihood vs loss, introducing exponential families.

Topic 2: Information theory: KL-divergence vs. cross-entropy, mutual information; the view of ML as compression.

Topic 3: Optimization theory (calculus). Why GD?  What are convex and non-convex functions? How do gradients inform how we optimize a function? How can we use second order properties? How can we prove whether a method will converge?

Topic 4: Dimensionality reduction (matrix algebra). “refine, denoise, and visualise your data”.  Data usually has limited degrees of interest, living on a low-dimensional manifold within a high-dimensional space. This topic will introduce students to matrix-algebra-intensive methods used to learn feature dimensions that can aid in your model fitting process. Examples include PCA, spectral embedding, Fisher discriminant analysis, etc. These allow visualisation, denoising and enhancing the separation of data for classification.

### Pre/co-requisites

Unit title Unit code Requirement type Description
Machine Learning COMP24112 Pre-Requisite Compulsory

To enrol students are required to have taken COMP24112

### Aims

Machine Learning has certain mathematical “building blocks”, which turn up in the study of all types of models and algorithms.  Specifically, these building blocks utilise techniques from probability theory, matrix algebra, and calculus.

This module aims to introduce students to these, and then show how to: (1) read and correctly interpret research papers in this context; and (2) understand how novel algorithms are devised in modern ML.

There will be no required coding/practical algorithm development.  The module aims to be a stepping-stone toward research, either in industry or in a PhD.

### Learning outcomes

• Discuss key mathematical terms in ML, e.g. bias/variance, entropy/cross-entropy, regularisation, the duality between the probabilistic vs. loss function view of ML, and their consequences in practical scenarios
• Correctly manipulate and interpret mathematical expressions for the likelihood of models, entropies and mutual information between random variables.
• Explain taught linear algebra concepts and methods, e.g., vector space/subspace, basis, linear independence, rank, inverse, orthogonality, singular value decomposition, eigen-decomposition.
• Explain and compare the nature and advantages/disadvantages of dimensionality reduction methods, e.g., PCA, spectral embedding, FDA, and how they make use of linear algebra concepts.
• Discuss and interpret data / concepts on convex and non-convex optimisation, including convergence properties and proof techniques to explain stochastic gradient descent.

### Syllabus

Topic 1: Empirical risk minimization, regularisation; bias/variance theory and the relation to overfitting; probabilistic view: likelihood vs loss, introducing exponential families.

Topic 2: Information theory: KL-divergence vs. cross-entropy, mutual information; the view of ML as compression.

Topic 3: Optimization theory (calculus). Why GD?  What are convex and non-convex functions? How do gradients inform how we optimize a function? How can we use second order properties? How can we prove whether a method will converge?

Topic 4: Dimensionality reduction (matrix algebra). “refine, denoise, and visualise your data”.  Data usually has limited degrees of interest, living on a low-dimensional manifold within a high-dimensional space. This topic will introduce students to matrix-algebra-intensive methods used to learn feature dimensions that can aid in your model fitting process. Examples include PCA, spectral embedding, Fisher discriminant analysis, etc. These allow visualisation, denoising and enhancing the separation of data for classification.

### Teaching and learning methods

Unit will consist of 4 major topics, delivered in 2-week blocks before Easter.  Each topic will consist of videos to watch, and readings to cover, before the interactive sessions. The interactive sessions will act to reinforce the videos/readings.  Weekly MCQs in class will be used as formative and summative assessments.

After Easter, a series of carefully selected classic research papers will be read week by week in groups, introducing students to the methods in how to read/interpret research results.  Presentations of the papers will consolidate the depth of understanding.

### Assessment methods

Method Weight
Written exam 80%
Practical skills assessment 20%

### Feedback methods

Correct answers discussed the following week

Selected chapters: Machine Learning, A Probabilistic Perspective by Kevin Murphy;

Selected chapters: Probability in Data Science, by Sidney Chan

### Study hours

Scheduled activity hours
Assessment written exam 2
Lectures 11
Practical classes & workshops 11
Independent study hours
Independent study 76

### Teaching staff

Staff member Role
Gavin Brown Unit coordinator