
Course unit details:
Statistics & Machine Learning 2: AI, Complex Data, Computationally Intensive Statistics
Unit code | DATA70132 |
---|---|
Credit rating | 15 |
Unit level | FHEQ level 7 – master's degree or fourth year of an integrated master's degree |
Teaching period(s) | Semester 2 |
Offered by | |
Available as a free choice unit? | No |
Overview
The module is delivered as a mixture of lectures and practical sessions and has five main sections:
- Dimension reduction and feature extraction: principal components analysis, feature selection, information theory.
- Classifiers and clustering: supervised and unsupervised learning, k-means and k-nearest neighbours, agglomerative clustering and dendrograms, support vector machines, linear and quadratic discriminants, Gaussian process classification, model-based clustering, mixture models and the EM algorithm.
- Neural Networks and Deep Learning: perceptrons, back-propagation and multi-layer networks.
- Markov-chain Monte Carlo (MCMC) methods: Markov chains and their stationary distributions, likelihood-based inference using the Metropolis-Hastings algorithm, likelihood-free inference using Approximate Bayesian Computation, tests for convergence, applications to Bayesian inference.
- Special Topic: Depending on the teaching staff, one special topic will be chosen to go into near-research depth, e.g. Random Forests; Social Networks; Advanced Monte Carlo methods.
Aims
The unit aims to:
Introduce students to a selection of modern methods widely used in Data Science that can go beyond standard statistical frameworks. It builds on the foundation laid in Statistics and Machine Learning 1 and is strongly focussed on applications, aiming to train students to be informed users of existing algorithms.
Learning outcomes
Students should be able to:
- Define the key terms from each of the module’s five sections
- Understand when to apply a given learning algorithm and how to judge its success, including questions of convergence and computational performanc
- Construct classifiers that capture features of already-understood data and exploit them to classify new data (supervised learning)
- Use classification algorithms to discover and exploit previously-unknown structure in data (unsupervised learning)
- Construct and train neural networks
- Use MCMC methods to estimate parameters and quantify uncertainty
- Present justifying choices of algorithm and communicating effectively with both technical and non-technical audiences.
Teaching and learning methods
The five sections of this module are essentially self-contained subunits. Each consists of a series of lectures that introduce key concepts and serve as support for practical sessions in which the students apply python-based software tools to data analysis problems.
Assessment methods
Assessment task | Length | How and when feedback is provided | Weighting | |||
Written exam: | 1 hour 40% | Generic feedback available to the whole cohort. |
| |||
Four written coursework assignments, that will include computational exercises. | 2*1000 words and 2*500 words each plus figures. | Feedback methodsFeedback will be made available through Turnitin Recommended reading
Teaching staff
|