MSc Data Science (Applied Urban Analytics)
Year of entry: 2023
- View tabs
- View full page
Course unit details:
Statistics & Machine Learning 2: AI, Complex Data, Computationally Intensive Statistics
|Unit level||FHEQ level 7 – master's degree or fourth year of an integrated master's degree|
|Teaching period(s)||Semester 2|
|Available as a free choice unit?||No|
The module is delivered as a mixture of lectures and practical sessions and has five main sections:
- Dimension reduction and feature extraction: principal components analysis, feature selection, information theory.
- Classifiers and clustering: supervised and unsupervised learning, k-means and k-nearest neighbours, agglomerative clustering and dendrograms, support vector machines, linear and quadratic discriminants, Gaussian process classification, model-based clustering, mixture models and the EM algorithm.
- Neural Networks and Deep Learning: perceptrons, back-propagation and multi-layer networks.
- Markov-chain Monte Carlo (MCMC) methods: Markov chains and their stationary distributions, likelihood-based inference using the Metropolis-Hastings algorithm, likelihood-free inference using Approximate Bayesian Computation, tests for convergence, applications to Bayesian inference.
- Special Topic: Depending on the teaching staff, one special topic will be chosen to go into near-research depth, e.g. Random Forests; Social Networks; Advanced Monte Carlo methods.
The unit aims to:
Introduce students to a selection of modern methods widely used in Data Science that can go beyond standard statistical frameworks. It builds on the foundation laid in Statistics and Machine Learning 1 and is strongly focussed on applications, aiming to train students to be informed users of existing algorithms.
Students should be able to:
- Define the key terms from each of the module’s five sections
- Understand when to apply a given learning algorithm and how to judge its success, including questions of convergence and computational performanc
- Construct classifiers that capture features of already-understood data and exploit them to classify new data (supervised learning)
- Use classification algorithms to discover and exploit previously-unknown structure in data (unsupervised learning)
- Construct and train neural networks
- Use MCMC methods to estimate parameters and quantify uncertainty
- Present justifying choices of algorithm and communicating effectively with both technical and non-technical audiences.
Teaching and learning methods
The five sections of this module are essentially self-contained subunits. Each consists of a series of lectures that introduce key concepts and serve as support for practical sessions in which the students apply python-based software tools to data analysis problems.
How and when feedback is provided
Generic feedback available to the whole cohort.
Four written coursework assignments, that will include computational exercises.
2*1000 words and 2*500 words each plus figures.
Feedback will be made available through Turnitin