Coronavirus information for applicants and offer-holders

We understand that prospective students and offer-holders may have concerns about the ongoing coronavirus outbreak. The University is following the advice from Universities UK, Public Health England and the Foreign and Commonwealth Office.

# MMath Mathematics and Statistics / Course details

Year of entry: 2021

## Course unit details:Multivariate Statistics

Unit code MATH48061 15 Level 4 Semester 1 Department of Mathematics No

### Overview

Almost all real data – from physical, biological, and social science, as well as industry and healthcare – involves recording observations of multiple variables. This course concerns the analysis of such multivariate data, from both a theoretical and practical viewpoint. Some techniques generalise on the univariate case – for example, maximum likelihood estimation. Others are new – for example principal component analysis.

### Pre/co-requisites

Unit title Unit code Requirement type Description
Probability 2 MATH20701 Pre-Requisite Compulsory
Statistical Methods MATH20802 Pre-Requisite Compulsory
MATH48061 pre-requisites

Students are not permitted to take more than one of MATH38061 or MATH48061 for credit in the same undergraduate year.  Students are not permitted to take MATH48061 and MATH68061 for credit in an undergraduate programme and then a postgraduate programme.

### Aims

To provide a modern overview of multivariate statistics including both the underlying math- ematical theory and practical considerations.

### Learning outcomes

On successful completion of the course students will be able to:

• Work with random vectors and matrices to derive results relevant to multivariate sta- tistical inference.
• Import multivariate data stored as plain text into statistical software, visualise the data and run the multivariate analysis techniques covered in the course on it.
• Use data or summary statistics of data to calculate sample mean vectors, variance- covariance matrices, and correlation matrices, as well as to define transformations to simplify analysis.
• Derive the principal components of data with a given covariance structure.
• Define the di¿erence between supervised and unsupervised learning, together with an algorithm for classification of data into two classes for each case.
• Perform unbiased estimation, maximum likelihood estimation and hypothesis testing for multivariate data.
• Derive key properties of the multivariate normal distribution and apply these to the analysis of multivariate data.
• Use contingency tables to test hypotheses and estimate e¿ect sizes for a variety of dis- crete multivariate models.

### Syllabus

Mathematical foundations. Revision of vectors, matrices and random variables. New mate- rial on random vectors and random matrices.

Working with data. Constructing the n × p data matrix X from a data file. Sample mean vec- tor and covariance and correlation matrices. Unbiased estimation of population mean and variance-covariance. Transformation of data including Mahalanobis, standardisation and log- arithmic transformation. Visualisation of data including histograms, scatter plots, kernel den- sity plots and plot matrices.

Parametric multivariate statistics. The multivariate normal distribution, including marginal and conditional distributions. Other parametric distributions such as the multivariate log- normal, the multivariate t, and Gaussian mixtures. Maximum likelihood estimation and confi- dence regions for multivariate statistical models. Hypothesis testing and model selection.

Dimensional reduction. Detailed treatment of principal components analysis as well discus- sion of other methods.

Classification. Supervised versus unsupervised learning. Detailed treatment of discriminant analysis and k-means clustering, as well as discussion of other methods.

Discrete multivariate statistics.  Discrete multivariate sampling distributions.  Construction of contingency tables, and hypothesis testing for di¿erent independence and sampling null models. E¿ect sizes and confidence intervals.

### Assessment methods

Method Weight
Other 20%
Written exam 80%
• Coursework, which will involve applying methods to real data: weighting 20%
• End of semester examination: weighting 80%

# Feedback methods

Feedback will be provided throughout the course, including:

• In tutorials you will be able to ask for and receive feedback on your work and under- standing.
• You can receive feedback from the lecturer in person during the o¿ice hour or at other times.
• You can receive feedback via the Forum on BlackBoard.

• C. Chatfield and A. Collins. Introduction to Multivariate Analysis. Chapman & Hall / CRC Texts in Statistical Science. Taylor & Francis, 1981.

An introductory book slightly below the level of the course.

• A. C. Rencher. Multivariate Statistical Inference and Applications. Wiley Series in Prob- ability and Statistics. John Wiley & Sons, New York, 1998.

The main course text.

• Y. Bishop, S. E. Fienberg, and P. W. Holland.  Discrete Multivariate Analysis: Theory and Practice. Massachusetts Institute of Technology Press, Cambridge, 1975.

Covers the discrete case.

• S. Rogers and M. Girolami. A First Course in Machine Learning. CRC Press, Boca Raton, Florida, 2 edition, 2016.

Deals with aspects of machine learning relevant to this course.

### Study hours

Scheduled activity hours
Lectures 33
Tutorials 11
Independent study hours
Independent study 106

### Teaching staff

Staff member Role
Thomas House Unit coordinator