MSc Statistics / Course details

Year of entry: 2021

Coronavirus information for applicants and offer-holders

We understand that prospective students and offer-holders may have concerns about the ongoing coronavirus outbreak. The University is following the advice from Universities UK, Public Health England and the Foreign and Commonwealth Office.

Course unit details:Multivariate Statistics

Unit code MATH68061 15 FHEQ level 7 – master's degree or fourth year of an integrated master's degree Semester 1 Department of Mathematics No

Overview

In practice most sets of data are multivariate in that they consist of observations on several different variables for each of a number of individuals or objects. Indeed, such data sets arise in many areas of science, the social sciences and medicine and techniques for their analysis form an important area of statistics. This course unit introduces a number of techniques, some of which are generalisation of univariate methods, while others are completely new (e.g. principal component analysis). The main part of the course focuses on continuous multivariate data in common with the level 3 module of the same name. The enhancement concentrates for the main part on multivariate methods for discrete data.

Pre/co-requisites

Students are not permitted to take more than one of MATH38061 or MATH48061 for credit in the same or different undergraduate year.

Students are not permitted to take MATH48061 and MATH68061 for credit in an undergraduate programme and then a postgraduate programme.

Note that MATH68061 is an example of an enhanced level 3 module as it includes all the material from MATH38061.

When a student has taken level 3 modules which are enhanced to produce level 6 modules on an MSc programme taken within the School of Mathematics, then they are limited to a maximum of two such modules (with no alternative arrangements available otherwise)

Aims

To familiarise students with the ideas and methodology of certain multivariate methods together with their application in data analysis using the R statistical computing package.

Learning outcomes

On successful completion of the course students will be able to:

• Work with random vectors and matrices to derive results relevant to multivariate sta- tistical inference.
• Import multivariate data stored as plain text into statistical software, visualise the data and run the multivariate analysis techniques covered in the course on it.
• Use data or summary statistics of data to calculate sample mean vectors, variance- covariance matrices, and correlation matrices, as well as to define transformations to simplify analysis.
• Derive the principal components of data with a given covariance structure.
• Define the di¿erence between supervised and unsupervised learning, together with an algorithm for classification of data into two classes for each case.
• Perform unbiased estimation, maximum likelihood estimation and hypothesis testing for multivariate data.
• Derive key properties of the multivariate normal distribution and apply these to the analysis of multivariate data.
• Use contingency tables to test hypotheses and estimate e¿ect sizes for a variety of dis- crete multivariate models.

Syllabus

• Introductory ideas and basic concepts - random vectors and their distribution, linear transformations (including the Mahalanobis transformation), sample statistics and their properties, overall measures of dispersion in p-space, distances in p-space, simple graphical techniques.
• Cluster Analysis - aims, hierarchical algorithms, the dendrogram.
• Principal component analysis - definition and derivation of population PC's, sample PC's, practical considerations, geometrical properties, examples.
• The Multivariate Normal (MVN) distribution - definition, properties, conditional distributions, the Wishart and Hotelling T-squared distributions, sampling distributions of the sample mean vector and covariance matrix, maximum likelihood estimation of the mean vector and covariance matrix.
• Hypothesis testing and confidence intervals (one sample procedures) - the generalized likelihood ratio test, tests on the mean vector, CI's for the components of the mean vector.
• Hypothesis testing and confidence intervals (two independent sample procedures) - tests on the difference between two mean vectors, testing equality of covariance matrices, CI's for the differences in the components of the mean vectors.
• Profile Analysis.
• Discriminant Analysis. [Guided Coursework]
• Techniques for discrete multivariate data incl. discrete multivariate vectors, two-way contingency tables, sampling distributions, odds ratio, testing independence, correspondence analysis, higher order contingency tables, conditional independence, introduction to log-linear models .

Assessment methods

Method Weight
Other 20%
Written exam 80%
• Coursework: weighting 20%
• End of semester examination: weighting 80%

Feedback methods

Feedback tutorials will provide an opportunity for students' work to be discussed and provide feedback on their understanding.  Coursework or in-class tests (where applicable) also provide an opportunity for students to receive feedback.  Students can also get feedback on their understanding directly from the lecturer, for example during the lecturer's office hour.

• Chatfield, C. and Collins, A. J., An Introduction to Multivariate Analysis, Chapman & Hall 1983.
• Krzanowski, W. J., Principles of Multivariate Analysis: A User's Perspective, Oxford University Press 1990.
• Johnson, R. A. and Wichern, D. W., Applied Multivariate Statistical Analysis 3rd edition, Prentice Hall 1992.
• Bishop, Y M, Fienberg, S E, and Holland P W (2007) Discrete Multivariate Analysis. Springer

Study hours

Scheduled activity hours
Lectures 33
Tutorials 11
Independent study hours
Independent study 106

Teaching staff

Staff member Role
Thomas House Unit coordinator