Course unit details:
Multivariate Statistics
Unit code | MATH68061 |
---|---|
Credit rating | 15 |
Unit level | FHEQ level 7 – master's degree or fourth year of an integrated master's degree |
Teaching period(s) | Semester 1 |
Available as a free choice unit? | No |
Overview
In practice, most data sets are multivariate, consisting of observations on several variables for each individual or object. Such data sets arise in many fields, including science, social sciences, and medicine, making techniques for their analysis an important area of statistics. This course introduces a variety of techniques, some of which generalise univariate methods, while others introduce new approaches (e.g. principal component analysis). The course focuses on continuous multivariate data, providing students with the theoretical tools and foundational knowledge necessary for advanced machine learning studies. Students will gain practical experience using R for data analysis throughout the course.
Pre/co-requisites
Unit title | Unit code | Requirement type | Description |
---|---|---|---|
Probability and Statistics 2 | MATH27720 | Pre-Requisite | Recommended |
Students are not permitted to take more than one of MATH38061 or MATH48061 for credit in the same or different undergraduate year.
Students are not permitted to take MATH48061 and MATH68061 for credit in an undergraduate programme and then a postgraduate programme.
Note that MATH68061 is an example of an enhanced level 3 module as it includes all the material from MATH38061.
When a student has taken level 3 modules which are enhanced to produce level 6 modules on an MSc programme taken within the School of Mathematics, then they are limited to a maximum of two such modules (with no alternative arrangements available otherwise)
Aims
The unit aims to:
Familiarise students with the ideas and methodology of foundational multivariate statistics, preparing them for advanced machine learning concepts. This includes their application in data analysis using the R statistical computing package.
Learning outcomes
On successful completion of the course students will be able to:
- Explain the properties of random vectors and matrices, and calculate sample statistics (mean vectors, covariance matrices, correlation matrices) with transformations to simplify analysis.
- Apply the multivariate normal distribution and related distributions (e.g. Wishart, Hotelling’s T-squared) to analyse and model multivariate data.
- Estimate parameters (mean vector, covariance matrix) from multivariate data using maximum likelihood and Bayesian methods.
- Formulate and test hypotheses about mean structures and covariance matrices, and interpret confidence intervals in multivariate contexts.
- Apply dimension reduction techniques such as PCA, CCA and corresponding analysis to summarise and simplify multivariate data.
- Classify multivariate data using clustering and discriminant analysis.
- Apply popular computing software such as R or Python to visualise, analyse, and interpret multivariate data.
Syllabus
Part A – Core concepts and inferential techniques in multivariate analysis
- Random vectors and matrices – introductory ideas and basic concepts, linear transformations, sample statistics and their properties, overall measures of dispersion in p-space, distances in p-space, simple graphical techniques
- Multivariate normal (MVN) distribution and related distribution – definition, properties, conditional distributions, the Wishart and Hotelling T-squared distribution, sampling distributions of the sample mean vector and covariance matrix, maximum likelihood estimation of the mean vector and covariance matrix
- Inferences about mean structures – hypothesis testing and confidence intervals (one sample and two independent sample procedures), generalised likelihood ratio test, CI for the components of the mean vector(s), other topics such as MANOVA and profile analysis
- Inferences about covariance and correlation matrices – testing covariance structure of a single covariance matrix, testing equality of covariance matrices
Part B – Advanced techniques in multivariate analysis
- Bayesian inference for MVN – including conjugate models with potential exploration of approximation techniques and Monte Carlo methods
- Dimension reduction – techniques such as principal component analysis, factor analysis, canonical correlation analysis, and correspondence analysis
- Cluster Analysis – aims, K-means and hierarchical algorithms, the dendrogram
- Discriminant Analysis – LDA, QDA
Teaching and learning methods
Teaching is composed of two hours of lectures and one tutorial class per week. Teaching materials will be made available online for reference and review.
Assessment methods
Method | Weight |
---|---|
Other | 20% |
Written exam | 80% |
- Coursework: weighting 20%
- End of semester examination: weighting 80%
Feedback methods
Feedback tutorials will provide an opportunity for students' work to be discussed and provide feedback on their understanding. Coursework or in-class tests (where applicable) also provide an opportunity for students to receive feedback. Students can also get feedback on their understanding directly from the lecturer, for example during the lecturer's office hour.
Recommended reading
- Chatfield, C. and Collins, A. J., An Introduction to Multivariate Analysis, Chapman & Hall 1983.
- Krzanowski, W. J., Principles of Multivariate Analysis: A User's Perspective, Oxford University Press 1990.
- Johnson, R. A. and Wichern, D. W., Applied Multivariate Statistical Analysis 3rd edition, Prentice Hall 1992.
- Bishop, Y M, Fienberg, S E, and Holland P W (2007) Discrete Multivariate Analysis. Springer
Study hours
Scheduled activity hours | |
---|---|
Lectures | 22 |
Tutorials | 11 |
Independent study hours | |
---|---|
Independent study | 117 |
Teaching staff
Staff member | Role |
---|---|
Yuk Ka Chung | Unit coordinator |
Additional notes
The independent study hours will normally comprise the following. During each week of the taught part of the semester:
· You will normally have approximately 75-120 minutes of video content. Normally you would spend approximately 2.5-4 hrs per week studying this content independently
· You will normally have exercise or problem sheets, on which you might spend approximately 2-2.5hrs per week
· There may be other tasks assigned to you on Blackboard, for example short quizzes, short-answer formative exercises or directed reading
· In some weeks you may be preparing coursework or revising for mid-semester tests
Together with the timetabled classes, you should be spending approximately 9 hours per week on this course unit.
The remaining independent study time comprises revision for and taking the end-of-semester assessment.
The above times are indicative only and may vary depending on the week and the course unit. More information can be found on the course unit’s Blackboard page.