- UCAS course code
- G100
- UCAS institution code
- M20
Course unit details:
Multivariate Statistics and Machine Learning
Unit code | MATH38161 |
---|---|
Credit rating | 10 |
Unit level | Level 3 |
Teaching period(s) | Semester 1 |
Available as a free choice unit? | No |
Overview
Multivariate statistical models and methods are essential for analysing complex-structured and possibly high-dimensional data from any areas of science and industry, ranging from biology and medicine, and genetics to finance and sociology. Multivariate statistics also provides the foundation of many machine learning algorithms.
In the first part of this module covers the foundations of multivariate data analysis, e.g., multivariate random variables, covariance and correlation, and multivariate regression. In addition, related approaches such dimension reduction and latent variable models are discussed.
The second part of the course is concerned with multivariate approaches for statistical learning in supervised and unsupervised settings, including techniques from machine learning, and their application in pattern recognition, classification, and high-dimensional data analysis.
Pre/co-requisites
Unit title | Unit code | Requirement type | Description |
---|---|---|---|
Probability and Statistics 2 | MATH27720 | Pre-Requisite | Compulsory |
Desirable
Good working knowledge in the R statistical programming language.
Students are not permitted to take more than one of MATH38161 or MATH48061 for credit in the same undergraduate year. Students are not permitted to take MATH48061 and MATH68061 for credit in an undergraduate programme and then a postgraduate programme.
Aims
To familiarise students with the fundamental concepts and ideas underlying multivariate statistical data analysis methods and related supervised and unsupervised machine learning approaches for pattern recognition and classification, as well as with their practical implementation and application using the R statistical programming language.
Learning outcomes
On successful completion of the course students will be able to:
- use the programming language R for multivariate data analysis and graphical presentation
- apply dimension reduction techniques such as PCA and CCA
- perform clustering and classification using tools from both statistics and machine learning
- make good choices among available parametric and nonparametric approaches
- analyse high-dimensional data sets with suitable regularisation techniques
Syllabus
- Multivariate normal model: distributional properties, estimation of covariance and correlation matrix both in large and small sample settings (using likelihood and regularised/shrinkage estimation), connection with multivariate regression. [4]
- Dimension reduction and latent variable models: whitening transformations, Principle Components Analysis (PCA), Canonical Correlation Analysis (CCA), Factor Analysis (FA). [4]
- Unsupervised learning / clustering: model-based clustering (finite normal mixture models), algorithmic approaches (e.g. K-means, hierarchical clustering). [4]
- Supervised learning / classification: Diagonal, Linear, and Quadratic Discriminant Analysis (DDA, LDA, QDA) and regularised versions for high-dimensional data analysis. Further approches to classification (eg support vector machines). [4]
- Nonlinear and Nonparametric models: splines, decision trees, random forest. [4]
Assessment methods
Method | Weight |
---|---|
Other | 30% |
Written exam | 70% |
- Coursework: weighting 30%
- End of semester examination: weighting 70%
Feedback methods
Computer labs will provide an opportunity for students to try out the methods on real data and to get feedback from the instructor. Coursework projects also provide an opportunity for students to receive feedback. Students can also get feedback on their understanding directly from the lecturer, for example during the lecturer's office hour or after class.
Recommended reading
- Härdle, W.K., and L. Simar. 2015. Applied Multivariate Statistical Analysis. Fourth edition. Download within UoM from https://link.springer.com/book/10.1007/978-3-662-45171-7
- Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. PDF freely available online from https://web.stanford.edu/~hastie/ElemStatLearn/
- James, G., D. Witten, T. Hastie and R. Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. Springer. PDF freely available online from http://www-bcf.usc.edu/~gareth/ISL/
Study hours
Scheduled activity hours | |
---|---|
Lectures | 11 |
Tutorials | 11 |
Independent study hours | |
---|---|
Independent study | 78 |
Teaching staff
Staff member | Role |
---|---|
Yuk Ka Chung | Unit coordinator |
Additional notes
The independent study hours will normally comprise the following. During each week of the taught part of the semester:
• You will normally have approximately 60-75 minutes of video content. Normally you would spend approximately 2-2.5 hrs per week studying this content independently
• You will normally have exercise or problem sheets, on which you might spend approximately 1.5hrs per week
• There may be other tasks assigned to you on Blackboard, for example short quizzes or short-answer formative exercises
• In some weeks you may be preparing coursework or revising for mid-semester tests
Together with the timetabled classes, you should be spending approximately 6 hours per week on this course unit.
The remaining independent study time comprises revision for and taking the end-of-semester assessment.
The above times are indicative only and may vary depending on the week and the course unit. More information can be found on the course unit’s Blackboard page.