MSc Statistics / Course details

Year of entry: 2024

Course unit details:
Generalised Linear Models and Survival Analysis

Course unit fact file
Unit code MATH68052
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 2
Offered by Department of Mathematics
Available as a free choice unit? No

Overview

This course unit moves on from linearity and normality, the very strict limitations in linear regression. We study the extension of linearity to nonlinearity and normality to a general class, called the exponential family of distributions. The resulting Generalized Linear Model (GLM) includes the normal linear model as a special case. It can be used to analyse discrete data, such as binomial and Poisson counts, and categorical data that arise very often in biomedical and industrial applications. Survival or time-to-event data are positive valued and sometimes censored. The survival and hazard functions are important concepts and we want to estimate them nonparametrically and parametrically. For the latter, a more suitable distribution than the normal, such as Weibull is often used. There are also regression problems with one or more covariates affecting the survival/hazard function. Proportional hazards, Weibull regression and Accelerated Failure Time models will be considered.

Pre/co-requisites

Unit title Unit code Requirement type Description
Probability 2 MATH20701 Pre-Requisite Recommended
Statistical Methods MATH20802 Pre-Requisite Recommended

Students are not permitted to take more than one of MATH38052 or MATH48052 for credit in the same or different undergraduate year.  Students are not permitted to take MATH48052 and MATH68052 for credit in an undergraduate programme and then a postgraduate programme.

Aims

The aims of the GLM part are to cover an important aspect of modern statistical modelling in an integrated way, and to develop the properties and uses of GLM, focusing on those situations in which the response variable is discrete. The SA part is to introduce some standard techniques in the modelling and analysis of survival data.

Learning outcomes

On successful completion of the unit students will be able to:  

  • use generalised linear models (GLMs),  including logistic regression and log linear models with a Poisson response, to analyse data with dependence on one or more explanatory  variables;
  • fit lifetime distributions and use proportional hazards (PH) and accelerated failure time (AFT) models to analyse survival/lifetime data;
  • write down the fitted model, assess goodness-of-fit, test significance of parameters, compare models and use the chosen model to calculate various quantities of interest;
  • write down a GLM with factors/covariates as appropriate, state the associated assumptions and constraints, derive the likelihood equation and algorithms for model fitting;
  • define, derive and interpret the survival function, hazard rate and cumulative hazard, estimate them parametrically and nonparametrically, construct confidence intervals and test equality between groups;
  • prove that a given distribution belong to the exponential family, work out its mean, variance, variance function, and derive the canonical link.


 

Syllabus

1. Introduction: response and predictors, linear models in matrix notation, (weighted) least squares estimation. Maximum likelihood estimation under the normal assumption. Limitations of the linear model. [3] 

2. The exponential family of distributions: Definition and examples. Derivation of mean and variance. Maximum likelihood estimation. [3] 

3. Generalized linear models: linear predictor, link function, canonical link, likelihood equation, the iterative reweighted least squares algorithm, Fisher information, tests on individual parameters, deviance and scaled deviance, Pearson's chi-square. Residuals and residual plots. Examples of model fitting in R. [6] 

4. Hypothesis tests for model reduction: chi-square or F-tests. Analysis of deviance examples: two factors or a factor and a covariate. [2] 

5. Logistic regression. Odds and odds ratio. LD50. [2]

6. Log linear Poisson models with an offset. [2] 

7. Contingency tables. [2]

8. Survival data. Censoring. The survival, hazard and cumulative hazard functions.  [1]

9. The Weibull and some other lifetime distributions. Fitting the Weibull distribution to survival data with or without censoring. [2] 

10. Kaplan-Meier estimate of the survival function. Nonparametric estimates of hazard and cumulative hazard functions. Confidence intervals. [3] 

11. Proportional hazards models and Cox regression: assumptions and interpretation. Partial likelihood. Weibull regression and accelerated failure time models. [4]

Assessment methods

Method Weight
Other 20%
Written exam 80%
  • Coursework; 2-week take-home assignment: 20%
  • End of semester examination: weighting 80%

Feedback methods

Feedback tutorials will provide an opportunity for students' work to be discussed and provide feedback on their understanding.  Coursework or in-class tests (where applicable) also provide an opportunity for students to receive feedback.  Students can also get feedback on their understanding directly from the lecturer, for example during the lecturer's office hour.

Recommended reading

Recommended:

  • Dobson, A. J. and A. G. Barnett, An Introduction to Generalized Linear Models, Chapman & Hall 2018.
  • · McCullagh, P. and Nelder, J. A., Generalized Linear Models, Chapman & Hall 1990.
  • · Moore, D. F. F., Applied Survival Analysis using R, Springer 2016.

Study hours

Scheduled activity hours
Lectures 32
Tutorials 6
Independent study hours
Independent study 112

Teaching staff

Staff member Role
Jingsong Yuan Unit coordinator

Additional notes

This course unit detail provides the framework for delivery in 20/21 and may be subject to change due to any additional Covid-19 impact.  

Please see Blackboard / course unit related emails for any further updates.

Return to course details