MSc Data Science (Computer Science Data Informatics)

Year of entry: 2025

View tabs
View full page

Course unit details:
Statistics and Machine Learning 1: Statistical Foundations

Course unit fact file
Unit code	DATA70121
Credit rating	15
Unit level	FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s)	Semester 1
Available as a free choice unit?	No

Overview

The module consists of a mixture of lectures designed to communicate key ideas in statistics and machine learning with practical sessions in which students will apply, and in simple cases, develop tools using Python and, where appropriate, other industry standard languages such as R.

There are five main sections:

1. Thinking probabilistically: random variables, distributions and models for data.

2. Exploratory data analysis: kinds of data, descriptive statistics and visualisation tools.

3. Statistical estimation: point estimation, bias, maximum likelihood estimates, tests of difference, confidence intervals and hypothesis testing, Bayesian estimation, prior and posterior distributions, conjugate priors.

4. Comparison and selection of models: linear regression, generalised linear regression, measures of goodness-of-fit and predictive power, comparison of models, generalisation to semi- and non-parametric approaches as well as hierarchical and spatial models, overfitting and regularisation.

5. Special Topic: Depending on the teaching staff, a special topic will be chosen to demonstrate the general concepts in more depth. A likely example is Social Networks: networks and statistical models for them including Erdős -Rényi random graphs and exponential random graph models; network statistics including degree distribution, homophily and transitivity.

Aims

The unit aims to:

• introduce students to the main ideas and methods of statistical approaches to data science, based on probability models, likelihoods and estimators, including such modern
developments as Gaussian processes and regularisation;

• enable students to explore data and to choose, fit, interpret and critique a range of standard and advanced statistical models;

• enable students to communicate-in writing and in presentations-statistical analyses to audiences with varying levels of technical expertise.

Learning outcomes

Students should be able to:

• Explain what probabilistic models are and can do: the sorts of relationships they can capture and the sorts of understanding and predictions they can yield;

• Explain and critique statistical models;

• Perform exploratory data analyses, fit standard statistical models and prepare illuminating visualisations;

• Present the results of statistical analyses, both in writing and orally, justifying modelling choices and communicating effectively with audiences at various levels of statistical expertise.

Teaching and learning methods

Lectures will introduce keys ideas from probability, statistics and explain how to use them ideas to interpret the results of, for example, regression models. Computer-based practicals will allow the students to develop their software skills and to apply standard tools from R, Python or any other industry standard language to perform statistical analyses and prepare visualisations.

Assessment methods

Method	Weight
Written exam	80%
Written assignment (inc essay)	20%

Feedback methods

Feedback available via Turnitin

Teaching staff

Staff member	Role
Mark Muldoon	Unit coordinator

Return to course details