# MSc Data Science (Applied Urban Analytics)

Year of entry: 2023

## Course unit details:Statistics and Machine Learning 1: Statistical Foundations

Unit code DATA70121 15 FHEQ level 7 – master's degree or fourth year of an integrated master's degree No

### Overview

The module consists of a mixture of lectures designed to communicate key ideas in statistics and machine learning with practical sessions in which students will apply, and in simple cases develop, tools in Python and, where appropriate, other industry standard languages. There are five main sections:

•           Thinking probabilistically: random variables, distributions and models for data

•           Exploratory data analysis: kinds of data, descriptive statistics and visualisation tools

•     Statistical estimation: point estimation, bias, maximum likelihood estimates, tests of difference, confidence intervals and hypothesis testing, Bayesian estimation, prior and posterior distributions, conjugate priors.

•      Comparison and selection of models: linear regression, generalised linear regression, measures of goodness-of-fit and predictive power, comparison of models, generalisation to semi- and non-parametric approaches as well as hierarchical and spatial models, overfitting and regularisation.

•       Special Topic: Special Topic: Depending on the teaching staff, one special topic will be chosen to demonstrate the general concepts in more depth. A likely example is Social Networks: networks and statistical models for them including Erd¿s-Rényi random graphs and exponential random graph models; network statistics including degree distribution, homophily and transitivity.

### Aims

The unit aims to:

1. introduce students to the main ideas and methods of statistical approaches to data science, based on probability models, likelihoods and estimators, including such modern developments as Gaussian processes and regularisation;
2. enable students to explore data and to choose, fit, interpret and critique a range of standard and advanced statistical models;
3. enable students to communicate—in writing and in presentations—statistical analyses to audiences with varying levels of technical expertise.

### Learning outcomes

Students should be able to:

• Explain what probabilistic models are and can do: the sorts of relationships they can capture and the sorts of understanding and predictions they can yield.
• Explain and critique statistical models
• Perform exploratory data analyses, fit standard statistical models and prepare illuminating visualisations.
• Present the results of statistical analyses, both in writing and orally, justifying modelling choices and communicating effectively with audiences at various levels of statistical expertise.

### Assessment methods

Method Weight
Written exam 40%
Written assignment (inc essay) 60%

### Recommended reading

•             Sheldon Ross (2014), A First Course in Probability, 9th edition, Pearson. ISBN 9780321926678
•             Thomas Halswanter (2016), An Introduction to Statistics with Python, Springer. ISBN 9783319283159
•             Simon Rogers & Mark Girolami (2017), A First Course in Machine Learning, 2nd edition, Chapman & Hall/CRC. ISBN 9781498738484
•             G. James, D. Witten, T. Hastie, and R. Tibshirani (2013), An Introduction to Statistic Learning with Applications in R. Springer-Verlag, New York. ISBN 9781461471370
•             T. Hastie, R. Tibshirani, and J. Friedman. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, Springer-Verlag, New York. ISBN 9780387848587
•             John Tukey (1977), Exploratory Data Analysis, Addison Wesley. ISBN 0201076160
•             Carl E. Rasmussen and Christopher K. I. Williams (2009), Gaussian Processes for Machine Learning, MIT Press. ISBN: 026218253X.

### Teaching staff

Staff member Role
Mark Muldoon Unit coordinator

Return to course details