MSci Biology / Course details

Year of entry: 2024

Course unit details:
MSci Reproducible Data Science

Course unit fact file
Unit code BIOL33031
Credit rating 10
Unit level Level 3
Teaching period(s) Semester 1
Available as a free choice unit? No

Overview

This unit will provide students with the skills needed to engage in reproducible data science. Students will learn how to wrangle data, build data visualisations, and model their data using the open source data science software, R. Each of the sessions will be run as a combined seminar and hands-on coding workshop. Students will learn how to use a reproducible workflow to generate reproducible analysis. They will also learn about general computational skills such as using git and GitHub for version control, and Binder for building reproducible computational environments.  Graduates with data science skills are in high demand, with skills in using R particularly desirable to employers across the academic, industrial, and business sectors. This unit will provide students with a grounding in data science using R and the knowledge to build on this foundation for the development of more focused skills (such as machine learning using R).

Aims

The unit aims to increase the students understanding of the following:

      To familiarise students with the tools to engage in reproducible research and data science practices.

      To familiarise students with the principles of Reproducibility and Open Science (incl. pre-registration of experiments, open data, and open analysis) and the problems that arise from Questionable Research Practices (QRPs).

      To familiarise students with the principles of programming and analysis in R (incl. linear mixed models), and the use of R Markdown or generate reproducible analyses and presentations.

      To provide students with the experience of advanced decision-making in the application of different statistical tests to different research questions.

      To provide students working in small groups with the experience of using and programming in R for reproducible data analysis.

Syllabus

This module will consist of 6 workshops – each workshop will involve a mix of seminar and hands-on programming.  The six workshops are as follows:

1. Reproducibility and R
2. The Linear Model (Regression)
3. The Linear Model (ANOVA)
4. Mixed Models
5. Data Simulation and Advanced Data Visualisation
6. Reproducible Computational Environments and Presentations
 

Teaching and learning methods

Practical sessions in computer labs.

Knowledge and understanding

Demonstrate an understanding of the principles of Open Science and the need for reproducibility in research.
Develop an understanding of the logic underlying the use of programming and building statistical models in R, and the range of circumstances appropriate for their use.

Intellectual skills

Design and interpret complex statistical models using diverse approaches.

Practical skills

Acquire experience of cutting edge data science methodologies for reproducible research.

Transferable skills and personal qualities

Problem solving.
Programming.
Data presentation.
Time management.

Employability skills

Analytical skills
students will learn the basis of coding and building statistical models in R.
Innovation/creativity
students will be encouraged to develop their coding skills and apply them to new research problems (including extracting meaning from large data sets).
Project management
students will develop coding skills and solutions for all stages of the reproducible research workflow.
Written communication
students will produce coursework using R Markdown which combines code, output, and narrative to produce a reproducible document.

Assessment methods

Method Weight
Written assignment (inc essay) 100%

One R-based assignment produced using R Markdown worth 100%.

Feedback methods

During the hands-on coding sessions, students will receive formative feedback associated with each of the practical problems that they will be engaged with.

Recommended reading

Grolemund, G, & Wickham, H. (2017). R for Data Science, O’Reilly. (https://r4ds.had.co.nz)

Study hours

Scheduled activity hours
Practical classes & workshops 12
Tutorials 12
Independent study hours
Independent study 76

Teaching staff

Staff member Role
Danna Gifford Unit coordinator

Return to course details