- UCAS course code
- 3A48
- UCAS institution code
- M20
Course unit details:
MSci Reproducible Data Science
Unit code | BIOL33031 |
---|---|
Credit rating | 10 |
Unit level | Level 3 |
Teaching period(s) | Semester 1 |
Available as a free choice unit? | No |
Overview
This unit will provide students with the skills needed to engage in reproducible data science. Students will learn how to wrangle data, build data visualisations, and model their data using the open source data science software, R. Each of the sessions will be run as a combined seminar and hands-on coding workshop. Students will learn how to use a reproducible workflow to generate reproducible analysis. They will also learn about general computational skills such as using git and GitHub for version control, and Binder for building reproducible computational environments. Graduates with data science skills are in high demand, with skills in using R particularly desirable to employers across the academic, industrial, and business sectors. This unit will provide students with a grounding in data science using R and the knowledge to build on this foundation for the development of more focused skills (such as machine learning using R).
Aims
The unit aims to increase the students understanding of the following:
• To familiarise students with the tools to engage in reproducible research and data science practices.
• To familiarise students with the principles of Reproducibility and Open Science (incl. pre-registration of experiments, open data, and open analysis) and the problems that arise from Questionable Research Practices (QRPs).
• To familiarise students with the principles of programming and analysis in R (incl. linear mixed models), and the use of R Markdown or generate reproducible analyses and presentations.
• To provide students with the experience of advanced decision-making in the application of different statistical tests to different research questions.
• To provide students working in small groups with the experience of using and programming in R for reproducible data analysis.
Syllabus
This module will consist of 6 workshops – each workshop will involve a mix of seminar and hands-on programming. The six workshops are as follows:
1. Reproducibility and R
2. The Linear Model (Regression)
3. The Linear Model (ANOVA)
4. Mixed Models
5. Data Simulation and Advanced Data Visualisation
6. Reproducible Computational Environments and Presentations
Teaching and learning methods
Practical sessions in computer labs.
Knowledge and understanding
Demonstrate an understanding of the principles of Open Science and the need for reproducibility in research.
Develop an understanding of the logic underlying the use of programming and building statistical models in R, and the range of circumstances appropriate for their use.
Intellectual skills
Design and interpret complex statistical models using diverse approaches.
Practical skills
Acquire experience of cutting edge data science methodologies for reproducible research.
Transferable skills and personal qualities
Problem solving.
Programming.
Data presentation.
Time management.
Employability skills
- Analytical skills
- students will learn the basis of coding and building statistical models in R.
- Innovation/creativity
- students will be encouraged to develop their coding skills and apply them to new research problems (including extracting meaning from large data sets).
- Project management
- students will develop coding skills and solutions for all stages of the reproducible research workflow.
- Written communication
- students will produce coursework using R Markdown which combines code, output, and narrative to produce a reproducible document.
Assessment methods
Method | Weight |
---|---|
Written assignment (inc essay) | 100% |
One R-based assignment produced using R Markdown worth 100%.
Feedback methods
During the hands-on coding sessions, students will receive formative feedback associated with each of the practical problems that they will be engaged with.
Recommended reading
Grolemund, G, & Wickham, H. (2017). R for Data Science, O’Reilly. (https://r4ds.had.co.nz)
Study hours
Scheduled activity hours | |
---|---|
Practical classes & workshops | 12 |
Tutorials | 12 |
Independent study hours | |
---|---|
Independent study | 76 |
Teaching staff
Staff member | Role |
---|---|
Danna Gifford | Unit coordinator |