MSc Computational and Corpus Linguistics

Year of entry: 2025

Course unit details:
Foundational statistics with R

Course unit fact file
Unit code LELA60141
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 1
Available as a free choice unit? No

Overview

This course aims to familiarize students with the basic concepts of statistics through hands-on practice and to build a foundation for more advanced studies in natural language processing. Topics covered in the course include distributions of data, basic principles of probability, describing and visualizing quantitative data, statistical modelling and interpreting quantitative data through hypothesis testing.

Aims

The unit aims to:

  • Familiarize students with basic statistical concepts and terms necessary to understand and perform quantitative research
  • Foster understanding of the principles of describing, visualizing, and interpreting data
  • Enable students to develop R programming skills needed to work with quantitative data
  • Foster organisational, evaluative and critical thinking skills necessary for conducting quantitative research
  • Provide the mathematical foundations for applying regression methods in computational linguistics

Syllabus

Week 1: Variable types and introduction to R and Rstudio

Week 2: Descriptive statistics and visualisations

Week 3: Introduction to the Linear Model

Week 4: Correlation and data transformation

Week 5: Multiple regression

Week 6: Reading week  

Week 7: Regression with categorical predictors

Week 8: Interactions and nonlinear effects

Week 9: Logistic regression 

Week 10: Statistical Inference

Week 11: Mixed models 1 

Week 12: Mixed models 2
 

Teaching and learning methods

Weekly 2-hour lecture (online asynchronous). These will introduce the theoretical and technical content to the topics covered in the seminars. Asynchronous delivery will allow students to cover the technical content at their own pace.

Five 2-hour synchronous seminars in computer lab. The focus will be on individual and small group computer-based activities implementing the methods described in the lecture, using R Studio. Analysis code will be provided to students will access them via Blackboard for them to run on their machine. The sessions will consist of collectively working through a series of activities, with students being able to run code provided, combined with exercises that students will complete individually or in small groups. The instructor will circulate and provide assistance as needed. On occasion the whole class will collaborate to provide a solution.  


Reading assignments (beyond the main textbook), revision quizzes and additional exercises will be provided between sessions with the Blackboard Discussion Board being used for interaction between students and instructors.

Knowledge and understanding

Students will be able to: 

  • Demonstrate understanding of fundamentals of quantitative analysis for data analysis
  • Demonstrate knowledge of basic statistical methods
  • Recall key principles for effective description and visualisation of data
  • Compare characteristics of basic statistical models

Intellectual skills

Students will be able to: 

  • Identify appropriate descriptive and data visualization methods for different types of data
  • Choose the appropriate statistical model for the type of data under analysis
  • Reformulate a research question into a statistical hypothesis 

Practical skills

Students will be able to: 

  • Create visualizations and summarizations of data
  • Fit a statistical model
  • Write computer code in R to carry out a statistical analysis, from data description to model fitting 

Transferable skills and personal qualities

Students will be able to: 

  • Explore and analyse quantitative data to extract information
  • Draw inferences about the relationships of latent variables from quantitative data
  • Generalise their quantitative analysis skills to new and unfamiliar scenarios
  • Develop time management skills by working to deadline 
     

Assessment methods

Assessment TaskFormative or SummativeWeighting
In-class activitiesFormative0%
Research Report Summative50%
ExamSummative50%

Feedback methods

Research Report - Via TurnItIn 15 working days after submission

Exam - Within 15 working days after submission

 

Recommended reading

Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

Dancey, C. P. & Reidy, J. (2007). Statistics without maths for psychology. Pearson Education. 

Study hours

Scheduled activity hours
Lectures 22
Tutorials 10
Independent study hours
Independent study 118

Teaching staff

Staff member Role
Andrea Nini Unit coordinator
Patrycja Strycharczuk Unit coordinator

Return to course details