MSc Computational and Corpus Linguistics

Year of entry: 2025

View tabs
View full page

Course unit details:
Corpus Linguistics

Course unit fact file
Unit code	LELA60112
Credit rating	15
Unit level	FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s)	Semester 2
Available as a free choice unit?	No

Overview

Corpus linguistics is the study of language through the use of large (usually digital) collections of text known as corpora. This course guides students through the process of collecting data from such corpora, analysing them with advanced statistical techniques to evaluate hypotheses relevant to the field of linguistics, and write up the results in a research paper. Students will learn about essential concepts in corpus linguistics, become familiar with design principles of corpora, and apply their knowledge of statistics and visualisation to practical linguistic problems.

Pre/co-requisites

In order to take this unit, students must have sufficient skills/knowledge in statistics (e.g. UG training in fitting and interpreting general and generalized linear models using R). The module co-ordinator will decide upon whether a student has a sufficient prior background following an initial meeting with the student.

Aims

The unit aims to:

Provide experience with testing linguistic hypotheses using corpus data
Provide opportunities to engage with linguistic concepts, such as alternations and grammatical functions, using corpora.
Provide experience with applying programming skills and statistical knowledge to corpus processing and analysis
Enable students to visualise patterns in data and model predictions
Foster critical thinking in the discussion of research literature and interpretation of new findings
Develop students' ability to report on corpus-based research and to demonstrate analytical and presentation skills

Syllabus

Week 1: Introduction (outline and syllabus, definitions, potential applications, history of CL, theoretical vs. corpus linguistics)

Week 2: Corpus basics (Corpus design, corpus construction, overview over available corpora, representativeness, important terminology, scientific method, relevance of hypotheses)

Introduce the linguistic topic for the final term paper. Group assignment.

Week 3: Data collection 1 (data storage, retrieving data with concordance software, online interfaces, illustrations of data collection with other software)

Seminar 1: Basic homework exercises. General remarks on writing a good paper in corpus linguistics.

Week 4: Data collection 2 (Using python to get data, regular expressions, scripting)

Week 5: Data collection 3 (Using python to get data, tagged and parsed corpora, other annotations)

Seminar 2: Discuss homework on methodology and data collection.

Week 6: Reading Week

Hand in data collection for the final term paper.

Week 7: Statistics 1 (Chi Square Test, R)

Seminar 3: Group presentation of background literature.

Week 8: Statistics 2 (Revision: Logistic Regression, R)

Week 9: Statistics 3 (Mixed Effects Logistic Regression, R)

Seminar 4: Literature review on the linguistic topic. Discussion of some relevant statistical notions for the final paper, like effect size measures, uncertainties, hypothesis testing.

Week 10: Statistics 4 (Model evaluation, comparison, variable selection, R)

Week 11: Statistics 5 (Other statistical methods, R)

Seminar 5: Exercises and Q&A for the statistics discussed. Specific requirement for term paper.

Week 12: Conclusion

Hand in final term paper.

Teaching and learning methods

Weekly 2 hour synchronous lecture. The theoretical and technical content will be followed up in the seminars.

Five 2 hour synchronous seminars

Knowledge and understanding

Apply key concepts in corpus linguistics to specific problems
Apply statistical techniques relevant to corpus linguistics to specific problems
Describe a corpus in terms of content, size, annotations, as part of a research paper.
Identify and explain linguistic variables, understand what influences them, and test them using statistical methods.

Intellectual skills

Critically engage with the results of current corpus-based research and report on them in a research paper.
Deduce testable hypotheses from the academic literature and apply them to a research paper.
Select an appropriate statistical model and use it to explain and evaluate a quantitative hypothesis.

Practical skills

Demonstrate the ability to load, store, search, manipulate, prepare and analyse large amounts of textual data with a computer.
Apply statistical models to real-world problems
Write computer code in R and Python

Transferable skills and personal qualities

Develop time management skills by working to a deadline.
Present results in a professional manner to a specialist audience using a range of engaging media.
Demonstrate the ability to work collaboratively on a range of data and industry-related tasks in an academic setting
Retrieve, gather and organise linguistic data from various sources and use it in a research paper.

Assessment methods

Assessment Task	Formative or Summative	Weighting
Coursework	Formative	0%
Term Paper	Summative	65%
Seminar Presentation	Summative	10%
Exam	Summative	25%

Feedback methods

Oral feedback on coursework will be given in the seminars.

Written feedback will be given on the Term Paper, Seminar Presentation and Exam.

Study hours

Scheduled activity hours
Lectures	22
Seminars	10

Independent study hours
Independent study	118

Teaching staff

Staff member	Role
Richard Zimmermann	Unit coordinator

Return to course details