BA English Language and Russian / Course details

Year of entry: 2024

Course unit details:
From Text to Linguistic Evidence

Course unit fact file
Unit code LELA10402
Credit rating 20
Unit level Level 1
Teaching period(s) Semester 2
Available as a free choice unit? Yes


The study of large amounts of texts allows us to generate linguistic evidence based on language use, rather than on the linguist’s intuitions or prescribed ideas. Linguists create large collections of naturally occurring examples of language stored electronically, which are called ‘corpora’ (singular ‘corpus’). Corpora can include written texts of different genres such as fiction, news, material from the internet, as well as transcriptions of spoken language, etc. Corpus methods are applied in data analysis in linguistics and beyond including in social sciences, law,  education, and even in health sciences, and of course in tech and other data-based industries. In this module, we will focus on the large variety of corpora of English.  

This unit provides a theoretical and practical introduction to corpus linguistics. You will study how corpora are designed, categorised and further annotated, and get an overview of the corpora available to study the English language. You will learn what a good corpus linguistic study involves and how to do one yourself. To this end, you will receive basic training in the use of specialist software such as BNCWeb, Sketch Engine and AntConc. You will learn corpus tools and techniques used to study a variety of linguistic questions, and come to understand how corpus methods can be applied in a variety of linguistic disciplines such as morphosyntax, semantics, pragmatics, sociolinguistics and historical linguistics.


The module aims to:

  • Provide an introduction to corpus linguistics;
  • Provide students with a good understanding of corpus design, annotation and corpus methods;
  • Familiarise students with major corpus resources, tools and techniques for studying English;
  • Teach students the practical skills to use these tools and perform a variety of corpus analyses;
  • Develop a critical awareness of which corpus and which corpus tool can be used to answer a certain linguistic question;
  • Develop a critical attitude towards the strengths and weaknesses of corpus research.


These are examples of topics that will be covered in the lectures and seminars:

  • What is a corpus
  • History of corpus linguistics
  • Overview over available corpora for the English language
  • Corpus-based research design, including turning research questions into searchable queries, selecting the right corpus and the right corpus techniques for specific questions, and the scientific method
  • Data collection from corpora, including the generation of KWIC concordances and word lists
  • Investigating differences in linguistic behaviour between two groups, the concepts of statistical significance and effect size, chi square test, odds ratio
  • Collocations and related techniques
  • Corpus annotation techniques, including part-of-Speech (POS) tagging

Teaching and learning methods

One 1-hour lecture per week

A total of 2 -hours of (computer) seminars per week

Optional individual consultation sessions

Lecture and supporting materials will be made available on Blackboard.

Knowledge and understanding

By the end of this course students will:

  • Understand what corpora are and how they are designed;
  • Be familiar with and able to apply a variety of corpus methods and techniques;
  • Have a good knowledge of the range of corpora and corpus tools available for the study of English.

Intellectual skills

By the end of this course students will be able to:

  • Critically evaluate the design of a particular corpus;
  • Assess the strengths and weaknesses of a corpus approach to a certain problem;
  • Decide which corpus, corpus tool and technique to use to investigate a particular question;
  • Formulate research questions that are amenable to corpus research.

Practical skills

By the end of this course students will be able to:

  • Carry out linguistic investigations using a variety of corpora and corpus tools;
  • Use software to produce concordances, to create word lists, to perform key word analyses, etc.;
  • Perform simple statistical tests;
  • Design a corpus study;
  • Confidently explore unknown corpora and tools, including those for other languages than English.

Transferable skills and personal qualities

By the end of this course students will have developed:

  • Advanced problem solving skills;
  • New IT skills;
  • Confidence in working with new resources and techniques;
  • Essay writing skills;
  • Critical attitude towards research methods.

Assessment methods

Assessment TaskFormative or SummativeWeighting
2 part Practical ExerciseFormative and Summative30%
Mock ExamFormative 0%




Feedback methods

Feedback method

Formative or summative

 Personalized written feedback from course   instructors on all submitted assignments.


 Feedback from instructors during seminars and on   discussion fora



Recommended reading

The main readings will be taken from:

  • Hoffmann, S., Evert, S., Smith, N. and Lee, D. (2008) Corpus Linguistics with BNCweb: A Practical Guide. Frankfurt am Main: Peter Lang.
  • McEnery, T., and Harie. A. (2012) Corpus Linguistics. Cambridge: Cambridge University Press.
  • McEnery, T., Xiao, R., Tono, Yuki. (2006) Corpus-Based Language Studies. An Advanced Resource Book. London: Routledge.

Additional (suggested) readings will be provided week by week when necessary.

Study hours

Scheduled activity hours
Assessment written exam 2
Lectures 11
Seminars 22
Independent study hours
Independent study 165

Teaching staff

Staff member Role
Richard Zimmermann Unit coordinator

Return to course details