MSc Computational and Corpus Linguistics

Year of entry: 2025

Course unit details:
Computational Linguistics 2

Course unit fact file
Unit code LELA60332
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 2
Available as a free choice unit? No

Overview

The last two decades have seen an explosion in the use of language technologies-from consumer applications such as Alexa, Google Translate and ChatGPT to behind-the-scenes use by, for example, social media, news and marketing companies. This course unit is the second of two units that are together designed to give a grounding in the field of computer natural language processing (NLP). It will provide hands-on training in technologies for performing syntactic analysis of sentences, translating between languages, generating texts and conducting human-machine conversation. We will focus in particular on ways in which linguistic theory is useful in performing each of these tasks, and conversely how decades of experience in building such systems can inform linguistic theory. Students will develop an understanding of and experience with tuning and deploying large language models such as GPT. It will build on the Python programming skills covered in LELA60341 and LELA60342.

Aims

The unit aims to:

  • Provide understanding of and opportunities to engage with the challenges of processing real language data
  • Provide students with the ability to use formal tools of machine learning approaches to processing, representing and generating sequences of words and sentences
  • Enable students to build and evaluate machine learning models for understanding and generating texts
  • Enable students to manage, document and report on text understanding and generation projects in computational linguistics

Syllabus

  1. Formal language theory and computing grammar
  2. Phrase-structure parsing
  3. Dependency parsing and semantic interpretation
  4. Recurrent neural networks for language modelling
  5. Recurrent neural networks for text classification
  6. Machine translation
  7. Transformers for text classification
  8. Language models for text generation
  9. Linguistic Interpretation of large language models
  10. Real-world knowledge representation (e.g. knowledge graphs and real-world knowledge in LLMS).

Teaching and learning methods

1 hour asynchronous lecture each week. These will introduce the theoretical and technical content to the topics covered in the seminars. Asynchronous delivery will allow students to cover the technical content at their own pace.

2 hour synchronous seminar in computer lab each week. The focus will be on individual and small group computer-based activities implementing the methods described in the lecture, using Jupyter notebooks. These are documents that combine text and computer code that can be edited and run within the document. Students will access them via Blackboard and run them either on their local machine or using a free cloud computing service such as Google Colab. The sessions will consist of collectively working through the notebook, with students being able to run code provided, combined with exercises that students will complete individually or in small groups. The instructor will circulate and provide assistance as needed. On occasion the whole class will collaborate to provide a solution. As well as providing an excellent pedagogical tool, Jupyter notebooks are a very widely used development environment among academic and industry researchers, and these sessions will provide valuable experience of this, with advanced functionality being introduced as needed.

Knowledge and understanding

  • Demonstrate understanding of computer comprehension and generation of sentences and longer texts
  • Demonstrate critical understanding of the theoretical and mathematical foundations of parsing, representation learning and generation
  • Critically discuss methods for evaluating text understanding and text generation systems
  • Engage in the debates surrounding the ethics and social implications of text understanding and generation tasks

Intellectual skills

  • Critically engage with research literatures describing text understanding and generation
  • Identify open research questions in computational linguistics that text understanding and generation might contribute to, and evaluate the role that the techniques learned might play in addressing them

Practical skills

  • Write computer programs for text understanding and generation
  • Use machine learning methods for understanding and generation
  • Design natural language engineering systems for text understanding and generation
  • Conduct parsing and generation experiments

Transferable skills and personal qualities

  • Write a research report describing a text understanding or generation experiment
  • Document a text understanding or generation experiment using github
  • Reflect upon the social impact of the technologies developed

Assessment methods

 

Weekly programming tasks0%
Research Report80%
Research Archive 20%

Feedback methods

Oral feedback from academics and peers on the weekly programming tasks. 

Written feedback from academics on the Research report and Research Archive through Turnitin.

Recommended reading

Jurafsky, D. and J. H. Martin (2023). Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd Edition. 

Individual research papers from ACL Anthology (https://aclanthology.org/) to accompany weekly topics as appropriate.

 

Study hours

Scheduled activity hours
Lectures 11
Seminars 20
Independent study hours
Independent study 119

Teaching staff

Staff member Role
Dmitry Nikolaev Unit coordinator
Colin James Bannard Unit coordinator

Return to course details