MSc Computational and Corpus Linguistics

Year of entry: 2025

View tabs
View full page

Course unit details:
Computational Linguistics 2

Course unit fact file
Unit code	LELA60332
Credit rating	15
Unit level	FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s)	Semester 2
Available as a free choice unit?	No

Overview

The last two decades have seen an explosion in the use of language technologies-from consumer applications such as Alexa, Google Translate and ChatGPT to behind-the-scenes use by, for example, social media, news and marketing companies. This course unit is the second of two units that are together designed to give a grounding in the field of computer natural language processing (NLP). It will provide hands-on training in technologies for performing syntactic analysis of sentences, translating between languages, generating texts and conducting human-machine conversation. We will focus in particular on ways in which linguistic theory is useful in performing each of these tasks, and conversely how decades of experience in building such systems can inform linguistic theory. Students will develop an understanding of and experience with tuning and deploying large language models such as GPT. It will build on the Python programming skills covered in LELA60341 and LELA60342.

Aims

The unit aims to:

Provide understanding of and opportunities to engage with the challenges of processing real language data
Provide students with the ability to use formal tools of machine learning approaches to processing, representing and generating sequences of words and sentences
Enable students to build and evaluate machine learning models for understanding and generating texts
Enable students to manage, document and report on text understanding and generation projects in computational linguistics

Syllabus

Formal language theory and computing grammar
Phrase-structure parsing
Dependency parsing and semantic interpretation
Recurrent neural networks for language modelling
Recurrent neural networks for text classification
Machine translation
Transformers for text classification
Language models for text generation
Linguistic Interpretation of large language models
Real-world knowledge representation (e.g. knowledge graphs and real-world knowledge in LLMS).

Teaching and learning methods

1 hour asynchronous lecture each week. These will introduce the theoretical and technical content to the topics covered in the seminars. Asynchronous delivery will allow students to cover the technical content at their own pace.

2 hour synchronous seminar in computer lab each week. The focus will be on individual and small group computer-based activities implementing the methods described in the lecture, using Jupyter notebooks. These are documents that combine text and computer code that can be edited and run within the document. Students will access them via Blackboard and run them either on their local machine or using a free cloud computing service such as Google Colab. The sessions will consist of collectively working through the notebook, with students being able to run code provided, combined with exercises that students will complete individually or in small groups. The instructor will circulate and provide assistance as needed. On occasion the whole class will collaborate to provide a solution. As well as providing an excellent pedagogical tool, Jupyter notebooks are a very widely used development environment among academic and industry researchers, and these sessions will provide valuable experience of this, with advanced functionality being introduced as needed.

Knowledge and understanding

Demonstrate understanding of computer comprehension and generation of sentences and longer texts
Demonstrate critical understanding of the theoretical and mathematical foundations of parsing, representation learning and generation
Critically discuss methods for evaluating text understanding and text generation systems
Engage in the debates surrounding the ethics and social implications of text understanding and generation tasks

Intellectual skills

Critically engage with research literatures describing text understanding and generation
Identify open research questions in computational linguistics that text understanding and generation might contribute to, and evaluate the role that the techniques learned might play in addressing them

Practical skills

Write computer programs for text understanding and generation
Use machine learning methods for understanding and generation
Design natural language engineering systems for text understanding and generation
Conduct parsing and generation experiments

Transferable skills and personal qualities

Write a research report describing a text understanding or generation experiment
Document a text understanding or generation experiment using github
Reflect upon the social impact of the technologies developed

Assessment methods

Weekly programming tasks	0%
Research Report	80%
Research Archive	20%

Feedback methods

Oral feedback from academics and peers on the weekly programming tasks.

Written feedback from academics on the Research report and Research Archive through Turnitin.

Study hours

Scheduled activity hours
Lectures	11
Seminars	20

Independent study hours
Independent study	119

Teaching staff

Staff member	Role
Dmitry Nikolaev	Unit coordinator
Colin James Bannard	Unit coordinator

Return to course details