MSc Computational and Corpus Linguistics

Year of entry: 2025

View tabs
View full page

Course unit details:
Computational Linguistics 1

Course unit fact file
Unit code	LELA60331
Credit rating	15
Unit level	FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s)	Semester 1
Available as a free choice unit?	No

Overview

The last two decades have seen an explosion in the use of language technologies-from consumer applications such as Alexa, Google Translate and ChatGPT to behind-the-scenes use by, for example, social media, news and marketing companies. This course unit is the first of two units that are together designed to give a grounding in the field of computer natural language processing (NLP). It will provide hands-on training in technologies for representing words and their meaning, and for classifying sentences and texts according to e.g. topics, sentiments and speaker intentions. It will provide a grounding in the formal mathematical tools and machine learning models that underlie these technologies. It will build on and complement the Python programming skills covered in LELA60341 (Research Methods in CCL1).

Aims

The unit aims to:

Provide understanding of and opportunities to engage with the challenges of processing real language data
Provide students with the ability to use formal machine learning approaches to text classification and tagging
Enable students to build and evaluate machine learning models for classification and tagging tasks
Enable students to manage, document and report on text classification projects in computational linguistics

Syllabus

Morphological analysis (including intro to regular expressions)
N-gram language modelling and intro to part-of-speech tagging (including intro to probablility theory)
Bag of words representations
Representing word meanings (including intro to linear algebra)
Naïve Bayes classification (including more on probablility theory)
Logistic regression for sentiment classification
Multi-class logistic regression for intent classification
Multilayer neural networks
Word embeddings
Part of speech tagging and chunking

Teaching and learning methods

1 hour asynchronous lecture each week to be viewed prior to the seminar. These will introduce the theoretical and technical content to the topics covered in the seminars. Asynchronous delivery will allow students to cover the technical content at their own pace.

2 hour synchronous seminar in computer lab each week. The focus will be on individual and small group computer-based activities implementing the methods described in the lecture, using Jupyter notebooks. These are documents that combine text and computer code that can be edited and run within the document. Students will access them via Blackboard and run them either on their local machine or using a free cloud computing service such as Google Colab. The sessions will consist of collectively working through the notebook, with students being able to run code provided, combined with exercises that students will complete individually or in small groups. The instructor will circulate and provide assistance as needed. On occasion the whole class will collaborate to provide a solution. As well as providing an excellent pedagogical tool, Jupyter notebooks are a very widely used development environment among academic and industry researchers, and these sessions will provide valuable experience of this, with advanced functionality being introduced as needed.

Knowledge and understanding

Demonstrate understanding of the classification and tagging of text
Demonstrate critical understanding of the theoretical and mathematical foundations of classification and tagging
Engage in the debates surrounding the ethics and social implications of classification and tagging tasks

Intellectual skills

Critically engage with research literatures describing text classification and tagging
Identify open research questions in computational linguistics that classification and tagging might contribute to, and evaluate the role that the techniques learned might play in addressing them

Practical skills

Write computer programs for text preprocessing, text classification and tagging
Perform text classification using generalised linear models
Use machine learning methods for classification and tagging
Design natural language engineering systems for text classification

Transferable skills and personal qualities

Write a research report describing a text classification experiment
Document a text classification experiment using github
Reflect upon the social impact of the technologies developed

Assessment methods

Assessment task	Formative or Summative	Weighting
Weekly Programming Tasks	Formative	0%
Research Report - 2500 words	Summative	80%
Research Archive	Summative	20%

Feedback methods

Oral feedback from peers and instructors within seminar sessions	Formative
Written feedback via Turnitin on Research Report and Research Archive	Summative

Study hours

Scheduled activity hours
Lectures	11
Seminars	20

Independent study hours
Independent study	119

Teaching staff

Staff member	Role
Dmitry Nikolaev	Unit coordinator
Colin James Bannard	Unit coordinator

Return to course details