MSc Computational and Corpus Linguistics

Year of entry: 2025

Course unit details:
Computational Linguistics 1

Course unit fact file
Unit code LELA60331
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 1
Available as a free choice unit? No

Overview

The last two decades have seen an explosion in the use of language technologies-from consumer applications such as Alexa, Google Translate and ChatGPT to behind-the-scenes use by, for example, social media, news and marketing companies. This course unit is the first of two units that are together designed to give a grounding in the field of computer natural language processing (NLP). It will provide hands-on training in technologies for representing words and their meaning, and for classifying sentences and texts according to e.g. topics, sentiments and speaker intentions. It will provide a grounding in the formal mathematical tools and machine learning models that underlie these technologies. It will build on and complement the Python programming skills covered in LELA60341 (Research Methods in CCL1).

Aims

The unit aims to:

  • Provide understanding of and opportunities to engage with the challenges of processing real language data
  • Provide students with the ability to use formal machine learning approaches to text classification and tagging
  • Enable students to build and evaluate machine learning models for classification and tagging tasks
  • Enable students to manage, document and report on text classification projects in computational linguistics

Syllabus

  1. Morphological analysis (including intro to regular expressions)
  2. N-gram language modelling and intro to part-of-speech tagging (including intro to probablility theory)
  3. Bag of words representations
  4. Representing word meanings (including intro to linear algebra)
  5. Naïve Bayes classification (including more on probablility theory)
  6. Logistic regression for sentiment classification
  7. Multi-class logistic regression for intent classification
  8. Multilayer neural networks
  9. Word embeddings
  10. Part of speech tagging and chunking

Teaching and learning methods

1 hour asynchronous lecture each week to be viewed prior to the seminar. These will introduce the theoretical and technical content to the topics covered in the seminars. Asynchronous delivery will allow students to cover the technical content at their own pace.

2 hour synchronous seminar in computer lab each week. The focus will be on individual and small group computer-based activities implementing the methods described in the lecture, using Jupyter notebooks. These are documents that combine text and computer code that can be edited and run within the document. Students will access them via Blackboard and run them either on their local machine or using a free cloud computing service such as Google Colab. The sessions will consist of collectively working through the notebook, with students being able to run code provided, combined with exercises that students will complete individually or in small groups. The instructor will circulate and provide assistance as needed. On occasion the whole class will collaborate to provide a solution. As well as providing an excellent pedagogical tool, Jupyter notebooks are a very widely used development environment among academic and industry researchers, and these sessions will provide valuable experience of this, with advanced functionality being introduced as needed.

Knowledge and understanding

  • Demonstrate understanding of the classification and tagging of text
  • Demonstrate critical understanding of the theoretical and mathematical foundations of classification and tagging
  • Engage in the debates surrounding the ethics and social implications of classification and tagging tasks

Intellectual skills

  • Critically engage with research literatures describing text classification and tagging
  • Identify open research questions in computational linguistics that classification and tagging might contribute to, and evaluate the role that the techniques learned might play in addressing them

Practical skills

  • Write computer programs for text preprocessing, text classification and tagging
  • Perform text classification using generalised linear models
  • Use machine learning methods for classification and tagging
  • Design natural language engineering systems for text classification

Transferable skills and personal qualities

  • Write a research report describing a text classification experiment
  • Document a text classification experiment using github
  • Reflect upon the social impact of the technologies developed

Assessment methods

Assessment taskFormative or SummativeWeighting
Weekly Programming TasksFormative0%
Research Report - 2500 wordsSummative 80%
Research Archive Summative 20%

Feedback methods

Oral feedback from peers and instructors within seminar sessionsFormative
Written feedback via Turnitin on Research Report and Research ArchiveSummative

Recommended reading

  • Jurafsky, D. and J. H. Martin (2023). Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd Edition. Free online at https://web.stanford.edu/~jurafsky/slp3/ 
  • Individual research papers from ACL Anthology (https://aclanthology.org/) to accompany weekly topics as appropriate.

Study hours

Scheduled activity hours
Lectures 11
Seminars 20
Independent study hours
Independent study 119

Teaching staff

Staff member Role
Dmitry Nikolaev Unit coordinator
Colin James Bannard Unit coordinator

Return to course details