MSc Data Science (Computer Science Data Informatics)

Year of entry: 2025

Course unit details:
Transforming Text Into Meaning

Course unit fact file
Unit code COMP64702
Credit rating 15
Unit level FHEQ level 7 – master's degree or fourth year of an integrated master's degree
Teaching period(s) Semester 2
Available as a free choice unit? No

Overview

Textual data contain information and meaning which are not readily accessible or searchable due to the unstructured nature of natural language. Machines have been increasingly used to automate the analysis, understanding and extraction of information buried within (often large amounts of) textual data. Enabling machines to automatically transform unstructured text into meaningful information requires the study, development and application of various natural language processing and text mining approaches.


This unit will equip students with the knowledge, skills and techniques necessary to: design state-of-the-art computational approaches for analysing the meaning of natural language; develop traditional machine learning-based and deep learning-based methods for NLP tasks in a responsible and ethical manner; and apply such methods on large-scale textual datasets drawn from various domains, leading to the extraction if not creation of insights and knowledge.

Pre/co-requisites

Unit title Unit code Requirement type Description
Topics in Machine Learning COMP64501 Pre-Requisite Recommended

Students should have some prior background in machine learning.

Aims

This course unit aims to: (1) introduce students to core concepts in natural language processing (NLP); (2) enable students to develop traditional machine learning-based and state-of-the-art deep learning-based approaches for various NLP tasks; and (3) provide students with an understanding of techniques underpinning text mining, that will facilitate the development of solutions for characterising, searching, analysing and exploiting large-scale textual data in the search for new knowledge.

Learning outcomes

1. Discuss techniques for pre-processing written natural language: sentence segmentation, tokenisation, lemmatisation and part-of-speech tagging.

2. Explain approaches to parsing and distinguish between different sentence structure representations.

3. Compare and contrast different types of language models and vector-based representations of words.

4. Design traditional and state-of-the-art approaches (e.g., transformer-based language models) to NLP tasks such as sequence classification, sequence labelling and span extraction.

5. Systematically evaluate and compare the performance of approaches to NLP tasks with the use of standard metrics and carefully selected datasets.

6. Develop and apply—as a team—various NLP methods to extract information and meaning from large-scale textual data in domains such as news, biomedicine, healthcare, law and social media.

7. Demonstrate responsible and ethical practices for developing, deploying and evaluating NLP methods, and for reporting their results.

Syllabus

1. Pre-processing of written natural language: sentence segmentation, tokenisation, lemmatisation and part-of-speech tagging.

2. Sentence structure representations: parsing.

3. Word representations: vector-based representations and language models.

4. Traditional and state-of-the-art approaches: machine learning and transformer-based models for various NLP tasks such as sequence classification, sequence labelling and span extraction.

5. Reliable data: collection and annotation of datasets and measuring their reliability.

6. Evaluation techniques: the use of standard metrics.

7. Method development and application: extracting information and meaning from large-scale textual data in various domains.

8. Responsible and ethical practices for developing, deploying and evaluating NLP methods, and for reporting their results.

Teaching and learning methods

Weekly asynchronous materials will be provided to students in the form of either pre-recorded video lectures or directed reading assignments.


Weekly synchronous lectures will consolidate learning from asynchronous material; these include a mix of interactive presentations (by staff) highlighting key concepts, Q&A sessions and formative quizzes.


Fortnightly labs will provide students with opportunities to ask questions, carry out hands-on exercises and work on their coursework as a group, including the preparation of a topic proposal, implementation of a solution and drafting a report.


Online discussions via an e-learning environment will help address student queries.

Employability skills

Analytical skills
Group/team working
Innovation/creativity
Project management
Oral communication
Problem solving
Research
Written communication

Assessment methods

Method Weight
Written exam 50%
Report 20%
Project output (not diss/n) 25%
Oral assessment/presentation 5%

Feedback methods

Weekly formative quizzes: cohort-level feedback in weekly synchronous lectures and immediate feedback in e-learning environment.

Coursework (Topic proposal oral presentation, Implementation, Report): feedback during formative fortnightly labs and individual feedback during/after marking.

Exam: cohort-level feedback after marking.

Recommended reading

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Feldman and Sanger, 2007.


An Introduction to Text Mining: Research Design, Data Collection, and Analysis, Ignatow and Mihalcea, 2017.


Speech and Language Processing (3rd ed.), Jurafsky and Martin, 2023.


Deep Learning for Natural Language Processing, Palash Goyal et al., Apress, 2018.

Study hours

Scheduled activity hours
Assessment written exam 2
Lectures 10
Supervised time in studio/wksp 10
Independent study hours
Independent study 128

Teaching staff

Staff member Role
Chenghua Lin Unit coordinator

Additional notes

Independent Study Hours:
•    Pre-recorded videos and/or directed reading (20 hours)
•    Assessed coursework (50 hours)
•    Weekly revision and exam revision (58 hours)
 

Additional information about assessment:
•    Oral assessment/presentation: Oral presentation of coursework topic proposal
•    Project output: Coursework code implementation and documentation
•    Report: Short academic paper describing the coursework
 

Grouping of coursework:
•    The coursework is attempted as a group of self-organised 3-4 students.

 

Return to course details