
Course unit details:
Transforming Text Into Meaning
Unit code | COMP64702 |
---|---|
Credit rating | 15 |
Unit level | FHEQ level 7 – master's degree or fourth year of an integrated master's degree |
Teaching period(s) | Semester 2 |
Available as a free choice unit? | No |
Overview
Textual data contain information and meaning which are not readily accessible or searchable due to the unstructured nature of natural language. Machines have been increasingly used to automate the analysis, understanding and extraction of information buried within (often large amounts of) textual data. Enabling machines to automatically transform unstructured text into meaningful information requires the study, development and application of various natural language processing and text mining approaches.
This unit will equip students with the knowledge, skills and techniques necessary to: design state-of-the-art computational approaches for analysing the meaning of natural language; develop traditional machine learning-based and deep learning-based methods for NLP tasks in a responsible and ethical manner; and apply such methods on large-scale textual datasets drawn from various domains, leading to the extraction if not creation of insights and knowledge.
Pre/co-requisites
Unit title | Unit code | Requirement type | Description |
---|---|---|---|
Topics in Machine Learning | COMP64501 | Pre-Requisite | Recommended |
Students should have some prior background in machine learning.
Aims
This course unit aims to: (1) introduce students to core concepts in natural language processing (NLP); (2) enable students to develop traditional machine learning-based and state-of-the-art deep learning-based approaches for various NLP tasks; and (3) provide students with an understanding of techniques underpinning text mining, that will facilitate the development of solutions for characterising, searching, analysing and exploiting large-scale textual data in the search for new knowledge.
Learning outcomes
1. Discuss techniques for pre-processing written natural language: sentence segmentation, tokenisation, lemmatisation and part-of-speech tagging.
2. Explain approaches to parsing and distinguish between different sentence structure representations.
3. Compare and contrast different types of language models and vector-based representations of words.
4. Design traditional and state-of-the-art approaches (e.g., transformer-based language models) to NLP tasks such as sequence classification, sequence labelling and span extraction.
5. Systematically evaluate and compare the performance of approaches to NLP tasks with the use of standard metrics and carefully selected datasets.
6. Develop and apply—as a team—various NLP methods to extract information and meaning from large-scale textual data in domains such as news, biomedicine, healthcare, law and social media.
7. Demonstrate responsible and ethical practices for developing, deploying and evaluating NLP methods, and for reporting their results.
Syllabus
1. Pre-processing of written natural language: sentence segmentation, tokenisation, lemmatisation and part-of-speech tagging.
2. Sentence structure representations: parsing.
3. Word representations: vector-based representations and language models.
4. Traditional and state-of-the-art approaches: machine learning and transformer-based models for various NLP tasks such as sequence classification, sequence labelling and span extraction.
5. Reliable data: collection and annotation of datasets and measuring their reliability.
6. Evaluation techniques: the use of standard metrics.
7. Method development and application: extracting information and meaning from large-scale textual data in various domains.
8. Responsible and ethical practices for developing, deploying and evaluating NLP methods, and for reporting their results.
Teaching and learning methods
Weekly asynchronous materials will be provided to students in the form of either pre-recorded video lectures or directed reading assignments.
Weekly synchronous lectures will consolidate learning from asynchronous material; these include a mix of interactive presentations (by staff) highlighting key concepts, Q&A sessions and formative quizzes.
Fortnightly labs will provide students with opportunities to ask questions, carry out hands-on exercises and work on their coursework as a group, including the preparation of a topic proposal, implementation of a solution and drafting a report.
Online discussions via an e-learning environment will help address student queries.
Employability skills
- Analytical skills
- Group/team working
- Innovation/creativity
- Project management
- Oral communication
- Problem solving
- Research
- Written communication
Assessment methods
Method | Weight |
---|---|
Written exam | 50% |
Report | 20% |
Project output (not diss/n) | 25% |
Oral assessment/presentation | 5% |
Feedback methods
Weekly formative quizzes: cohort-level feedback in weekly synchronous lectures and immediate feedback in e-learning environment.
Coursework (Topic proposal oral presentation, Implementation, Report): feedback during formative fortnightly labs and individual feedback during/after marking.
Exam: cohort-level feedback after marking.
Recommended reading
The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Feldman and Sanger, 2007.
An Introduction to Text Mining: Research Design, Data Collection, and Analysis, Ignatow and Mihalcea, 2017.
Speech and Language Processing (3rd ed.), Jurafsky and Martin, 2023.
Deep Learning for Natural Language Processing, Palash Goyal et al., Apress, 2018.
Study hours
Scheduled activity hours | |
---|---|
Assessment written exam | 2 |
Lectures | 10 |
Supervised time in studio/wksp | 10 |
Independent study hours | |
---|---|
Independent study | 128 |
Teaching staff
Staff member | Role |
---|---|
Chenghua Lin | Unit coordinator |
Additional notes
Independent Study Hours:
• Pre-recorded videos and/or directed reading (20 hours)
• Assessed coursework (50 hours)
• Weekly revision and exam revision (58 hours)
Additional information about assessment:
• Oral assessment/presentation: Oral presentation of coursework topic proposal
• Project output: Coursework code implementation and documentation
• Report: Short academic paper describing the coursework
Grouping of coursework:
• The coursework is attempted as a group of self-organised 3-4 students.