MEng Computer Systems Engineering
Year of entry: 2021
- View tabs
- View full page
Course unit details:
|Unit level||FHEQ level 7 – master's degree or fourth year of an integrated master's degree|
|Teaching period(s)||Semester 2|
|Offered by||Department of Computer Science|
|Available as a free choice unit?||Yes|
This course unit aims to provide students with an understanding of principles, issues, techniques and solutions connected with text mining, and to enable them to gain knowledge of how recent advances in text mining relate to innovative approaches to organising, characterising, finding and exploiting large scale textual information in the search for new knowledge.
To compare and contrast methods for sentence segmentation, tokenisation, part-of-speech tagging, syntactic parsing and semantic representation
To apply techniques such as named entity recognition, entity linking, relation and event extraction to extract information from text, while leveraging resources such as lexical and semantic resources (e.g. Framenet, VerbNet, WordNet), and terminological repositories
To design and customise text annotation workflows, taking into consideration various annotation formats
To explain how text mining supports the development of semantic search systems
To explain the distributional hypothesis, and to compare with each other (1) count-based and (2) compositional distributional semantics models
To apply various evaluation measures (e.g., Kappa, recall, precision and F-score)
To investigate methods for social media content analysis
Introduction: background, motivation, dealing with information overload and information overlook, unstructured vs. (semi-)structured data, evolving information needs and knowledge management issues, enhancing user experience of information provision and seeking, the business case for text mining.
The text mining pipeline: information retrieval, information extraction and data mining.
Fundamentals of natural language processing: linguistic foundations, levels of linguistic analysis.
Approaches to text mining: rule-based vs. machine learning based vs. hybrid; generic vs. domain specific; domain adaptation.
Dealing with real text: text types, document formats and conversion, character encodings, markup, low-level processes (sentence splitting, tokenisation, part of speech tagging, chunking).
Information extraction: term extraction, named entity recognition, relation extraction, fact and event extraction; partial analysis vs. full analysis.
Data mining and visualisation of results from text mining.
Evaluation of text mining systems: evaluation measures, role of evaluation challenges, usability evaluation.
Resources for text mining: annotated corpora, computational lexica, ontologies, computational grammars; design, construction and use issues.
Issues in large scale processing of text: distributed text mining, scalable text mining systems.
A sampler of text mining applications and services; case studies.
Teaching and learning methods
15 hours of lectures.
15 hours of labs.
5 hours of consultation
Transferable skills and personal qualities
- Analytical skills
- Problem solving
- Analytical skills
- Problem solving
|Written assignment (inc essay)||50%|
- Oral feedback in class.
- Course Web site.
COMP61332 reading list can be found on the Department of Computer Science website for current students.
|Scheduled activity hours|
|Assessment written exam||2|
|Practical classes & workshops||20|
|Independent study hours|
|John McNaught||Unit coordinator|
Course unit materials
Links to course unit teaching materials can be found on the School of Computer Science website for current students.