
Course unit details:
Text Mining
Unit code | COMP61332 |
---|---|
Credit rating | 15 |
Unit level | FHEQ level 7 – master's degree or fourth year of an integrated master's degree |
Teaching period(s) | Semester 2 |
Available as a free choice unit? | Yes |
Overview
Aims
This course unit aims to provide students with an understanding of principles, issues, techniques and solutions connected with text mining, and to enable them to gain knowledge of how recent advances in text mining relate to innovative approaches to organising, characterising, finding and exploiting large scale textual information in the search for new knowledge.
Learning outcomes
- To compare and contrast methods for sentence segmentation, tokenisation, part-of-speech tagging, syntactic parsing and semantic representation
To apply techniques such as named entity recognition, entity linking, relation and event extraction to extract information from text, while leveraging resources such as lexical and semantic resources (e.g. Framenet, VerbNet, WordNet), and terminological repositories
To design and customise text annotation workflows, taking into consideration various annotation formats
To explain how text mining supports the development of semantic search systems
To explain the distributional hypothesis, and to compare with each other (1) count-based and (2) compositional distributional semantics models
To apply various evaluation measures (e.g., Kappa, recall, precision and F-score)
To investigate methods for social media content analysis
Syllabus
Introduction: background, motivation, dealing with information overload and information overlook, unstructured vs. (semi-)structured data, evolving information needs and knowledge management issues, enhancing user experience of information provision and seeking, the business case for text mining.
The text mining pipeline: information retrieval, information extraction and data mining.
Fundamentals of natural language processing: linguistic foundations, levels of linguistic analysis.
Approaches to text mining: rule-based vs. machine learning based vs. hybrid; generic vs. domain specific; domain adaptation.
Dealing with real text: text types, document formats and conversion, character encodings, markup, low-level processes (sentence splitting, tokenisation, part of speech tagging, chunking).
Information extraction: term extraction, named entity recognition, relation extraction, fact and event extraction; partial analysis vs. full analysis.
Data mining and visualisation of results from text mining.
Evaluation of text mining systems: evaluation measures, role of evaluation challenges, usability evaluation.
Resources for text mining: annotated corpora, computational lexica, ontologies, computational grammars; design, construction and use issues.
Issues in large scale processing of text: distributed text mining, scalable text mining systems.
A sampler of text mining applications and services; case studies.
Teaching and learning methods
Lectures
15 hours of lectures.
Laboratories
15 hours of labs.
Consultation
5 hours of consultation
Transferable skills and personal qualities
Employability skills
- Analytical skills
- Problem solving
- Research
Employability skills
- Analytical skills
- Problem solving
- Research
Assessment methods
Method | Weight |
---|---|
Written exam | 50% |
Written assignment (inc essay) | 50% |
Feedback methods
- Oral feedback in class.
- Email.
- Course Web site.
Study hours
Scheduled activity hours | |
---|---|
Assessment written exam | 2 |
Lectures | 15 |
Practical classes & workshops | 20 |
Independent study hours | |
---|---|
Independent study | 113 |
Teaching staff
Staff member | Role |
---|---|
Riza Batista-Navarro | Unit coordinator |