In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Data-driven Temporal Information Extraction with Applications in General and Clinical Domains

Filannino, Michele

[Thesis]. Manchester, UK: The University of Manchester; 2016.

Access to files

Abstract

The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. However, Temporal Information Extraction (TIE) is a challenging task because of the amount of types of expressions (durations, frequencies, times, dates) and their high morphological variability and ambiguity. As far as the approaches are concerned, the most common among the existing ones is rule-based, while data-driven ones are under-explored.This thesis introduces a novel domain-independent data-driven TIE strategy. The identification strategy is based on machine learning sequence labelling classifiers on features selected through an extensive exploration. Results are further optimised using an a posteriori label-adjustment pipeline. The normalisation strategy is rule-based and builds on a pre-existing system.The methodology has been applied to both specific (clinical) and generic domain, and has been officially benchmarked at the i2b2/2012 and TempEval-3 challenges, ranking respectively 3rd and 1st. The results prove the TIE task to be more challenging in the clinical domain (overall accuracy 63%) rather than in the general domain (overall accuracy 69%).Finally, this thesis also presents two applications of TIE. One of them introduces the concept of temporal footprint of a Wikipedia article, and uses it to mine the life span of persons. In the other case, TIE techniques are used to improve pre-existing information retrieval systems by filtering out temporally irrelevant results.

Layman's Abstract

The human brain has evolved to master, among the others, the ability of dealing with time. People are naturally able to interpret the temporal side of speech or text, and use this knowledge to work out new insights and discoveries. Making computers mimicking such capability has become imperative to deal with the information overload.Automatic temporal information analysis is a challenging task in Text Mining (TM). This analysis makes knowledge extraction faster in different orders of magnitude and it enhances the exhibited intelligence of pre-existing natural language-based systems.This thesis presents a data-driven TIE methodology which improves the stateof-the-art performance on two types of data: general and clinical. In the clinical domain TIE has proved to be crucial because of its applications, for example summarisation, visualisation of patients’ clinical pathways, disease progression modelling and analysis of the effectiveness of treatments, to mention a few. Novel applications of TIE systems are also presented. In one case, by temporally analysing people’s Wikipedia pages, it is now possible to predict their life span on the time-line. In the other case, the temporal analysis has been shown to improve information retrieval systems by filtering out documents which are not temporally relevant according to the users’ queries.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science (CDT)
Publication date:
Location:
Manchester, UK
Total pages:
233
Abstract:
The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. However, Temporal Information Extraction (TIE) is a challenging task because of the amount of types of expressions (durations, frequencies, times, dates) and their high morphological variability and ambiguity. As far as the approaches are concerned, the most common among the existing ones is rule-based, while data-driven ones are under-explored.This thesis introduces a novel domain-independent data-driven TIE strategy. The identification strategy is based on machine learning sequence labelling classifiers on features selected through an extensive exploration. Results are further optimised using an a posteriori label-adjustment pipeline. The normalisation strategy is rule-based and builds on a pre-existing system.The methodology has been applied to both specific (clinical) and generic domain, and has been officially benchmarked at the i2b2/2012 and TempEval-3 challenges, ranking respectively 3rd and 1st. The results prove the TIE task to be more challenging in the clinical domain (overall accuracy 63%) rather than in the general domain (overall accuracy 69%).Finally, this thesis also presents two applications of TIE. One of them introduces the concept of temporal footprint of a Wikipedia article, and uses it to mine the life span of persons. In the other case, TIE techniques are used to improve pre-existing information retrieval systems by filtering out temporally irrelevant results.
Layman's abstract:
The human brain has evolved to master, among the others, the ability of dealing with time. People are naturally able to interpret the temporal side of speech or text, and use this knowledge to work out new insights and discoveries. Making computers mimicking such capability has become imperative to deal with the information overload.Automatic temporal information analysis is a challenging task in Text Mining (TM). This analysis makes knowledge extraction faster in different orders of magnitude and it enhances the exhibited intelligence of pre-existing natural language-based systems.This thesis presents a data-driven TIE methodology which improves the stateof-the-art performance on two types of data: general and clinical. In the clinical domain TIE has proved to be crucial because of its applications, for example summarisation, visualisation of patients’ clinical pathways, disease progression modelling and analysis of the effectiveness of treatments, to mention a few. Novel applications of TIE systems are also presented. In one case, by temporally analysing people’s Wikipedia pages, it is now possible to predict their life span on the time-line. In the other case, the temporal analysis has been shown to improve information retrieval systems by filtering out documents which are not temporally relevant according to the users’ queries.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:296972
Created by:
Filannino, Michele
Created:
11th February, 2016, 10:59:11
Last modified by:
Filannino, Michele
Last modified:
1st December, 2017, 09:08:30

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.