In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

AUTOMATIC IDENTIFICATION OF TEXTUAL UNCERTAINTY

Zerva, Chrysoula

[Thesis]. Manchester, UK: The University of Manchester; 2019.

Access to files

Abstract

The exponential increase in published research progressively perplexes the navigation of existing literature and the search of specific information for researchers, rendering the incorporation of new knowledge increasingly difficult. Text mining, can aid in literature exploration, by processing vast document collections to extract and organise information of interest. This is of particular importance in the biomedical domain, where text mining methods can extract mentions of bio-molecular reactions and automatically incorporate them in pathway and interaction networks, thus contributing to their timely curation and maintenance. However, current methods tend to ignore the context of extracted interaction mentions, and treat them all as equally certain, overlooking speculative statements, hypotheses and admission of ignorance. To address this problem, we investigate the use of textual uncertainty in biomedical literature, and propose novel methods to identify the (un)certainty value of extracted statements. We study to which extent, such values, representing the confidence of the author in a statement (and thus the inferred certainty of the statement itself), can be used to provide a more informative weighting of extracted knowledge. Focusing on the biomedical use case, we propose an approach to accurately identify uncertainty values for the mentions of interaction identified in different documents. We subsequently use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction, and obtain a consolidated confidence score. Throughout this work, we validated the output of our methods against the judgement of researchers in bio-medicine. We thus confirmed that our methodology for inferring an overall interaction score can approximate well the scores attributed by researchers. We demonstrate the usability of textual uncertainty in the biomedical context, by integrating it as a confidence filter in a pilot interactive interface, providing literature-aided pathway visualisation. We thus illustrate, that, along with other literature-based confidence filters, textual uncertainty can help researchers explore and discover interactions of interest. % The aim of the thesis is to investigate the use of uncertainty in written language, with an emphasis in scientific writing. The thesis explores the practical aspects of assessing uncertainty of extracted statements and ranking information accordingly, as well as the theoretical foundations and linguistic patterns of expressed uncertainty in text. It is shown that automated uncertainty identification can prove to be a valuable tool in attempting to extract and process vast amounts of information from raw text, by enabling more accurate and targeted acquisition and integration of new knowledge.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science (CDT)
Publication date:
Location:
Manchester, UK
Total pages:
422
Abstract:
The exponential increase in published research progressively perplexes the navigation of existing literature and the search of specific information for researchers, rendering the incorporation of new knowledge increasingly difficult. Text mining, can aid in literature exploration, by processing vast document collections to extract and organise information of interest. This is of particular importance in the biomedical domain, where text mining methods can extract mentions of bio-molecular reactions and automatically incorporate them in pathway and interaction networks, thus contributing to their timely curation and maintenance. However, current methods tend to ignore the context of extracted interaction mentions, and treat them all as equally certain, overlooking speculative statements, hypotheses and admission of ignorance. To address this problem, we investigate the use of textual uncertainty in biomedical literature, and propose novel methods to identify the (un)certainty value of extracted statements. We study to which extent, such values, representing the confidence of the author in a statement (and thus the inferred certainty of the statement itself), can be used to provide a more informative weighting of extracted knowledge. Focusing on the biomedical use case, we propose an approach to accurately identify uncertainty values for the mentions of interaction identified in different documents. We subsequently use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction, and obtain a consolidated confidence score. Throughout this work, we validated the output of our methods against the judgement of researchers in bio-medicine. We thus confirmed that our methodology for inferring an overall interaction score can approximate well the scores attributed by researchers. We demonstrate the usability of textual uncertainty in the biomedical context, by integrating it as a confidence filter in a pilot interactive interface, providing literature-aided pathway visualisation. We thus illustrate, that, along with other literature-based confidence filters, textual uncertainty can help researchers explore and discover interactions of interest. % The aim of the thesis is to investigate the use of uncertainty in written language, with an emphasis in scientific writing. The thesis explores the practical aspects of assessing uncertainty of extracted statements and ranking information accordingly, as well as the theoretical foundations and linguistic patterns of expressed uncertainty in text. It is shown that automated uncertainty identification can prove to be a valuable tool in attempting to extract and process vast amounts of information from raw text, by enabling more accurate and targeted acquisition and integration of new knowledge.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:318549
Created by:
Zerva, Chrysoula
Created:
26th February, 2019, 20:53:05
Last modified by:
Zerva, Chrysoula
Last modified:
6th March, 2019, 11:31:41

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.