In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Academic department(s)

    Text mining molecular interactions and their context for studying disease

    Jamieson, Daniel

    [Thesis]. Manchester, UK: The University of Manchester; 2014.

    Access to files

    Abstract

    Molecular interactions enable us to understand the complexity of the human living system and how it can be exploited or malfunction to cause disease. The biomedical literature presents detailed knowledge of molecular functions and therefore represents a valuable reservoir of data for studying disease. However, extracting this data efficiently is difficult as it is spread over millions of publications in text that is not machine-readable. In this thesis we investigate how text mining can be used to automatically extract data for molecular interactions and their context relevant to disease. We focus on two globally relevant classes of diseases of which manifest from contrasting mechanisms: pain-related diseases and diseases caused by pathogenic organisms. Using HIV-1 as a case study, we first show that text mining can be used to partially recreate a large, manually curated database of HIV-1-human molecular interactions derived from the literature. We highlight both weaknesses in the quality of the data produced by the text-mining approach and strengths in it being able to extract this data rapidly, identifying instances missed in the manual curation and its potential as a support tool. We then expand on this approach by showing how an entirely new database of protein interactions relevant to pain can be created efficiently and accurately using text mining to generate the data and manual curation to validate the data quality. The following chapter then presents an analysis of 1,002 unique pain-related protein-protein interactions derived from this database, showing that it is of greater relevance to pain research than databases of pain interactions created from other common starting points. We highlight its value by, for example, identifying new drug repurposing opportunities and exploring differences in specific pain diseases using the contextual detail afforded by the text mining. Finally, we expand further on our approach to extracting molecular interactions from the literature, by showing how interactions between human proteins and pathogens can be curated across pathogenic organisms. We demonstrate how these techniques can be used to expand our knowledge of human pathogen related interaction data already stored in public databases, by identifying 42 new HIV-1-human molecular interactions, 108 new interactions between pathogen species and human proteins and 33 new human proteins that were found to interact with pathogens. Together, the results show that contexualised text mining, when supported by manual curation, can be used to extract molecular interactions for contrasting disease types in an efficient and accurate manner.

    Additional content not available electronically

    Supplementary files are provided on disk and with the the relevant publications online.

    Bibliographic metadata

    Type of resource:
    Content type:
    Form of thesis:
    Type of submission:
    Degree programme:
    PhD Bioinformatics
    Publication date:
    Location:
    Manchester, UK
    Total pages:
    183
    Abstract:
    Molecular interactions enable us to understand the complexity of the human living system and how it can be exploited or malfunction to cause disease. The biomedical literature presents detailed knowledge of molecular functions and therefore represents a valuable reservoir of data for studying disease. However, extracting this data efficiently is difficult as it is spread over millions of publications in text that is not machine-readable. In this thesis we investigate how text mining can be used to automatically extract data for molecular interactions and their context relevant to disease. We focus on two globally relevant classes of diseases of which manifest from contrasting mechanisms: pain-related diseases and diseases caused by pathogenic organisms. Using HIV-1 as a case study, we first show that text mining can be used to partially recreate a large, manually curated database of HIV-1-human molecular interactions derived from the literature. We highlight both weaknesses in the quality of the data produced by the text-mining approach and strengths in it being able to extract this data rapidly, identifying instances missed in the manual curation and its potential as a support tool. We then expand on this approach by showing how an entirely new database of protein interactions relevant to pain can be created efficiently and accurately using text mining to generate the data and manual curation to validate the data quality. The following chapter then presents an analysis of 1,002 unique pain-related protein-protein interactions derived from this database, showing that it is of greater relevance to pain research than databases of pain interactions created from other common starting points. We highlight its value by, for example, identifying new drug repurposing opportunities and exploring differences in specific pain diseases using the contextual detail afforded by the text mining. Finally, we expand further on our approach to extracting molecular interactions from the literature, by showing how interactions between human proteins and pathogens can be curated across pathogenic organisms. We demonstrate how these techniques can be used to expand our knowledge of human pathogen related interaction data already stored in public databases, by identifying 42 new HIV-1-human molecular interactions, 108 new interactions between pathogen species and human proteins and 33 new human proteins that were found to interact with pathogens. Together, the results show that contexualised text mining, when supported by manual curation, can be used to extract molecular interactions for contrasting disease types in an efficient and accurate manner.
    Additional digital content not deposited electronically:
    Supplementary files are provided on disk and with the the relevant publications online.
    Thesis main supervisor(s):
    Thesis co-supervisor(s):
    Funder(s):
    Language:
    en

    Institutional metadata

    University researcher(s):
    Academic department(s):

    Record metadata

    Manchester eScholar ID:
    uk-ac-man-scw:242593
    Created by:
    Jamieson, Daniel
    Created:
    8th December, 2014, 15:03:30
    Last modified by:
    Jamieson, Daniel
    Last modified:
    16th November, 2017, 14:24:29

    Can we help?

    The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.