In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Academic department(s)

    Using large-scale text mining for a systematic reconstruction of molecular mechanisms of diseases: a case study in thyroid cancer

    Wu, Chengkun

    [Thesis]. Manchester, UK: The University of Manchester; 2014.

    Access to files

    Abstract

    Information about genes and pathways involved in a disease is usually 'buried' in scientific literature, making it difficult to perform systematic studies for a comprehensive understanding. Text mining has provided opportunities to retrieve and extract most relevant information from literature, and thus might enable collecting and exploring relevant data to a certain disease systematically. This thesis aims to develop a text-mining pipeline that can identify genes and pathways involved in the given disease and extract their interacting patterns. As a case study, we used thyroid cancer, which is a type of cancer with increasing incidence and characterised by multiple heterogeneous subtypes.Firstly, pathways are essential to many systems biology studies, but pathways have not been the focus of text mining. To address this problem, we have designed and implemented an automated method, called PathNER, for mining pathway mentions from literature. PathNER employs soft dictionary matching and rules that can detect pathway mentions from either abstracts or full-texts with an F1-score of 84%. When used on a large-scale, PathNER was able to identify disease-associated pathways that have been “neglected” by the manually created databases. PathNER can also help prioritise pathways that should be curated. A disease subtype classification method for the thyroid cancer literature was designed, which assigns subtype labels to articles. The classification method achieved a micro-average F1-score of 85.9% for primary subtypes. Utilising gene recognition tools and PathNER, we extracted genes and pathways (molecular profiling) associated with different thyroid cancer subtypes. The results have demonstrated the ability to complement existing annotated databases or most recent review articles in terms of coverage and efficiency, providing a basis for a systematic understanding of thyroid cancer. Finally, we have expanded an existing state-of-the-art text-mining system that extracts molecular events to develop PWTEES, which incorporates mentions of pathways. PWTEES achieved an F1-score of 59% and it can be utilised to assist semi-automated curation. We then applied PWTEES to generate a systematic profile of molecular interactions of thyroid cancer from over 30,000 articles. We integrated the molecular details of pathways from curated databases and created our interaction networks taking genes and pathways as nodes. Subsequent network analysis provides additional and more comprehensive perspectives of understanding existing data.The main outcome of the thesis is a systematic and efficient pipeline that can identify key pathways and genes that characterise a disease (molecular profile) and build networks of their interactions (interactome). While we used TC as a case study, the work presented here can be applied to other diseases for systematic studies and assisted curation efforts.

    Additional content not available electronically

    CD-ROM containing supplementary data files submitted in pocket inside back cover of print version of thesis

    Bibliographic metadata

    Type of resource:
    Content type:
    Form of thesis:
    Type of submission:
    Degree programme:
    PhD DTC Systems Biology (FLS)
    Publication date:
    Location:
    Manchester, UK
    Total pages:
    130
    Abstract:
    Information about genes and pathways involved in a disease is usually 'buried' in scientific literature, making it difficult to perform systematic studies for a comprehensive understanding. Text mining has provided opportunities to retrieve and extract most relevant information from literature, and thus might enable collecting and exploring relevant data to a certain disease systematically. This thesis aims to develop a text-mining pipeline that can identify genes and pathways involved in the given disease and extract their interacting patterns. As a case study, we used thyroid cancer, which is a type of cancer with increasing incidence and characterised by multiple heterogeneous subtypes.Firstly, pathways are essential to many systems biology studies, but pathways have not been the focus of text mining. To address this problem, we have designed and implemented an automated method, called PathNER, for mining pathway mentions from literature. PathNER employs soft dictionary matching and rules that can detect pathway mentions from either abstracts or full-texts with an F1-score of 84%. When used on a large-scale, PathNER was able to identify disease-associated pathways that have been “neglected” by the manually created databases. PathNER can also help prioritise pathways that should be curated. A disease subtype classification method for the thyroid cancer literature was designed, which assigns subtype labels to articles. The classification method achieved a micro-average F1-score of 85.9% for primary subtypes. Utilising gene recognition tools and PathNER, we extracted genes and pathways (molecular profiling) associated with different thyroid cancer subtypes. The results have demonstrated the ability to complement existing annotated databases or most recent review articles in terms of coverage and efficiency, providing a basis for a systematic understanding of thyroid cancer. Finally, we have expanded an existing state-of-the-art text-mining system that extracts molecular events to develop PWTEES, which incorporates mentions of pathways. PWTEES achieved an F1-score of 59% and it can be utilised to assist semi-automated curation. We then applied PWTEES to generate a systematic profile of molecular interactions of thyroid cancer from over 30,000 articles. We integrated the molecular details of pathways from curated databases and created our interaction networks taking genes and pathways as nodes. Subsequent network analysis provides additional and more comprehensive perspectives of understanding existing data.The main outcome of the thesis is a systematic and efficient pipeline that can identify key pathways and genes that characterise a disease (molecular profile) and build networks of their interactions (interactome). While we used TC as a case study, the work presented here can be applied to other diseases for systematic studies and assisted curation efforts.
    Additional digital content not deposited electronically:
    CD-ROM containing supplementary data files submitted in pocket inside back cover of print version of thesis
    Thesis main supervisor(s):
    Thesis co-supervisor(s):
    Language:
    en

    Institutional metadata

    University researcher(s):
    Academic department(s):

    Record metadata

    Manchester eScholar ID:
    uk-ac-man-scw:241289
    Created by:
    Wu, Chengkun
    Created:
    29th November, 2014, 14:27:44
    Last modified by:
    Wu, Chengkun
    Last modified:
    16th November, 2017, 14:24:14

    Can we help?

    The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.