In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Deep Learning Uncovers Genomic Features of Cell-type and State

    Phuycharoen, Mike

    [Thesis]. Manchester, UK: The University of Manchester; 2019.

    Access to files

    Abstract

    Genomic and epigenomic data are being obtained experimentally at an ever-increasing rate. As datasets become easier and cheaper to collect, computational methods allowing their interpretation and integration gain in importance. This thesis addresses the problem of using omic data to identify functional elements in DNA sequences with machine learning. In particular, convolutional neural networks are used to identify binding sites of transcription factor proteins (TFs), as well as features of chromatin accessibility in a set of mouse and human cell types. Deep learning attribution methods can provide explanations for model predictions, and performance of different approaches is evaluated. Two main systems are analysed. The problem of differential and cooperative TF binding is illustrated in mouse branchial arch tissues, where MEIS TFs are known to co-bind with HOX to regulate tissue-specific developmental programmes. It is shown that deep neural networks outperform other commonly used computational methods in predicting binding of HOXA2 from differential MEIS data. Novel applications for indirect regularisation with data are introduced, allowing classification of small datasets. Secondly, a short time series of chromatin accessibility is modelled after immune stimulation in human CD4+T cells. Sequence features characteristic of different dynamic trajectories are identified. An unsupervised approach is introduced for obtaining differential features without a priori class specification along with a semi-supervised method for removal of replicate bias from the differential metric. The methods are used in two more systems in mouse: MEF2D binding across three tissues, and OCT4 binding in embryonic stem cells. Deep learning models presented in this work show substantial improvements over k-mer counting and SVMs, and provide important motivation for further development of machine learning methods for genomic analysis.

    Bibliographic metadata

    Type of resource:
    Content type:
    Form of thesis:
    Type of submission:
    Degree type:
    Doctor of Philosophy
    Degree programme:
    PhD Computer Science (CDT)
    Publication date:
    Location:
    Manchester, UK
    Total pages:
    167
    Abstract:
    Genomic and epigenomic data are being obtained experimentally at an ever-increasing rate. As datasets become easier and cheaper to collect, computational methods allowing their interpretation and integration gain in importance. This thesis addresses the problem of using omic data to identify functional elements in DNA sequences with machine learning. In particular, convolutional neural networks are used to identify binding sites of transcription factor proteins (TFs), as well as features of chromatin accessibility in a set of mouse and human cell types. Deep learning attribution methods can provide explanations for model predictions, and performance of different approaches is evaluated. Two main systems are analysed. The problem of differential and cooperative TF binding is illustrated in mouse branchial arch tissues, where MEIS TFs are known to co-bind with HOX to regulate tissue-specific developmental programmes. It is shown that deep neural networks outperform other commonly used computational methods in predicting binding of HOXA2 from differential MEIS data. Novel applications for indirect regularisation with data are introduced, allowing classification of small datasets. Secondly, a short time series of chromatin accessibility is modelled after immune stimulation in human CD4+T cells. Sequence features characteristic of different dynamic trajectories are identified. An unsupervised approach is introduced for obtaining differential features without a priori class specification along with a semi-supervised method for removal of replicate bias from the differential metric. The methods are used in two more systems in mouse: MEF2D binding across three tissues, and OCT4 binding in embryonic stem cells. Deep learning models presented in this work show substantial improvements over k-mer counting and SVMs, and provide important motivation for further development of machine learning methods for genomic analysis.
    Thesis main supervisor(s):
    Thesis co-supervisor(s):
    Language:
    en

    Institutional metadata

    University researcher(s):

    Record metadata

    Manchester eScholar ID:
    uk-ac-man-scw:322862
    Created by:
    Phuycharoen, Mike
    Created:
    17th December, 2019, 14:12:51
    Last modified by:
    Phuycharoen, Mike
    Last modified:
    23rd December, 2019, 12:17:43

    Can we help?

    The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.