In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

A General Framework for Building Accurate and Understandable Genomic Models: A Study in Rice (Oryza Sativa)

Orhobor, Oghenejokpeme Israel

[Thesis]. Manchester, UK: The University of Manchester; 2019.

Access to files

Abstract

Rapid technological advances in genotyping and sequencing technologies are driving the generation of vast amounts of genomic data. These advancements present a unique opportunity to improve our understanding of the environmental and genetic mechanisms that give rise to phenotypes. This data is technically hard to analyse because there are many attributes (often in the order of a million), and vast quantities of background knowledge is relevant. Genotype data are most commonly used in genomic models to identify genetic regions which control phenotypes and to predict the likelihood that members of a population will produce progeny with particular phenotypes. However, most of the data may be irrelevant for certain phenotypes, leading to suboptimal, difficult to understand models. To meet this challenge, we propose a three-stage general framework that incorporates background knowledge in its model building processes by applying feature stability, inductive logic programming (ILP), and meta-learning. In the first stage of the framework, we identify associated markers using marker stability rather than traditional mixed models. In the second stage we formalise the identified frequent patterns and additional background knowledge as predicates in first order logic, and using an ILP engine we identify frequent patterns which correspond to genetic configurations that are associated with a trait. Finally, the identified frequent patterns in the previous stage are used as additional data for phenotype prediction. We demonstrate that this framework (1) significantly outperforms the state-of-the-art in identifying associated genomic regions, (2) identifies relevant genetic configurations, and (3) improves overall phenotype prediction, using a diverse Rice (Oryza sativa) population.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science (CDT)
Publication date:
Location:
Manchester, UK
Total pages:
161
Abstract:
Rapid technological advances in genotyping and sequencing technologies are driving the generation of vast amounts of genomic data. These advancements present a unique opportunity to improve our understanding of the environmental and genetic mechanisms that give rise to phenotypes. This data is technically hard to analyse because there are many attributes (often in the order of a million), and vast quantities of background knowledge is relevant. Genotype data are most commonly used in genomic models to identify genetic regions which control phenotypes and to predict the likelihood that members of a population will produce progeny with particular phenotypes. However, most of the data may be irrelevant for certain phenotypes, leading to suboptimal, difficult to understand models. To meet this challenge, we propose a three-stage general framework that incorporates background knowledge in its model building processes by applying feature stability, inductive logic programming (ILP), and meta-learning. In the first stage of the framework, we identify associated markers using marker stability rather than traditional mixed models. In the second stage we formalise the identified frequent patterns and additional background knowledge as predicates in first order logic, and using an ILP engine we identify frequent patterns which correspond to genetic configurations that are associated with a trait. Finally, the identified frequent patterns in the previous stage are used as additional data for phenotype prediction. We demonstrate that this framework (1) significantly outperforms the state-of-the-art in identifying associated genomic regions, (2) identifies relevant genetic configurations, and (3) improves overall phenotype prediction, using a diverse Rice (Oryza sativa) population.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:318296
Created by:
Orhobor, Oghenejokpeme
Created:
7th February, 2019, 09:27:15
Last modified by:
Orhobor, Oghenejokpeme
Last modified:
8th February, 2019, 13:28:13

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.