In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Bayesian Methods for Gene Expression Analysis from High-Throughput Sequencing data

Glaus, Peter

[Thesis]. Manchester, UK: The University of Manchester; 2014.

Access to files

Abstract

We study the tasks of transcript expression quantification and differentialexpression analysis based on data from high-throughput sequencing of thetranscriptome (RNA-seq).In an RNA-seq experiment subsequences of nucleotides are sampled from atranscriptome specimen, producing millions of short reads. The reads can bemapped to a reference to determine the set of transcripts from which they weresequenced. We can measure the expression of transcripts in the specimen bydetermining the amount of reads that were sequenced from individualtranscripts.In this thesis we propose a new probabilistic method for inferring theexpression of transcripts from RNA-seq data. We use a generative model of thedata that can account for read errors, fragment length distribution andnon-uniform distribution of reads along transcripts. We apply the Bayesianinference approach, using the Gibbs sampling algorithm to sample from theposterior distribution of transcript expression. Producing the fulldistribution enables assessment of the uncertainty of the estimated expressionlevels.We also investigate the use of alternative inference techniques for thetranscript expression quantification. We apply a collapsed Variational Bayesalgorithm which can provide accurate estimates of mean expression faster thanthe Gibbs sampling algorithm.Building on the results from transcript expression quantification, we present anew method for the differential expression analysis. Our approach utilizes thefull posterior distribution of expression from multiple replicates in order todetect significant changes in abundance between different conditions. Themethod can be applied to differential expression analysis of both genes andtranscripts.We use the newly proposed methods to analyse real RNA-seq data and provideevaluation of their accuracy using synthetic datasets. We demonstrate theadvantages of our approach in comparisons with existing alternative approachesfor expression quantification and differential expression analysis.The methods are implemented in the BitSeq package, which is freely distributedunder an open-source license. Our methods can be accessed and used by otherresearchers for RNA-seq data analysis.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
193
Abstract:
We study the tasks of transcript expression quantification and differentialexpression analysis based on data from high-throughput sequencing of thetranscriptome (RNA-seq).In an RNA-seq experiment subsequences of nucleotides are sampled from atranscriptome specimen, producing millions of short reads. The reads can bemapped to a reference to determine the set of transcripts from which they weresequenced. We can measure the expression of transcripts in the specimen bydetermining the amount of reads that were sequenced from individualtranscripts.In this thesis we propose a new probabilistic method for inferring theexpression of transcripts from RNA-seq data. We use a generative model of thedata that can account for read errors, fragment length distribution andnon-uniform distribution of reads along transcripts. We apply the Bayesianinference approach, using the Gibbs sampling algorithm to sample from theposterior distribution of transcript expression. Producing the fulldistribution enables assessment of the uncertainty of the estimated expressionlevels.We also investigate the use of alternative inference techniques for thetranscript expression quantification. We apply a collapsed Variational Bayesalgorithm which can provide accurate estimates of mean expression faster thanthe Gibbs sampling algorithm.Building on the results from transcript expression quantification, we present anew method for the differential expression analysis. Our approach utilizes thefull posterior distribution of expression from multiple replicates in order todetect significant changes in abundance between different conditions. Themethod can be applied to differential expression analysis of both genes andtranscripts.We use the newly proposed methods to analyse real RNA-seq data and provideevaluation of their accuracy using synthetic datasets. We demonstrate theadvantages of our approach in comparisons with existing alternative approachesfor expression quantification and differential expression analysis.The methods are implemented in the BitSeq package, which is freely distributedunder an open-source license. Our methods can be accessed and used by otherresearchers for RNA-seq data analysis.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Thesis advisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:218689
Created by:
Glaus, Peter
Created:
4th February, 2014, 08:23:02
Last modified by:
Glaus, Peter
Last modified:
6th May, 2015, 14:08:21

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.