In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis.

Sims AH, Smethurst GJ, Hey Y, Okoniewski MJ, Pepper SD, Howell A, Miller CJ, Clarke RB

BMC Med Genomics. 2008;1( 1).

Access to files

Full-text and supplementary files are not available from Manchester eScholar. Full-text is available externally using the following links:

Full-text held externally

Abstract

ABSTRACT: BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. RESULTS: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple mean batch-centring was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. CONCLUSION: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.

Bibliographic metadata

Type of resource:
Content type:
Publication type:
Published date:
Journal title:
ISSN:
Volume:
1( 1)
Digital Object Identifier:
10.1186/1755-8794-1-42
Access state:
Active

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:1d32394
Created:
2nd September, 2009, 14:23:57
Last modified:
10th March, 2014, 17:15:57

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.