Skip to navigation | Skip to main content | Skip to footer

Learning the language of gene expression

19 Jan 2007

Researchers have taken a major step towards understanding the language of gene regulation in the fruitfly Drosophila and they expect the technique to be rapidly applicable to understanding the effects of genome variation in humans.

The new research, published today in PLoS Computational Biology, is a major advance in using computers to detect the regions in DNA that control the activity of genes. Studies on single genes have shown that variation in gene regulation can be important in disease. The new program, called NestedMICA, allows researchers to find many regulatory regions, which will become a new focus for disease understanding.

The team, from the Wellcome Trust Sanger Institute and The University of Manchester, took slices of genome sequence from next to each Drosophila gene - where the highest concentration of regulatory signals are thought to lie - and fed them into the new computer program that looks for patterns shared between the sequences. The search process is similar to looking for words in a sentence where the vocabulary of the language is unknown

"Most words in the language of gene regulation can be spelled more than one way," explained Dr Thomas Down, first author on the report. "In English, you might see people writing either 'analyse' or 'analyze'. In genomes, such variation - or even bigger differences - seems to be normal.

"So we can't just count words, we need to recognize alternative spellings."

The team, which includes Dr Casey Bergman from Manchester's Faculty of Life Sciences, has so far found 120 'words' - distinct examples of regions that might regulate genes.  About 30 of these were known from many years of studying how individual Drosophila genes are controlled, but most are novel. This is a major step towards understanding the language of gene regulation in an important model organism, and proof of principle of a new technology that will speed the study of regulatory elements in the human genome. Drosophila is a well-studied organism and shares 48% of its 14,000 genes with humans.

Research emerging in the past few months suggests that variation in the sequence of regulatory regions will affect susceptibility to many diseases. A few cases are already known - one form of thalassaemia is caused by a regulatory sequence variant - but knowledge of regulatory elements in the human genome is limited: scientists have only scratched the surface.

Systematic annotation of regulatory regions in the human genome will be very important if researchers are going to understand the effects of all sequence variation.

Dr Tim Hubbard, senior author on the report explained: "While others have tried to identify these control regions before, they have had to try to align lots of sequences. Our new method doesn't depend on alignment, an advantage because the new program is robust to rapidly evolving sequences.

"The new method also doesn't require prior knowledge from, say, looking at known examples, and can search for hundreds of different motifs at once."

As science should, the work makes predictions that the team is testing. Using a set of excellent, publicly available data on gene activity from the University of California-Berkeley and Lawrence Berkeley National Laboratory, they have predicted what some of the newly discovered sequences might mean in the language of gene regulation.

Computer analysis can accelerate the search for important regions in genomes, but the authors emphasize that computer predictions must always be examined experimentally. The findings in Drosophila by the new program have been validated by examining findings against results from experimental imaging.

The results of the research, a set of Drosophila sequence motifs, are freely available from a database at the Sanger Institute. Like many tools developed at the Sanger Institute, NestedMICA is open source software, freely available for anyone to download, run and modify.


Notes for Editors:

Publication details:
Thomas Down, Casey Bergman, Jing Su, Tim Hubbard (2007) Large-scale discovery of promoter motifs in Drosophila melanogaster. PLoS Computational Biology, Article #06-PLCB-RA-0344R2

Research at the Wellcome Trust Sanger Institute was funded by the Wellcome Trust: Dr Bergman is funded by the Royal Society.

Research data:
NestedMICA software:
Hubbard lab:
Bergman lab:

Bioinformatics at Manchester:
University of Manchester:

The University of Manchester is Britain's largest single-site university with a proud history of achievement and an ambitious agenda for the future. It boasts 36,000 students, 4,500 academic and research staff and 500 degree courses. The University has an exceptional record of generating and sharing new ideas and innovations and is one of the world's top centres for biomedical research. Manchester's total expenditure on research in 2003/4 was £269.5 million which has led to a quality, breadth and volume of research activity unparalleled in the UK, as demonstrated by the results of the independent Research Assessment Exercise (RAE). Further information is available at

The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992 as the focus for UK sequencing efforts. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms such as mouse and zebrafish, and more than 90 pathogen genomes. In October 2005, new funding was awarded by the Wellcome Trust to enable the Institute to build on its world-class scientific achievements and exploit the wealth of genome data now available to answer important questions about health and disease. These programmes are built around a Faculty of more than 30 senior researchers. The Wellcome Trust Sanger Institute is based in Hinxton, Cambridge, UK.

The Wellcome Trust is the largest independent charity in the UK and the second largest medical research charity in the world. It funds innovative biomedical research, in the UK and internationally, spending around £500 million each year to support the brightest scientists with the best ideas. The Wellcome Trust supports public debate about biomedical research and its impact on health and wellbeing.

Contact details:

Don Powell Press Officer
Wellcome Trust Sanger Institute
Hinxton, Cambs, CB10 1SA, UK
Tel +44 (0)1223 494 956
Mobile +44 (0)7753 7753 97

Aeron Haworth
Media Officer
Faculty of Life Sciences
The University of Manchester
Manchester M13 9PL UK
Tel +44 (0)161 275 8383
Mobile +44 (0)7717 881 563