In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

Full-text held externally

University researcher(s)

Academic department(s)

Assigning roles to protein mentions: the case of transcription factors.

Yang, Hui; Keane, John; Bergman, Casey M; Nenadic, Goran

Journal of Biomedical Informatics. 2009;42:887-94.

Access to files

Full-text and supplementary files are not available from Manchester eScholar. Full-text is available externally using the following links:

Full-text held externally

Abstract

Transcription factors (TFs) play a crucial role in gene regulation, and providing structured and curated information about them is important for genome biology. Manual curation of TF related data is time-consuming and always lags behind the actual knowledge available in the biomedical literature. Here we present a machine-learning text mining approach for identification and tagging of protein mentions that play a TF role in a given context to support the curation process. More precisely, the method explicitly identifies those protein mentions in text that refer to their potential TF functions. The prediction features are engineered from the results of shallow parsing and domain-specific processing (recognition of relevant appearing in phrases) and a phrase-based Conditional Random Fields (CRF) model is used to capture the content and context information of candidate entities. The proposed approach for the identification of TF mentions has been tested on a set of evidence sentences from the TRANSFAC and FlyTF databases. It achieved an F-measure of around 51.5% with a precision of 62.5% using 5-fold cross-validation evaluation. The experimental results suggest that the phrase-based CRF model benefits from the flexibility to use correlated domain-specific features that describe the dependencies between TFs and other entities. To the best of our knowledge, this work is one of the first attempts to apply text-mining techniques to the task of assigning semantic roles to protein mentions.

Bibliographic metadata

Content type:
Publication type:
Publication form:
Published date:
Language:
eng
Abbreviated journal title:
ISSN:
Place of publication:
United States
Volume:
42
Pagination:
887-94
Digital Object Identifier:
10.1016/j.jbi.2009.04.001
Pubmed Identifier:
19364541
Pii Identifier:
S1532-0464(09)00052-5
Access state:
Active

Institutional metadata

University researcher(s):
Academic department(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:75835
Created by:
Bergman, Casey
Created:
13th January, 2010, 17:29:18
Last modified by:
Bergman, Casey
Last modified:
6th March, 2016, 19:30:35

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.