In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

Search for item elsewhere

University researcher(s)

Academic department(s)

AMPLIFYING DATA CURATION EFFORTS TO IMPROVE THE QUALITY OF LIFE SCIENCE DATA

Alqasab, Mariam Saleh J

[Thesis]. Manchester, UK: The University of Manchester; 2019.

Access to files

FULL-TEXT.PDF (pdf)

Abstract

The massive amount of data received from the biomedical literature raises the issue of maintaining data quality. This leads biomedical database providers to curate their data, whether by using tools or hiring domain experts (humans who are known as curators). It should be noted that the curation process is not affordable for all databases, as it is an expensive and time-consuming task, especially when human experts perform curation. Carrying out curation is crucial in all domains and is not limited to biocuration. In the biomedical field, keeping data curated can prevent harmful problems. For example, if a protein name is miswritten in a data records, a scientist may then use the incorrect name in all their experiments, causing confusion. In short, relying on data that has not received curation can cause the production of incorrect results. The importance of performing data curation leads many researchers to focus their efforts on providing approaches to help speed up the curation process, make it more reliable and make it more efficient. In this thesis, we first propose a maturity model that describes the maturity levels of biomedical data curation. The proposed maturity model aims to help data providers to identify limitations in their current curation methods and enhance their curation process. The maturity model was built based on information gathered from five different biomedical databases and surveying the biocuration literature, and did not require extra input from curators. Second, we explore one possible approach to maximising the value obtained from human curators (IQBot) by automatically extracting information about data defects and corrections arising from the work that the curators carry out. This information is packaged in a source-independent form, allowing it to be used by the owners of other databases. To extract this information, we compared data from two consecutive versions of the data records. We ran IQBot to monitor a real-world database (UniProtKB) to extract defects and defect corrections. When we compared the extracted defects and defect corrections with data from other databases, we found that the databases still had out-of-date data in their records.

Keyword(s)

Data Curation; IQ-Bot; Maturity Model

Bibliographic metadata

Type of resource:

text

Content type:

Administered thesis

Form of thesis:

Traditional

Type of submission:

Doctoral level ETD - final

Thesis title:

AMPLIFYING DATA CURATION EFFORTS TO IMPROVE THE QUALITY OF LIFE SCIENCE DATA

Degree type:

Doctor of Philosophy

Degree programme:

PhD Computer Science (CDT)

Publication date:

2019-06-07T12:55:34

Institution:

The University of Manchester

Location:

Manchester, UK

Total pages:

143

Abstract:

Keyword(s):

Thesis main supervisor(s):

EMBURY, SUZANNE SM

Thesis co-supervisor(s):

SAMPAIO, SANDRA SDFM

Degree grantor:

The University of Manchester

Language:

Institutional metadata

University researcher(s):

Alqasab, Mariam

Academic department(s):

Record metadata

Manchester eScholar ID:

uk-ac-man-scw:319733

Created by:

Alqasab, Mariam

Created:

7th June, 2019, 11:55:34

Last modified by:

Alqasab, Mariam

Last modified:

1st July, 2019, 14:03:16