In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Investigating Data Quality in Question and Answer Reports

Mohamed Zaki ali, Mona Hussein

[Thesis]. Manchester, UK: The University of Manchester; 2016.

Access to files

Abstract

Data Quality (DQ) has been a long-standing concern for a number of stakeholders in a variety of domains. It has become a critically important factor for the effectiveness of organisations and individuals. Previous work on DQ methodologies have mainly focused on either the analysis of structured data or the business-process level rather than analysing the data itself. Question and Answer Reports (QAR) are gaining momentum as a way to collect responses that can be used by data analysts, for instance, in business, education or healthcare. Various stakeholders benefit from QAR such as data brokers and data providers, and in order to effectively analyse and identify the common DQ problems in these reports, the various stakeholders' perspectives should be taken into account which adds another complexity for the analysis.This thesis investigates DQ in QAR through an in-depth DQ analysis and provide solutions that can highlight potential sources and causes of problems that result in “low-quality” collected data. The thesis proposes a DQ methodology that is appropriate for the context of QAR. The methodology consists of three modules: question analysis, medium analysis and answer analysis. In addition, a Question Design Support (QuDeS) framework is introduced to operationalise the proposed methodology through the automatic identification of DQ problems. The framework includes three components: question domain-independent profiling, question domain-dependent profiling and answers profiling. The proposed framework has been instantiated to address one example of DQ issues, namely Multi-Focal Question (MFQ). We introduce MFQ as a question with multiple requirements; it asks for multiple answers. QuDeS-MFQ (the implemented instance of QuDeS framework) has implemented two components of QuDeS for MFQ identification, these are question domain-independent profiling and question domain-dependent profiling. The proposed methodology and the framework are designed, implemented and evaluated in the context of the Carbon Disclosure Project (CDP) case study. The experiments show that we can identify MFQs with 90% accuracy.This thesis also demonstrates the challenges including the lack of domain resources for domain knowledge representation, such as domain ontology, the complexity and variability of the structure of QAR, as well as the variability and ambiguity of terminology and language expressions and understanding stakeholders or users need.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
209
Abstract:
Data Quality (DQ) has been a long-standing concern for a number of stakeholders in a variety of domains. It has become a critically important factor for the effectiveness of organisations and individuals. Previous work on DQ methodologies have mainly focused on either the analysis of structured data or the business-process level rather than analysing the data itself. Question and Answer Reports (QAR) are gaining momentum as a way to collect responses that can be used by data analysts, for instance, in business, education or healthcare. Various stakeholders benefit from QAR such as data brokers and data providers, and in order to effectively analyse and identify the common DQ problems in these reports, the various stakeholders' perspectives should be taken into account which adds another complexity for the analysis.This thesis investigates DQ in QAR through an in-depth DQ analysis and provide solutions that can highlight potential sources and causes of problems that result in “low-quality” collected data. The thesis proposes a DQ methodology that is appropriate for the context of QAR. The methodology consists of three modules: question analysis, medium analysis and answer analysis. In addition, a Question Design Support (QuDeS) framework is introduced to operationalise the proposed methodology through the automatic identification of DQ problems. The framework includes three components: question domain-independent profiling, question domain-dependent profiling and answers profiling. The proposed framework has been instantiated to address one example of DQ issues, namely Multi-Focal Question (MFQ). We introduce MFQ as a question with multiple requirements; it asks for multiple answers. QuDeS-MFQ (the implemented instance of QuDeS framework) has implemented two components of QuDeS for MFQ identification, these are question domain-independent profiling and question domain-dependent profiling. The proposed methodology and the framework are designed, implemented and evaluated in the context of the Carbon Disclosure Project (CDP) case study. The experiments show that we can identify MFQs with 90% accuracy.This thesis also demonstrates the challenges including the lack of domain resources for domain knowledge representation, such as domain ontology, the complexity and variability of the structure of QAR, as well as the variability and ambiguity of terminology and language expressions and understanding stakeholders or users need.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:302156
Created by:
Mohamed Zaki ali, Mona
Created:
7th July, 2016, 11:25:36
Last modified by:
Mohamed Zaki ali, Mona
Last modified:
1st December, 2017, 09:09:05

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.