In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Academic department(s)

      Modelling and Computing the Quality of Scientific Information on the Web of Data

      Gamble, Matthew Philip

      [Thesis]. Manchester, UK: The University of Manchester; 2014.

      Access to files

      Abstract

      The Web is being transformed into an open data commons, and is now the dominant point of access for information seeking scientists. In parallel the scientific community has been required to manage the challenges of "Big Data" - characterized by its large-scale, distributed, and diverse nature. The Web of Linked Data has emerged as a platform through which the sciences can meet this challenge, allowing them to publish and reuse data in a machine readable manner. The openness of the Web of Data is however a double-edged sword. On one hand it drives a rapid growth of adoption, but on the other a lack of governance and quality control has led to data of varied quality and trustworthiness.The challenge scientists face then is not that data on the Web is universally poor, but that the quality is unknown. Previous research has established the notion of Quality Knowledge, latent domain knowledge possessed by expert scientists to make quality based decisions. The main idea pursued in this thesis is that we can address Information Quality (IQ) issues in the Web of Data by repurposing these existing mechanisms scientists use to evaluate data. We argue that there are three distinct aspects of Quality Knowledge, objective, predictive, and subjective, defined by information required for their assessment, and present two studies focused on the modelling and exploitation of the objective and predictive aspects. We address the objective aspect by developing the Minimum Information Model as a repurposing of Minimum Information Checklists, an increasingly prevalent type of quality knowledge employed in the Life Sciences. A more general approach to modelling the predictive aspect explores the use of Multi-Entity Bayesian Networks to tackle the characteristic uncertainty in predictive quality knowledge, and the inconsistent availability of metadata in the Web of Data. We show that by following our classification we can develop techniques and infrastructure to successfully evaluate IQ that are tailored to the challenges of the Web of Data, and informed by the needs of the scientific community.

      Bibliographic metadata

      Type of resource:
      Content type:
      Form of thesis:
      Type of submission:
      Degree type:
      Doctor of Philosophy
      Degree programme:
      PhD Computer Science
      Publication date:
      Location:
      Manchester, UK
      Total pages:
      326
      Abstract:
      The Web is being transformed into an open data commons, and is now the dominant point of access for information seeking scientists. In parallel the scientific community has been required to manage the challenges of "Big Data" - characterized by its large-scale, distributed, and diverse nature. The Web of Linked Data has emerged as a platform through which the sciences can meet this challenge, allowing them to publish and reuse data in a machine readable manner. The openness of the Web of Data is however a double-edged sword. On one hand it drives a rapid growth of adoption, but on the other a lack of governance and quality control has led to data of varied quality and trustworthiness.The challenge scientists face then is not that data on the Web is universally poor, but that the quality is unknown. Previous research has established the notion of Quality Knowledge, latent domain knowledge possessed by expert scientists to make quality based decisions. The main idea pursued in this thesis is that we can address Information Quality (IQ) issues in the Web of Data by repurposing these existing mechanisms scientists use to evaluate data. We argue that there are three distinct aspects of Quality Knowledge, objective, predictive, and subjective, defined by information required for their assessment, and present two studies focused on the modelling and exploitation of the objective and predictive aspects. We address the objective aspect by developing the Minimum Information Model as a repurposing of Minimum Information Checklists, an increasingly prevalent type of quality knowledge employed in the Life Sciences. A more general approach to modelling the predictive aspect explores the use of Multi-Entity Bayesian Networks to tackle the characteristic uncertainty in predictive quality knowledge, and the inconsistent availability of metadata in the Web of Data. We show that by following our classification we can develop techniques and infrastructure to successfully evaluate IQ that are tailored to the challenges of the Web of Data, and informed by the needs of the scientific community.
      Thesis main supervisor(s):
      Thesis advisor(s):
      Language:
      en

      Institutional metadata

      University researcher(s):
      Academic department(s):

        Record metadata

        Manchester eScholar ID:
        uk-ac-man-scw:224381
        Created by:
        Gamble, Matthew
        Created:
        30th April, 2014, 11:28:28
        Last modified by:
        Gamble, Matthew
        Last modified:
        1st June, 2015, 18:44:39

        Can we help?

        The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.