In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

A PROBABILISTIC APPROACH TO UNCERTAINTY QUANTIFICATION IN PAY-AS-YOU-GO DATA INTEGRATION

Sanchez Serrano, Fernando Rene

[Thesis]. Manchester, UK: The University of Manchester; 2019.

Access to files

Abstract

The use of Web standards, compact publication guidelines, and open data initiatives have motivated many public and private organisations to publish data on the Web, giving rise to a global data space. Consuming data from heterogeneous data sources published on the Web requires integration at scale. The pay-as-you-go approach to data integration (PAYG) addresses integration at scale, relying on automatic techniques to provide candidate integrations. The high reliance on automatic techniques gives rise to uncertainty. Uncertainty may arise and propagate to all the tasks of the life cycle of a PAYG approach whose effect may be manifested in the quality of an automatically generated integration. Quantifying the uncertainty on the outcomes of a bootstrapped integration is a crucial task that can help in understanding the decisions made by the automatic algorithms, aiming to reduce such uncertainty that ultimately can improve the quality of an integration. In this thesis, we address the issue of quantifying the uncertainty that arises dur- ing the bootstrapping phase of PAYG in the context of Dataspaces. In particular, two approaches are proposed: (i) an approach to quantify the uncertainty in mapping gener- ation using internal evidence; (ii) an approach to quantify the uncertainty on the quality of an entire integration using user feedback in a pay-as-you-go manner. More specifically, this thesis makes the following contributions: (i) a principled methodology to derive degrees of belief on mappings that builds on Bayesian infer- ence to assimilate evidence in the form of fitness scores associated to mappings during mapping generation; (ii) a novel methodology to quantify the uncertainty on the quality of an entire integration by assimilating user feedback on tuple results; (iii) an experi- mental evaluation of the proposed techniques on a real-world integration scenario. The experimental evaluation of the contributed techniques presented in this dis- sertation provides empirical evidence of their cost-effectiveness, when applied in syn- thetic and real-world scenarios, in quantifying the quality of a pay-as-you-go data in- tegration.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science (Conacyt)
Publication date:
Location:
Manchester, UK
Total pages:
172
Abstract:
The use of Web standards, compact publication guidelines, and open data initiatives have motivated many public and private organisations to publish data on the Web, giving rise to a global data space. Consuming data from heterogeneous data sources published on the Web requires integration at scale. The pay-as-you-go approach to data integration (PAYG) addresses integration at scale, relying on automatic techniques to provide candidate integrations. The high reliance on automatic techniques gives rise to uncertainty. Uncertainty may arise and propagate to all the tasks of the life cycle of a PAYG approach whose effect may be manifested in the quality of an automatically generated integration. Quantifying the uncertainty on the outcomes of a bootstrapped integration is a crucial task that can help in understanding the decisions made by the automatic algorithms, aiming to reduce such uncertainty that ultimately can improve the quality of an integration. In this thesis, we address the issue of quantifying the uncertainty that arises dur- ing the bootstrapping phase of PAYG in the context of Dataspaces. In particular, two approaches are proposed: (i) an approach to quantify the uncertainty in mapping gener- ation using internal evidence; (ii) an approach to quantify the uncertainty on the quality of an entire integration using user feedback in a pay-as-you-go manner. More specifically, this thesis makes the following contributions: (i) a principled methodology to derive degrees of belief on mappings that builds on Bayesian infer- ence to assimilate evidence in the form of fitness scores associated to mappings during mapping generation; (ii) a novel methodology to quantify the uncertainty on the quality of an entire integration by assimilating user feedback on tuple results; (iii) an experi- mental evaluation of the proposed techniques on a real-world integration scenario. The experimental evaluation of the contributed techniques presented in this dis- sertation provides empirical evidence of their cost-effectiveness, when applied in syn- thetic and real-world scenarios, in quantifying the quality of a pay-as-you-go data in- tegration.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Funder(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:319123
Created by:
Sanchez Serrano, Fernando Rene
Created:
4th April, 2019, 14:49:27
Last modified by:
Sanchez Serrano, Fernando Rene
Last modified:
7th November, 2019, 10:03:09

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.