In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

ASPECTS OF STATISTICAL DISCLOSURE CONTROL

Smith, Duncan Geoffrey

[Thesis]. Manchester, UK: The University of Manchester; 2012.

Access to files

Abstract

This work concerns the evaluation of statistical disclosure control risk by adopting the position of the data intruder. The underlying assertion is that risk metrics should be based on the actual inferences that an intruder can make. Ideally metrics would also take into account how sensitive the inferences would be, but that is subjective. A parallel theme is that of the knowledgeable data intruder; an intruder who has the technical skills to maximally exploit the information contained in released data. This also raises the issue of computational costs and the benefits of using good algorithms.A metric for attribution risk in tabular data is presented. It addresses the issue that most measures for tabular data are based on the risk of identification. The metric can also take into account assumed levels of intruder knowledge regarding the population, and it can be applied to both exact and perturbed collections of tables.An improved implementation of the Key Variable Mapping System (Elliot, et al., 2010) is presented. The problem is more precisely defined in terms of categorical variables rather than responses to survey questions. This allows much more efficient algorithms to be developed, leading to significant performance increases.The advantages and disadvantages of alternative matching strategies are investigated. Some are shown to dominate others. The costs of searching for a match are also considered, providing insight into how a knowledgeable intruder might tailor a strategy to balance the probability of a correct match and the time and effort required to find a match.A novel approach to model determination in decomposable graphical models is described. It offers purely computational advantages over existing schemes, but allows data sets to be more thoroughly checked for disclosure risk.It is shown that a Bayesian strategy for matching between a sample and a population offers much higher probabilities of a correct match than traditional strategies would suggest.The Special Uniques Detection Algorithm (Elliot et al., 2002) (Manning et al., 2008), for identifying risky sample counts of 1, is compared against Bayesian (using Markov Chain Monte Carlo and simulated annealing) alternatives. It is shown that the alternatives are better at identifying risky sample uniques, and can do so with reduced computational costs.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Centre for Census and Survey Research
Publication date:
Location:
Manchester, UK
Total pages:
246
Abstract:
This work concerns the evaluation of statistical disclosure control risk by adopting the position of the data intruder. The underlying assertion is that risk metrics should be based on the actual inferences that an intruder can make. Ideally metrics would also take into account how sensitive the inferences would be, but that is subjective. A parallel theme is that of the knowledgeable data intruder; an intruder who has the technical skills to maximally exploit the information contained in released data. This also raises the issue of computational costs and the benefits of using good algorithms.A metric for attribution risk in tabular data is presented. It addresses the issue that most measures for tabular data are based on the risk of identification. The metric can also take into account assumed levels of intruder knowledge regarding the population, and it can be applied to both exact and perturbed collections of tables.An improved implementation of the Key Variable Mapping System (Elliot, et al., 2010) is presented. The problem is more precisely defined in terms of categorical variables rather than responses to survey questions. This allows much more efficient algorithms to be developed, leading to significant performance increases.The advantages and disadvantages of alternative matching strategies are investigated. Some are shown to dominate others. The costs of searching for a match are also considered, providing insight into how a knowledgeable intruder might tailor a strategy to balance the probability of a correct match and the time and effort required to find a match.A novel approach to model determination in decomposable graphical models is described. It offers purely computational advantages over existing schemes, but allows data sets to be more thoroughly checked for disclosure risk.It is shown that a Bayesian strategy for matching between a sample and a population offers much higher probabilities of a correct match than traditional strategies would suggest.The Special Uniques Detection Algorithm (Elliot et al., 2002) (Manning et al., 2008), for identifying risky sample counts of 1, is compared against Bayesian (using Markov Chain Monte Carlo and simulated annealing) alternatives. It is shown that the alternatives are better at identifying risky sample uniques, and can do so with reduced computational costs.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:156400
Created by:
Smith, Duncan
Created:
23rd February, 2012, 15:16:11
Last modified by:
Smith, Duncan
Last modified:
21st March, 2015, 19:23:13

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.