In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

Search for item elsewhere

University researcher(s)

Academic department(s)

Statistical Disclosure Control for Frequency Tables

Antal, Laszlo

[Thesis]. Manchester, UK: The University of Manchester; 2016.

Access to files

FULL-TEXT.PDF (pdf)

Abstract

Disclosure risk assessment of statistical data, such as frequency tables, is a prerequisite for data dissemination. This thesis investigates the problem of disclosure risk assessment of frequency tables from the perspective of a statistical institute.In the research reported here, disclosure risk is measured by a mathematical function designed for the data according to a disclosure risk scenario. Such functions are called disclosure risk measures. A disclosure risk measure is defined for frequency tables based on the entire population using information theory.If the disclosure risk of a population based frequency table is high, a statistical institute will apply a statistical disclosure control (SDC) method possibly perturbing the table. It is known that the application of any SDC method lowers the disclosure risk. However, measuring the disclosure risk of the perturbed frequency table is a difficult problem. The disclosure risk measure proposed in the first paper of the thesis is also extended to assess the disclosure risk of perturbed frequency tables. SDC methods can be applied to either the microdata from which the frequency table is generated or directly to the frequency table. The two classes of methods are called pre- and post-tabular methods accordingly. It is shown that the two classes are closely related and that the proposed disclosure risk measure can account for both methods.In the second paper, the disclosure risk measure is extended to assess the disclosure risk of sample based frequency tables. Probabilistic models are used to estimate the population frequencies from sample frequencies which can then be used in the proposed disclosure risk measures.In the final paper of the thesis, we investigate an application of building a flexible table generator where disclosure risk and data utility measures must be calculated on-the-fly. We show that the proposed disclosure risk measure and a related information loss measure are adaptable to these settings. An example implementation of the disclosure risk and data utility assessment using the proposed disclosure risk measure is given.