In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Predictive Variable Selection for Subgroup Identification

Turner, Emily

[Thesis]. Manchester, UK: The University of Manchester; 2017.

Access to files

Abstract

The problem of exploratory subgroup identification can be broken down into three steps. The first step is to identify predictive features, the second is to identify the interesting regions on those features, and the third is to estimate the properties of the subgroup region, such as subgroup size and the predicted recovery outcome for individuals belonging to this subgroup. While most work in this field analyses the full subgroup identification procedure, we provide an in-depth examination of the first step, predictive feature identification. A feature is defined as predictive if it interacts with a treatment to affect the recovery outcome. We compare three prominent methods for exploratory subgroup identification: Vir- tual Twins (Foster et al. 2011), SIDES (Subgroup Identification based on Differential Effect Search, Lipkovich et al. 2011) and GUIDE (Generalised, Unbiased Interaction Detection and Estimation, Loh et al. 2015). First, we provide a theoretical interpretation of the problem of predictive variable selection and connect it with the three methods. We believe that bringing different approaches under a common analytical framework facilitates a clearer comparison of each. We show that Virtual Twins and SIDES select interesting features in a theoretically similar way, so that the essential difference between the two is in the way in which this selection mechanism is implemented in their respective subgroup identification procedures. Second, we undertake an experimental analysis of the three. In order to do this, we apply each method to return a predictive variable importance measure (PVIMs), which we use to rank features in order of their predictiveness. We then evaluate and compare how well each method performs at this task. Although each of Virtual Twins, SIDES and GUIDE either output a PVIM or require minor adaptations to do so, their strengths and weaknesses as PVIMs had not been explored prior to this work. We argue that a variable ranking approach is a particularly good solution to the problem of subgroup identification. Because clinical trials often lack the power to identify predictive features with statistical significance, predictive variable scoring and ranking may be more appropriate than a full subgroup identification procedure. PVIMs enable a clinician to visualise the relative importance of each feature in a straightforward manner and to use clinical expertise to scrutinise the findings of the algorithm. Our conclusions are that Virtual Twins performs best in terms of predictive feature selection, outperforming SIDES and GUIDE on every type of data set. However, it appears to have weaknesses in distinguishing between predictive and prognostic biomarkers. Finally, we note that there is a need to provide common data sets on which new methods can be evaluated. We show that there is a tendency towards testing new subgroup identification methods on data sets that demonstrate the strengths of the algorithm and hide its weaknesses.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Master of Philosophy
Degree programme:
MPhil Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
64
Abstract:
The problem of exploratory subgroup identification can be broken down into three steps. The first step is to identify predictive features, the second is to identify the interesting regions on those features, and the third is to estimate the properties of the subgroup region, such as subgroup size and the predicted recovery outcome for individuals belonging to this subgroup. While most work in this field analyses the full subgroup identification procedure, we provide an in-depth examination of the first step, predictive feature identification. A feature is defined as predictive if it interacts with a treatment to affect the recovery outcome. We compare three prominent methods for exploratory subgroup identification: Vir- tual Twins (Foster et al. 2011), SIDES (Subgroup Identification based on Differential Effect Search, Lipkovich et al. 2011) and GUIDE (Generalised, Unbiased Interaction Detection and Estimation, Loh et al. 2015). First, we provide a theoretical interpretation of the problem of predictive variable selection and connect it with the three methods. We believe that bringing different approaches under a common analytical framework facilitates a clearer comparison of each. We show that Virtual Twins and SIDES select interesting features in a theoretically similar way, so that the essential difference between the two is in the way in which this selection mechanism is implemented in their respective subgroup identification procedures. Second, we undertake an experimental analysis of the three. In order to do this, we apply each method to return a predictive variable importance measure (PVIMs), which we use to rank features in order of their predictiveness. We then evaluate and compare how well each method performs at this task. Although each of Virtual Twins, SIDES and GUIDE either output a PVIM or require minor adaptations to do so, their strengths and weaknesses as PVIMs had not been explored prior to this work. We argue that a variable ranking approach is a particularly good solution to the problem of subgroup identification. Because clinical trials often lack the power to identify predictive features with statistical significance, predictive variable scoring and ranking may be more appropriate than a full subgroup identification procedure. PVIMs enable a clinician to visualise the relative importance of each feature in a straightforward manner and to use clinical expertise to scrutinise the findings of the algorithm. Our conclusions are that Virtual Twins performs best in terms of predictive feature selection, outperforming SIDES and GUIDE on every type of data set. However, it appears to have weaknesses in distinguishing between predictive and prognostic biomarkers. Finally, we note that there is a need to provide common data sets on which new methods can be evaluated. We show that there is a tendency towards testing new subgroup identification methods on data sets that demonstrate the strengths of the algorithm and hide its weaknesses.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:312697
Created by:
Turner, Emily
Created:
21st December, 2017, 14:52:04
Last modified by:
Turner, Emily
Last modified:
3rd January, 2018, 13:42:04

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.