In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    UNDERSTANDING DIFFERENTIAL FUNCTIONING BY GENDER IN MATHEMATICS ASSESSMENT

    Ong, Yoke Mooi

    [Thesis]. Manchester, UK: The University of Manchester; 2011.

    Access to files

    Abstract

    When examinees with the same ‘ability’ take a test, they should have an equal chance of responding correctly to an item irrespective of group membership. This logic in assessment is known as measurement invariance. The lack of invariance of the item-, bundle-, and test-difficulty across different subgroups indicates differential functioning (DF). The aim of this study is to advance our understanding of DF by detecting, predicting and explaining the sources of DF by gender in a mathematics test. The presence of DF means that the test scores of these examinees may fail to provide a valid measure of their performance. A framework for investigating DF was proposed, moving from the item-level to a more complex random-item level, which provides a theme of critiques of limitations in DF methods and explorations of some advances. A dataset of 11-year-olds of a high-stakes National mathematics examination from England was used in this study. The results are reported in three journal publication format papers. The first paper addressed the issue of understanding nonuniform differential item functioning (DIF) at the item- level. The nonuniform DIF is investigated because it is a possible threat when common DIF statistics sensitive to uniform DIF may indicate no significant DIF. This study differentiates two different types of nonuniform DIF, namely crossing and noncrossing DIF. Two commonly used DIF detection methods, namely the Logistic Regression (LR) procedure and the Rasch measurement model were used to identify crossing and noncrossing DIF. This paper concludes that items with nonuniform DIF do exist in empirical data; hence there is a need to include statistics sensitive to crossing DIF in item analysis. The second paper investigated the sources of DF via differential bundle functioning (DBF) because this way we may get a substantive explanations of DF - without which we do not know if DF is ‘valid’ or ‘biased’. Roussos and Stout’s (1996a) multidimensionality-based DIF paradigm was used with an extension of the LR procedure to detect DBF. Three qualitatively different content areas: test modality, curriculum domains and problem presentation were studied. This paper concludes that DBF in curriculum domains may elicit construct-relevant variance, and so may indicate 'real' differences, whereas problem presentation and test modality arguably includes construct-irrelevant variance and so may indicate gender bias. Finally, the third paper considered item-person responses as hierarchically nested within items. Hence a two-level logistic model was used to model the random item effects, because otherwise it is argued that DF might be over-exaggerated and may lead to invalid inferences. This paper aimed to explain DF via DBF comparing single-level and two-level models. The DIF effects of the single-level model were found to be attenuated in the two-level model. A discussion of why the two different models produced different results was presented. Taken together, this thesis shows how validity arguments regarding bias should not be reduced to DF at item-level but can be analysed on three different levels.

    Bibliographic metadata

    Type of resource:
    Content type:
    Form of thesis:
    Type of submission:
    Degree type:
    Doctor of Philosophy
    Degree programme:
    PhD Education
    Publication date:
    Location:
    Manchester, UK
    Total pages:
    322
    Abstract:
    When examinees with the same ‘ability’ take a test, they should have an equal chance of responding correctly to an item irrespective of group membership. This logic in assessment is known as measurement invariance. The lack of invariance of the item-, bundle-, and test-difficulty across different subgroups indicates differential functioning (DF). The aim of this study is to advance our understanding of DF by detecting, predicting and explaining the sources of DF by gender in a mathematics test. The presence of DF means that the test scores of these examinees may fail to provide a valid measure of their performance. A framework for investigating DF was proposed, moving from the item-level to a more complex random-item level, which provides a theme of critiques of limitations in DF methods and explorations of some advances. A dataset of 11-year-olds of a high-stakes National mathematics examination from England was used in this study. The results are reported in three journal publication format papers. The first paper addressed the issue of understanding nonuniform differential item functioning (DIF) at the item- level. The nonuniform DIF is investigated because it is a possible threat when common DIF statistics sensitive to uniform DIF may indicate no significant DIF. This study differentiates two different types of nonuniform DIF, namely crossing and noncrossing DIF. Two commonly used DIF detection methods, namely the Logistic Regression (LR) procedure and the Rasch measurement model were used to identify crossing and noncrossing DIF. This paper concludes that items with nonuniform DIF do exist in empirical data; hence there is a need to include statistics sensitive to crossing DIF in item analysis. The second paper investigated the sources of DF via differential bundle functioning (DBF) because this way we may get a substantive explanations of DF - without which we do not know if DF is ‘valid’ or ‘biased’. Roussos and Stout’s (1996a) multidimensionality-based DIF paradigm was used with an extension of the LR procedure to detect DBF. Three qualitatively different content areas: test modality, curriculum domains and problem presentation were studied. This paper concludes that DBF in curriculum domains may elicit construct-relevant variance, and so may indicate 'real' differences, whereas problem presentation and test modality arguably includes construct-irrelevant variance and so may indicate gender bias. Finally, the third paper considered item-person responses as hierarchically nested within items. Hence a two-level logistic model was used to model the random item effects, because otherwise it is argued that DF might be over-exaggerated and may lead to invalid inferences. This paper aimed to explain DF via DBF comparing single-level and two-level models. The DIF effects of the single-level model were found to be attenuated in the two-level model. A discussion of why the two different models produced different results was presented. Taken together, this thesis shows how validity arguments regarding bias should not be reduced to DF at item-level but can be analysed on three different levels.
    Thesis main supervisor(s):
    Thesis advisor(s):
    Language:
    en

    Institutional metadata

    University researcher(s):
    Academic department(s):

    Record metadata

    Manchester eScholar ID:
    uk-ac-man-scw:102905
    Created by:
    Ong, Yoke
    Created:
    4th January, 2011, 13:35:48
    Last modified by:
    Ong, Yoke
    Last modified:
    10th February, 2015, 10:50:11

    Can we help?

    The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.