In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Integrated Supervised and Unsupervised Learning Method to Predict the Outcome of Tuberculosis Treatment Course

    Rostamniakankalhori, Sharareh

    [Thesis]. Manchester, UK: The University of Manchester; 2011.

    Access to files

    Abstract

    Tuberculosis (TB) is an infectious disease which is a global public health problem with over 9 million new cases annually. Tuberculosis treatment, with patient supervision and support is an element of the global plan to stop TB designed by the World Health Organization in 2006. The plan requires prediction of patient treatment course destination. The prediction outcome can be used to determine how intensive the level of supplying services and supports in frame of DOTS therapy should be. No predictive model for the outcome has been developed yet and only limited reports of influential factors for considered outcome are available.To fill this gap, this thesis develops a machine learning approach to predict the outcome of tuberculosis treatment course, which includes, firstly, data of 6,450 Iranian TB patients under DOTS (directly observed treatment, short course ) therapy were analysed to initially diagnose the significant predictors by correlation analysis; secondly, these significant features were applied to find the best classification approach from six examined algorithms including decision tree, Bayesian network, logistic regression, multilayer perceptron, radial basis function, and support vector machine; thirdly, the prediction accuracy of these existing techniques was improved by proposing and developing a new integrated method of k-mean clustering and classification algorithms. Finally, a cluster-based simplified decision tree (CSDT) was developed through an innovative hierarchical clustering and classification algorithm. CSDT was built by k-mean partitioning and the decision tree learning. This innovative method not only improves the prediction accuracy significantly but also leads to a much simpler and interpretative decision tree.The main results of this study included, firstly, finding seventeen significantly correlated features which were: age, sex, weight, nationality, area of residency, current stay in prison, low body weight, TB type, treatment category, length of disease, TB case type, recent TB infection, diabetic or HIV positive, and social risk factors like history of imprisonment, IV drug usage, and unprotected sex ; secondly, the results by applying and comparing six applied supervised machine learning tools on the testing set revealed that decision trees gave the best prediction accuracy (74.21%) compared with other methods; thirdly, by using testing set, the new integrated approach to combine the clustering and classification approach leads to the prediction accuracy improvement for all applied classifiers; the most and least improvement for prediction accuracy were shown by logistic regression (10%) and support vector machine (4%) respectively. Finally, by applying the proposed and developed CSDT, cluster-based simplified decision trees were optioned, which reduced the size of the resulting decision tree and further improved the prediction accuracy.Data type and having normal distribution have created an opportunity for the decision tree to outperform other algorithms. Pre-learning by k-mean clustering to relocate the objects and put similar cases in the same group can improve the classification accuracy. The compatible feature of k-mean partitioning and decision tree to generate pure local regions can simplify the decision trees and make them more precise through creating smaller sub-trees with fewer misclassified cases. The extracted rules from these trees can play the role of a knowledge base for a decision support system in further studies.

    Bibliographic metadata

    Type of resource:
    Content type:
    Form of thesis:
    Type of submission:
    Degree type:
    Doctor of Philosophy
    Degree programme:
    PhD in Informatics
    Publication date:
    Location:
    Manchester, UK
    Total pages:
    221
    Abstract:
    Tuberculosis (TB) is an infectious disease which is a global public health problem with over 9 million new cases annually. Tuberculosis treatment, with patient supervision and support is an element of the global plan to stop TB designed by the World Health Organization in 2006. The plan requires prediction of patient treatment course destination. The prediction outcome can be used to determine how intensive the level of supplying services and supports in frame of DOTS therapy should be. No predictive model for the outcome has been developed yet and only limited reports of influential factors for considered outcome are available.To fill this gap, this thesis develops a machine learning approach to predict the outcome of tuberculosis treatment course, which includes, firstly, data of 6,450 Iranian TB patients under DOTS (directly observed treatment, short course ) therapy were analysed to initially diagnose the significant predictors by correlation analysis; secondly, these significant features were applied to find the best classification approach from six examined algorithms including decision tree, Bayesian network, logistic regression, multilayer perceptron, radial basis function, and support vector machine; thirdly, the prediction accuracy of these existing techniques was improved by proposing and developing a new integrated method of k-mean clustering and classification algorithms. Finally, a cluster-based simplified decision tree (CSDT) was developed through an innovative hierarchical clustering and classification algorithm. CSDT was built by k-mean partitioning and the decision tree learning. This innovative method not only improves the prediction accuracy significantly but also leads to a much simpler and interpretative decision tree.The main results of this study included, firstly, finding seventeen significantly correlated features which were: age, sex, weight, nationality, area of residency, current stay in prison, low body weight, TB type, treatment category, length of disease, TB case type, recent TB infection, diabetic or HIV positive, and social risk factors like history of imprisonment, IV drug usage, and unprotected sex ; secondly, the results by applying and comparing six applied supervised machine learning tools on the testing set revealed that decision trees gave the best prediction accuracy (74.21%) compared with other methods; thirdly, by using testing set, the new integrated approach to combine the clustering and classification approach leads to the prediction accuracy improvement for all applied classifiers; the most and least improvement for prediction accuracy were shown by logistic regression (10%) and support vector machine (4%) respectively. Finally, by applying the proposed and developed CSDT, cluster-based simplified decision trees were optioned, which reduced the size of the resulting decision tree and further improved the prediction accuracy.Data type and having normal distribution have created an opportunity for the decision tree to outperform other algorithms. Pre-learning by k-mean clustering to relocate the objects and put similar cases in the same group can improve the classification accuracy. The compatible feature of k-mean partitioning and decision tree to generate pure local regions can simplify the decision trees and make them more precise through creating smaller sub-trees with fewer misclassified cases. The extracted rules from these trees can play the role of a knowledge base for a decision support system in further studies.
    Thesis main supervisor(s):
    Thesis advisor(s):
    Language:
    en

    Institutional metadata

    University researcher(s):

    Record metadata

    Manchester eScholar ID:
    uk-ac-man-scw:132404
    Created by:
    Rostamniakankalhori, Sharareh
    Created:
    3rd October, 2011, 11:37:47
    Last modified by:
    Rostamniakankalhori, Sharareh
    Last modified:
    2nd November, 2011, 15:21:43

    Can we help?

    The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.