In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

The development of a statistical software resource for medical research

Iain E. Buchan

[Thesis].University of Liverpool;2000.

Access to files

Abstract

Medical research is often weakened by poor statistical practice, and inappropriate use of statistical computer software is part of this problem. The statistical knowledge that medical researchers require has traditionally been gained in both dedicated and ad hoc learning time, often separate from the research processes in which the statistical methods are applied. Computer software, however, can be written to flexibly support statistical practice. The work of this thesis was to explore the possibility of, and if possible, to create, a resource supporting medical researchers in statistical knowledge and calculation at the point of need.The work was carried out over eleven years, and was directed towards the medical research community in general. Statistical and Software Engineering methods were used to produce a unified statistical computational and knowledge support resource. Mathematically and computationally robust approaches to statistical methods were continually sought from current literature. The type of evaluation undertaken was formative; this included monitoring uptake of the software and feedback from its users, comparisons with other software, reviews in peer reviewed publications, and testing of results against classical and reference data. Large-scale opportunistic feedback from users of this resource was employed in its continuous improvement.The software resulting from the work of this thesis is provided herein as supportive evidence. Results of applying the software to classical reference data are shown in the written thesis. The scope and presentation of statistical methods are considered in a comparison of the software with common statistical software resources. This comparison showed that the software written for this thesis more closely matched statistical methods commonly used in medical research, and contained more statistical knowledge support materials. Up to October 31st 2000, uptake of the software was recorded for 5621 separate instances by individuals or institutions. The development has been self-sustaining.Medical researchers need to have sufficient statistical understanding, just as statistical researchers need to sufficiently understand the nature of data. Statistical software tools may damage statistical practice if they distract attention from statistical goals and tasks, onto the tools themselves. The work of this thesis provides a practical computing framework supporting statistical knowledge and calculation in medical research. This work has shown that sustainable software can be engineered to improve statistical appreciation and practice in ways that are beyond the reach of traditional medical statistical education.

Bibliographic metadata

Type of resource:
Content type:
Type of thesis:
Author(s) list:
Degree type:
MD
Publication date:
Total pages:
322
Table of contents:
PREFACE IDeclaration iElectronic enclosures iOverview of thesis iAudience iIntroduction iMethods iiResults iiNumerical validation iiComparisons with other resources iiEvidence of use and application to medical research iiiDiscussion and conclusions iiiACKNOWLEDGEMENTS IVCONTENTS VIABSTRACT 1INTRODUCTION 2Origins of this work 2Origins of statistics 2The rise of medical applications of statistics 5A brief history of computing machines 6Computer-supported numerical reasoning in medical research 8METHODS 10Software interface development 10Software platforms, languages and development tools 12Numerical precision and error 13Evaluating arithmetic expressions 15Constants 15Arithmetic functions 15Arithmetic operators 16Operator precedence 16Trigonometric functions 16Logical functions 17Counting and grouping 18Searching and translation of dates and text 20Sorting, ranking and normal scores 21Pairwise calculations 23Pairwise differences 23Pairwise means 23Pairwise slopes 24Transformations 25Logarithmic 25Logit 25Probit 25Angular 26Cumulative 26Ladder of powers 26P values and confidence intervals 27Probability distributions 28Normal 28Chi-square 29Student's t 29F (variance ratio) 30Studentized range (Q) 31Spearman's rho 31Kendall's tau 32Binomial 33Poisson 34Non-central t 35Sample sizes 36Population survey 36Paired cohort 37Independent cohort 37Matched case-control 38Independent case-control 39Unpaired t test 39Paired t test 40Survival times (two groups) 40Randomization 41Proportions (binomial) 42Single 42Paired 42Two independent 43Chi-square methods 44Two by two tables 44Two by k tables 45r by c tables 46McNemar 48Mantel-Haenszel 48Woolf 48Goodness of fit 49Exact methods for counts 50Sign test 50Fisher's exact test 50Exact confidence limits for two by two odds 51Matched pairs 52Miscellaneous methods 53Risk (prospective) 53Risk (retrospective) 54Number needed to treat 55Incidence rates 56Diagnostic tests and likelihood ratios 57Screening test errors 58Standardised mortality ratios 59Kappa agreement statistics for two raters 60Basic univariate descriptive statistics 62Valid and missing data 62Variance, standard deviation and standard error 62Skewness and kurtosis 63Geometric mean 63Median, quartiles and range 64Parametric methods 65Student's t tests 65Normal distribution tests 66Reference ranges 68Poisson confidence intervals 68Shapiro-Wilk W test 69Nonparametric methods 70Mann-Whitney 70Wilcoxon signed ranks test 70Spearman's rank correlation 71Kendall's rank correlation 71Kruskal-Wallis test 73Friedman test 75Cuzick's test for trend 76Quantile confidence interval 77Smirnov two sample test 77Homogeneity of variance 78Analysis of variance 79One way and homogeneity 79Multiple comparisons 80Two way randomized block 81Fully nested random (hierarchical) 82Latin square 83Crossover 84Agreement 86Regression and correlation 88Simple linear 88Multiple (general) linear 89Grouped linear and test for linearity 91Polynomial 93Linearized estimates 94Exponential 94Geometric 94Hyperbolic 94Probit analysis 95Logistic regression 96Principal components 100Survival analysis 101Kaplan-Meier 101Life table 103Log-rank and Wilcoxon 104Wei-Lachin 107Cox regression 108Meta-analysis 110Odds ratio 110Peto odds ratio 111Relative risk 113Risk difference 114Effect size 115Incidence rate 116Graphics 117Sustainable development and distribution 118RESULTS 120Numerical validation 120Standard normal distribution 120Student's t distribution 120F (variance ratio) distribution 121Chi-square distribution 121Studentized range distribution 121Binomial distribution 122Poisson distribution 122Kendall's test statistic and tau distribution 122Hotelling's test statistic and Spearman's rho distribution 123Non-central t distribution 123Sign test 124Fisher's exact test 124Expanded Fisher's exact test 125McNemar and exact (Liddell) test 125Exact confidence limits for 2 by 2 odds 126Chi-square test (2 by 2) 127Chi-square test (2 by k) 128Chi-square test (r by c) 129Woolf chi-square statistics 130Mantel Haenszel chi-square test 131Single proportion 131Paired proportions 131Two independent proportions 132Sample sizes for paired or single sample Student t tests 133Sample sizes for unpaired two sample Student t tests 133Sample sizes for independent case-control studies 133Sample sizes for independent cohort studies 134Sample sizes for matched case-control studies 135Sample sizes for paired cohort studies 136Sample sizes for population surveys 136Risk analysis (prospective) 137Risk analysis (retrospective) 138Diagnostic test (2 by 2 table) 139Likelihood ratios (2 by k table) 140Number needed to treat 140Kappa inter-rater agreement with two raters 141Screening test errors 142Standardized mortality ratio 142Incidence rate analysis 143Basic descriptive statistics 144Student's t test for paired samples 145Student's t test for a single sample 146Student's t test for two independent samples 147F (variance ratio) test for two samples 147Normal distribution (z) test for a single sample 148Normal distribution (z) test for two independent samples 149Reference range 150Poisson confidence interval 151Shapiro-Wilk W test 151Mann-Whitney test 152Wilcoxon signed ranks test 153Kendall's rank correlation 153Spearman's rank correlation 154Nonparametric linear regression 155Cuzick's test for trend 156Smirnov two sample test 157Quantile confidence interval 157Kruskal-Wallis test 158Friedman test 159Chi-square goodness of fit test 161One way analysis of variance 162Two way randomized blocks analysis of variance 164Two way replicate randomized blocks analysis of variance 165Nested random analysis of variance 166Latin square 167Crossover 168Agreement analysis 169Simple linear regression 171Multiple/general linear regression 174Grouped regression - linearity 183Grouped regression - covariance 183Principal components analysis 185Polynomial regression 186Logistic regression 189Probit analysis 197Cox regression 199Kaplan-Meier survival estimates 209Life table 213Log-rank and Wilcoxon comparisons of survival 214Unstratified two sample example: 214Stratified two sample example: 215Unstratified k sample example: 218Wei-Lachin test 220Odds ratio meta-analysis 223Peto odds ratio meta-analysis 226Relative risk meta-analysis 228Risk difference meta-analysis 231Effect size meta-analysis 234Incidence rate meta-analysis 237Crosstabs 241Frequencies 243Box and whisker plot 244Spread Plot 245Histogram 246Scatter Plot 247Error bar plot 248Ladder plot 249Receiver operating characteristic curve 250Normal Plot 251Population Pyramid 252Comparisons with other statistical resources 253Knowledge support 253Access to statistical methods 253Samples of interaction and output 261Orientating users 262Interacting with users over data 262Interaction with users over results 263Evidence of use and application 264Distribution of software 264Citations and reviews 265DISCUSSION AND CONCLUSIONS 266Evaluation of developments against original aims 266Lessons learnt from developing the software for this thesis 270Plans for further research and development 274Conclusions 276REFERENCES 279APPENDIX 1 305
Abstract:
Medical research is often weakened by poor statistical practice, and inappropriate use of statistical computer software is part of this problem. The statistical knowledge that medical researchers require has traditionally been gained in both dedicated and ad hoc learning time, often separate from the research processes in which the statistical methods are applied. Computer software, however, can be written to flexibly support statistical practice. The work of this thesis was to explore the possibility of, and if possible, to create, a resource supporting medical researchers in statistical knowledge and calculation at the point of need.The work was carried out over eleven years, and was directed towards the medical research community in general. Statistical and Software Engineering methods were used to produce a unified statistical computational and knowledge support resource. Mathematically and computationally robust approaches to statistical methods were continually sought from current literature. The type of evaluation undertaken was formative; this included monitoring uptake of the software and feedback from its users, comparisons with other software, reviews in peer reviewed publications, and testing of results against classical and reference data. Large-scale opportunistic feedback from users of this resource was employed in its continuous improvement.The software resulting from the work of this thesis is provided herein as supportive evidence. Results of applying the software to classical reference data are shown in the written thesis. The scope and presentation of statistical methods are considered in a comparison of the software with common statistical software resources. This comparison showed that the software written for this thesis more closely matched statistical methods commonly used in medical research, and contained more statistical knowledge support materials. Up to October 31st 2000, uptake of the software was recorded for 5621 separate instances by individuals or institutions. The development has been self-sustaining.Medical researchers need to have sufficient statistical understanding, just as statistical researchers need to sufficiently understand the nature of data. Statistical software tools may damage statistical practice if they distract attention from statistical goals and tasks, onto the tools themselves. The work of this thesis provides a practical computing framework supporting statistical knowledge and calculation in medical research. This work has shown that sustainable software can be engineered to improve statistical appreciation and practice in ways that are beyond the reach of traditional medical statistical education.

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:71360
Created by:
Buchan, Iain
Created:
27th October, 2009, 07:28:00
Last modified by:
Buchan, Iain
Last modified:
26th January, 2010, 16:22:51

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.