In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Improvements to PLS Methodology

Bissett, Alastair Campbell

[Thesis]. Manchester, UK: The University of Manchester; 2015.

Access to files

Abstract

Partial Least Squares (PLS) is an important statistical technique with multipleand diverse applications, used as an effective regression method for correlated orcollinear datasets or for datasets that are not full rank for other reasons. A shorthistory of PLS is followed by a review of the publications where the issues with theapplication PLS that have been discussed. The theoretical basis of PLS is developedfrom the single value decomposition of the covariance, so that the strong links between principal components analysis and within the various PLS algorithms appear as a natural consequence.Latent variable selection by crossvalidation, permutation and information criteriaare examined. A method for plotting crossvalidation results is proposed that makeslatent variable selection less ambiguous than conventional plots. Novel and practicalmethods are proposed to extend published methods for latent variable selection byboth permutation and information criteria from univariate PLS1 models to PLS2 multivariate cases. The numerical method proposed for information criteria is also more general than the algebraic methods for PLS1 that have been recently published as it does not assume any particular form for the PLS regression coefficients. All of these methods have been critically assessed using a number of datasets, selected specifically to represent a diverse set of dimensions and covariance structures.Methods for simulating multivariate datasets were developed that allow controlof correlation and collinearity in both regressors and responses independently. Thisdevelopment also allows control over the variate distributions. Statistical design ofexperiments was used to generate plans for the simulation that allowed the factorsthat infuence PLS model fit and latent variable selection. It was found that all thelatent variable selection methods in the simulation tend to overfit and the feature inthe simulation that causes overfitting has been identified.

Layman's Abstract

Partial Least Squares (PLS) is an important statistical technique with multipleand diverse applications, used as an effective regression method for correlated orcollinear datasets or for datasets that are not full rank for other reasons. A shorthistory of PLS is followed by a review of the publications where the issues with theapplication PLS that have been discussed. The theoretical basis of PLS is developedfrom the single value decomposition of the covariance, so that the strong links between principal components analysis and within the various PLS algorithms appear as a natural consequence.Latent variable selection by crossvalidation, permutation and information criteriaare examined. A method for plotting crossvalidation results is proposed that makeslatent variable selection less ambiguous than conventional plots. Novel and practicalmethods are proposed to extend published methods for latent variable selection byboth permutation and information criteria from univariate PLS1 models to PLS2 multivariate cases. The numerical method proposed for information criteria is also more general than the algebraic methods for PLS1 that have been recently published as it does not assume any particular form for the PLS regression coefficients. All of these methods have been critically assessed using a number of datasets, selected specifically to represent a diverse set of dimensions and covariance structures.Methods for simulating multivariate datasets were developed that allow controlof correlation and collinearity in both regressors and responses independently. Thisdevelopment also allows control over the variate distributions. Statistical design ofexperiments was used to generate plans for the simulation that allowed the factorsthat infuence PLS model fit and latent variable selection. It was found that all thelatent variable selection methods in the simulation tend to overfit and the feature inthe simulation that causes overfitting has been identified.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Mathematical Sciences
Publication date:
Location:
Manchester, UK
Total pages:
252
Abstract:
Partial Least Squares (PLS) is an important statistical technique with multipleand diverse applications, used as an effective regression method for correlated orcollinear datasets or for datasets that are not full rank for other reasons. A shorthistory of PLS is followed by a review of the publications where the issues with theapplication PLS that have been discussed. The theoretical basis of PLS is developedfrom the single value decomposition of the covariance, so that the strong links between principal components analysis and within the various PLS algorithms appear as a natural consequence.Latent variable selection by crossvalidation, permutation and information criteriaare examined. A method for plotting crossvalidation results is proposed that makeslatent variable selection less ambiguous than conventional plots. Novel and practicalmethods are proposed to extend published methods for latent variable selection byboth permutation and information criteria from univariate PLS1 models to PLS2 multivariate cases. The numerical method proposed for information criteria is also more general than the algebraic methods for PLS1 that have been recently published as it does not assume any particular form for the PLS regression coefficients. All of these methods have been critically assessed using a number of datasets, selected specifically to represent a diverse set of dimensions and covariance structures.Methods for simulating multivariate datasets were developed that allow controlof correlation and collinearity in both regressors and responses independently. Thisdevelopment also allows control over the variate distributions. Statistical design ofexperiments was used to generate plans for the simulation that allowed the factorsthat infuence PLS model fit and latent variable selection. It was found that all thelatent variable selection methods in the simulation tend to overfit and the feature inthe simulation that causes overfitting has been identified.
Layman's abstract:
Partial Least Squares (PLS) is an important statistical technique with multipleand diverse applications, used as an effective regression method for correlated orcollinear datasets or for datasets that are not full rank for other reasons. A shorthistory of PLS is followed by a review of the publications where the issues with theapplication PLS that have been discussed. The theoretical basis of PLS is developedfrom the single value decomposition of the covariance, so that the strong links between principal components analysis and within the various PLS algorithms appear as a natural consequence.Latent variable selection by crossvalidation, permutation and information criteriaare examined. A method for plotting crossvalidation results is proposed that makeslatent variable selection less ambiguous than conventional plots. Novel and practicalmethods are proposed to extend published methods for latent variable selection byboth permutation and information criteria from univariate PLS1 models to PLS2 multivariate cases. The numerical method proposed for information criteria is also more general than the algebraic methods for PLS1 that have been recently published as it does not assume any particular form for the PLS regression coefficients. All of these methods have been critically assessed using a number of datasets, selected specifically to represent a diverse set of dimensions and covariance structures.Methods for simulating multivariate datasets were developed that allow controlof correlation and collinearity in both regressors and responses independently. Thisdevelopment also allows control over the variate distributions. Statistical design ofexperiments was used to generate plans for the simulation that allowed the factorsthat infuence PLS model fit and latent variable selection. It was found that all thelatent variable selection methods in the simulation tend to overfit and the feature inthe simulation that causes overfitting has been identified.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:261814
Created by:
Bissett, Alastair
Created:
27th March, 2015, 09:25:50
Last modified by:
Bissett, Alastair
Last modified:
16th November, 2017, 12:38:11

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.