In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

TOPIC MODELLING FOR SUPPORTING SYSTEMATIC REVIEWS

Mo, Yuanhan

[Thesis]. Manchester, UK: The University of Manchester; 2016.

Access to files

Abstract

Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies have shown that the use of machine learning and text mining methods to automatically identify relevant studies have the potential to drastically decrease the workload involved in the screening phase. The vast majority of available machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). This thesis explores the use of topic modelling methods to derive a more informative representation of studies. Latent Dirichlet Allocation (LDA) is applied, an unsupervised topic-modelling approach, to identify topics from a collection of studies. Then each study is represented as a distribution of LDA-topics. Additionally, Topics derived by LDA are enriched with technical multi-word terms identified by an automatic term recognition (ATR) tool. For experimentation, SVM-based classifiers are applied using either the topic-based or the BOWrepresentation to automatically identify relevant studies.The results obtained show that the SVM classifier is able to identify more relevant studies when using the LDA representation than the BOW representation. Moreover, this study demonstrates that kernel functions used in SVM obtain a superior performance when using LDA feature representations. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Master of Philosophy
Degree programme:
MPhil Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
85
Abstract:
Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies have shown that the use of machine learning and text mining methods to automatically identify relevant studies have the potential to drastically decrease the workload involved in the screening phase. The vast majority of available machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). This thesis explores the use of topic modelling methods to derive a more informative representation of studies. Latent Dirichlet Allocation (LDA) is applied, an unsupervised topic-modelling approach, to identify topics from a collection of studies. Then each study is represented as a distribution of LDA-topics. Additionally, Topics derived by LDA are enriched with technical multi-word terms identified by an automatic term recognition (ATR) tool. For experimentation, SVM-based classifiers are applied using either the topic-based or the BOWrepresentation to automatically identify relevant studies.The results obtained show that the SVM classifier is able to identify more relevant studies when using the LDA representation than the BOW representation. Moreover, this study demonstrates that kernel functions used in SVM obtain a superior performance when using LDA feature representations. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain
Additional digital content not deposited electronically:
N/A
Non-digital content not deposited electronically:
N/A
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:300195
Created by:
Mo, Yuanhan
Created:
13th April, 2016, 13:55:36
Last modified by:
Mo, Yuanhan
Last modified:
26th May, 2016, 09:28:43

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.