In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Software/Hardware Co-Design and Co-Specialisation: Novel Simulation Techniques and Optimisations

Rodchenko, Andrey

[Thesis]. Manchester, UK: The University of Manchester; 2018.

Access to files

Abstract

Progress in microprocessors has moved towards increasing the number of cores, heterogeneity, and bitness of computing. Performance, programmability and energy efficiency of next generations of microprocessors will be highly dependent on efficient synchronisation, hardware virtualisation, memory subsystem utilisation and closer synergy between hardware and software. This multidisciplinary work addresses these challenges by advancing the state-of-the-art in the following three major fields of computer science: shared-memory synchronisation, computer architecture simulation, and high-level language computer architecture. Firstly, this thesis presents a study of the state-of-the-art barrier synchronisation algorithms specialised for the Intel Xeon Phi architecture. The novel proposed hybrid barrier synchronisation algorithm exploits the topology, the memory hierarchy, and other capabilities of the Intel Xeon Phi 5110P coprocessor. The showcased algorithm achieves a 3.28x lower overhead than the Intel OpenMP barrier implementation (ICC 14.0.0), thus outperforming all other known implementations. The study investigates design issues of Intel Xeon Phi with respect to barrier synchronisation. Furthermore, the thesis introduces an extensible parameterised framework for empirical evaluation of barrier synchronisation algorithms on different systems, and it is released as free software. Secondly, a novel open-source simulation platform named MaxSim is introduced. MaxSim facilitates hardware/software co-design of managed runtime environments and architectures with tagged pointers support. It has an awareness of the managed runtime environment, supports fast tagged pointers simulation on the x86-64 architecture, allows to model new hardware extensions, to perform microarchitectural profiling and to model complex software changes via a novel address-space morphing technique. MaxSim is available as free software. Finally, the work explores hardware/software co-design opportunities of managed runtime environments and architectures with tagged pointers support. It is shown how an array length can be stored in a tagged pointer and efficiently retrieved from it with the assistance of hardware extenstions in Java Virtual Machine implementations. The proposed technique resulted in up to 4% and 2% geometric mean dynamic energy reduction and up to 14% and 7% geometric mean L1 data cache loads reduction. The work also researches how tagged pointers can be used for storing type information in Java Virtual Machine implementations. In addition, novel hardware extensions to the address generation and load-store units are proposed to achieve low-overhead type information retrieval and tagged object pointers compression-decompression. The evaluation shows up to 26% and 10% geometric mean heap space savings, up to 50% and 12% geometric mean dynamic random-access memory dynamic energy reduction, and up to 49% and 3% geometric mean execution time reduction.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
183
Abstract:
Progress in microprocessors has moved towards increasing the number of cores, heterogeneity, and bitness of computing. Performance, programmability and energy efficiency of next generations of microprocessors will be highly dependent on efficient synchronisation, hardware virtualisation, memory subsystem utilisation and closer synergy between hardware and software. This multidisciplinary work addresses these challenges by advancing the state-of-the-art in the following three major fields of computer science: shared-memory synchronisation, computer architecture simulation, and high-level language computer architecture. Firstly, this thesis presents a study of the state-of-the-art barrier synchronisation algorithms specialised for the Intel Xeon Phi architecture. The novel proposed hybrid barrier synchronisation algorithm exploits the topology, the memory hierarchy, and other capabilities of the Intel Xeon Phi 5110P coprocessor. The showcased algorithm achieves a 3.28x lower overhead than the Intel OpenMP barrier implementation (ICC 14.0.0), thus outperforming all other known implementations. The study investigates design issues of Intel Xeon Phi with respect to barrier synchronisation. Furthermore, the thesis introduces an extensible parameterised framework for empirical evaluation of barrier synchronisation algorithms on different systems, and it is released as free software. Secondly, a novel open-source simulation platform named MaxSim is introduced. MaxSim facilitates hardware/software co-design of managed runtime environments and architectures with tagged pointers support. It has an awareness of the managed runtime environment, supports fast tagged pointers simulation on the x86-64 architecture, allows to model new hardware extensions, to perform microarchitectural profiling and to model complex software changes via a novel address-space morphing technique. MaxSim is available as free software. Finally, the work explores hardware/software co-design opportunities of managed runtime environments and architectures with tagged pointers support. It is shown how an array length can be stored in a tagged pointer and efficiently retrieved from it with the assistance of hardware extenstions in Java Virtual Machine implementations. The proposed technique resulted in up to 4% and 2% geometric mean dynamic energy reduction and up to 14% and 7% geometric mean L1 data cache loads reduction. The work also researches how tagged pointers can be used for storing type information in Java Virtual Machine implementations. In addition, novel hardware extensions to the address generation and load-store units are proposed to achieve low-overhead type information retrieval and tagged object pointers compression-decompression. The evaluation shows up to 26% and 10% geometric mean heap space savings, up to 50% and 12% geometric mean dynamic random-access memory dynamic energy reduction, and up to 49% and 3% geometric mean execution time reduction.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:313062
Created by:
Rodchenko, Andrey
Created:
19th January, 2018, 14:09:15
Last modified by:
Rodchenko, Andrey
Last modified:
8th February, 2019, 13:32:20

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.