In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Compiler and Runtime Support for Heterogeneous Programming

Clarkson, James

[Thesis]. Manchester, UK: The University of Manchester; 2019.

Access to files

Abstract

Over the last decade computer architectures have changed dramatically leaving us in a position where nearly every desktop computer, laptop, server or mobile phone, has at least one multi-core processor. Realistically, these systems are also likely to include at least one many-core processor that is used for accelerating graphical applications -- called a General Purpose Graphics Processor Unit or GPGPU. By properly utilising these two different processors a software developer could achieve up to two orders of magnitude improvement in performance and/or energy efficiency. Unfortunately, these improvements in performance often inaccessible to developers due to the combined complexity of understanding both the hardware architecture and the software needed to program them. It is this problem of inaccessibility that is explored within this Thesis with the goal being to determine whether it is possible to develop a programming language that allows an application written using it to dynamically adapt to the system it is executing on. One of the salient issues is that a large amount of prior art is built atop of a closed-world assumption: that all the code and the devices it is to execute on is both known ahead of time and are fixed. An assumption that is becoming increasingly unworkable due to the proliferation of heterogeneous hardware. For instance, developers can now run applications in public clouds or mobile devices -- contexts where it is difficult to anticipate what hardware an application is executing on and also where there is a high probability that some form of hardware acceleration exists. Handling this uncertainty of not knowing what hardware is available until runtime is a fundamental problem of more statically compiled languages -- like C, C++ and FORTRAN. In these languages, the closed-world assumption is obvious: a single processor architecture is assumed so that a single binary executable can be produced. It is the aim of this Thesis is to determine whether it is possible to create a programming language that is able to target hardware accelerators without requiring any closed world assumptions to be made about about either the number or types of hardware accelerator contained within it. Consequently, this Thesis introduces and evaluates Tornado: the first truly dynamic programming framework for modern heterogeneous architectures. The implementation of Tornado is unique as it comprises of three co-designed components: (1) the Tornado API that is designed to decouple the application code that decides on which device code should execute -- the co-ordination logic of the application -- away from the code that defines the computation -- the computation logic of the application; (2) the Tornado Virtual Machine that provides a layer of virtualisation between the application and the underlying architecture of the heterogeneous system; and (3) the Tornado Runtime System -- a dynamic optimising compiler that converts code written using the Tornado API into a format consumed by the Tornado Virtual Machine. Tornado has a number of distinguishing features that are a direct result of combining these three key components together. One of these features is the optimisation of co-ordination logic by the Tornado Runtime System -- this allows Tornado to automatically minimise the cost of data movement in complex processing pipelines that span multiple devices. Another is dynamic configuration: the ability to have the Tornado Runtime System dynamically re-compile the application at runtime to use a different hardware accelerator, parallelisation scheme, or device setting. During the evaluation Tornado is tested across thirteen unique hardware accelerators: five multi-core processors, a discrete many-core accelerator, three embedded GPGPUs, and four discrete GPGPUs. In the evaluation it is shown that a complex real-world application, called Kinect Fusion, can be written in Tornado once and executed across all of these devices. Moreover, this portable implementation written in Tornado is able to achieve a maximum speed-up of 55x on a NVIDIA Tesla K20m GPGPU. However, if a little portability can be sacrificed more specialised code can be written that produces a speed-up of 166x on the same device. Tornado is also compared against OpenCL -- the state-of-the-art in heterogeneous programming languages -- where the specialised implementations of Kinect Fusion run 22% slower and in the best case experience a speed-up of 14x (although this is in an unrealistic scenario). This level of performance translates to speed-ups over the original Java application of between 18x and 150x. Finally, Tornado has been open-sourced so that the reader is able to verify the claims made by this Thesis and start writing their own hardware accelerated Java applications -- https://github.com/beehive-lab/Tornado.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science (CDT)
Publication date:
Location:
Manchester, UK
Total pages:
267
Abstract:
Over the last decade computer architectures have changed dramatically leaving us in a position where nearly every desktop computer, laptop, server or mobile phone, has at least one multi-core processor. Realistically, these systems are also likely to include at least one many-core processor that is used for accelerating graphical applications -- called a General Purpose Graphics Processor Unit or GPGPU. By properly utilising these two different processors a software developer could achieve up to two orders of magnitude improvement in performance and/or energy efficiency. Unfortunately, these improvements in performance often inaccessible to developers due to the combined complexity of understanding both the hardware architecture and the software needed to program them. It is this problem of inaccessibility that is explored within this Thesis with the goal being to determine whether it is possible to develop a programming language that allows an application written using it to dynamically adapt to the system it is executing on. One of the salient issues is that a large amount of prior art is built atop of a closed-world assumption: that all the code and the devices it is to execute on is both known ahead of time and are fixed. An assumption that is becoming increasingly unworkable due to the proliferation of heterogeneous hardware. For instance, developers can now run applications in public clouds or mobile devices -- contexts where it is difficult to anticipate what hardware an application is executing on and also where there is a high probability that some form of hardware acceleration exists. Handling this uncertainty of not knowing what hardware is available until runtime is a fundamental problem of more statically compiled languages -- like C, C++ and FORTRAN. In these languages, the closed-world assumption is obvious: a single processor architecture is assumed so that a single binary executable can be produced. It is the aim of this Thesis is to determine whether it is possible to create a programming language that is able to target hardware accelerators without requiring any closed world assumptions to be made about about either the number or types of hardware accelerator contained within it. Consequently, this Thesis introduces and evaluates Tornado: the first truly dynamic programming framework for modern heterogeneous architectures. The implementation of Tornado is unique as it comprises of three co-designed components: (1) the Tornado API that is designed to decouple the application code that decides on which device code should execute -- the co-ordination logic of the application -- away from the code that defines the computation -- the computation logic of the application; (2) the Tornado Virtual Machine that provides a layer of virtualisation between the application and the underlying architecture of the heterogeneous system; and (3) the Tornado Runtime System -- a dynamic optimising compiler that converts code written using the Tornado API into a format consumed by the Tornado Virtual Machine. Tornado has a number of distinguishing features that are a direct result of combining these three key components together. One of these features is the optimisation of co-ordination logic by the Tornado Runtime System -- this allows Tornado to automatically minimise the cost of data movement in complex processing pipelines that span multiple devices. Another is dynamic configuration: the ability to have the Tornado Runtime System dynamically re-compile the application at runtime to use a different hardware accelerator, parallelisation scheme, or device setting. During the evaluation Tornado is tested across thirteen unique hardware accelerators: five multi-core processors, a discrete many-core accelerator, three embedded GPGPUs, and four discrete GPGPUs. In the evaluation it is shown that a complex real-world application, called Kinect Fusion, can be written in Tornado once and executed across all of these devices. Moreover, this portable implementation written in Tornado is able to achieve a maximum speed-up of 55x on a NVIDIA Tesla K20m GPGPU. However, if a little portability can be sacrificed more specialised code can be written that produces a speed-up of 166x on the same device. Tornado is also compared against OpenCL -- the state-of-the-art in heterogeneous programming languages -- where the specialised implementations of Kinect Fusion run 22% slower and in the best case experience a speed-up of 14x (although this is in an unrealistic scenario). This level of performance translates to speed-ups over the original Java application of between 18x and 150x. Finally, Tornado has been open-sourced so that the reader is able to verify the claims made by this Thesis and start writing their own hardware accelerated Java applications -- https://github.com/beehive-lab/Tornado.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:318477
Created by:
Clarkson, James
Created:
21st February, 2019, 19:56:53
Last modified by:
Clarkson, James
Last modified:
2nd March, 2020, 11:00:36

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.