In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Enabling Independent Communication for FPGAs in High Performance Computing

Lant, Joshua

[Thesis]. Manchester, UK: The University of Manchester; 2019.

Access to files

Abstract

The landscape of High Performance Computing is changing, with increasing heterogeneity, new data-intensive workloads and ever tighter system power constraints. Given these changes there has been increased interest in the deployment of FPGA technology within HPC systems. Traditionally FPGAs have been of limited use to the HPC community. However, there have been many architectural advances in recent years; hardened floating-point operators and on-die CPUs, greater on-chip memory capacity, increased off-chip memory bandwidth but to name a few. These advances have brought the opportunity to more readily exploit the FPGA’s efficiency and flexibility in HPC. Unfortunately there are still a number of research problems to be solved in order to allow this to happen. In this thesis we tackle one such problem; regarding the interconnect and its relation to the system architecture. The interconnect must have several key properties in order to satisfy the demands of large, data-intensive applications and take advantage of dataflow processing for FPGA based HPC. It must (i) allow for tight coupling between FPGA and system memory in both local and remote nodes. This is required to enhance the performance of a number of key workloads which exhibit irregular memory access patterns. (ii) It must allow for the FPGA to issue and process network transactions without any CPU intervention. This is required for high performance inter-FPGA communication and independent scaling (disaggregation) of the FPGA resources. (iii) The interconnect must maintain its key properties of scalability and reliability; required for HPC systems but at odds with the other primary requirements. In this thesis we present a novel Network Interface solution implemented entirely within the fabric of the FPGA, which attempts to address all of these competing factors. The Network Interface allows for a system architecture which better supports distributed FPGA processing within a shared, global address space. It provides hardware primitives to support RDMA and shared-memory transfers over a lightweight, custom network protocol. It allows for direct inter-FPGA communication without any CPU intervention; supported via a hardware-offloaded, reliable and connectionless transport layer. The microarchitecture of the Network Interface and transport layer are detailed, as well as a number of performance enhancements which reduce the latency and increase the achievable throughput of the system. We assess the consistency issues and network errors which can occur, and show how the Network Interface is able to support Out-Of-Order packets from the network. In the latter part of the thesis we show the benefits of direct inter-FPGA communication for dataflow processing when compared with a software based transport, and demonstrate how we can estimate the expected performance of such a system for network-bound processing.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
255
Abstract:
The landscape of High Performance Computing is changing, with increasing heterogeneity, new data-intensive workloads and ever tighter system power constraints. Given these changes there has been increased interest in the deployment of FPGA technology within HPC systems. Traditionally FPGAs have been of limited use to the HPC community. However, there have been many architectural advances in recent years; hardened floating-point operators and on-die CPUs, greater on-chip memory capacity, increased off-chip memory bandwidth but to name a few. These advances have brought the opportunity to more readily exploit the FPGA’s efficiency and flexibility in HPC. Unfortunately there are still a number of research problems to be solved in order to allow this to happen. In this thesis we tackle one such problem; regarding the interconnect and its relation to the system architecture. The interconnect must have several key properties in order to satisfy the demands of large, data-intensive applications and take advantage of dataflow processing for FPGA based HPC. It must (i) allow for tight coupling between FPGA and system memory in both local and remote nodes. This is required to enhance the performance of a number of key workloads which exhibit irregular memory access patterns. (ii) It must allow for the FPGA to issue and process network transactions without any CPU intervention. This is required for high performance inter-FPGA communication and independent scaling (disaggregation) of the FPGA resources. (iii) The interconnect must maintain its key properties of scalability and reliability; required for HPC systems but at odds with the other primary requirements. In this thesis we present a novel Network Interface solution implemented entirely within the fabric of the FPGA, which attempts to address all of these competing factors. The Network Interface allows for a system architecture which better supports distributed FPGA processing within a shared, global address space. It provides hardware primitives to support RDMA and shared-memory transfers over a lightweight, custom network protocol. It allows for direct inter-FPGA communication without any CPU intervention; supported via a hardware-offloaded, reliable and connectionless transport layer. The microarchitecture of the Network Interface and transport layer are detailed, as well as a number of performance enhancements which reduce the latency and increase the achievable throughput of the system. We assess the consistency issues and network errors which can occur, and show how the Network Interface is able to support Out-Of-Order packets from the network. In the latter part of the thesis we show the benefits of direct inter-FPGA communication for dataflow processing when compared with a software based transport, and demonstrate how we can estimate the expected performance of such a system for network-bound processing.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:322768
Created by:
Lant, Joshua
Created:
10th December, 2019, 15:23:37
Last modified by:
Lant, Joshua
Last modified:
4th January, 2021, 11:39:16

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.