The Chimera heterogeneous computing system

Most people have some realisation that we are in a data-driven age. This era of ubiquitous data collection affects most of our day-to-day activities, from our shopping tastes to what social engagements we plan to attend. It is hard not to see the ever-increasing deluge of data. As soon as we want to do anything with these data, this translates immediately to a computing problem. This is felt very keenly in scientific research. Many important problems in physics and astronomy have a computational bound and many large data sets lie languishing for want of computational power.

'The Chimera' Heterogeneous Computing System

Left: Chimera – Image courtesy of Elizabeth Koehn, Right: ‘The Chimera’ Heterogeneous Computing System

As an example, the large radio antenna array known as the Square Kilometre Array (SKA) is expected to require an exaflop/s scale computational platform. This is a billion billion (10^{18}) floating point operations per second! This single observatory alone will effectively require the projected top supercomputer when the SKA is expected to come on-line in 2020. Yet this is only one amongst many computation-intensive scientific programmes to come on-line in the next decade.

Graph of performance trend of global top 500 supercomputing systems

Adapted from www.top500.com – September 2012

In contrast, conventional processor technology is reaching power and speed ‘walls,’ resulting in a multi-core philosophy (N. Hasasneh et al.: ‘Scalable and partitionable asynchronous arbiter for micro-threaded chip multiprocessors,’ Lecture Notes in Computer Science 3894, pp. 252–267 (2006)). Further, these conventional processors are very power-hungry. For example, over half the life-time cost of a conventional Beowulf supercomputing cluster is in the electrical power required to run it.

One way of addressing this is to use specialised hardware accelerators. The most commonly used is the General-Purpose Graphical Processor Unit (GPU) – highly multi-threaded computational devices from the specialised co-processors used to render graphics for computer gaming, and built for intensively parallel dense linear algebra. Another commonly used hardware accelerator is the Field Programmable Gate Array (FPGA) , designed to mimic application-specific integrated circuits, with unrivalled pipelining capacity, especially for logic intensive and fixed point calculations. All three platforms have obvious advantages and disadvantages, summarised in the table below, thus the appropriate choice of accelerator depends heavily on the computational task.

Advantages and Disadvantages of Hardware Types
Processor Advantages Disadvantages
CPU * Multi-tasking
* Familiar computational ‘work-horse’
* Power-hungry
* Limited processor cores
GP-GPU * Highly parallel
* Relatively simple interface (e.g. C for CUDA)
* Highly rigid instruction set
* Doesn’t handle pipelining well
FPGA * Unrivalled flexibility
* Pipelining capacity
* Expensive initial outlay,
* Very specialised programming interface
Implementation of the Chimera system

Photograph of the implementation of ‘The Chimera’ system. Because of the three hardware classes, we named the system after the mythical Greek beast with three different heads

The Chimera Heterogeneous Computing System attempts to combine these different hardware accelerators, leveraging their strengths to process different classes of parallel algorithms and pipelines. The schematic is shown in the first image (right), and a photo of the current setup is shown below.

The Chimera consists of a cluster of FPGAs and a cluster of GPUs mediated by a high-speed backplane with conventional PCI Express bus resident on the CPU motherboard (for commercial/off-the-shelf constraints).

Contact

For more information please contact

Scott, Susan profile
Professor
50347

Updated:  15 June 2016/ Responsible Officer:  Head of Department/ Page Contact:  Physics Webmaster