HPC: turning magic into reality

What ‘magic’ technologies would bring dramatic benefits to your R&D? How about a time machine that would enable week-long experiments to be performed in minutes? What about a magnifier that improved the resolution of your scanners/ instruments by a factor of 1000? Or a tool that made large potentially dangerous experiments safer and cheaper?

With computer modelling and simulation, today that ‘magic’ could be real. Supercomputing, or high performance computing (HPC), involves the use of the world’s most powerful computing technology – thousands of times more powerful than ordinary desktop computers or workstations.

The world’s most powerful computers today are petascale supercomputers, capable of more than 10¹⁵ calculations each second. The first supercomputer capable of over one petaflops (10¹⁵ FLOPS or Floating Point Operations/Second) was the Roadrunner machine built in the US by IBM at Los Alamos National Laboratory (LANL) in 2008; by 2010, there were 10 public petaflops supercomputers, with the fastest – Tianhe-1A in China – nearly four times faster than Roadrunner. Already, computer experts – including myself – are actively planning for the next factor of 1000 in capability – exascale supercomputers that handle 10¹⁸ calculations/s – by 2020.

hecter Supercomputers deliver a time machine effect – simulations that would take weeks or years on normal computers can deliver results within hours on supercomputers. In research, that might mean years of advantage over your competitors. Supercomputers also provide much finer resolutions or representations of real world chemistry and physics in computer modelling. Powerful, reliable simulations of the real world, made possible by supercomputing, mean that experiments with an element of danger – for example, explosives research – can be carried out safely inside the virtual world of the computer. Similarly, repeated parameter studies or process development investigations can be performed with modelling on supercomputers more cheaply than with repeated costly plant trials.

Supercomputing got higher profile media attention early this year, when in his annual State of the Union address, US president Barack Obama highlighted supercomputing as a key economic recovery activity: the example Obama cited was future energy research. In fact, many of the challenges the world faces – future energy provision, climate science, environmental protection, economically stronger manufacturing sectors, etc – increasingly depend on supercomputing for key research.

audio system Chemistry, including related areas such as materials science, is probably the biggest consumer of resources on open science or non-military HPC facilities around the world. Indeed, even on the military-focused HPC facilities, chemistry is one of the dominant sciences. This ranges from molecular dynamics – studying the behaviour and derived properties of groups of molecules – to electronic structure studies to explore the structure within molecules and derived properties, such as energy states.

Probably the other biggest universal user of HPC is computational fluid dynamics (CFD) – studying the behaviour of fluids in motion. CFD includes aerodynamics, used to design more fuel-efficient aircraft and cars, or faster ones for Formula 1. Chemistry and CFD come together in climate science, spanning chemical reactions (atmosphere and ocean), CFD (again, atmosphere and ocean), thermal processes, and more. Modelling is the primary method of studying climate science and the computational requirements of the models demand supercomputers. Indeed, many of the world’s biggest supercomputing facilities support weather prediction or climate research programmes.

It is not all just academic, military or high-end research either. Some of the most captivating uses of HPC are from the domestic manufacturing sector: like using HPC to redesign the shape of Pringles crisps to stop them flying off the production lines (aerodynamics); or to research filling bottles of household products, to maximise the filling rate, avoid toppling, wastage, etc; or to optimise the design of nappies, sprays, batteries, toothpaste, tyres and so on.

Supercomputers achieve their performance ‘magic’ through parallelism – many processors working together on the same set of calculations. Today’s supercomputers use the same basic processors as desktop computers – some even use similar graphics processing units (GPUs) to give extra speed boosts to some applications. Hundreds or thousands of these processors are coupled together using specialist interconnects to create an integrated computing platform.

In the UK, the Engineering and Physical Sciences Research Council (ESPRC) provides a national supercomputing service on behalf of Research Councils UK. The current service, known as HECToR, consists of a 374 teraflops (10¹² FLOPS) XE6 supercomputer built by Cray and housed at the University of Edinburgh, in addition to a Computational Science and Engineering (CSE) Support Service operated by the Numerical Algorithms Group (NAG) from Oxford and Manchester.

The HECToR website provides a series of case studies, CSE reports and research highlights which show the wide range of research successfully using HECToR – materials science, oceanography, quantum Monte-Carlo simulations, atmospheric chemistry, CFD, catalytic chemistry, combustion and more.

The HECToR website also provides a series of case study reports showing how the NAG CSE service helps users to overcome the software challenges of achieving the potential performance of the largest supercomputers. Nearly all modern computers use parallel processing, from a multicore laptop to the 224,162 cores of the Jaguar machine at Oak Ridge National Laboratory: the world’s most powerful non-GPU supercomputer, and the same Cray technology as HECToR. Most of the world’s existing software has been developed for serial processing, that is assuming a single active computational thread running through the calculations or data processing in sequence. With all major processor vendors now turning to multiple cores rather than increased clock frequencies as the way of delivering higher performance, on all computers from a desktop PC to supercomputers, large chunks of the world’s software base must be rewritten for parallel processing.

While this parallel programming isreasonably well understood for two, four or even a few dozen cores, at the scale of hundreds or thousands of cores used in high performance computing, the programming becomes harder. This is why EPSRC’s HECToR service includes the CSE Support Service from NAG – to provide the specialist HPC programming expertise to help researchers get the best performance from their applications.

This complexity should not scare those interested in HPC for two major reasons. First, most research technologies/tools that are new to you seem very daunting initially – remember being introduced to the theory and practice of NMR equipment or Raman spectroscopy? But with appropriate help and support, they are soon successfully integrated into your set of research tools.

Secondly, most users of HPC do not use the world’s leading edge supercomputers – they use smaller departmental scale machines, perhaps with hundreds of processor cores rather than tens of thousands. These few-teraflops machines are cheaper to buy and operate than their petascale brethren, and importantly, because of the reduced scale, easier to program effectively. However, they still pack a sizeable punch compared to desktop PCs. In fact, even users of the fastest supercomputers routinely use departmental scale machines for much of their work, reserving their biggest and most challenging problems for the more expensive leading edge supercomputers.

So how can you get access to supercomputing power? The choices are to buy your own or get access to someone else’s. A world leading petascale supercomputer costs maybe $100m – and at least the same again to operate and support it for a typical three-year competitive lifetime. The top supercomputers require several megawatts of electricity – roughly $1m/year per MW – and commensurate cooling infrastructure. And, if you want the first/fastest in the world – you’ll probably have to contribute to the development costs too – more millions. More realistically for most buyers, a departmental scale HPC machine will cost anything from $100k to a few $m depending on specifications, with the same factor of two or three times extra for whole life costs.

Academic researchers can usually get access to institutionally or nationally provided facilities with costs paid by academic funding agencies. However, access to the most powerful supercomputers for open research, such as the HECToR XE6 or the Jaguar system at ORNL, is usually through a competitive peer review process, to ensure these expensive assets are used for the highest quality research.

It is also possible to buy time on a HPC facility for commercial or academic research. This overlaps with the current marketing buzz of ‘cloud computing’. The business model, in essence, is that as and when you need HPC time, you go to your ‘on-demand’ HPC provider or cloud computing service provider and pay for what you use as you use it. This model can keep the cost of entry to HPC usage very low as there is no large initial capital outlay, and is also very cost-effective if your usage of HPC is very ‘lumpy’, involving peaks and troughs of demand. However, your usage pattern needs to be carefully assessed, as it may still be cheaper in the long run to buy and operate your own HPC facility for some or all of your computational needs.

There are also support providers who can provide services, advice or training for buying, operating and programming HPC. If you are lucky with your area of computational need, in some domains there are service providers who will sell you the whole modelling service, rather than just the machine time to do it yourself. This can be valuable if your in-house computational modelling expertise is not strong.

So how does all the wonder that supercomputing offers fit with the traditional world of ‘wet lab’ experiments? Simulation and modelling, especially when enhanced with HPC, when used appropriately can provide highly valuable and accurate predictions of real world behaviours. But in most arenas, simulation is not yet a complete replacement for physical testing and experiments. When this debate comes up, experimentalists need to accept that experiments are only controlled representations of reality too – they have their limits as does computer modelling. But HPC enhanced simulations used together with physical trials/experiments, each used according to its strengths and costs in an integrated research programme or product development cycle? That is potentially a dramatic increase in performance over either alone.

Using $100m petascale supercomputers might not figure in your research budgets or group expertise, but simulation and modelling, and the use of HPC techniques from desktop to departmental scale, is growing as the benefits are spread by new users. The bottom line is that in the future, your competitors will be using supercomputing to secure a time machine effect in their product development cycle or research programmes – so when will you?

Chicken and egg conundrum

Recently, researchers from the University of Warwick, UK, caught some press attention when they used NAG’s parallel software optimisation expertise and the Cray supercomputers to investigate egg shell formation – and thus give a partial answer to the infamous question of ‘which comes first: the chicken or the egg?’ Mark Rodger and David Quigley, in collaboration with colleagues at the University of Sheffield, used the DL_POLY3 molecular dynamics modelling application on the HECToR supercomputers to study the role of a protein called ovocleidin-17 (OC-17) in chicken egg shell formation.

cells 2 Researchers knew that OC-17 must play some role in egg shell formation. The protein is found only in the mineral region of the egg – the hard part of the shell – and laboratory results showed that it appeared to influence the transformation of calcium carbonate into calcite crystals. How this process could be used to form an actual eggshell remained unclear. The researchers created a model to show how the protein bound to a calcium carbonate surface.

With the parallel I/O optimisations and performance improvement delivered by NAG experts, use of the specific model to investigate eggshell formation became tractable. Results of the simulation now show how the protein binds using two clusters of amino acid residues on two loops of the protein. This creates a chemical clamp to nano-sized particles of calcium carbonate, which encourages calcite crystallites to form. Once the crystallites are large enough to grow on their own, the protein drops off. This frees up the OC-17 to promote yet more crystallisation, facilitating the overnight creation of an eggshell – started, first, by this chicken protein.

Global atmospheric aerosol model

Aerosols affect the climate by scattering and absorbing solar radiation and by affecting the properties of clouds. Aerosol ‘forcing' of climate is one of the largest uncertainties in the quantification of climate change over the last 150 years. GLOMAP is a global atmospheric aerosol and chemistry model with a comprehensive treatment of aerosol microphysical and chemical processes. The model is being used to study the global lifecycle of aerosol and the impact of aerosol on climate. GLOMAP runs within the TOMCAT Chemical Transport Model and the UKCA aerosol-chemistry-climate model.

HPC experts from NAG, working under NAG’s Computational Science and Engineering (CSE) support service for HECToR have restructured key parts of the GLOMAP-mode TOMCAT application and enhanced multicore performance allowing researchers to achieve four-fold reduction in runtimes, thus enabling new science and higher resolutions.

Lead researcher Graham Mann at the University of Leeds, UK, also a major user of GLOMAP/TOMCAT, estimated that when extrapolated across future research using GLOMAPmode TOMCAT code on HECToR and other supercomputers, the optimisations could deliver significant cost savings. The key result for researchers however, is the reduction in CPUtime per model time-step, thus making it possible to do new science, including higher resolutions.

Chemistry and supercomputers

cComputational chemistry methods range from highly accurate to very approximate; trading computational requirements against method accuracy. Ab initio methods are based entirely on theory from first principles. Other, typically less accurate, methods are called empirical or semi-empirical because they employ experimental results, often from acceptable models of atoms or related molecules, to approximate some elements of the underlying theory.

Both ab initio and semi-empirical approaches involve approximations. These range from simplified forms of the first-principles equations that are easier or faster to solve, to approximations limiting the size of the system – for example, periodic boundary conditions – to fundamental approximations to the underlying equations that are required to achieve any solution to them at all.

Understanding the way that solid industrial catalysts function and the design of new catalysts is an important area of research. In addition, the study of the binding of pollutants and undesirable species to the surfaces of solids as a method of removing them from the environment is vital. Another area of interest is defects on the surface of solids, as these can be highly reactive. The potentially large number of atoms involved in these studies requires massive computing resources. Many chemical reactions take place in the liquid phase with the reacting species being different from the solvent that constitutes the bulk of the liquid. Modelling reactions with explicit classical or quantum mechanical water molecules present can give detailed insight into the processes taking place. Other liquids of interest include liquid crystals, used commercially in display devices, which have the property of a fluid, but retain much more order of the molecules with respect to each other than a normal liquid.

Andrew Jones leads the HPC consulting and services business at Numerical Algorithms Group (NAG) based in Oxford, UK.