• Home
  • Science
  • The world’s fastest supercomputer encounters errors every day

The world’s fastest supercomputer encounters errors every day

Building supercomputers is often a challenging task. But building the industry's first "exascale" class system is another level and a challenge altogether. High level of precision in both hardware and software...
 The world’s fastest supercomputer encounters errors every day
READING NOW The world’s fastest supercomputer encounters errors every day
Building supercomputers is often a challenging task. But building the industry’s first “exascale” class system is another level and a challenge altogether. Sensitivity must be at a high level in both hardware and software. Oak Ridge National Laboratory’s Frontier supercomputer, which cannot go a day without numerous hardware failures, appears to be suffering from these issues.

The world’s fastest supercomputer

The Frontier supercomputer uses 9,472 AMD’s 64-core EPYC Trento processors, 37,888 Instinct MI250X GPUs and HPE’s (Hewlett Packard Enterprise) Slingshot (12.8 terabits/second bandwidth) connectivity solution. All this processing power promises 1,685 FP64 ExaFLOPS computing performance to the system. The system was scaled and built by HPE using the Cray EX architecture.

On paper, the Frontier supercomputer looks extremely powerful, but it lacks stability in terms of stability. Although the hardware parts have been placed in the system and the installation has been carried out, it is not possible to use Frontier in research due to the problems thought to be hardware-based.

Authorities have full confidence in AMD

Justin Whitt, program director from Oak Ridge, states in an interview that they are working on hardware issues and trying to figure out why. Whitt also states that the average time between systemic failures they encounter on this huge scale is sometimes hours, let alone days.

It is thought that AMD’s Instinct MI250X GPUs are at the root of the problems Frontier is facing and they are not as reliable as expected. However, officials also state that they are not worried about AMD products, at least at this point. It was announced that the Frontier supercomputer will be online in 2022, but given the events, only time will tell when it will start its full operation.

Comments
Leave a Comment

Details
241 read
okunma14884
0 comments