In the decades since Seymour Cray developed what is widely considered the world’s first supercomputer, a CDC 6600 (Opens in a new tab)In the high performance computing (HPC) community, an arms race has been waged. The goal: to improve performance by any means and at any cost.
Driven by advances in computing, storage, networking, and software, the performance of pioneering systems has increased a trillion-fold since the CDC 6600 was revealed in 1964, from millions of floating-point operations per second (megaFLOPS) to quintillions (exaFLOPS).
The current bearer of the crown, a massive supercomputer based in the United States called border, capable of achieving 1,102 exaFLOPS by the High Performance Linpack (HPL) standard. But suspected of having more powerful machines work elsewherebehind closed doors.
The arrival of so-called exascale supercomputers is expected to benefit nearly all sectors — from science to cybersecurity, healthcare and finance — and pave the way for powerful new AI models that would otherwise take years to train.
However, increasing speeds of this magnitude had a cost: power consumption. Full speed, Frontier Consume up to 40 megawatts (Opens in a new tab) of energy, roughly like 40 million Desktop computers.
Supercomputing has always been about pushing the limits of the possible. But as the need to reduce emissions becomes increasingly clear and energy prices continue to rise, the HPC industry will have to re-evaluate whether its original guideline is still worth pursuing.
performance vs. efficiency
One institution at the forefront of this problem is the University of Cambridge, which in partnership with Dell Technologies has developed several energy-efficient supercomputers at the forefront of design.
The Wilkes 3 (Opens in a new tab)for example, takes only 100th place in overall performance charts (Opens in a new tab)but it ranks third in the Green 500 (Opens in a new tab)which is a ranking of HPC systems based on performance per watt of power consumed.
in conversation with Pro radar technologyDr Paul Calleja, Director of Research Computing Services at the University of Cambridge, explained that the institution is more concerned with building high-throughput and efficient machines than extremely powerful ones.
“We are not really interested in large systems, because they are very specific point solutions. But the technologies deployed within them are more widely applicable and will enable systems at a slower order of magnitude to operate in a more cost and energy-efficient manner,” says Dr. Calleja.
“By doing so, you are democratizing access to computing for many people. We are interested in using technologies designed for big-age systems to create more sustainable supercomputers, for a broader audience.”
In the coming years, Dr. Kalija also anticipates an increasingly aggressive drive for energy efficiency in the HPC sector and the broader data center community, where energy consumption accounts for well over 90% of costs, we’re told.
The recent fluctuations in energy prices associated with the war in Ukraine will make supercomputers significantly more expensive to run, particularly in the context of exascale computing, illustrating the importance of performance per watt.
In the context of Wilkes3, the university found that there were a number of improvements that helped improve the level of efficiency. For example, by lowering the clock speed at which some components were running, depending on the workload, the team was able to achieve 20-30% energy consumption reductions in the region.
“Within a given architecture family, clock speed has a linear relationship with performance, but a square relationship with power consumption. This is a killer,” explained Dr. Caliga.
“Reducing the clock speed reduces the power draw at a much faster rate than the performance, but it also lengthens the time it takes to complete the task. So what we have to look at is not the power consumption while running, but really the power consumed per job. There is a nice spot.”
The program is king
In addition to fine-tuning hardware configurations for specific workloads, there are also a number of improvements that need to be made elsewhere, in the context of storage and networking, and in connected disciplines such as refrigeration and rack design.
However, when asked where exactly he would like to see the resources allocated in the quest to improve energy efficiency, Dr. Kalija explained that the focus should be on software, first and foremost.
“Hardware is not the problem, it is about application efficiency. That will be the main bottleneck going forward,” he said. Today’s Exascale systems rely on GPU The architectures and the number of applications that can run efficiently on a large scale in GPU systems is small.”
“To really take advantage of today’s technology, we need to focus a lot on application development. The development lifecycle spans decades; the software in use today was developed 20-30 years ago, and it’s tough when you have such long-lived code that needs to be researched “.
However, the problem is that the HPC industry is not used to thinking about software first. Historically, more attention has been paid to devices, because, in the words of Dr. Calleja, “It’s easy; you just buy a slice faster. You don’t have to think smart.”
“While we had Moore’s Law, with the processor’s performance doubling every eighteen months, you didn’t have to do anything. [on a software level] to increase performance. But those days are over. Now if we want to progress, we have to go back and redesign the program.”
keep d. Kaliga has some praise for Intel in this regard. Such as server The hardware space is becoming more diverse from a vendor perspective (in most respects, a positive development), and application compatibility will likely become an issue, but Intel is working on a solution.
“One of the advantages I see for Intel is that they invest a lot [of both funds and time] inside the oneAPI ecosystem, to develop code portability across silicon species. It’s the kind of tool chain we need, to enable tomorrow’s applications to take advantage of emerging silicon,” he notes.
Separately, he called Dr. Kalija to a tighter focus on “scientific need”. All too often, “things go wrong in translation”, resulting in a mismatch between hardware and software structures and the actual needs of the end user.
He says a more active approach to cross-industry collaboration would create a “utopian circle” of users, service providers and vendors, which would translate into benefits from both performance And the Efficiency perspective.
Zetascal future
Typically, with the fall of the exascale sign, attention will now turn to the next stage: the zetascale.
Dr Calleja said: “Zettascale is the next science on Earth, a totem that highlights the technologies needed to get to the next stage in computing developments, which are unobtainable today.”
“The fastest systems in the world are very expensive for what you get from them, in terms of scientific output. But they are important, because they show the art of the possible and move the industry forward.”
Whether systems capable of achieving the performance of a single zettaFLOPS, a thousand times more powerful than the current crop, can be developed in a way that aligns with sustainability goals will depend on the industry’s ability to invent.
There is no binary relationship between performance and energy efficiency, but a healthy dose of compounds will be required in each sub-discipline to deliver the necessary performance boost within an appropriate energy envelope.
In theory, there is a golden ratio of performance to energy consumption, where it can be argued that the benefits to society from HPC justify the expenditure of carbon emissions.
The exact figure will remain elusive in practice, of course, but the pursuit of the idea is by definition a step in the right direction.