Supercomputing News and Discussions

weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

User avatar
wjfox
Site Admin
Posts: 13579
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: Supercomputing News and Discussions

Post by wjfox »

NVIDIA's Eos supercomputer just broke its own AI training benchmark record

Wed, Nov 8, 2023, 5:00 PM GMT · 5 min read

Depending on the hardware you're using, training a large language model of any significant size can take weeks, months, even years to complete. That's no way to do business — nobody has the electricity and time to be waiting that long. On Wednesday, NVIDIA unveiled the newest iteration of its Eos supercomputer, one powered by more than 10,000 H100 Tensor Core GPUs and capable of training a 175 billion-parameter GPT-3 model on 1 billion tokens in under four minutes. That's three times faster than the previous benchmark on the MLPerf AI industry standard, which NVIDIA set just six months ago.

Eos represents an enormous amount of compute. It leverages 10,752 GPUs strung together using NVIDIA's Infiniband networking (moving a petabyte of data a second) and 860 terabytes of high bandwidth memory (36PB/sec aggregate bandwidth and 1.1PB sec interconnected) to deliver 40 exaflops of AI processing power. The entire cloud architecture is comprised of 1344 nodes — individual servers that companies can rent access to for around $37,000 a month to expand their AI capabilities without building out their own infrastructure.

In all, NVIDIA set six records in nine benchmark tests: the 3.9 minute notch for GPT-3, a 2.5 minute mark to train a Stable Diffusion model using 1,024 Hopper GPUs, a minute even to train DLRM, 55.2 seconds for RetinaNet, 46 seconds for 3D U-Net and the BERT-Large model required just 7.2 seconds to train.

https://www.engadget.com/nvidias-eos-su ... 42546.html


Image
NVIDIA
Tadasuke

Re: Supercomputing News and Discussions

Post by Tadasuke »

wjfox wrote: Thu Nov 09, 2023 9:50 pm Eos represents an enormous amount of compute. It leverages 10,752 GPUs strung together using NVIDIA's Infiniband networking (moving a petabyte of data a second) and 860 terabytes of high bandwidth memory (36PB/sec aggregate bandwidth and 1.1PB sec interconnected) to deliver 40 exaflops of AI processing power
It's just a number. I would like to see it actually bring results that would improve what matters.

Even if DLSS in video games could finally start working well, that would be something positive.

Or if our daily computers would finally be smart in a useful way, so they stop being frustrating.

Or if they managed to bring humanity some super awesome genetic engineering that works well.
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

China Stuns With New Homegrown Supercomputer Announcement
China isn't saying what kind of CPUs it's using, but it may have breached the exascale barrier with them.
By Josh Norem December 12, 2023
https://www.extremetech.com/computing/c ... nouncement
The modern world's supercomputers operate out in the open, as countries brag about their performance and enter them into standardized benchmarking competitions to prove their engineering chops. China doesn't play this game, however. Its entire supercomputer program is mostly kept secret because it's not supposed to have access to advanced technology. Despite its desire to keep its cards close to its vest, it recently announced a new supercomputer that could break the exascale barrier—all while using homegrown CPUs, which shouldn't be possible under the sanctions levied against it.

The new supercomputer is named Tianhe Xingyi, state news agency Xinhua reports (via Reuters). The release is unsurprisingly vague since China doesn't release numbers or hard info. It states only that it was built with "domestic advanced computing architecture, high-performance multi-core processors, high-speed interconnection networks, and large-scale storage." The release says that compared with Tianhe-2 (above), China has doubled many aspects of its performance. That's unsurprising, as Tianhe-2 first debuted on the Top500 list in 2013 and was the world's fastest supercomputer for several years after that, only being displaced by TaihuLight, another computer from China in 2016.
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

AMD to Build 2 New Supercomputers in Germany
One will use its new MI300, the other will be based on whatever AMD has ready in 2025.
By Josh Norem December 21, 2023
https://www.extremetech.com/computing/a ... in-germany
The proverbial paint is still dry on AMD's new Instinct MI300 chips, and yet the company has already said they're being used for a new supercomputer in Germany. AMD has announced "Exascale Supercomputing Is Coming to Stuttgart" and will build two computers: one that will upgrade an existing system to 39 PFLOPS and a future exascale machine similar to its current Frontier supercomputer. The two machines will be known as Hunter and Herder, with the former coming online in 2025 and the latter poised for a 2027 launch.

The two new supercomputers result from a new contract signed by the University of Stuttgart and Hewlett Packard Enterprise. It will see the organization upgrade its existing Hawk supercomputer and install a second system in the future at the HLRS, which is a research institute and supercomputer center in Stuttgart. The big news here is this is the first supercomputer contract for AMD's all-new MI300A chip, which combines a CPU, GPU, and high-bandwidth memory onto the same package. These data center "APUs" will go into Hawk, the center's current flagship supercomputer at 26 PFLOPS, which is nothing to sneeze at. This computer debuted at #16 on the Top500 list in 2020, so it's neither old nor slow. That said, we certainly understand the itch to upgrade a PC, so there's no shade coming from this direction.
User avatar
wjfox
Site Admin
Posts: 13579
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: Supercomputing News and Discussions

Post by wjfox »

Europe plans to build the world’s fastest supercomputer in 2024

Europe will get its first exascale supercomputer next year, called JUPITER, and it should allow simulations that are currently possible only on a few machines worldwide

28 December 2023

The first exascale computer in Europe, called JUPITER, should be completed next year, and it may even become the most powerful computer in the world. It will allow experiments and simulations currently only possible on a tiny number of machines in the US and China.

Exascale machines can carry out a billion billion operations per second, an exaflop. Currently, there are – officially – only two supercomputers in the world capable of those sorts of calculations.

https://www.newscientist.com/article/23 ... r-in-2024/


Image
The exascale supercomputer JUPITER will be hosted at the Jülich Supercomputing Centre in Germany
Credit: Forschungszentrum Jülich/Sascha Kreklau
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

China Developing 1.57 Exaflop Supercomputer With China Made CPU-GPU Chip
February 14, 2024 by Brian Wang
There are reports that China has a new superchip MT-3000 processor designed by the National University of Defense Technology (NUDT). The MT-3000 has general-purpose CPU cores, control cores, and matrix accelerator cores. NUDT’s MT-3000 processor features a multi-zone structure that packs 16 general-purpose CPU cores with 96 control cores and 1,536 accelerator cores.

The MT-3000 processor reportedly achieves 11.6 FP64 TFLOPS of peak performance and demonstrates a power efficiency of 45.4 GigaFLOPS/Watt at an operational frequency of 1.20 GHz.

The Tianhe-3 a new supercomputer reported to be able to reachi 1.57 ExaFLOPS on LINPACK benchmarks. Tianhe-3 would use the MT-3000 at its core. The top US supercomputer is the Frontier with 1.102 ExaFLOPS of performance.
https://www.nextbigfuture.com/2024/02/c ... -chip.html
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

Nvidia Unveils Its Eos Supercomputer for AI Training
It's already ranked as the ninth-fastest supercomputer in the world.
By Josh Norem February 16, 2024
https://www.extremetech.com/computing/n ... i-training
In November of last year, Nvidia raised a few eyebrows by suddenly appearing in the 9th spot on the Top500 list of the world's fastest supercomputers with a system named Eos. Named after the Greek Goddess who opened the gates of dawn every day, Eos is Nvidia's enterprise-scale system for AI training, and the company has now released a video showing it off to the public for the first time.

Eos is essentially Nvidia's very own supercomputer that its employees get to use every day for things like AI training and playing Crysis on their lunch breaks. It comprises a cluster of 576 DGX H100 servers, and since each one features eight H100 GPUs, there's a total of 4,608 H100s linked together with its Quantum-2 InfiniBand technology. It's basically Nvidia showing off an extreme version of its DGX SuperPod design, which is AI training at an enterprise scale, which it hopes to sell to companies with huge budgets and massive AI models to train.

Nvidia describes Eos as a system that can power an "AI factory," as it's a very large-scale SuperPod DGX H100 system. The company says it is what allows it to develop its own AI breakthroughs and shows the power of Nvidia's latest technology when scaled up to ludicrous size.

The DGX H100 servers use Intel Xeon Platinum 8480C CPUs, which feature 56 cores and 112 threads. Combined with the 4,608 H100 GPUs, it offers 121 PetaFLOPS of Linpack performance, which was only good enough for 9th on the Top500, but that's more of a generic metric. When measured purely for AI training, it's easily one of the fastest systems in the world currently.
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »



Nvidia and Amazon Upgrade Project Ceiba AI Supercomputer to Blackwell


https://www.extremetech.com/computing/n ... -blackwell
The change will increase the AI performance of the upcoming machine by 6x, according to Nvidia.

Nvidia is fresh off the unveiling of its new Blackwell AI superchip, and it's wasting no time making plans to roll that hardware out. Nvidia and Amazon partnered up last year to build what was to be one of the fastest supercomputers in the world, known as Project Ceiba. Now, the companies have said Project Ceiba will get a Blackwell upgrade to make it up to six times faster than originally envisioned.

The version of Project Ceiba discussed last year was still a beast, featuring more than 16,000 H100 Hopper AI accelerators. Nvidia predicted the machine would have offered 65 exaflops of AI processing power when complete. The current leading supercomputer is the US Department of Energy's Frontier machine, which can hit 1.1 exaFLOPS with thousands of AMD Epyc CPUs and Radeon GPUs.
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

Russia Is Working on a 128-Core Supercomputing Platform: Report
The country's notoriously ancient computer systems are due for an upgrade.
By Josh Norem April 22, 2024
Russia has always lagged behind the rest of the industrial world when it comes to information technology, and now sanctions from its war on Ukraine have held it back even further. Despite this situation, the country is reportedly in the early stages of deploying a new supercomputing and cloud platform that will feature up to 128 CPU cores per server cluster. It's unknown where these computer parts will be made, however, as Russia isn't known for running advanced silicon fabs.

The details about Russia's plans come from CNews, which appears to be a Russian news site. The site notes a state-owned company named Roselectronics has been developing this new computing platform called Basis using "domestic technologies." The platform is both scalable and a fusion of software and hardware. Each Basis module includes three servers with up to 128 CPU cores, along with 2TB of memory, though the architecture used for the CPUs isn't disclosed. It's unknown if it will feature a monolithic or chiplet design.
https://www.extremetech.com/computing/r ... orm-report
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

Intel Aurora Supercomputer Breaks Exascale Barrier but Fails to Topple AMD Frontier
Intel's supercomputer at Argonne National Laboratory is only at 87% functionality, however.

At the recent International supercomputing conference called ISC 2024, Intel's newest Aurora supercomputer installed at Argonne National Laboratory raised a few eyebrows by finally surpassing the exascale barrier. Before this, only AMD's Frontier system had been able to achieve this level of performance. Intel also achieved what it says is the world's best performance for AI at 10.61 "AI exaflops."
https://www.extremetech.com/computing/i ... -to-topple
User avatar
wjfox
Site Admin
Posts: 13579
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: Supercomputing News and Discussions

Post by wjfox »

NVIDIA Grace Hopper Ignites New Era of AI Supercomputing

May 12, 2024

Driving a fundamental shift in the high-performance computing industry toward AI-powered systems, NVIDIA today announced nine new supercomputers worldwide are using NVIDIA Grace Hopper™ Superchips to speed scientific research and discovery. Combined, the systems deliver 200 exaflops, or 200 quintillion calculations per second, of energy-efficient AI processing power.

https://nvidianews.nvidia.com/news/nvid ... rcomputing


Image
User avatar
wjfox
Site Admin
Posts: 13579
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: Supercomputing News and Discussions

Post by wjfox »

Musk’s xAI to build supercomputer facility in Memphis

by: David Royer
Updated: Jun 5, 2024 / 02:58 PM CDT

MEMPHIS, Tenn. (WREG) — xAI, the artificial intelligence company founded by Elon Musk, will build the world’s largest supercomputer in Memphis, officials announced Wednesday.

The multi-billion dollar “Gigafactory of Compute” will be the largest capital investment by a new-to-market company in Memphis history, Greater Memphis Chamber CEO Ted Townsend said.

It is expected to open sometime before the end of this calendar year, pending approval by local and state agencies.

[...]

“I think it is a defining moment for Memphis to be recognized globally,” Townsend said. “We’re going to have some of the world’s top data scientists and computational engineers that are attracted here, that are working here.”

He described the scale of the projected supercomputer like this: “If you take the two largest supercomputers in the world, and you combine them and you multiply them by four, that’s what we’re building here in Memphis.”

https://www.wkrn.com/news/tennessee-new ... n-memphis/
Tadasuke

Jupiter exascale supercomputer starts installation

Post by Tadasuke »

24,000 GH200 Grace Hopper Superchips (possibly up to 288 GB of memory, up to 10 TB/s of bandwidth, 60 TF in double precision and 2 PF for AI per one GH200 Superchip) which can be used for various projects. That's over 1 exaflops in double precision (multiplying matrices for example) and about ~48 exaflops in AI (lower precision).

article about is here: https://www.servethehome.com/jupiter-ex ... rm-eviden/
weatheriscool
Posts: 24486
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: Supercomputing News and Discussions

Post by weatheriscool »

Japan Announces Plans for a Zetta-Scale Supercomputer by 2030
It aims to be 1,000 times more powerful than the AMD-powered Frontier exascale computer, currently the fastest supercomputer in the world.
Japan has announced the successor to its legendary Fugaku supercomputer, which is currently ranked the fourth fastest computer system in the world by Top500.org. The Arm-based system is a half-exabyte scale computer, as it can churn out 442 petaFLOPS in Linpack. Its successor will offer far more performance and go beyond even exascale (1,000 petaFLOPS) to Zetta-scale, which is 1,000 exaFLOPS.

MEXT, the country's Ministry of Education, Culture, Sports, Science and Technology, announced the next-generation supercomputer. The system will cost more than $750 million and will be active by the year 2030, according to LiveScience. This computer is simply named Fugaku Next and will be a Zetta-class system, or 1,000 times more powerful than the AMD-powered Frontier system, ranked #1 in the world at 1.2 exaFLOPS.
Image
https://www.extremetech.com/computing/j ... er-by-2030
firestar464
Posts: 7202
Joined: Wed Oct 12, 2022 7:45 am

Re: Supercomputing News and Discussions

Post by firestar464 »

Switzerland unveils new supercomputer 'Alps', already ranked sixth in the world

https://www.euronews.com/my-europe/2024 ... -the-world
Tadasuke

regarding plans for zettaflops in 2030

Post by Tadasuke »

They are not going from 415.5 petaflops to 1 000 000 petaflops in 10 years between 2020 and 2030, unless they use some completely new paradigm of computing. The biggest hurdle is operating frequency. You can scale parallelly only so much. Also look at costs and power draw. Btw flops are often "fake", not adding real performance, just some numbers to be hyped up about.
User avatar
wjfox
Site Admin
Posts: 13579
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: regarding plans for zettaflops in 2030

Post by wjfox »

Tadasuke wrote: Thu Sep 26, 2024 9:05 am They are not going from 415.5 petaflops to 1 000 000 petaflops in 10 years between 2020 and 2030, unless they use some completely new paradigm of computing. The biggest hurdle is operating frequency. You can scale parallelly only so much. Also look at costs and power draw. Btw flops are often "fake", not adding real performance, just some numbers to be hyped up about.

Yeah, I'm going with 2036 or later.


Image
Tadasuke

trends in supercomputers efficiency 2014-2034

Post by Tadasuke »

Performance per watt in (fp64) gigaflops has grown by 16.5x between 06/2014 and 06/2024. From 4.4 gigaflops/watt in GSIC Center in Tokyo Institute of Technology (using Xeon E5-2620v2 6/12 ~2.3 GHz and Nvidia K20x) to 72.7 gigaflops per watt in JEDI supercomputer in Germany (using the new 72-core Nvidia Grace Hopper arm superchip).

Perhaps by 06/2034, supercomputers will get to around 1 (fp64) teraflops per watt. That would be rad, but far from perfect. This would mean that a 100 exaflops supercomputer would use 100 000 000 watts (100 megawatts). I hope people are not crazy enough to build 1 gigawatt supercomputers in the 2030s...

1 teraflops per watt makes 10 petaflops supercomputer rather pedestrian, using only 10 kilowatts, which is alright for a small university or a small company. 10 exaflops would be still a lot with such energy efficiency. Of course I don't mean 10 exaops in 2-bit precision.

And that 16.5x of fp64 flops per watt gain translates to 8-10x of a real performance per watt upgrade. Further gains are possible with changing software. And most supercomputers today are still around 1 petaflops.

Aurora supercomputer in Argonne National Laboratory uses about 40 gigawatts for only 1 exaflops in fp64, which is only 25 gigaflops per watt. Allegedly, it is expected that after optimizing, its performance it will exceed 2 exaflops, but it has not happened yet. 2 exaflops would bring it to only 50 gigaflops per watt or 1/20 of teraflops per watt and 1/500 of zettaflops....
Tadasuke

Nvidia, Blackwell, Jensen, Musk

Post by Tadasuke »

In another remarkable take that is rightly going viral, NVIDIA's CEO believes that the Moore's Law as we know it has ended, and, therefore, in order to extract the requisite computing power to keep pace with the compute-hungry software of the future, existing data centers will need around $1 trillion dollar worth of GPUs in the next 4 to 5 years to modernize.

"... Just building a massive factory, liquid-cooled, energized, permitted in the short time that was done...I mean that is, like, superhuman. Yeah, there's. And, as far as I know, there's only one person in the world who could do that. You know, I mean, Elon is singular in this understanding of engineering and construction and large systems, and marshaling resources ..."

Meanwhile, as we noted in a dedicated post recently, NVIDIA's entire supply of its latest Blackwell GPUs is already sold out for the next 12 months as demand for those chips remains "insane."

It is hardly a surprise, therefore, that NVIDIA shares currently remain just shy of their existing all-time high stock price of $140.76.
the article (don't even look at comments): https://wccftech.com/nvidia-ceo-elon-mu ... h-of-gpus/

Totaly and completely different situation than I have had imagined. I used to think it would be all about efficiency, affordability and wide distribution of everything computing. I still hope for new paradigm of computing and for Nvidia as stock to plummet. I so hate this dumb situation. I so don't want to be relying on huge, extremely expensive server farms running on multiple nuclear reactors. This is awful. :-(
Post Reply