Giant AI models and the most powerful chips: Nvidia released the powerful B200 GPU with modest power consumption

четвер, 21 березня 2024 р.

Giant AI models and the most powerful chips: Nvidia released the powerful B200 GPU with modest power consumption

Hornbeam

4 min

Nvidia Corporation continues to develop the production of chips for the AI industry. Now the head of an American company has announced the release of the most powerful AI chip in the world. In addition, some modular systems based on the GPU B200 are ready, as the new product is called.

What kind of chip is this and why is it needed?

Since the artificial intelligence industry is now actively developing, manufacturers of chips and modules based on them are also not standing still. One of the largest representatives of this market is Nvidia Corporation.

In March 2024, it showed new computing accelerators that are based on the Blackwell architecture. The latter is a “descendant” of Hopper, the predecessor introduced two years ago.

The corporation stated that the new chip is designed to create systems that allow training truly gigantic models - no longer with billions, but trillions of parameters. This is needed to solve problems such as natural language processing, creating multimodal applications, code generation, etc. There is nothing surprising in the naming of the architecture; it is named after an American mathematician.

It is worth noting that the H100/H200 chips are extremely in demand on the market, despite their price. The new chip, as far as one can judge, will be even more expensive, but most likely it will also be purchased en masse. Here it is worth talking in more detail about the characteristics of the chip.

Thus, the new GPU consists of two crystals, which are produced using a special version of the TSMC 4NP 4nm process technology (of course, at TSMC facilities) and combined with 2.5D CoWoS-L packaging. Interestingly, this is Nvidia's first GPU with a chiplet layout. The chips are connected by an NV-HBI bus with a throughput of 10 TB/s and operate as a single GPU. In total, the new product has 208 billion transistors. The company calls its product the engine for a new industrial revolution.

Actually, he has the right, since his capabilities are truly excellent. For example, in FP4 and FP8 calculations, this GPU shows performance of up to 20 and 10 Pflops, respectively. Last but not least, the chip's functions are due to new tensor cores and the second generation of the Transformer Engine. It allows you to fine-tune calculations for different tasks, which, of course, affects the speed of model training. Blackwell supports a wide range of formats, including FP4, FP6, FP8, INT8, BF16, FP16, TF32 and FP64.

Okay, the chip is great, but what about the accelerators?

The main product here will be the Nvidia Grace Blackwell Superchip. It has two B200 GPUs and a central Nvidia Grace Arm chip with 72 Neoverse V2 cores. In general, it’s not for nothing that the prefix “Super” was added to the name of the chip; the result is truly surprising. Performance in FP4 operations reaches 40 Pflops, while in FP8/FP6/INT8 operations the new GB200 is capable of delivering 10 Pflops.

Compared to the H100, the new product shows a 30-fold increase in performance. But at the same time, it also consumes less energy - the accelerator is reported to be 25 times more energy efficient than previous models.

The company will also supply GB200 NVL72 systems. This is our own development - a server rack that includes 36 Grace Blackwell Superchips and a pair of NVSwitch 7.2T switches. It turns out that the system has 72 B200 graphics processors and 36 Grace chips, which are combined with fifth-generation NVLink.

And it all runs as a single GPU with AI performance of 1.4 exaflops (FP4) and 720 pflops (FP8). This system will be the building block for Nvidia's newest supercomputer, the DGX SuperPOD.

But that’s not all, because the American company also presented server systems. These are primarily HGX B100, HGX B200 and DGX B200. Each contains eight accelerators of a new type.

Nvidia says it is possible to build very large AI systems that include 10,000 to 100,000 GB200 accelerators. They can be formed using the Nvidia Quantum-X800 InfiniBand and Spectrum-X800 Ethernet network interfaces. They were also announced today and will provide advanced networking capabilities with speeds of up to 800 Gbps.

At the same time, just one GB200 NVL72 system is capable of performing inference on a model with 27 trillion parameters. In the same GPT-4, a model with which many Habr readers are familiar, there are 1.7 trillion parameters. Accordingly, in the near future we can expect the appearance of even larger and more advanced models capable of surprising us with various technological innovations and capabilities.

In addition to the new products shown by Nvidia, other corporations will soon present their systems based on Nvidia B200, including Aivres, ASRock Rack, ASUS, Eviden, Foxconn, GIGABYTE, Inventec, Pegatron, QCT, Wistron, Wiwynn and ZT Systems.

Naturally, large and medium-sized companies have already become interested in Nvidia's new products. Among the first are those corporations that provide cloud computing services, such as Amazon, Google, Microsoft and Oracle.

Немає коментарів:

Дописати коментар

Pure Acetone: "Pin Tweet to IPFS https://chro…" - Mastodon
https://mastodon.social/deck/@pureacetone/111421706607809813

Fediverse - the social network of the future

четвер, 21 березня 2024 р.