AMD introduced the Instinct MI325X with 256GB of HBM3E memory, promises a more powerful MI355X

Company AMD comes with another computing accelerator. It’s called Instinct MI325X and replaces the original (and quite successful) accelerator MI300X. In the presentation, AMD differentiates itself mainly against the competitor Nvidia H200, which was to be expected due to the fact that there is no noticeable difference between the MI325X and the MI300X, at least as far as the chip is concerned. The novelty is still built on the CDNA 3 architecture, produced by the 5nm and 6nm process at TSMC, has the same 304 CU and 19456 stream processors, even the frequency of 2100 MHz is identical. The computing power of 2.61 PFLOPS in FP8, 1.3 PFLOPS in FP16 or 2.6 POPS in INT8 is no different. The chip still has 153 billion transistors.

So where did the change happen? This is mainly in the field of memory. While the LLC (Last Level Cache) is still at 256MB, instead of 192GB of HBM3 memory we have 256GB of faster HBM3E memory. This means it will accommodate an even larger LLM. The 8192-bit memory bus remains, however, the frequency increases from 5.2 GHz to 6 GHz and with it the throughput from 5.3 TB/s to 6 TB/s. The cards are in the form of an OAM Module with a PCIe 5.0 x16 bus. But the dark side is a significant increase in TBP, from 750 W to 1000 W, which is quite surprising, considering that nothing has changed on the chip side, and we just have a bigger and faster memory.

AMD compared the performance of the new MI325X accelerator with Nvidia H100 cards and says it is 40% faster in Mixtral 8x7B inference, 30% faster in Mistral 7B and 20% faster in Meta Llama-3.1 70B. In the case of a server with 8 MI325X cards, inference is faster by 40% in the Llama-3.1 405B and by 20% in the 70B version of the same. Training is not as strong as the Instincts, however in a single GPU configuration it offers 10% more training performance in the Meta Llama-2 7B than the H100 and is equally powerful in an 8 GPU configuration in the Llama-2 70B. However, the fact that AMD chooses different versions of models with different sizes in different configurations suggests that in the undisclosed ones (and there are some) the results may not be so good. But we don’t know the results.

AMD also revealed the upcoming version Instinct MI355X. It will already be manufactured with 3nm technology, will even get 288 GB of HBM3E memory per card and will now support FP4 and FP6 data types. The performance of the chip should increase by 80%, for example in FP16 from 1.3 to 2.3 PFLOPS, in FP8 then from 2.6 to 4.6 PFLOPS. In the case of FP6 and FP4, the performance will be even 9.2 TFLOPS per card. As for memory bandwidth, it should even increase to 8 TB/s.

Source: www.svethardware.cz

Issue: *
Your Name: *
Your Email: *

Details: *