Company AMD showed a comparison of accelerators on its website Instinct MI300X with the competition, namely the Nvidia DGX H100 accelerators. But it is not a comprehensive comparison, because it concerns one LLM, namely inference in the MLPerf v1.4 benchmark, in which the performance for the model was measured Call 2 70B. Three 8-card builds were tested, the AMD Instinct MI300 with AMD Genoa processors, the same with the newer AMD Turin processors, and then the Nvidia DGX 100 with Intel Xeon processors. In the case of the server test, which tests real-time deployment with strict latency limits, the solution with the MI300X was the fastest, but Nvidia was only close behind. It is interesting that the solution with MI300X and older EPYCs was even slower, which shows the dependence of performance on the processor, which throws the entire comparison of accelerators down quite a bit.
How do we know that Nvidia’s lower result is due to the performance of the Nvidia accelerators and not the Xeon used? What would the results look like if Nvidia were also paired with the same processors from AMD? Would it be better or worse? We will never know.
At the same time, it can be seen that in offline mode, when it comes to batch processing of a huge number of tokens at once, Nvidia’s solution is already the most powerful. But in both cases, the differences between the servers were less than 2%, which shows AMD’s ability to compete with the very popular solution from Nvidia.
Furthermore, AMD has shown that performance scales very well with the number of cards used. When 8 cards were used instead of one, the measured performance was 7.7 to 8.3 times higher than with one card. This is a good result, as algorithms often face the fact that the more computing systems you add, the greater the overhead of their communication compared to the useful computations, and performance stops growing linearly as the number of computing units increases. Here, however, the number of cards is relatively low for such a problem to become more pronounced.
While AMD showed parity with Nvidia in terms of performance, but not victory, it clearly has one significant advantage. The MI300X has 192 GB of HBM3 memory, which means that a single card can fit the entire LLaMa 2 70B model. As for the giant 405 billion LLaMa 3.1 405B (in FP16 this model requires more than 1000 GB), it easily fits into a cluster of 8 Instinct MI300X cards (they have 8×192 GB, i.e. 1536 GB), while it is no longer enough for 8 Nvidia H100 cards, which with 80GB of memory per card only reach a total of 640GB.
Source: www.svethardware.cz