LightCounting’s observations from GTC 2022
by Vlad Kozlov
If you have not seen the keynote presentation from Nvidia’s CEO at GTC 2022, you should: https://www.nvidia.com/gtc/keynote/. My personal favorite is the closing video showing a “jazz” version of an AI Cluster. It is a lot of fun and it does show all the new hardware discussed at the event.
The announcements started with the new H100 GPU chip, which increases the performance of A100 by 6x in training applications and by 30x for inference models. This is accomplished in part by increasing the bandwidth of connectivity between GPUs in large clusters.
The new DGX H100 system, which combines the HGX H100 with a ConnectX-7 NIC, supporting up to 10 400G Ethernet or InfiniBand connections. Accounting for the total bandwidth of NVLinks (14.4Tbps) and 4Tbps of Ethernet/InfiniBand ports, the new DGX system can support up to 18.4Tbps of connectivity – a lot more than any other server on the market.
It is very likely that the next generation of Nvidia GPU-based systems will need 1.6T optics and this may be just 2 years away.
We reported in our November 2021 research note on the OCP Summit that Meta is constructing very large AI clusters using 200G optical connectivity based on Nvidia’s GPUs. These AI clusters form a “back-end” network within Meta’s datacenters, enabling targeted advertising on Facebook, Instagram and other applications running on the “front-end” of the datacenters. Many other Cloud companies also use GPU-based clusters for targeted advertising. This is where all the money is coming from. Expect a lot more investments in AI clusters and the optics supporting them.
This information is also included in our latest report: High-Speed Ethernet Optics – March 2022.
A detailed analysis of trends in the AI hardware market is part of our December 2021 report: High-Speed Cables, Embedded and Co-Packaged Optics.
LightCounting subscribers can access the full text of this research note by logging into their accounts.