Transitioning to the NVIDIA H100
Leaving aside fancy stuff like HGX, supercomputer configurations, and advanced interconnect, slotting the H100’s simpler PCIe form factor into a four-year-old server is doable, technically speaking. We’ve done it as an experiment, after a solid month of high-level software tweaks. But all this effort didn’t quite pay off. The bus connecting the GPUs to the CPUs was a bottleneck, and the server’s overall design constrained the much more sophisticated GPUs. Moving data around and getting it onto the GPU took longer, making tasks like AI training too much of a chore by today’s standards. With double the link speed, we were looking at transferring twice as much data both in and out.
But then there’s the electronics side of things. As we chase higher speeds, precise circuitry becomes our main concern. Modern PCBs blend novel materials and intricate components, adding layers of complexity to manufacturing, operation, and programming. Designing such advanced integrated circuitry demands in-depth, complex simulations.
Switching from PCI Express gen 4 to 5 comes with its own challenges. Interference we once brushed off as tolerable is now a deal-breaker. The complexity of motherboards increases as layer counts rise, and the benchmarks for material purity and quality grow more stringent.
Currently, we’re assembling an H100 GPU cluster using standard off-the-shelf HGX servers. Simultaneously, we’re developing our very own HGX design. To harness the H100’s full potential and allow it to operate at intended frequencies and modes, we’ve gone back to the drawing board with our rack design, placing the network front and center. Equally important, though, was to develop different solutions for specific AI tasks.
Our rack solutions are versatile and cater to a range high-end users. For supercomputer-grade tasks, we prioritize GPU power, rapid data delivery, and high interconnectivity. For use cases like MapReduce, it’s all about storage and CPU power. Finally, regular cloud computing demands replicated or non-replicated disks, along with constant connectivity and high redundancy.
We recognize that each data scientist or other specialist interested in AI cloud has unique requirements. Our goal is to cater to a broad spectrum of workloads, services, end-users, and any other specific needs, while still focusing on AI tasks, of course. We don’t believe in boundaries or saying, “We don’t need this kind of server.”
Now that we’re exploring the potential of GPUs for inference, we’re adding entirely new nodes to our lineup.
Tailoring server solutions: training vs. inference
Our latest rack generation presents two distinct node solutions: one custom-built for training and the other optimized for inference. One notable departure from the last gen is the internal placement of GPUs to address signal loss, speed reduction, and compatibility issues experienced when connecting GPUs externally.
The training-oriented node
Training AI models is a data-intensive process, with significant input and output. Here’s a simple breakdown: data enters the system, gets sifted through trillions of parameters, undergoes training across interconnected servers, and finally, you’re left with a refined model.