- The Training 2.1 benchmark for ML training, HPC 2.0 for large systems, including supercomputers, and Tiny 1.0 for small and embedded deployments are among the latest MLPerf benchmarks being announced right now.
- Nvidia reports significant technological improvements in the most recent MLPerf Training benchmarks.
The most recent set of Machine Learning (ML) MLPerf benchmarks from MLCommons is out today, demonstrating how hardware and software for Artificial Intelligence (AI) are becoming quicker.
MLCommons, a vendor-neutral firm, aims to offer benchmarks and standardized testing to help assess the state of ML hardware and software. MLCommons gathers various ML benchmarks under the MLPerf testing name multiple times a year. The MLPerf Inference results, which demonstrated how many technologies have enhanced inference performance, were published in September.
The new MLPerf benchmarks announced now include the Training 2.1 benchmark, which is for ML training; HPC 2.0 for large systems, including supercomputers, and Tiny 1.0 for small and embedded deployments.
According to David Kanter, executive director of MLCommons, “The key reason why we’re doing benchmarking is to drive transparency and measure performance. This is all predicated on the key notion that once you can actually measure something, you can start thinking about how you would improve it.”
The operation of the MLPerf training benchmark
Focusing on the training benchmark, Kanter pointed out that MLPerf isn’t just about hardware but also about software.
Models in ML systems must first be trained on data in order to function. The training process benefits from accelerator hardware, as well as optimized software.
According to Kanter, the MLPerf Training benchmark begins with a predetermined dataset and a model. Organizations then train the model to reach a certain quality level. Time to train is one of the main criteria that the MLPerf Training benchmark measures.
Kanter said, “When you look at the results, and this goes for any submission — whether it’s training, tiny, HPC, or inference — all of the results are submitted to say something. Part of this exercise is figuring out what that something they say is.”
The metrics can recognize relative performance levels and highlight how hardware and software have improved over time.
The chair of MLPerf Training at MLCommons and senior director of deep learning libraries and hardware design at Nvidia, John Tran, called attention to the fact that there were several software-only submissions for the most recent benchmark.
Tran added, “I find it continually interesting how we have so many software-only submissions, and they don’t necessarily need help from the hardware vendors. I think that’s great and is showing the maturity of the benchmark and usefulness to people.”
Intel and Habana Labs’ advanced training with Gaudi2
Jordan Plawner, senior director of AI products at Intel, also emphasized the need for software. Plawner outlined during the MLCommons press call that the difference between ML inference and training workloads is in terms of hardware and software.
Plawner declared, “Training is a distributed-workload problem. Training is more than just hardware, more than just the silicon; it’s the software, it’s also the network and running distributed-class workloads.”
In contrast, Plawner claimed that ML inference could be a single-node problem that does not have the same distributed elements, which offers a minimal barrier to entry for vendor technologies than ML training.
In terms of performance, Intel is well characterized on the latest MLPerf Training benchmarks with its Gaudi2 technology. In 2019, Intel paid two billion dollars to acquire Habana Labs and its Gaudi technology. The acquisition has helped it to enhance the company’s capabilities in recent years.
The Gaudi2 system, which was unveiled in May, is currently the most cutting-edge silicon produced by Habana Labs. Compared to the initial benchmarks that Habana Labs published with the MLPerf Training upgrade in June, the most recent Gaudi2 results demonstrate improvements. Intel claims that Gaudi2 has a 10% improvement in time-to-train in TensorFlow for both the BERT and ResNet-50 models.
Nvidia H100 surpasses its forerunner
Nvidia reports significant technological improvements in the most recent MLPerf Training benchmarks.
Compared to the previous generation of A100-based hardware, test results for Nvidia’s Hopper-based H100 with MLPerf Training demonstrate considerable improvements. Dave Salvator, director of AI, benchmarking, and cloud at Nvidia, stated that the H100 offers 6.7 times more performance than the first A100 submission did for the same benchmarks seven years ago during an Nvidia briefing call regarding the MLCommons results. According to Salvator, the integrated transformer engine component of the Nvidia Hopper chip architecture plays a significant role in the H100’s exceptional performance.
Although the H100 is currently Nvidia’s top hardware for ML training, the A100’s MLPerf Training performance has also improved.
Salvator claimed, “The A100 continues to be a really compelling product for training, and over the last couple of years, we’ve been able to scale its performance by more than two times from software optimizations alone.”
Salvator anticipates that there will be a continual stream of performance improvements for ML training in the months and years to come, whether with new hardware or ongoing software refinements.
According to Salvator, “AI’s appetite for performance is unbounded, and we continue to need more and more performance to be able to work with growing datasets in a reasonable amount of time.”
For a number of reasons, including the fact that training is an iterative process, it is essential to be able to train a model more quickly. Data scientists frequently need to train and then retrain models to achieve the intended outcomes.
Salvator said, “That ability to train faster makes all the difference in not only being able to work with larger networks but being able to employ them faster and get them doing work for you in generating value.”