Graphcore announces the results of the first MLPerf submission, with AI performance firmly in the leading position

July 1, 2021, Beijing — Today Graphcore (Jianwei Technology) officially announced the results of the first MLPerf™ submission it participated in. Graphcore’s products performed well and its AI performance was firmly in the leading position. MLPerf is the most recognized comparative benchmark in the AI ​​industry. The test results show that on the Graphcore IPU-POD64, the training time of BERT is only more than 9 minutes, and the training time of ResNet-50 is 14.5 minutes, and the AI ​​performance has reached the supercomputer level.

MLPerf also compared the Graphcore system on the market with NVIDIA’s latest products, and the results confirmed that Graphcore is firmly in the lead in the “Performance-Per-Dollar” metric. For customers, this important third-party test confirms that the Graphcore system not only has the excellent performance of next-generation AI, but also performs better in the current wide range of applications.

MLPerf benchmark

For the first MLPerf (training version 1.0) submission, Graphcore chose to focus on key image classification and natural language processing application benchmark categories. The MLPerf image classification benchmark uses the popular ResNet-50 version 1.5 model, trained on the ImageNet dataset to achieve an accuracy rate applicable to all submissions. For natural language processing, the BERT-Large model and a selected representative segment were used. This segment represents approximately 10% of the total training computational workload and is trained using the Wikipedia dataset. Graphcore’s decision to submit image classification and natural language processing using ResNet-50 and BERT is largely driven by customers and prospects, as these are some of their most commonly used applications and models. The strong performance in the MLPerf test further proves that the Graphcore system can fully meet today’s AI computing requirements.

The two Graphcore systems involved in testing, the IPU-POD16 and IPU-POD64, have both been shipped to customers in production.

  • The affordable, compact 5U IPU-POD16 system is for enterprise customers who are just beginning to build IPU AI computing capabilities. It consists of 4 1U IPU-M2000s and 1 dual-CPU server, which can provide 4 PetaFLOPS of AI processing power.

  • The scale-up IPU-POD64 contains 16 IPU-M2000s and a flexible number of servers. The Graphcore system achieves decoupling of servers and AI accelerators, so customers can specify the ratio of CPU to IPU based on workload. For example, computer vision tasks are often more server-intensive than natural language processing. For MLPerf, IPU-POD64 used 1 server in the submission of BERT and 4 servers in the submission of ResNet-50. Each server is powered by 2 AMD EPYC™ CPUs.

The MLPerf test contains two commit partitions, an open partition and a closed partition. Closed partitions strictly require committers to use the exact same model implementation and optimizer approach, including defining hyperparameter states and training epochs. Open partitions guarantee exactly the same model accuracy and quality as closed partitions, but enable more flexible model implementations to foster innovation. Therefore, this partition enables faster model implementations that are more adaptable to different processor capabilities and optimizer methods. For innovative architectures like Graphcore IPU, the open partition can better reflect the excellent performance of the product, but Graphcore chose to submit both open and closed partitions.

The test results demonstrate the excellent performance of the Graphcore system, even on closed partitions out of the box with limited specifications. Even more striking is the open partitioning result, where Graphcore was able to optimize deployments to take full advantage of the IPU and system capabilities. This is closer to the real application, allowing customers to continuously improve their system performance.

Graphcore announces the results of the first MLPerf submission, with AI performance firmly in the leading position

Performance per dollar metric

MLPerf is known as a benchmark, and actually doing direct comparisons can be complicated. Today’s processors and system architectures vary widely, from relatively simple silicon chips to complex stacked chips with expensive storage. It’s often most telling from a “performance per dollar” perspective.

Graphcore’s IPU-POD16 is a 5U system with a list price of $149,995. As mentioned earlier, it consists of 4 IPU-M2000 accelerators and an industry standard host server. Each IPU-M2000 consists of 4 IPU processors. The NVIDIA DGX-A100 640GB used in MLPerf is a 6U box with a list price of around $300,000 (based on market intelligence and published dealer pricing) with 8 DGX A100 chips. The price of the IPU-POD16 is half that. In this system, the price of an IPU-M2000 is the same as the price of a DGX A100 80GB, or at a finer level, an IPU costs a quarter of that.

In the MLPerf comparative analysis, Graphcore took the results of tightly regulated closed partitions and normalized them for system price. For ResNet-50 and BERT, it is clear that the Graphcore system provides better performance per dollar than the NVIDIA offering. With ResNet-50 training on IPU-POD16, Graphcore’s performance per dollar is 1.6x that of NVIDIA. On BERT, Graphcore’s performance per dollar is 1.3 times that of NVIDIA. The economics of the Graphcore system can better help customers achieve their AI computing goals, and at the same time, the Graphcore system can unlock next-generation models and technologies due to the IPU’s architectural features built specifically for AI.

Tao Lu, Senior Vice President and General Manager of China at Graphcore, said: “We are very proud to have achieved such an excellent result in our first MLPerf submission. This test will also bring more value to Graphcore customers because of what we have done in the preparation phase. All improvements and optimizations are fed back into the Graphcore software stack. Graphcore users worldwide will benefit greatly from MLPerf testing, not just BERT and ResNet-50 models. We will continue to participate in MLPerf testing including training and inference , to contribute all the wisdom and power of Graphcore in pursuit of better performance, larger scale and adding more models.”

The Links:   NL6448BC26-01 FS150R12KT4