AMD Umumkan Akselerator HPC Terkencang di Dunia untuk Riset Ilmiah

ꟷ Akselerator AMD Instinct™ MI100 merevolusi high-performance computing (HPC) dan AI dengan performa komputasi terdepan di industri  ꟷ

ꟷ Akselerator GPU pertama dengan arsitektur AMD CDNA baru yang dirancang untuk era exascale ꟷ

SANTA CLARA, California, AS — 16 November, 2020 — AMD (NASDAQ: AMD) hari ini mengumumkan akselerator AMD Instinct™ MI100 baru – GPU HPC terkencang di dunia dan GPU server x86 pertama yang melampaui hambatan performa 10 teraflop (FP64 ).1 Didukung oleh platform komputasi akselerasi baru dari Dell, Gigabyte, HPE, dan Supermicro, MI100, dikombinasikan dengan CPU AMD EPYCTM dan platform software terbuka ROCm™ 4.0, dirancang untuk mendorong penemuan baru menjelang era exascale.

Dibangun pada arsitektur AMD CDNA baru, GPU AMD Instinct  MI100 memungkinkan kelas sistem akselerasi baru untuk HPC dan AI ketika dipasangkan dengan prosesor AMD EPYC Generasi Kedua. MI100 menawarkan performa puncak FP64 hingga 11,5 TFLOPS untuk HPC dan hingga 46,1 TFLOPS performa Matriks FP32 puncak untuk beban kerja AI dan machine learning2. Dengan teknologi AMD Matrix Core baru, MI100 juga memberikan peningkatan hampir 7x dalam performa titik mengambang puncak teoritis FP16 untuk beban kerja pelatihan AI dibandingkan dengan akselerator generasi sebelumnya dari AMD.3

“Saat ini AMD mengambil langkah depan yang besar dalam perjalanan menuju komputasi exascale saat kami mengungkap AMD Instinct MI100 – GPU HPC terkencang di dunia,” kata Brad McCredie, Brad McCredie, corporate vice president, Data Center GPU and Accelerated Processing, AMD. “Dengan target tepat pada beban kerja dalam komputasi ilmiah, akselerator terbaru kami, bila dikombinasikan dengan platform software terbuka AMD ROCm, dirancang untuk memberikan daya unggul bagi para ilmuwan dan peneliti untuk pekerjaan mereka di HPC.”   

Platform Software Terbuka untuk Era Exascale   

Pengembang software AMD ROCm menyediakan dasar untuk komputasi exascale. Sebagai sebuah perangkat open source yang terdiri dari compilers, programming APIs dan libraries, ROCm digunakan oleh pengembang software exascale untuk membuat aplikasi performa tinggi. ROCm 4.0 telah ditingkatkan untuk memberikan performa pada skala untuk sistem berbasis MI100. ROCm 4.0 mengupgrade compiler menjadi open source dan menyatu untuk mendukung OpenMP® 5.0 dan HIP. Framework PyTorch dan Tensorflow, yang telah dioptimalkan dengan ROCm 4.0, kini dapat mencapai performa yang lebih tinggi dengan MI1007,8.  ROCm 4.0 adalah penawaran terbaru untuk pengembang aplikasi HPC, ML dan AI yang memungkinkan mereka membuat performa software portabel.   

“Kami telah menerima akses awal ke akselerator MI100, dan hasil awalnya sangat menggembirakan. Kami biasanya melihat peningkatan performa yang signifikan, hingga 2-3x dibandingkan dengan GPU lainnya, ”kata Bronson Messer, direktur sains, director of science, Oak Ridge Leadership Computing Facility. “Yang juga penting untuk diketahui adalah pengaruh software terhadap performa. Fakta bahwa platform software terbuka ROCm dan alat pengembang HIP adalah open source dan bekerja pada beragam platform, itu adalah sesuatu yang hampir membuat kami terobsesi sejak kami meluncurkan sistem CPU / GPU hybrid pertama.”   

Kemampuan dan fitur utama akselerator AMD Instinct MI100 meliputi:

  • All-New AMD CDNA Architecture- Dirancang untuk memberi daya pada AMD GPU untuk era exascale dan akselerator MI100, arsitektur AMD CDNA menawarkan performa dan efisiensi daya yang luar biasa   
  • Leading FP64 and FP32 Performance for HPC Workloads – Menghadirkan 11,5 FP64 TFLOPS performa puncak dan 23,1 FP32 TFLOPS, memungkinkan ilmuwan dan peneliti di seluruh dunia untuk mempercepat penemuan dalam industri termasuk ilmu hayat, energi, keuangan, akademisi, pemerintah, pertahanan, dan lainnya. 1  
  • All-New Matrix Core Technology for HPC and AI – Performa supercharged untuk berbagai operasi matriks presisi tunggal dan campuran, seperti FP32, FP16, bFloat16, Int8 dan Int4, yang dirancang untuk meningkatkan konvergensi HPC dan AI. 
  • 2nd Gen AMD Infinity Fabric™ Technology – Instinct MI100 menyediakan ~ 2x bandwidth I/O puncak peer-to-peer (P2P) melalui PCIe® 4.0 dengan hingga 340 GB/dtk bandwidth agregat per kartu dengan tiga AMD Infinity Fabric™ Links. 4  Dalam sebuah server, GPU MI100 dapat dikonfigurasi dengan hingga dua kumpulan GPU quad yang terhubung sepenuhnya, masing-masing menyediakan hingga 552 GB/dtk bandwidth I/O P2P untuk berbagi data dengan cepat. 4    
  • Ultra-Fast HBM2 Memory– Menampilkan memori HBM2 bandwidth tinggi 32 GB dengan kecepatan rate clock 1.2 GHz dan memberikan bandwidth memori sangat tinggi 1,23 TB/dtk untuk mendukung kumpulan data besar dan membantu menghilangkan hambatan dalam memindahkan data masuk dan keluar dari memori. 5  
  • Support for Industry’s Latest PCIe® Gen 4.0 – Didesain dengan dukungan teknologi PCIe Gen 4.0 terbaru yang menyediakan bandwidth data transportasi teoretis puncak hingga 64 GB/dtk dari CPU ke GPU. 6 

Solusi Server yang Tersedia

Akselerator AMD Instinct MI100 diharapkan pada akhir tahun dalam sistem dari mitra OEM dan ODM utama di pasar enterprise, termasuk:

Dell

“Server Dell EMC PowerEdge akan mendukung AMD Instinct MI100 baru, yang akan memungkinkan wawasan yang lebih cepat dari data. Ini akan membantu pelanggan kami mencapai hasil HPC dan AI yang lebih kuat dan efisien dengan cepat, ”kata Ravi Pendekanti, senior vice president, PowerEdge Servers, Dell Technologies. “AMD telah menjadi mitra berharga dalam dukungan kami untuk memajukan inovasi di data center. Kemampuan performa tinggi akselerator AMD Instinct sangat cocok untuk portofolio AI & HPC server PowerEdge kami.”    

Gigabyte

“Kami senang dapat kembali bekerja dengan AMD sebagai mitra strategis yang menawarkan server hardware kepada pelanggan untuk komputasi performa tinggi,” kata Alan Chen, assistant vice president di NCBU, GIGABYTE. “Akselerator AMD Instinct MI100 mewakili tingkat komputasi performa tinggi pada tingkatan berikutnya di data center, menghadirkan konektivitas dan bandwidth data yang lebih besar untuk penelitian energi, dinamika molekuler, dan pelatihan pembelajaran mendalam. Sebagai akselerator baru dalam portofolio GIGABYTE, pelanggan kami dapat memperoleh manfaat dari peningkatan performa di berbagai beban kerja ilmiah dan industri HPC. ”   

Hewlett Packard Enterprise (HPE)

“Pelanggan menggunakan sistem HPE Apollo untuk kapabilitas dan performa yang dibuat khusus untuk menangani berbagai beban kerja data yang kompleks dan intensif di seluruh high-performance computing (HPC), pembelajaran mendalam dan analitik,” kata Bill Mannel, vice president and general manager, HPC di HPE. “Dengan diperkenalkannya sistem HPE Apollo 6500 Gen10 Plus yang baru, kami semakin meningkatkan portofolio kami untuk meningkatkan performa beban kerja dengan mendukung akselerator AMD Instinct MI100 baru, yang memungkinkan konektivitas dan pemrosesan data yang lebih baik, di samping prosesor AMD EPYC ™ Generasi Kedua. Kami berharap dapat melanjutkan kolaborasi kami dengan AMD untuk memperluas penawaran kami dengan CPU dan akselerator terbaru.”    

Supermicro

“Kami senang bahwa AMD membuat dampak besar dalam komputasi performa tinggi dengan akselerator GPU AMD Instinct MI100,” kata Vik Malyala, senior vice president, field application engineering and business development, Supermicro. “Dengan kombinasi kekuatan komputasi yang diperoleh dengan arsitektur CDNA baru, bersama dengan memori tinggi dan bandwidth peer-to-peer GPU yang dibawa MI100, pelanggan kami akan mendapatkan akses ke solusi hebat yang akan memenuhi persyaratan komputasi yang dipercepat dan beban kerja enterprise. AMD Instinct MI100 akan menjadi tambahan yang bagus untuk server multi-GPU kami dan portofolio ekstensif sistem performa tinggi dan solusi server building block.”     

Spesifikasi MI100

Compute UnitsStream ProcessorsFP64 TFLOPS (Peak)FP32 TFLOPS (Peak)FP32 Matrix TFLOPS (Peak)FP16/FP16 Matrix
TFLOPS (Peak)
INT4 | INT8 TOPS (Peak)bFloat16 TFLOPs (Peak)HBM2
ECC
Memory
Memory Bandwidth
1207680Up to 11.5Up to 23.1Up to 46.1Up to 184.6Up to 184.6Up to 92.3 TFLOPS32GBUp to 1.23 TB/s

Sumber-sumber Pendukung

Tentang AMD

Selama lebih dari 50 tahun, AMD sudah mendorong inovasi pada teknologi computing, grafis, dan visualisasi berperforma tinggi ― unsur pokok untuk bermain game, immersive platforms, dan pusat data. Ratusan juta konsumen, bisnis-bisnis Fortune 500 terkemuka, dan fasilitas penelitian sains yang canggih di seluruh dunia bertopang pada teknologi AMD setiap harinya untuk memperbaiki cara mereka hidup, bekerja, dan bermain. Pegawai AMD di seluruh dunia berfokus untuk membangun produk-produk yang hebat yang mendorong melewati batas kemungkinan. Untuk informasi lebih lanjut tentang bagaimana AMD memungkinkan hari ini dan menginspirasi esok hari, kunjungi halaman AMD (NASDAQ: AMD) di website, blog, Facebook dan Twitter.

CAUTIONARY STATEMENT
This press release contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the features, functionality, performance, availability, timing and expected benefits of AMD products including the AMD Instinct™ MI100 accelerator, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward looking statements are commonly identified by words such as “would,” “may,” “expects,” “believes,” “plans,” “intends,” “projects” and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this press release are based on current beliefs, assumptions and expectations, speak only as of the date of this press release and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD’s control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Material factors that could cause actual results to differ materially from current expectations include, without limitation, the following: Intel Corporation’s dominance of the microprocessor market and its aggressive business practices; the ability of third party manufacturers to manufacture AMD’s products on a timely basis in sufficient quantities and using competitive technologies; expected manufacturing yields for AMD’s products; the availability of essential equipment, materials or manufacturing processes; AMD’s ability to introduce products on a timely basis with features and performance levels that provide value to its customers; global economic uncertainty; the loss of a significant customer; AMD’s ability to generate revenue from its semi-custom SoC products;  the impact of the COVID-19 pandemic on AMD’s business, financial condition and results of operations; political, legal, economic risks and natural disasters; the impact of government actions and regulations such as export administration regulations, tariffs and trade protection measures; the impact of acquisitions, joint ventures and/or investments on AMD’s business, including the announced acquisition of Xilinx, and the failure to integrate acquired businesses; AMD’s ability to complete the Xilinx merger; the impact of the announcement and pendency of the Xilinx merger on AMD’s business; potential security vulnerabilities; potential IT outages, data loss, data breaches and cyber-attacks; uncertainties involving the ordering and shipment of AMD’s products; quarterly and seasonal sales patterns; the restrictions imposed by agreements governing AMD’s notes and the revolving credit facility; the competitive markets in which AMD’s products are sold; market conditions of the industries in which AMD products are sold; AMD’s reliance on third-party intellectual property to design and introduce new products in a timely manner; AMD’s reliance on third-party companies for the design, manufacture and supply of motherboards, software and other computer platform components; AMD’s reliance on Microsoft Corporation and other software vendors’ support to design and develop software to run on AMD’s products; AMD’s reliance on third-party distributors and add-in-board partners; the potential dilutive effect if the 2.125% Convertible Senior Notes due 2026 are converted; future impairments of goodwill and technology license purchases; AMD’s ability to attract and retain qualified personnel; AMD’s ability to generate sufficient revenue and operating cash flow or obtain external financing for research and development or other strategic investments; AMD’s indebtedness; AMD’s ability to generate sufficient cash to service its debt obligations or meet its working capital requirements; AMD’s ability to repurchase its outstanding debt in the event of a change of control; the cyclical nature of the semiconductor industry; the impact of modification or interruption of AMD’s internal business processes and information systems; compatibility of AMD’s products with some or all industry-standard software and hardware; costs related to defective products; the efficiency of AMD’s supply chain; AMD’s ability to rely on third party supply-chain logistics functions; AMD’s stock price volatility; worldwide political conditions; unfavorable currency exchange rate fluctuations; AMD’s ability to effectively control the sales of its products on the gray market; AMD’s ability to adequately protect its technology or other intellectual property; current and future claims and litigation; potential tax liabilities; and the impact of environmental laws, conflict minerals-related provisions and other laws or regulations. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s Quarterly Report on Form 10-Q for the quarter ended September 26, 2020.

©2020 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, AMD Instinct, Infinity Fabric, ROCm and combinations thereof are trademarks of Advanced Micro Devices, Inc. The OpenMP name and the OpenMP logos are registered trademarks of the OpenMP Architecture Review Board. PCIe is a registered trademark of PCI-SIG Corporation. Python is a trademark of the Python Software Foundation. PyTorch is a trademark or registered trademark of PyTorch. TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

  1. Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak double precision (FP64), 46.1 TFLOPS peak single precision matrix (FP32), 23.1 TFLOPS  peak single precision (FP32), 184.6 TFLOPS peak half precision (FP16) peak theoretical, floating-point performance. Published results on the NVidia Ampere A100 (40GB) GPU accelerator resulted in 9.7 TFLOPS peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16) theoretical, floating-point performance. Server manufacturers may vary configuration offerings yielding different results. MI100-03
  2. Calculations performed by AMD Performance Labs as of Sep 3, 2020 on the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak engine clock resulted in 46.1 TFLOPS peak theoretical single precision (FP32 Matrix) Math floating-point performance. The Nvidia Ampere A100 (40GB) GPU accelerator published results are 19.5 TFLOPS peak single precision (FP32) floating-point performance.  Nvidia results found at: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.   Server manufacturers may vary configuration offerings yielding different results. MI100-01
  3. Calculations performed by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 accelerator at 1,502 MHz peak boost engine clock resulted in 184.57 TFLOPS peak theoretical half precision (FP16) and 46.14 TFLOPS peak theoretical single precision (FP32 Matrix) floating-point performance. The results calculated for Radeon Instinct™ MI50 GPU at 1,725 MHz peak engine clock resulted in 26.5 TFLOPS peak theoretical half precision (FP16) and 13.25 TFLOPS peak theoretical single precision (FP32 Matrix) floating-point performance. Server manufacturers may vary configuration offerings yielding different results. MI100-04
  4. Calculations as of SEP 18th, 2020. AMD Instinct™ MI100 built on AMD CDNA technology accelerators supporting PCIe® Gen4 providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU per card. AMD Instinct™ MI100 accelerators include three Infinity Fabric™ links providing up to 276 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) transport rate bandwidth performance per GPU card. Combined with PCIe Gen4 support providing an aggregate GPU card I/O peak bandwidth of up to 340 GB/s. MI100s have three links: 92 GB/s * 3 links per GPU = 276 GB/s. Four GPU hives provide up to 552 GB/s peak theoretical P2P performance. Dual 4 GPU hives in a server provide up to 1.1 TB/s total peak theoretical direct P2P performance per server. AMD Infinity Fabric link technology not enabled: Four GPU hives provide up to 256 GB/s peak theoretical P2P performance with PCIe® 4.0. Server manufacturers may vary configuration offerings yielding different results. MI100-07
  5. Calculations by AMD Performance Labs as of Oct 5th, 2020 for the AMD Instinct™ MI100 accelerator designed with AMD CDNA 7nm FinFET process technology at 1,200 MHz peak memory clock resulted in 1.2288 TFLOPS peak theoretical memory bandwidth performance. The results calculated for Radeon Instinct™ MI50 GPU designed with “Vega” 7nm FinFET process technology with 1,000 MHz peak memory clock resulted in 1.024 TFLOPS peak theoretical memory bandwidth performance. CDNA-04
  6. Works with PCIe® Gen 4.0 and Gen 3.0 compliant motherboards. Performance may vary from motherboard to motherboard. Refer to system or motherboard provider for individual product performance and features.
  7. Testing Conducted by AMD performance labs as of October 30th, 2020, on three platforms and software versions typical for the launch dates of the Radeon Instinct MI25 (2018), MI50 (2019) and AMD Instinct MI100 GPU (2020) running the benchmark application Quicksilver. MI100 platform (2020): Gigabyte G482-Z51-00 system comprised of Dual Socket AMD EPYC™ 7702 64-Core Processor, AMD Instinct™ MI100 GPU, ROCm™ 3.10 driver, 512GB DDR4, RHEL 8.2.  MI50 platform (2019): Supermicro® SYS-4029GP-TRT2 system comprised of Dual Socket Intel Xeon® Gold® 6132, Radeon Instinct™ MI50 GPU, ROCm 2.10 driver, 256 GB DDR4, SLES15SP1. MI25 platform (2018): Supermicro SYS-4028GR-TR2 system comprised of Dual Socket Intel Xeon CPU E5-2690, Radeon Instinct™ MI25 GPU, ROCm 2.0.89 driver, 246GB DDR4 system memory, Ubuntu 16.04.5 LTS. MI100-14

Testing Conducted by AMD performance labs as of October 30th, 2020, on three platforms and software versions typical for the launch dates of the Radeon Instinct MI25 (2018), MI50 (2019) and AMD Instinct MI100 GPU (2020) running the benchmark application TensorFlow ResNet 50 FP 16 batch size 128. MI100 platform (2020): Gigabyte G482-Z51-00 system comprised of Dual Socket AMD EPYC™ 7702 64-Core Processor, AMD Instinct™ MI100 GPU, ROCm™ 3.10 driver, 512GB DDR4, RHEL 8.2. MI50 platform (2019): Supermicro® SYS-4029GP-TRT2 system comprised of Dual Socket Intel Xeon® Gold® 6254, Radeon Instinct™ MI50 GPU, ROCm 3.0.6 driver, 338 GB DDR4, Ubuntu® 16.04.6 LTS. MI25 platform (2018): a Supermicro SYS-4028GR-TR2 system comprised of Dual Socket Intel Xeon CPU E5-2690, Radeon Instinct™ MI25 GPU, ROCm 2.0.89 driver, 246GB DDR4 system memory, Ubuntu 16.04.5 LTS. MI100-15