Hot Chips 31: IBM Power10, AMD’s AI ambitions, Intel NNP-T, Cerebras largest chip with 1.2 trillion transistors and more

Hot Chips 31, the premiere event for the biggest semiconductor vendors to highlight their latest architectural developments is held in August every year. The event this year was held at the Memorial Auditorium on the Stanford University Campus in California, from August 18-20, 2019. Since its inception it is co-sponsored by IEEE and ACM SIGARCH.

Hot Chips is amazing for the level of depth it provides on the latest technology and the upcoming releases in the IoT, firmware and hardware space. This year the list of presentations for Hot Chips was almost overwhelming with a wide range of technical disclosures on the latest chip logic innovations.

Almost all the major chip vendors and IP licensees involved in semiconductor logic designs took part: Intel, AMD, NVIDIA, Arm, Xilinx, IBM, were on the list. But companies like Google, Microsoft, Facebook and Amazon also took part. There are notable absences from the likes of Apple, who despite being on the Committee, last presented at the conference in 1994.

Day 1 kicked off with tutorials and sponsor demos. On the cloud side, Amazon AWS covered the evolution of hypervisors and the AWS infrastructure. Microsoft described its acceleration strategy with FPGAs and ASICs, with details on Project Brainwave and Project Zipline. Google covered the architecture of Google Cloud with the TPU v3 chip. And a 3-part RISC-V tutorial rounded off by afternoon, so the day was spent well with insights into the latest cloud infrastructure and processor architectures.

The detailed talks were presented on Day 2 and Day 3, below are some of the important highlights of the event:

IBM’s POWER10 Processor expected by 2021

IBM which creates families of processors to address different segments, with different models for tasks like scale-up, scale-out, and now NVLink deployments. The company is adding new custom models that use new acceleration and memory devices, and that was the focus of this year’s talk at Hot Chips. They also announced about POWER10 which is expected to come with these new enhancements in 2021, they additionally announced, core counts of POWER10 and process technology.

IBM also spoke about focusing on developing diverse memory and accelerator solutions to differentiate its product stack with heterogeneous systems.

IBM aims to reduce the number of PHYs on its chips, so now it has PCIe Gen 4 PHYs while the rest of the SERDES run with the company's own interfaces. This creates a flexible interface that can support many types of accelerators and protocols, like GPUs, ASICs, CAPI, NVLink, and OpenCAPI.

AMD wants to become a significant player in Artificial Intelligence

AMD does not have an artificial intelligence–focused chip. However, AMD CEO Lisa Su in a keynote address at Hot Chips 31 stated that the company is working toward becoming a more significant player in artificial intelligence.

Lisa stated that the company had adopted a CPU/GPU/interconnect strategy to tap artificial intelligence and HPC opportunity. She said that AMD would use all its technology in the Frontier supercomputer. The company plans to fully optimize its EYPC CPU and Radeon Instinct GPU for supercomputing. It would further enhance the system’s performance with its Infinity Fabric and unlock performance with its ROCM (Radeon Open Compute) software tools.

Unlike Intel and NVIDIA, AMD does not have a dedicated artificial intelligence chip or application-specific accelerators. Despite this, Su noted, “We’ll absolutely see AMD be a large player in AI.” AMD is considering whether to build a dedicated AI chip or not. This decision will depend on how artificial intelligence evolves.

Lisa explained that companies have been improving their CPU (central processing unit) performance by leveraging various elements. These elements are process technology, die size, TDP (thermal design power), power management, microarchitecture, and compilers. Process technology is the biggest contributor, as it boosts performance by 40%. Increasing die size also boosts performance in the double digits, but it is not cost-effective.

While AMD used microarchitecture to boost EPYC Rome server CPU IPC (instructions per cycle) by 15% in single-threaded and 23% in multi-threaded workloads. This IPC improvement is above the industry average IPC improvement of around 5%–8%.

Intel’s Nervana NNP-T and Lakefield 3D Foveros hybrid processors

Intel revealed fine-grained details about its much-anticipated Spring Crest Deep Learning Accelerators at Hot Chips 31. The Nervana Neural Network Processor for Training (NNP-T) comes with 24 processing cores and a new take on data movement that's powered by 32GB of HBM2 memory. The spacious 27 billion transistors are spread across a 688mm2 die. The NNP-T also incorporates leading-edge technology from Intel-rival TSMC.

Intel Lakefield 3D Foveros Hybrid Processors

Intel in another presentation talked about Lakefield 3D Foveros hybrid processors that are the first to come to market with Intel's new 3D chip-stacking technology. The current design consists of two dies. The lower die houses all of the typical southbridge features, like I/O connections, and is fabbed on the 22FFL process. The upper die is a 10nm CPU that features one large compute core and four smaller Atom-based 'efficiency' cores, similar to an ARM big.LITTLE processor. Intel calls this a "hybrid x86 architecture," and it could denote a fundamental shift in the company's strategy. Finally, the company stacks DRAM atop the 3D processor in a PoP (package-on-Package) implementation.

Cerebras largest chip ever with 1.2 trillion transistors

California artificial intelligence startup Cerebras Systems introduced its Cerebras Wafer Scale Engine (WSE), the world’s largest-ever chip built for neural network processing. Sean Lie the Co-Founder and Chief Hardware Architect at Cerebras Lie presented the gigantic chip ever at Hot Chips 31.

The 16nm WSE is a 46,225 mm2 silicon chip which is slightly larger than a 9.7-inch iPad. It features 1.2 trillion transistors, 400,000 AI optimized cores, 18 Gigabytes of on-chip memory, 9 petabyte/s memory bandwidth, and 100 petabyte/s fabric bandwidth. It is 56.7 times larger than the largest Nvidia graphics processing unit, which accommodates 21.1 billion transistors on a 815 mm2 silicon base.

NVIDIA’s multi-chip solution for deep neural networks accelerator

NVIDIA which announced about designing a test multi-chip solution for DNN computations at a VLSI conference last year, the company explained chip technology at Hot Chips 31 this year. It is currently a test chip which involves a multi-chip DL inference. It is designed for CNNs and has a RISC-V chip controller. It has 36 small chips, 8 Vector MACs per PE, and each chip has 12 PEs and each package has 6x6 chips.

Few other notable talks at Hot Chips 31

Microsoft unveiled its new product Hololens 2.0 silicone. It has a holographic processor and a custom silicone. The application processor runs the app, and the HPU modifies the rendered image and sends to the display.

Facebook presented details on Zion, its next generation in-memory unified training platform. Zion which is designed for Facebook sparse workloads, has a unified BFLOAT 16 format with CPU and accelerators.

Huawei spoke about its Da Vinci architecture, a single Ascend 310 which can deliver 16 TeraOPS of 8-bit integer performance, support real-time analytics across 16 channels of HD video, and consume less than 8W of power.

Xiling Versal AI engine Xilinx, the manufacturer of FPGAs, announced its new Versal AI engine last year as a way of moving FPGAs into the AI domain. This year at Hot Chips they expanded on its technology and more.

Ayar Labs, an optical chip making startup, showcased results of its work with DARPA (U.S. Department of Defense's Defense Advanced Research Projects Agency) and Intel on an FPGA chiplet integration platform.

The final talk on Day 3 ended with a presentation by Habana, they discussed about an innovative approach to scaling AI Training systems with its GAUDI AI Processor.