The new 'Furr' supercomputer at Simon Fraser University leverages advanced direct dye liquid cooling and high-density NVIDIA H100 GPUs and AMD Zen 5 CPUs to serve tens of thousands of researchers across various scientific fields.
Takeways• The 'Furr' supercomputer at SFU is a massive, multi-million dollar installation leveraging advanced liquid cooling.
• It features high-density NVIDIA H100 GPUs and AMD Zen 5 CPUs, all directly liquid cooled without fans.
• The system's robust infrastructure supports a wide range of scientific research and prioritizes efficient power and thermal management.
The 'Furr' supercomputer at Simon Fraser University is a $82 million high-performance computing deployment featuring 640 NVIDIA H100 GPUs and 192-core AMD Zen 5 CPU nodes, designed to support vast research needs. It utilizes highly efficient direct dye liquid cooling, which enhances cooling efficiency from 30% to over 90%, eliminating the need for internal fans. The system integrates advanced components and infrastructure to handle immense power consumption and maintain optimal operating temperatures.
Furr Supercomputer Overview
• 00:00:05 The new 'Furr' supercomputer at Simon Fraser University is a massive deployment, featuring 165,000 CPU cores, $20 million worth of GPUs, and a terabyte of RAM, designed to serve tens of thousands of scientists and researchers in fields from AI to zoology. This system provides a unique opportunity to examine a real-world deployment of direct dye liquid cooling, which significantly boosts cooling efficiency from approximately 30% to over 90%.
GPU Node Architecture
• 00:01:38 Each GPU node contains 640 NVIDIA H100 80GB GPUs, each costing around $31,000, which necessitates substantial power and cooling infrastructure upgrades to the building. These nodes utilize Epic Genoa CPUs with 48 cores and 1.152 terabytes of RAM, paired with four NVIDIA H100 SXM5 80-gig GPUs, providing 320 gigabytes of HBM3 VRAM per node with 3.36 terabytes per second bandwidth per GPU. The high power consumption of 700 watts per GPU is managed by an advanced direct liquid cooling system that cools all components without any fans.
Cooling Infrastructure Details
• 00:04:20 The supercomputer features an innovative direct liquid cooling system where everything—CPUs, GPUs, VRMs, network interface, SSD caddy, and even system memory—is directly liquid cooled. The cooling loops are intricately designed, with a primary loop dedicated to the GPUs and a secondary loop handling power delivery and RAM. External cooling is provided by evaporative cooling towers with a total capacity of 4.7 megawatts, supplemented by mechanical chillers during hot periods, all managed by a sophisticated monitoring system called Kaizen that captures telemetry data to prevent issues like condensation or corrosion.
CPU and Specialized Nodes
• 00:13:01 Beyond GPU-intensive tasks, the 'Furr' supercomputer includes dedicated CPU nodes, each containing two nodes with 192 Zen 5 Epic Turing cores and 768 gigabytes of memory, totaling nearly 400 cores per 1U rack. These nodes use 200 gigabit networking connections, dynamically shared between the two nodes. The system also integrates specialized nodes, such as storage nodes with petabytes of NVMe and spinning disk storage, and 8TB RAM nodes for memory-intensive jobs, alongside a single AMD MI300X node for diverse computational needs.