Why is liquid cooling required for the GB200 NVL72?

With 1,000 W TDP per B200 GPU, air cooling is physically incapable of managing the thermal load in a dense rack; direct liquid cooling is mandatory to sustain high-performance AI workloads.

Blog / NVIDIA GB200 NVL72: PCB & System Architecture Explained

NVIDIA GB200 NVL72: PCB & System Architecture Explained

Q: What is the GB200 NVL72?

The GB200 NVL72 is a rack-scale AI accelerator system that combines 36 GB200 Grace Blackwell Superchips in a fully interconnected NVLink 5.0 fabric.

Posted: June, 2026 Last Updated: June, 2026 Writer: Stacy Lu Share:

Introduction

When NVIDIA announced the GB200 NVL72 in March 2024, it described it as “a 72-GPU liquid-cooled rack-scale system that is essentially one giant GPU.” That description is architecturally accurate in a way that has profound implications for PCB engineers and infrastructure teams: the NVL72 does not behave like a cluster of 72 individual GPUs connected by a network. It behaves like a single accelerator with 13.5 TB of unified HBM3e memory, 130 petaFLOPS of FP4 compute, and 1,800 GB/s of bidirectional interconnect bandwidth per GPU—all of it unified by a NVLink 5.0 switch fabric that fills nine dedicated switch boards within the rack.

To build this system, NVIDIA pushed every dimension of PCB engineering to its current commercial limit. The NVSwitch 4.0 boards route more NVLink 5.0 differential pairs per board than any previously produced commercial PCB. The B200 GPU operates at 1,000 W per die, making air cooling physically impossible and demanding liquid cooling integration down to the board level. The 48 V DC bus distribution system carries 2,500 A of rack-level current through copper planes sized to handle that load continuously for years. And the Grace CPU–B200 GPU interconnect within each Superchip operates at 900 GB/s of coherent bandwidth—faster than the PCIe Gen5 bus that connects CPUs to GPUs in every previous server architecture.

This article explains the GB200 NVL72 from the silicon outward: what the architecture achieves, how each subsystem is built, and what the system demands from the PCBs that make it function.

Table of Contents

Introduction
What Is the GB200 NVL72?
The Grace Blackwell Superchip: B200 GPU + Grace CPU
B200 GPU: Blackwell Architecture and CoWoS-L Packaging
NVLink 5.0 Fabric: 72 GPUs as One Logical Accelerator
NVSwitch 4.0 Boards: The Most Complex PCBs in the Rack
Compute Trays: GB200 Superchip PCB Architecture
Power Architecture: 120 kW at 48 V
Liquid Cooling: Mandatory at 1,000 W per GPU
Unified Memory Fabric: 13.5 TB HBM3e
Networking: InfiniBand and Ethernet Scale-Out
Performance Numbers: What GB200 NVL72 Delivers
PCB Manufacturing Demands Across the Rack
GB200 NVL72 vs DGX H100: System Comparison
FAQ

What Is the GB200 NVL72?

The GB200 NVL72 is a rack-scale AI accelerator system from NVIDIA, combining 36 GB200 Grace Blackwell Superchips (each containing one Grace ARM CPU and one B200 GPU) in a fully interconnected NVLink 5.0 fabric within a single liquid-cooled rack. At the system level, the NVL72's key specifications are:

Specification	Value
GPUs per rack	72 B200 (from 36 GB200 Superchips)
CPUs per rack	36 Grace (ARM Neoverse V2, 72 cores each)
GPU compute (FP4, sparse)	~1.44 ExaFLOPS (1,440 PetaFLOPS)
GPU compute (FP8, sparse)	~648 PetaFLOPS
GPU compute (BF16, dense)	~162 PetaFLOPS
HBM3e memory per GPU	192 GB
Total HBM3e (72 GPUs)	13,824 GB (~13.5 TB)
Total CPU memory (LPDDR5X)	36 × 480 GB = 17,280 GB (~17 TB)
Combined addressable memory	~30 TB (HBM3e + LPDDR5X, coherent)
NVLink 5.0 bandwidth per GPU	1,800 GB/s bidirectional
NVSwitch boards in rack	9
Rack power consumption	~120 kW
Cooling requirement	Mandatory direct liquid cooling (DLC)
External network	8 × 400G InfiniBand per rack
Rack form factor	Custom NVIDIA rack (~42U equivalent)

The “NVL72” designation refers to NVLink 72—72 GPUs connected by NVLink. The predecessor DGX H100 rack connected 32 GPUs (across 4 independent nodes) and required InfiniBand to communicate between the 4 nodes; any workload requiring all 32 GPUs to exchange data simultaneously was bound by InfiniBand bandwidth between nodes. The NVL72 eliminates this constraint entirely: all 72 GPUs share a single NVLink 5.0 fabric, making the rack functionally equivalent to a single accelerator for the purposes of tensor parallelism and memory allocation.

The Grace Blackwell Superchip: B200 GPU + Grace CPU

The GB200 (Grace Blackwell) Superchip is the fundamental computing unit of the NVL72. Each Superchip integrates one B200 GPU and one Grace CPU on a single module, connected by NVLink-C2C (Chip-to-Chip)—a 900 GB/s bidirectional, cache-coherent interconnect that operates at much higher bandwidth and lower latency than PCIe.

The Grace CPU is based on 72 ARM Neoverse V2 cores fabricated on TSMC N4, designed specifically for AI server workloads. Its key characteristics for the NVL72 architecture:

Memory: 480 GB LPDDR5X at 128-bit bus width, providing ~4 TB/s of CPU memory bandwidth
NVLink-C2C: 900 GB/s coherent interconnect to the B200 GPU; CPU and GPU share a unified memory address space without explicit data copy operations between CPU and GPU memory
PCIe Gen5: ×16 PCIe Gen5 for external connectivity (NIC, storage, management)
Power: ~100 W TDP; total GB200 Superchip TDP ~1,200 W (CPU + GPU combined)

The NVLink-C2C interconnect within the Superchip replaces the PCIe host interface that connected CPUs to GPUs in every previous server architecture. PCIe Gen5 ×16 provides ~128 GB/s; NVLink-C2C provides 900 GB/s—7× higher bandwidth. More significantly, NVLink-C2C is cache-coherent: the GPU can access CPU memory (LPDDR5X) and the CPU can access GPU memory (HBM3e) without explicit data transfer or address translation overhead. This enables new AI programming models where the 30 TB combined memory (17 TB LPDDR5X + 13.5 TB HBM3e) in the rack is addressable as a single unified pool.

The GB200 Superchip module is not a standard PCB add-in card. It is a multi-chip module (MCM) that mounts to the compute tray baseboard via a high-density mezzanine connector. The compute tray baseboard provides power delivery, NVLink 5.0 routing to the NVSwitch boards, PCIe Gen5 connectivity for external I/O, and liquid cooling manifold connections.

B200 GPU: Blackwell Architecture and CoWoS-L Packaging

The B200 GPU is NVIDIA's first dual-die GPU in production. Two GB100 dies, each fabricated on TSMC's 4NP process with approximately 104 billion transistors, are joined by TSMC's CoWoS-L (Chip-on-Wafer-on-Substrate with Local silicon interconnect) advanced packaging technology. The two dies are connected across the CoWoS interposer at approximately 900 GB/s die-to-die bandwidth through a dense array of silicon microbumps. For a detailed technical treatment of CoWoS packaging and its PCB implications, see CoWoS Packaging Explained.

From a PCB design perspective, the B200's CoWoS-L package presents as a single very large BGA component with a footprint significantly larger than the H100 SXM5 package. The B200 uses the SXM6 socket, which is physically incompatible with SXM5—an H100 baseboard cannot accept B200 GPUs without a complete baseboard redesign. The SXM6 socket has higher pin count, wider power delivery capability (for 1,000 W TDP), and supports the NVLink 5.0 interface at 200 Gb/s per lane.

The B200's 8 stacks of HBM3e memory, providing 192 GB at 8.0 TB/s aggregate bandwidth, are integrated within the CoWoS package alongside the dual GB100 dies. The PCB baseboard does not route HBM signals—all HBM interconnects are within the CoWoS package substrate. However, the PCB must supply the HBM power rails (multiple VDDQ domains, each with tight ripple specifications) through the SXM6 power delivery pins.

NVLink 5.0 Fabric: 72 GPUs as One Logical Accelerator

The NVLink 5.0 switch fabric is what transforms a collection of 72 B200 GPUs into a unified accelerator. Each B200 GPU has 18 NVLink 5.0 links, each operating at 200 Gb/s bidirectional. The total NVLink bandwidth per GPU is 18 × 200 Gb/s = 3,600 Gb/s = 1,800 GB/s bidirectional. Across 72 GPUs, the aggregate NVLink fabric bandwidth is 72 × 1,800 GB/s = approximately 130 TB/s bidirectional.

This fabric is implemented through nine NVSwitch 4.0 boards, each containing multiple NVSwitch 4.0 chips. Each GPU connects to all 9 NVSwitch boards (2 NVLink links per board per GPU), creating a fully non-blocking topology: any GPU can send to any other GPU at full 1,800 GB/s simultaneously without any other GPU's traffic reducing available bandwidth. In a 72-GPU all-to-all communication (the pattern used in tensor-parallel collective operations), every GPU can simultaneously send and receive at full 1,800 GB/s.

The NVSwitch 4.0 chip itself operates at 14.4 TB/s aggregate bidirectional bandwidth (72 ports × 200 GB/s per port), nearly double the 6.4 TB/s of NVSwitch 3.0 in the H100 era. For background on NVSwitch architecture and its PCB routing implications, the NVSwitch guide provides the foundational context, and the NVLink PCB routing guide covers the signal integrity requirements of the NVLink 5.0 interface.

NVSwitch 4.0 Boards: The Most Complex PCBs in the Rack

The 9 NVSwitch 4.0 boards in the GB200 NVL72 are almost certainly the most technically complex PCBs in commercial production today. Their function is straightforward to describe—switch NVLink 5.0 traffic between all 72 GPUs—but the engineering required to implement it pushes every PCB capability simultaneously.

Each NVSwitch board routes NVLink 5.0 signals between:

All 36 compute trays (each carrying 2 B200 GPUs), providing 2 NVLink 5.0 links per GPU from this switch board
Multiple NVSwitch 4.0 chips on the board itself, which together implement the crossbar switching function

The resulting NVLink 5.0 routing density is extraordinary: a single NVSwitch board may route more than 3,000 differential pairs operating at 200 Gb/s per lane, plus power delivery for NVSwitch chips consuming approximately 400 W each. The PCB design requirements that follow from this are described in detail in the 30+ Layer HDI PCB guide, but the key specifications are:

Layer count: 32–40 layers, with any-layer HDI (ELIC) required to achieve the routing density beneath NVSwitch 4.0 BGA packages
Laminate: Panasonic Megtron 7 (Df ~0.002 at 10 GHz) or equivalent on all NVLink 5.0 signal layers; Megtron 6 on power and ground planes
Copper foil: High-VLP (HVLP, Rz < 1 μm) on all NVLink 5.0 signal layers to minimize skin-effect conductor loss at 25+ GHz
Impedance control: 100 Ω ± 5% differential for all NVLink 5.0 pairs; requires LDI imaging and ± 3% dielectric thickness control
Via stubs: < 5 mils (127 μm) residual stub on any through-hole via carrying NVLink 5.0 signals; any-layer HDI eliminates stubs entirely on microvia connections
Board dimensions: Large format (estimated 400–600 mm per side) to accommodate connections to all 36 compute trays

These NVSwitch boards require fabricators with sequential lamination capability (4–5 press cycles for any-layer HDI), UV laser drilling systems for < 75 μm diameter microvias, precision CNC backdrilling at ± 25 μm depth accuracy, and VNA measurement capability to 50+ GHz for channel coupon verification. Very few PCB fabricators worldwide have the complete combination of these capabilities qualified for production volume.

Compute Trays: GB200 Superchip PCB Architecture

The NVL72 rack contains 18 compute trays, each carrying 2 GB200 Superchips (2 Grace CPUs + 2 B200 GPUs). The compute tray baseboard is a large-format PCB (approximately 600–700 mm per long axis) that serves as the integration platform for the Superchip modules.

The compute tray baseboard must simultaneously handle:

Power delivery for 2 B200 GPUs at 1,000 W each plus 2 Grace CPUs at ~100 W each (~2,200 W total per tray); 48 V bus input to on-board VRMs generating GPU VCORE (~0.85 V at 800+ A per GPU), HBM VDDQ, CPU VCORE, and auxiliary rails
NVLink 5.0 routing from each B200 GPU to 9 NVSwitch board edge connectors; at 18 NVLink 5.0 links per GPU and 200 Gb/s per lane, these traces are the signal integrity design constraint that defines the board's layer count and material selection
PCIe Gen5 routing from each Grace CPU to external connectivity (NIC, storage); as covered in the PCIe Gen5 PCB design guide, Gen5 at 32 GT/s per lane requires low-loss laminate and backdrilling on through-hole vias
Liquid cooling integration: cold plate mounting structures for each B200 GPU package; manifold connections for the liquid cooling loop that carries ~2,000 W of heat per tray to the rack-level cooling infrastructure
Management and I/O: BMC (Baseboard Management Controller) for out-of-band management, temperature monitoring, power sequencing

The compute tray baseboard's PCB design requirements are somewhat less extreme than the NVSwitch boards (because the GPU count per board is lower), but still represent one of the most demanding commercial PCB designs outside the NVSwitch boards themselves: 24–32 layers, Megtron 7 on NVLink 5.0 signal layers, 2+N+2 or 3+N+3 HDI, and integrated liquid cooling structures as described in Thermal Management on AI Server PCBs.

Power Architecture: 120 kW at 48 V

At 120 kW total rack power, the GB200 NVL72 requires a power architecture that standard 12 V data center infrastructure cannot support. The rack uses 48 V DC bus distribution throughout, a requirement driven by fundamental electrical engineering constraints.

At 120 kW and 12 V, the rack bus current would be 120,000 W / 12 V = 10,000 A. Carrying 10,000 A through bus bars within a rack enclosure would require impractically large copper cross-sections (several hundred square millimeters) and generate unacceptable resistive losses in the bus distribution wiring. At 48 V, the same 120 kW requires only 2,500 A—a manageable current that can be handled with bus bar systems of practical size and with standard high-current connectors.

The 48 V architecture cascades through the system:

Rack PDU (Power Distribution Unit): Converts facility AC (208 V or 400 V three-phase) to 48 V DC through an intermediate bus converter; efficiency target ≥ 97% (titanium efficiency class)
Tray-level distribution: 48 V is distributed to each compute tray and NVSwitch board via bus bars within the rack enclosure; each compute tray receives approximately 2,400 W from the 48 V bus
On-board VRM: Each compute tray baseboard converts 48 V to the required GPU VCORE voltage (~0.85 V) through a high-efficiency multi-phase buck converter; the step-down ratio of 56:1 (48 V to 0.85 V) is achieved in a single conversion stage in modern AI server VRM designs, avoiding the two-stage conversion used in older 12 V architectures

The PCB power delivery implications of the 48 V bus architecture are discussed in the OAM and GPU baseboard design context at OAM PCB Assembly Guide. At the GPU VCORE output stage, the design requirements are identical regardless of whether the bus input is 12 V or 48 V: the VCORE current of 800+ A per GPU requires 2–3 oz copper power planes, tight PDN impedance (< 0.1 mΩ from DC to 100 MHz at the GPU package), and extensive decoupling capacitor networks as specified in the AI Accelerator PCB Design Guide.

Liquid Cooling: Mandatory at 1,000 W per GPU

The B200 GPU's 1,000 W TDP makes air cooling technically impossible for sustained compute workloads in any practical data center environment. Air cooling at 1,000 W per GPU in a dense rack configuration would require airflow velocities and inlet temperatures that exceed standard data center CRAC (Computer Room Air Conditioning) specifications, and would produce jet-engine-level acoustic noise from the cooling fans.

The NVL72 implements mandatory direct liquid cooling (DLC) throughout the rack. The liquid cooling architecture has several layers:

Cold plates on each B200 GPU: Machined aluminum or copper cold plate assemblies make contact with the B200 package thermal interface surface through a TIM (Thermal Interface Material) layer; chilled water or facility coolant circulates through microchannels in the cold plate, absorbing GPU heat and carrying it out of the chassis
NVSwitch board cooling: NVSwitch 4.0 chips at ~400 W each also require liquid cooling; the NVSwitch boards have cold plate mounting structures integrated into their mechanical design
Grace CPU cooling: At ~100 W, the Grace CPU can be cooled by either liquid or air, but the NVL72's all-liquid-cooling design extends liquid cooling to the CPU modules as well for thermal uniformity
Rack-level coolant distribution: A manifold within the rack distributes facility-supplied chilled water (typically 18–24°C supply temperature) to each compute tray and NVSwitch board; return lines carry heated water (typically 30–40°C) back to the facility cooling plant

The PCB-level implications of mandatory liquid cooling are significant. Cold plate mounting requires precise mechanical integration with the PCB: mounting hole patterns must be placed without interfering with BGA package areas or critical signal routing; copper coin inserts beneath GPU packages (described in the thermal management guide) provide the conductive heat path from the package solder joint layer to the cold plate contact surface; and thermal via arrays beneath VRM components transfer VRM heat to the cold plate mounting structure rather than relying on ambient air convection.

The B200's thermal interface surface flatness requirement (< 0.1 mm across the full package area) is particularly challenging because the PCB and package both experience thermal expansion and contraction during operation. The TIM layer must maintain adequate contact area throughout the operating temperature range while accommodating these dimensional changes.

Unified Memory Fabric: 13.5 TB HBM3e

One of the GB200 NVL72's most significant architectural advantages is its unified memory fabric. The 13,824 GB (13.5 TB) of HBM3e across 72 B200 GPUs is addressable as a single coherent memory space by any GPU or CPU in the rack, without explicit data transfer or page migration operations.

This unified addressing is enabled by the combination of NVLink 5.0's cache-coherent memory access protocol and NVIDIA's NVLink address translation hardware. When a GPU needs to access data resident in another GPU's HBM3e, it issues a load to a remote HBM3e address; the NVSwitch fabric routes the memory request to the correct GPU and returns the data at NVLink bandwidth (~1,800 GB/s). The latency is higher than accessing local HBM (approximately 5–10 μs for a remote HBM3e access vs < 100 ns for local), but the bandwidth is far higher than what PCIe or InfiniBand would provide for equivalent remote memory access.

The practical consequence for AI workloads: models too large to fit in a single GPU's 192 GB of HBM3e can be loaded into the rack's 13.5 TB unified memory pool without partitioning the model across independent servers connected by InfiniBand. A 1T parameter model in BF16 precision (2 TB) fits in the NVL72's unified memory pool, enabling inference and potentially fine-tuning within a single rack without inter-rack communication. This capability positions the NVL72 as the primary platform for GPT-4-class and larger model inference in 2025–2026.

The HBM3e memory subsystem itself is entirely contained within each B200's CoWoS-L package—the PCB does not carry HBM signal routing. The PCB's responsibility for HBM is limited to supplying clean, regulated power to the HBM voltage domains (VDDQ_HBM per stack group) through the SXM6 power delivery pins, with tight noise and ripple specifications as described for H100/H200 HBM power planes in the HBM vs GDDR7 PCB layout guide.

Networking: InfiniBand and Ethernet Scale-Out

Within the NVL72 rack, GPU-to-GPU communication uses NVLink 5.0 exclusively—InfiniBand is not involved in intra-rack traffic. Between racks in a cluster, InfiniBand NDR (400 Gb/s per port) or high-speed Ethernet provides the inter-rack fabric. The NVL72 rack provides 8 × 400G InfiniBand ports per rack for cluster-scale scale-out.

At cluster scale, a 100-rack NVL72 deployment (7,200 GPUs) requires an InfiniBand fat-tree fabric with approximately 200 leaf switches and 100 spine switches, providing full bisection bandwidth across the cluster. The corresponding networking PCBs—InfiniBand NDR switch line cards using 112G PAM4 serdes lanes—must meet the signal integrity requirements described in the 112G PAM4 PCB Design guide.

NVIDIA also supports RoCE (RDMA over Converged Ethernet) deployments of NVL72 clusters using 400G or 800G Ethernet switches, enabling cloud providers that prefer Ethernet infrastructure to deploy NVL72 in their existing network fabric. The PCIe Gen5 interface on each Grace CPU provides connectivity for the InfiniBand or Ethernet NIC, consistent with the Gen5 design guidelines in the PCIe Gen5 PCB design guide.

Performance Numbers: What GB200 NVL72 Delivers

Metric	GB200 NVL72 (full rack)	DGX H100 (per node, 8 GPUs)	Ratio
FP8 compute (sparse)	648 PF	~16 PF	~40×
BF16 compute (dense)	162 PF	~7.9 PF	~20×
HBM memory capacity	13.5 TB	640 GB	~21×
HBM memory bandwidth	576 TB/s (72 × 8.0 TB/s)	26.8 TB/s (8 × 3.35 TB/s)	~21×
GPU-to-GPU bandwidth (intra)	1,800 GB/s per GPU (NVLink 5.0, non-blocking to all 72 GPUs)	900 GB/s per GPU (NVLink 4.0, within node only)	2× + full rack scale
Rack power	~120 kW	~10.2 kW	~12× (but 20× GPU count)
Performance/watt (FP8)	~5.4 PF/kW	~1.6 PF/kW	~3.4× more efficient

The performance/watt improvement is as significant as the raw performance increase: at 3.4× better FP8 compute per kilowatt, the NVL72 allows data centers to deliver dramatically more AI compute per unit of power and cooling infrastructure compared to equivalent H100 deployments. This efficiency gain is why the NVL72 is the preferred platform for both training frontier models (where compute density per rack determines training throughput) and serving large models at scale (where memory capacity determines how many model instances fit in a given infrastructure footprint).

PCB Manufacturing Demands Across the Rack

The GB200 NVL72 contains several distinct PCB types, each with its own manufacturing requirements. The aggregate PCB manufacturing complexity of a single NVL72 rack exceeds that of any previous commercial AI server system.

NVSwitch 4.0 boards (9 per rack): As described above, these are the most demanding boards in the system. Any-layer HDI (ELIC), 32–40 layers, Megtron 7 with HVLP copper throughout NVLink signal layers, precision backdrilling or full microvia replacement on all NVLink signal vias, and large-format fabrication (> 400 mm per side). Qualified fabricators for this board type can be counted on one hand globally. For details on the via and material requirements, see the GPU PCB Manufacturing guide.

Compute tray baseboards (18 per rack): 24–32 layers, 2+N+2 or 3+N+3 HDI, hybrid Megtron 7 / Megtron 6 stackup, heavy copper (2–3 oz) power planes for 48 V–to–VCORE conversion at > 800 A per GPU. The B200 GPU BGA assembly on these boards requires the specialized reflow profiling, warpage management, and 3D X-ray inspection described in GPU Board Assembly: Manufacturing Challenges.

Grace CPU module PCBs: The Grace CPU itself is mounted on a multi-chip module substrate (not a traditional PCB in the server sense), but the interconnect between the Grace module and the compute tray baseboard involves controlled-impedance connections that must meet NVLink-C2C signal integrity requirements.

Power distribution boards: Heavy copper bus bar PCBs distributing 48 V at 2,500 A rack-level current. Primary requirements are mechanical (large format, heavy copper plating, precision cut-outs for bus bar connections) rather than high-speed signal integrity. The high-speed PCB materials guide is less relevant here; standard copper-clad laminate at 3–4 oz is the primary specification.

Management boards: BMC, sensor, and management network boards within the rack. Standard complexity (< 12 layers, FR4-class materials), but must be qualified for the high-temperature, high-humidity, and electromagnetic environment inside a 120 kW liquid-cooled AI rack.

GB200 NVL72 vs DGX H100: System Comparison

Comparison Dimension	DGX H100 (4-node rack)	GB200 NVL72
GPU architecture	Hopper (GH100, single die)	Blackwell (dual GB100 die, CoWoS-L)
GPUs per rack	32 H100 SXM5	72 B200 SXM6
GPU TDP	700 W	1,000 W
HBM per GPU	80 GB HBM3 (3.35 TB/s)	192 GB HBM3e (8.0 TB/s)
Intra-rack NVLink scope	Within node only (8 GPUs per node)	Entire rack (all 72 GPUs)
NVLink generation	NVLink 4.0 (100 Gb/s per lane)	NVLink 5.0 (200 Gb/s per lane)
NVLink bandwidth per GPU	900 GB/s (intra-node only)	1,800 GB/s (to any of 71 other GPUs)
CPU integration	Separate AMD EPYC CPUs via PCIe Gen5	Grace CPUs co-packaged with B200 via NVLink-C2C (900 GB/s)
Host interface (CPU to GPU)	PCIe Gen5 ×16 (~128 GB/s)	NVLink-C2C (900 GB/s, coherent)
Rack power	~40 kW	~120 kW
Cooling requirement	Air cooling or DLC	Mandatory DLC
NVSwitch placement	On GPU baseboard (NVSwitch 3.0)	Dedicated NVSwitch 4.0 boards (9 per rack)
GPU baseboard complexity	20–24 layers; Megtron 6E	24–32 layers (compute tray); Megtron 7
Most complex PCB	H100 HGX baseboard (20–24L)	NVSwitch 4.0 board (32–40+L, any-layer HDI)

The architectural gap between the DGX H100 rack and the GB200 NVL72 is larger than any previous NVIDIA generation transition. The H100 introduced the Transformer Engine and doubled NVLink bandwidth; the NVL72 changes the fundamental unit of AI computation from the GPU node to the AI rack, enabled by a switch fabric architecture that has no precedent in commercial AI server hardware. The PCB engineering implications are correspondingly unprecedented, as the NVSwitch board requirements and compute tray designs push commercial PCB manufacturing to its absolute current limits.

FAQ

What does “NVL72” stand for?
NVL72 stands for “NVLink 72”—referring to 72 GPUs fully connected by a NVLink switch fabric. The “72” is the number of B200 GPUs in the rack. The NVL72 designation distinguishes this full-rack system from smaller GB200 configurations, such as the NVL36 (36 GPUs in half-rack configurations) and NVL2 (2 GPUs per node in a standard server).

Why is liquid cooling mandatory for the GB200 NVL72?
The B200 GPU's 1,000 W TDP cannot be adequately managed by air cooling in any practical server configuration. At 1,000 W per GPU and 72 GPUs per rack, the rack generates 72,000 W of heat from GPUs alone (before CPUs, VRMs, and other components add their contributions). Air cooling this heat load would require airflow velocities and volumes that are physically incompatible with standard 42U rack enclosures and standard data center air management. Liquid cooling (direct liquid cooling to cold plates on each GPU) is the only thermal management approach that can sustain B200 GPUs at full compute load continuously.

Can a GB200 NVL72 rack replace 4 DGX H100 racks for the same AI training workload?
For workloads that fit within the NVL72's memory capacity and can use the unified NVLink 5.0 fabric efficiently, the NVL72 provides significantly higher performance than 4 DGX H100 racks (approximately 20× higher BF16 compute vs one DGX H100 node, or approximately 5× vs 4 DGX H100 nodes). The more important comparison is per-GPU or per-FLOP cost and the workload's sensitivity to intra-rack vs inter-rack GPU-to-GPU bandwidth. The NVL72 eliminates the InfiniBand bandwidth bottleneck for intra-rack collectives that limits scaling efficiency in multi-node DGX H100 training.

What PCIe generation does the GB200 use for external connectivity?
The Grace CPU in the GB200 Superchip uses PCIe Gen5 ×16 for external connectivity—connecting to InfiniBand or Ethernet NICs for cluster networking. The GPU-to-CPU interface within the Superchip is NVLink-C2C (not PCIe), which provides 900 GB/s of coherent bandwidth—far exceeding what PCIe Gen5 can offer. PCIe Gen6, used in some B200 configurations for direct host connectivity, appears on the SXM6 form factor in environments without the Grace CPU Superchip pairing.

How many NVSwitch chips are in the entire GB200 NVL72 rack?
NVIDIA has not publicly disclosed the exact number of NVSwitch 4.0 chips per switch board. Based on the NVSwitch 4.0 chip's 72-port specification and the requirement to connect 72 GPUs (each with 18 NVLink links distributed across 9 switch boards) in a non-blocking topology, each switch board requires enough NVSwitch chips to handle the switching bandwidth for its share of the fabric. Industry estimates suggest 8 NVSwitch 4.0 chips per switch board, for a total of 72 NVSwitch 4.0 chips per rack, each operating at 14.4 TB/s aggregate bandwidth.

When was the GB200 NVL72 released and when did volume production begin?
NVIDIA announced the GB200 and NVL72 architecture at GTC March 2024. Volume production began ramping in the second half of 2024, with major cloud providers (AWS, Azure, Google Cloud, Oracle Cloud) announcing availability of GB200 NVL72 instances in late 2024 and expanding throughout 2025. By 2026, the NVL72 is the primary platform for frontier AI training and large-model inference at hyperscale cloud providers, while the H100 remains widely deployed for cost-sensitive training and inference workloads.

Need to Manufacture PCBs for GB200 NVL72 or Next-Generation AI Rack Programs?

The GB200 NVL72 defines the current frontier of AI server PCB requirements—from NVSwitch 4.0 boards at 32–40 layers with any-layer HDI to compute tray baseboards with Megtron 7 signal layers, heavy copper 48 V power planes, and integrated liquid cooling structures. NextPCB's advanced fabrication services cover the complete requirement set: sequential HDI lamination, ultra-low-loss laminate processing, HVLP copper foil, precision backdrilling, large-format boards, BGA assembly for CoWoS packages, 3D X-ray inspection, and IPC Class 3 standards throughout.

Get a quote from NextPCB →

Upload & Get Your Instant Quote Now Engineer Consultation

About the Author

Stacy Lu

With extensive experience in the PCB and PCBA industry, Stacy has established herself as a professional and dedicated Key Account Manager with an outstanding reputation. She excels at deeply understanding client needs, delivering effective and high-quality communication. Renowned for her meticulousness and reliability, Stacy is skilled at resolving client issues and fully supporting their business objectives.

7966 0 0 1 Facebook Twitter Linked In