Support Team
Feedback:
support@nextpcb.comNVIDIA's Blackwell architecture represents the most significant generational leap in GPU design since the introduction of the transformer engine in Hopper. For software teams, the headline is raw performance: up to 9,000 FP8 TFLOPS per GPU, nearly 10× the training throughput of H100. For hardware engineers and PCB designers, the headline is something different: a 1,000 W thermal envelope, a dual-die CoWoS package, NVLink 5.0 at 1,800 GB/s, and PCIe Gen6—a combination that forces a fundamental rethink of board stackup, material selection, and thermal architecture.
This article unpacks the Blackwell architecture from the silicon outward, with a focus on what each design decision means at the PCB level.
Blackwell is NVIDIA's fifth-generation data center GPU architecture, succeeding Hopper (H100/H200). It was announced in March 2024 and entered volume production in late 2024. The architecture is named after David Harold Blackwell, an American statistician and game theorist.
The Blackwell family includes several distinct products:
| Feature | Hopper (H100) | Blackwell (B200) | Improvement |
|---|---|---|---|
| Process node | TSMC 4N | TSMC 4NP | Refined 4 nm class |
| Die configuration | 1× GH100 (80B transistors) | 2× GB100 (208B transistors total) | 2.6× transistor count |
| Packaging | Single-die flip-chip | Dual-die CoWoS-L | Advanced 2.5D integration |
| FP8 Training TFLOPS | ~2,000 (with sparsity) | 9,000 | ~4.5× |
| Memory type | HBM2e (80 GB) | HBM3e (192 GB) | 2.4× capacity |
| Memory bandwidth | 3.35 TB/s | 8.0 TB/s | 2.4× |
| NVLink generation | NVLink 4.0 | NVLink 5.0 | 2× bandwidth (1,800 GB/s) |
| PCIe generation | PCIe Gen5 | PCIe Gen6 | 2× per-lane throughput (PAM4) |
| TDP | 700 W | 1,000 W | +43% |
| Form factor | SXM5 | SXM6 | Larger footprint, higher pin count |
The introduction of second-generation Transformer Engine with FP8 mixed precision, a new Fifth-Generation NVTensor Core, and RAS (Reliability, Availability, Serviceability) Engine for in-field error correction are among the software-visible advances. For PCB engineers, the critical numbers are TDP, NVLink bandwidth, PCIe generation, and package type.
The single most consequential architectural decision in Blackwell—from a PCB standpoint—is the use of two GB100 dies connected by TSMC's CoWoS-L (Chip-on-Wafer-on-Substrate with Local Silicon Interconnect) packaging technology.
At the transistor counts required for B200 performance targets, a single monolithic die would measure approximately 1,000 mm2—exceeding the reticle limit of current EUV lithography equipment (~858 mm2) and yielding poorly. TSMC's CoWoS-L solves this by placing two separate GB100 dies (each approximately 460 mm2) side-by-side on a silicon interposer, connected by a dense array of microbumps providing die-to-die bandwidth of ~900 GB/s.
From the perspective of the PCB carrying the B200, the package presents as a single very large BGA component with an expanded footprint relative to SXM5. The substrate under the CoWoS assembly is itself a complex interconnect structure, and the PCB must support its mounting area, power delivery, and signal escape routing with extremely fine features.
The B200's Fifth-Generation NVTensor Cores natively support FP8, FP16, BF16, FP32, INT4, and INT8 precisions. The headline throughput numbers:
| Precision | B200 TFLOPS (dense) | B200 TFLOPS (sparse) | H100 TFLOPS (dense) |
|---|---|---|---|
| FP8 | 4,500 | 9,000 | ~2,000 (sparse) |
| FP16 / BF16 | 2,250 | 4,500 | 989 |
| FP32 | 75 | — | 67 |
| INT8 | 4,500 | 9,000 | ~2,000 (sparse) |
The B200 integrates 192 GB of HBM3e across eight stacks, delivering 8.0 TB/s of memory bandwidth. HBM stacks are placed directly on the CoWoS interposer adjacent to the GB100 dies, connected via through-silicon vias (TSVs) within the HBM packages and microbumps to the interposer.
This on-package memory architecture means that the PCB does not carry HBM signals—all HBM routing occurs within the CoWoS package itself. However, the PCB must still provide the power rails that feed HBM through the package substrate, with tight ripple and transient requirements.
NVLink 5.0 doubles per-lane bandwidth over NVLink 4.0, achieving 1,800 GB/s total bidirectional bandwidth per GPU (900 GB/s in each direction). Each NVLink 5.0 link runs at 200 Gb/s per lane, and the B200 supports 18 links (for a total of 18 × 2 × 100 Gb/s = 3,600 Gb/s = ~450 GB/s per direction, aggregated across all links to 900 GB/s per direction).
At 200 Gb/s per lane, NVLink 5.0 signals are among the fastest routed on any commercial PCB today. The channel loss budget from GPU package pad to NVSwitch package pad—including PCB trace, vias, and connectors—is extremely tight. Meeting this budget requires:
Blackwell introduces PCIe Gen6, doubling throughput over Gen5 by switching encoding from NRZ (Non-Return-to-Zero) to PAM4 (Pulse Amplitude Modulation, 4 levels) at the same 32 GT/s signaling rate. This yields 64 GT/s effective throughput per lane, or approximately 256 GB/s across a ×16 link.
PAM4 encoding significantly reduces noise margins compared to NRZ signaling at equivalent data rates. For the PCB connecting the B200 to the host CPU:
The GB200 Superchip integrates a B200 GPU and an NVIDIA Grace CPU (based on ARM Neoverse V2 cores) on a single module, connected by a 900 GB/s NVLink-C2C (Chip-to-Chip) interconnect. This replaces the traditional PCIe host interface between CPU and GPU with a cache-coherent, low-latency memory interconnect.
Key GB200 Superchip specifications:
The GB200 module is not an individual PCB in the traditional sense—it is a multi-chip module (MCM) that mounts to a baseboard. That baseboard must handle the combined power delivery, NVLink 5.0 routing to neighboring modules, and PCIe/network connectivity, all within the constraints of a rack-optimized form factor.
The GB200 NVL72 is a complete rack-scale system containing 36 GB200 Superchips (36 Grace CPUs + 72 B200 GPUs) connected in a fully-connected NVLink 5.0 fabric via NVSwitch chips. The system operates as a single logical GPU with 13.5 TB of unified HBM3e memory and 130 petaFLOPS of FP4 compute.
Infrastructure specifications:
The NVSwitch boards within the NVL72 rack are among the most complex PCBs manufactured for commercial deployment, routing NVLink 5.0 signals between all 72 GPUs simultaneously across a fully non-blocking fabric.
The transition from Hopper to Blackwell drives a significant increase in required PCB layer count:
| Board | H100/H200 Baseboard | B200 Baseboard | GB200 NVL72 NVSwitch Board |
|---|---|---|---|
| Typical layer count | 16–20 | 24–32 | 32–40+ |
| Primary drivers | NVLink 4.0 routing, power planes | NVLink 5.0, PCIe Gen6, higher current power planes | Fully-connected NVLink 5.0 fabric, maximum signal density |
Additional layers are needed for: dedicated NVLink 5.0 signal routing layers (which cannot share layers with power or other signal types due to crosstalk requirements); additional power planes for the expanded rail count in the B200 PDN; and HDI build-up layers for fine-pitch BGA escape routing under the SXM6 socket.
NVLink 5.0 at 200 Gb/s per lane makes material selection one of the most critical decisions in Blackwell PCB design. The insertion loss budget from GPU to NVSwitch is fixed by the NVLink 5.0 specification; the PCB laminate's dielectric loss consumes a portion of that budget that cannot be recovered.
| Laminate | Dk (at 10 GHz) | Df (at 10 GHz) | Suitable for B200? |
|---|---|---|---|
| Standard FR4 | ~4.5 | ~0.020 | No — unacceptable loss at NVLink 5.0 frequencies |
| Panasonic Megtron 6 | ~3.6 | ~0.004 | Marginal for NVLink 5.0; suitable for non-NVLink layers |
| Panasonic Megtron 7 | ~3.4 | ~0.002 | Yes — recommended for NVLink 5.0 and PCIe Gen6 layers |
| Isola Tachyon 100G | ~3.6 | ~0.0021 | Yes — suitable for NVLink 5.0 routing layers |
| Rogers 4350B | ~3.48 | ~0.0037 | Conditional — check channel budget for specific trace lengths |
| Rogers RO4450F | ~3.52 | ~0.0037 | Conditional — prepreg use; verify bonding compatibility |
Many Blackwell board designs use a hybrid stackup: Megtron 7 or Tachyon 100G on the high-speed signal layers, with lower-cost materials on power, ground, and low-speed signal layers to manage overall board cost.
At 200 Gb/s per lane (NVLink 5.0) and 64 GT/s per lane (PCIe Gen6 PAM4), the following SI design rules apply to Blackwell baseboards:
The B200's 1,000 W TDP and dual-die architecture require a substantially more complex PDN than Hopper:
At 1,000 W per GPU, air cooling alone cannot maintain junction temperatures within operating limits for sustained compute workloads. Blackwell server designs universally incorporate direct liquid cooling (DLC), and the PCB must accommodate this:
The SXM6 socket and companion chips (NVSwitch, PCIe retimer, power management ICs) all use fine-pitch BGA packages. Routing signal and power escape from these packages requires HDI via structures:
| Design Parameter | Hopper (H100/H200) | Blackwell (B200) |
|---|---|---|
| Baseboard layer count | 16–20 | 24–32+ |
| Primary laminate | Megtron 6, Tachyon 100G | Megtron 7, Tachyon 100G |
| NVLink trace speed | 100 Gb/s per lane (NVLink 4.0) | 200 Gb/s per lane (NVLink 5.0) |
| Host PCIe generation | PCIe Gen5 (NRZ, 32 GT/s) | PCIe Gen6 (PAM4, 64 GT/s) |
| GPU TDP | 700 W | 1,000 W |
| Cooling requirement | Air or DLC | DLC mandatory |
| Via stub removal | Backdrilling (< 10 mil stub) | Backdrilling (< 5 mil stub) or laser via |
| Copper foil grade | Low-profile (LP) | Very-low-profile (VLP) or HVLP |
| PDN complexity | High (10–15 rails) | Very high (15–25 rails) |
| HDI type | 1+N+1 or 2+N+2 | 2+N+2 or 3+N+3; ELIC for NVSwitch boards |
| Copper coin requirement | Optional | Common / recommended |
Producing PCBs for Blackwell-based AI servers is among the most demanding work in the PCB fabrication industry. The key manufacturing requirements are:
What does “Blackwell” refer to in NVIDIA's naming scheme?
NVIDIA names its GPU architectures after scientists and mathematicians. Blackwell refers to David Harold Blackwell (1919–2010), an American statistician who made foundational contributions to game theory, probability theory, and mathematical statistics. He was the first African American inducted into the National Academy of Sciences.
Is the B200 a single chip or multiple chips?
The B200 uses two GB100 dies connected via TSMC's CoWoS-L interposer. From a software perspective, the two dies present as a single GPU. The die-to-die interconnect inside the CoWoS package operates at ~900 GB/s and is transparent to application code.
Why does Blackwell require direct liquid cooling when Hopper supported air cooling?
The B200's 1,000 W TDP exceeds the practical limit of air cooling for sustained operation in a rack-dense AI server environment. Air cooling at 1,000 W per GPU would require airflow volumes and temperatures that are incompatible with standard data center air management. DLC removes heat more efficiently, enabling higher power density per rack.
Can existing H100 server infrastructure be upgraded to B200?
No. B200 uses the SXM6 form factor (incompatible with SXM5 sockets), requires DLC infrastructure, and demands substantially different baseboard PCB designs. A transition from H100 to B200 infrastructure is a full system replacement, not a GPU card swap.
What PCIe version does B200 use for the host CPU connection?
B200 uses PCIe Gen6 (x16), which uses PAM4 signaling to achieve 64 GT/s per lane (approximately 256 GB/s total for a ×16 link). This is double the throughput of PCIe Gen5. In GB200 Superchip configurations, the Grace CPU connects to the B200 via NVLink-C2C instead of PCIe, providing 900 GB/s coherent bandwidth.
What is the difference between B200 and GB200?
The B200 is the GPU accelerator alone (in SXM6 form factor). The GB200 is a combined module pairing a B200 GPU with an NVIDIA Grace ARM-based CPU, connected by the 900 GB/s NVLink-C2C die-to-die interconnect. The GB200 NVL72 is a complete rack-scale system using 36 GB200 modules.
Designing for Blackwell? NextPCB supports the full PCB manufacturing stack for B200 and GB200 infrastructure—high-layer-count fabrication, Megtron 7 and low-loss laminate processing, any-layer HDI, backdrilling, copper coin integration, and complete PCBA services.
Related Articles:
Still, need help? Contact Us: support@nextpcb.com
Need a PCB or PCBA quote? Quote now