Home
PCB Quote
Standard PCB Advanced PCB
Rev 0 PCBA
PCB Assembly
Rev 0 PCBA PCB Assembly Quote PCB Assembly Service PCB Assembly Capability PCB Stencil Service BOM Service Free Functional Testing
Components Sourcing
HQ Online Components BOM Tool
Gerber Viewer | DFM
Online Gerber Viewer HQDFM Design Analysis Software HQDFM User Manual
Capabilities & Services

NextPCB Capabilities

Standard PCB Capabilities Advanced PCB Capabilities PCB Assembly Capabilities

Capabilities by PCB Types

PCB Product Showsase Rigid PCBs Rogers PCB High-TG PCBs Heavy Copper PCBs HDI PCBs High-Speed PCBs High-Frequency PCBs Aluminum PCBs Copper-Core PCBs Ceramic PCBs Flex PCBs Rigid-Flex PCBs

Printed Circuit Boards

PCB Prototype Applicable Industries PCB Manufacturing Process Advanced PCB Materials

PCB Assembly

PCB Assembly Service PCB Stencil Service File Requirements PCB Assembly Guide IC Programming PCBA DFA BGA Assembly Capabilities Laser Labeling/Coding

Layer Buildup

Layer Stack-up Prepregs, cores, foils

SMD-Stencils

Laser Stencil

PCB Design-Aid & Layout

Layer Orientation BGA PCB Price Composition Printed Circuit Board Materials PCB Design & Layout Panel Creation Gold Fingers

Mechanics

V-Scoring Back drilling PCB milling

Surface

Via Covering Surface Finish Silkscreen Solder mask

Quality

E-Test X-RAY Design Rule Check A.O.I

Drills & Throughplating

Via-in-pad Blind & Buried Vias Annular Rings Side plating Plated Half-holes/Castellated Holes Plated-through Slots

Factory & Certificate

PCB Factory VR Visiting PCB Assembly Factory Show Certificate

New users: $30 off
24 hours Fast Turnaround
100% E-test & AOI

Free for 10pcs 50%OFF for 100pcs TURNKEY PCB ASSEMBLY
Tools & Resources
PCB Impedance Calculator PCB Stackups & Impedance PCB Trace Width Calculator AI Electrical Rule Check KiCad Resource Hub KiCad Version Converter NextPCB Accelerator Program Blog News
About Us
About Us Contact Us Why Us Feedback Help Center Payment Methods Shipping Methods

0
Support Team

support@nextpcb.com

0086-755-8364 3663

+86 13622941920
Feedback:
support@nextpcb.com

Blog / NVIDIA Blackwell Architecture Explained: B200, GB200 & PCB Design Impact

NVIDIA Blackwell Architecture Explained: B200, GB200 & PCB Design Impact

Q: What does "Blackwell" refer to in NVIDIA's naming scheme?

NVIDIA names its GPU architectures after scientists and mathematicians. Blackwell refers to David Harold Blackwell (1919–2010), an American statistician who made foundational contributions to game theory, probability theory, and mathematical statistics. He was the first African American inducted into the National Academy of Sciences.

Q: Is the B200 a single chip or multiple chips?

The B200 uses two GB100 dies connected via TSMC's CoWoS-L interposer. From a software perspective, the two dies present as a single GPU. The die-to-die interconnect inside the CoWoS package operates at ~900 GB/s and is transparent to application code.

Q: Why does Blackwell require direct liquid cooling when Hopper supported air cooling?

The B200's 1,000 W TDP exceeds the practical limit of air cooling for sustained operation in a rack-dense AI server environment. Air cooling at 1,000 W per GPU would require airflow volumes and temperatures that are incompatible with standard data center air management. DLC removes heat more efficiently, enabling higher power density per rack.

Q: Can existing H100 server infrastructure be upgraded to B200?

No. B200 uses the SXM6 form factor (incompatible with SXM5 sockets), requires DLC infrastructure, and demands substantially different baseboard PCB designs. A transition from H100 to B200 infrastructure is a full system replacement, not a GPU card swap.

Q: What PCIe version does B200 use for the host CPU connection?

B200 uses PCIe Gen6 (x16), which uses PAM4 signaling to achieve 64 GT/s per lane (approximately 256 GB/s total for a ×16 link). This is double the throughput of PCIe Gen5. In GB200 Superchip configurations, the Grace CPU connects to the B200 via NVLink-C2C instead of PCIe, providing 900 GB/s coherent bandwidth.

Q: What is the difference between B200 and GB200?

The B200 is the GPU accelerator alone (in SXM6 form factor). The GB200 is a combined module pairing a B200 GPU with an NVIDIA Grace ARM-based CPU, connected by the 900 GB/s NVLink-C2C die-to-die interconnect. The GB200 NVL72 is a complete rack-scale system using 36 GB200 modules.

Posted: June, 2026 Last Updated: June, 2026 Writer: Arya Li Share:

NVIDIA's Blackwell architecture represents the most significant generational leap in GPU design since the introduction of the transformer engine in Hopper. For software teams, the headline is raw performance: up to 9,000 FP8 TFLOPS per GPU, nearly 10× the training throughput of H100. For hardware engineers and PCB designers, the headline is something different: a 1,000 W thermal envelope, a dual-die CoWoS package, NVLink 5.0 at 1,800 GB/s, and PCIe Gen6—a combination that forces a fundamental rethink of board stackup, material selection, and thermal architecture.

This article unpacks the Blackwell architecture from the silicon outward, with a focus on what each design decision means at the PCB level.

Table of Contents
Introduction
What Is the NVIDIA Blackwell Architecture?
Key Architectural Advances Over Hopper
NVIDIA B200: Specifications and Die Architecture
GB200: The Grace Blackwell Superchip
GB200 NVL72: Rack-Scale AI Infrastructure
What Blackwell Means for PCB Design
Hopper vs Blackwell: PCB Design Comparison
Manufacturing Considerations
FAQ

What Is the NVIDIA Blackwell Architecture?

Blackwell is NVIDIA's fifth-generation data center GPU architecture, succeeding Hopper (H100/H200). It was announced in March 2024 and entered volume production in late 2024. The architecture is named after David Harold Blackwell, an American statistician and game theorist.

The Blackwell family includes several distinct products:

B100: Entry-level Blackwell for dense deployments, lower TDP
B200: Flagship Blackwell GPU for AI training and inference
GB200: Grace Blackwell Superchip—B200 GPU paired with an NVIDIA Grace (ARM-based) CPU on a single module
GB200 NVL72: Rack-scale system with 36 GB200 Superchips (72 B200 GPUs) connected via NVLink 5.0
B200A: Cloud-optimized variant with adjusted power envelope

Key Architectural Advances Over Hopper

Feature	Hopper (H100)	Blackwell (B200)	Improvement
Process node	TSMC 4N	TSMC 4NP	Refined 4 nm class
Die configuration	1× GH100 (80B transistors)	2× GB100 (208B transistors total)	2.6× transistor count
Packaging	Single-die flip-chip	Dual-die CoWoS-L	Advanced 2.5D integration
FP8 Training TFLOPS	~2,000 (with sparsity)	9,000	~4.5×
Memory type	HBM2e (80 GB)	HBM3e (192 GB)	2.4× capacity
Memory bandwidth	3.35 TB/s	8.0 TB/s	2.4×
NVLink generation	NVLink 4.0	NVLink 5.0	2× bandwidth (1,800 GB/s)
PCIe generation	PCIe Gen5	PCIe Gen6	2× per-lane throughput (PAM4)
TDP	700 W	1,000 W	+43%
Form factor	SXM5	SXM6	Larger footprint, higher pin count

The introduction of second-generation Transformer Engine with FP8 mixed precision, a new Fifth-Generation NVTensor Core, and RAS (Reliability, Availability, Serviceability) Engine for in-field error correction are among the software-visible advances. For PCB engineers, the critical numbers are TDP, NVLink bandwidth, PCIe generation, and package type.

NVIDIA B200: Specifications and Die Architecture

Dual-Die Design and CoWoS-L Packaging

The single most consequential architectural decision in Blackwell—from a PCB standpoint—is the use of two GB100 dies connected by TSMC's CoWoS-L (Chip-on-Wafer-on-Substrate with Local Silicon Interconnect) packaging technology.

At the transistor counts required for B200 performance targets, a single monolithic die would measure approximately 1,000 mm²—exceeding the reticle limit of current EUV lithography equipment (~858 mm²) and yielding poorly. TSMC's CoWoS-L solves this by placing two separate GB100 dies (each approximately 460 mm²) side-by-side on a silicon interposer, connected by a dense array of microbumps providing die-to-die bandwidth of ~900 GB/s.

From the perspective of the PCB carrying the B200, the package presents as a single very large BGA component with an expanded footprint relative to SXM5. The substrate under the CoWoS assembly is itself a complex interconnect structure, and the PCB must support its mounting area, power delivery, and signal escape routing with extremely fine features.

Compute: FP8, FP16, BF16, and INT8

The B200's Fifth-Generation NVTensor Cores natively support FP8, FP16, BF16, FP32, INT4, and INT8 precisions. The headline throughput numbers:

Precision	B200 TFLOPS (dense)	B200 TFLOPS (sparse)	H100 TFLOPS (dense)
FP8	4,500	9,000	~2,000 (sparse)
FP16 / BF16	2,250	4,500	989
FP32	75	—	67
INT8	4,500	9,000	~2,000 (sparse)

HBM3e Memory: 192 GB at 8.0 TB/s

The B200 integrates 192 GB of HBM3e across eight stacks, delivering 8.0 TB/s of memory bandwidth. HBM stacks are placed directly on the CoWoS interposer adjacent to the GB100 dies, connected via through-silicon vias (TSVs) within the HBM packages and microbumps to the interposer.

This on-package memory architecture means that the PCB does not carry HBM signals—all HBM routing occurs within the CoWoS package itself. However, the PCB must still provide the power rails that feed HBM through the package substrate, with tight ripple and transient requirements.

NVLink 5.0: 1,800 GB/s Bidirectional

NVLink 5.0 doubles per-lane bandwidth over NVLink 4.0, achieving 1,800 GB/s total bidirectional bandwidth per GPU (900 GB/s in each direction). Each NVLink 5.0 link runs at 200 Gb/s per lane, and the B200 supports 18 links (for a total of 18 × 2 × 100 Gb/s = 3,600 Gb/s = ~450 GB/s per direction, aggregated across all links to 900 GB/s per direction).

At 200 Gb/s per lane, NVLink 5.0 signals are among the fastest routed on any commercial PCB today. The channel loss budget from GPU package pad to NVSwitch package pad—including PCB trace, vias, and connectors—is extremely tight. Meeting this budget requires:

Ultra-low-loss PCB laminates with dissipation factor (Df) < 0.002
Minimized via stub length (backdrilling to within < 5 mils of the signal layer)
Smooth copper foil (very-low-profile or ultra-low-profile grades) to reduce skin-effect losses at high frequencies
Tight differential pair impedance control (100 Ω ± 5%)

PCIe Gen6 Host Interface

Blackwell introduces PCIe Gen6, doubling throughput over Gen5 by switching encoding from NRZ (Non-Return-to-Zero) to PAM4 (Pulse Amplitude Modulation, 4 levels) at the same 32 GT/s signaling rate. This yields 64 GT/s effective throughput per lane, or approximately 256 GB/s across a ×16 link.

PAM4 encoding significantly reduces noise margins compared to NRZ signaling at equivalent data rates. For the PCB connecting the B200 to the host CPU:

Channel insertion loss must be < 28 dB at 16 GHz (the Nyquist frequency for 32 GT/s PAM4)
Return loss and crosstalk specifications are tighter than Gen5
Forward Error Correction (FEC) is mandatory in the Gen6 specification, but FEC adds latency and PCB signal quality still directly affects bit error rate before correction

GB200: The Grace Blackwell Superchip

The GB200 Superchip integrates a B200 GPU and an NVIDIA Grace CPU (based on ARM Neoverse V2 cores) on a single module, connected by a 900 GB/s NVLink-C2C (Chip-to-Chip) interconnect. This replaces the traditional PCIe host interface between CPU and GPU with a cache-coherent, low-latency memory interconnect.

Key GB200 Superchip specifications:

Grace CPU: 72 ARM Neoverse V2 cores, 480 GB LPDDR5X memory, 128-bit memory bus
B200 GPU: Full B200 specifications (192 GB HBM3e, 9,000 FP8 TFLOPS sparse)
NVLink-C2C bandwidth: 900 GB/s bidirectional (CPU↔GPU)
Combined memory: 672 GB (480 GB LPDDR5X + 192 GB HBM3e), addressable as unified memory space
Module TDP: ~1,200 W (combined CPU + GPU)

The GB200 module is not an individual PCB in the traditional sense—it is a multi-chip module (MCM) that mounts to a baseboard. That baseboard must handle the combined power delivery, NVLink 5.0 routing to neighboring modules, and PCIe/network connectivity, all within the constraints of a rack-optimized form factor.

GB200 NVL72: Rack-Scale AI Infrastructure

The GB200 NVL72 is a complete rack-scale system containing 36 GB200 Superchips (36 Grace CPUs + 72 B200 GPUs) connected in a fully-connected NVLink 5.0 fabric via NVSwitch chips. The system operates as a single logical GPU with 13.5 TB of unified HBM3e memory and 130 petaFLOPS of FP4 compute.

Infrastructure specifications:

GPUs per rack: 72 B200
Total HBM3e: 72 × 192 GB = 13,824 GB (~13.5 TB)
NVLink switches per rack: 9 NVSwitch boards
Rack power consumption: ~120 kW
Cooling: Direct liquid cooling (mandatory)
Network: 8× 400G InfiniBand per rack

The NVSwitch boards within the NVL72 rack are among the most complex PCBs manufactured for commercial deployment, routing NVLink 5.0 signals between all 72 GPUs simultaneously across a fully non-blocking fabric.

What Blackwell Means for PCB Design

Layer Count Requirements

The transition from Hopper to Blackwell drives a significant increase in required PCB layer count:

Board	H100/H200 Baseboard	B200 Baseboard	GB200 NVL72 NVSwitch Board
Typical layer count	16–20	24–32	32–40+
Primary drivers	NVLink 4.0 routing, power planes	NVLink 5.0, PCIe Gen6, higher current power planes	Fully-connected NVLink 5.0 fabric, maximum signal density

Additional layers are needed for: dedicated NVLink 5.0 signal routing layers (which cannot share layers with power or other signal types due to crosstalk requirements); additional power planes for the expanded rail count in the B200 PDN; and HDI build-up layers for fine-pitch BGA escape routing under the SXM6 socket.

Material Selection

NVLink 5.0 at 200 Gb/s per lane makes material selection one of the most critical decisions in Blackwell PCB design. The insertion loss budget from GPU to NVSwitch is fixed by the NVLink 5.0 specification; the PCB laminate's dielectric loss consumes a portion of that budget that cannot be recovered.

Laminate	Dk (at 10 GHz)	Df (at 10 GHz)	Suitable for B200?
Standard FR4	~4.5	~0.020	No — unacceptable loss at NVLink 5.0 frequencies
Panasonic Megtron 6	~3.6	~0.004	Marginal for NVLink 5.0; suitable for non-NVLink layers
Panasonic Megtron 7	~3.4	~0.002	Yes — recommended for NVLink 5.0 and PCIe Gen6 layers
Isola Tachyon 100G	~3.6	~0.0021	Yes — suitable for NVLink 5.0 routing layers
Rogers 4350B	~3.48	~0.0037	Conditional — check channel budget for specific trace lengths
Rogers RO4450F	~3.52	~0.0037	Conditional — prepreg use; verify bonding compatibility

Many Blackwell board designs use a hybrid stackup: Megtron 7 or Tachyon 100G on the high-speed signal layers, with lower-cost materials on power, ground, and low-speed signal layers to manage overall board cost.

>> Advanced PCB Materials - NextPCB

Signal Integrity at NVLink 5.0 and PCIe Gen6 Speeds

At 200 Gb/s per lane (NVLink 5.0) and 64 GT/s per lane (PCIe Gen6 PAM4), the following SI design rules apply to Blackwell baseboards:

Differential impedance: 100 Ω ± 5% for NVLink 5.0 pairs; 85 Ω ± 5% for PCIe Gen6
Intra-pair skew: < 5 ps (NVLink 5.0); < 3 ps (PCIe Gen6 PAM4)
Via stub length: < 10 mils after backdrilling, ideally < 5 mils on NVLink 5.0 traces
Copper foil: Very-low-profile (VLP) or ultra-low-profile (HVLP) copper required to reduce skin-effect loss at > 10 GHz; standard electrodeposited (ED) copper is not acceptable on NVLink 5.0 signal layers
Trace width: Typically 75–100 μm (3–4 mil) on inner signal layers, with spacing ≥ 2× trace width between adjacent differential pairs
Crosstalk: Near-end crosstalk (NEXT) < −30 dB; far-end crosstalk (FEXT) < −40 dB at 10 GHz for NVLink 5.0 channels

Power Delivery Network

The B200's 1,000 W TDP and dual-die architecture require a substantially more complex PDN than Hopper:

Primary GPU core voltage (VCORE): ~0.85–0.9 V at up to 800+ A; copper plane resistance must be < 0.2 mΩ end-to-end
HBM power rails (VDDQ_HBM): Separate regulated supplies per HBM stack group, with tight noise requirements (< 5 mV ripple)
I/O and auxiliary rails: NVLink I/O, PCIe, and management functions each require isolated regulated supplies
Total rail count: B200 baseboards typically manage 15–25 distinct power rails
Target PDN impedance: < 0.1 mΩ from DC to 100 MHz at the GPU package; some designs target < 50 μΩ for transient response

Thermal Management

At 1,000 W per GPU, air cooling alone cannot maintain junction temperatures within operating limits for sustained compute workloads. Blackwell server designs universally incorporate direct liquid cooling (DLC), and the PCB must accommodate this:

Cold plate mounting area: The PCB layout must reserve mounting hole patterns and keep-out zones for cold plate hardware above the SXM6 GPU module
Thermal via arrays: Dense arrays (0.4–0.5 mm pitch) of thermal vias under VRM components and near the GPU mounting zone transfer heat to internal copper planes and subsequently to the cold plate structure
Copper coin inserts: Where direct GPU-to-cold-plate contact is not achievable, copper coins embedded in the PCB provide a high-conductivity thermal path; cavity milling tolerance is typically ± 0.05 mm for reliable coin seating
Board material T_g: Glass transition temperature ≥ 180°C required; continuous operation near 1,000 W power zones elevates local PCB temperature well above ambient, and thermal cycling accelerates delamination in lower-T_g materials

HDI and Via Technology

The SXM6 socket and companion chips (NVSwitch, PCIe retimer, power management ICs) all use fine-pitch BGA packages. Routing signal and power escape from these packages requires HDI via structures:

Laser-drilled microvias: 75–100 μm diameter, connecting adjacent layers for BGA escape; stacked and staggered configurations used depending on available layers
Via-in-pad: Vias placed directly in BGA pads (filled with conductive or non-conductive epoxy and plated over) to maximize routing density under fine-pitch packages
Sequential lamination: Build-up HDI structures (1+N+1, 2+N+2, or any-layer) require multiple press and drill cycles; B200 designs with 28–32 layers typically use 2+N+2 or 3+N+3 HDI
ELIC (Every Layer Interconnect): Required for the most complex NVSwitch boards where via density exceeds what sequential build-up can achieve; all layers are interconnectable via stacked filled microvias

Hopper vs Blackwell: PCB Design Comparison

Design Parameter	Hopper (H100/H200)	Blackwell (B200)
Baseboard layer count	16–20	24–32+
Primary laminate	Megtron 6, Tachyon 100G	Megtron 7, Tachyon 100G
NVLink trace speed	100 Gb/s per lane (NVLink 4.0)	200 Gb/s per lane (NVLink 5.0)
Host PCIe generation	PCIe Gen5 (NRZ, 32 GT/s)	PCIe Gen6 (PAM4, 64 GT/s)
GPU TDP	700 W	1,000 W
Cooling requirement	Air or DLC	DLC mandatory
Via stub removal	Backdrilling (< 10 mil stub)	Backdrilling (< 5 mil stub) or laser via
Copper foil grade	Low-profile (LP)	Very-low-profile (VLP) or HVLP
PDN complexity	High (10–15 rails)	Very high (15–25 rails)
HDI type	1+N+1 or 2+N+2	2+N+2 or 3+N+3; ELIC for NVSwitch boards
Copper coin requirement	Optional	Common / recommended

Manufacturing Considerations

Producing PCBs for Blackwell-based AI servers is among the most demanding work in the PCB fabrication industry. The key manufacturing requirements are:

Sequential lamination: 2+N+2 or 3+N+3 HDI requires 3–4 lamination cycles; each cycle introduces thermal stress and requires tight control of dielectric thickness and layer registration
Layer registration: ± 50 μm or better across the full board area; misregistration at this layer count degrades via alignment and controlled impedance
Backdrilling accuracy: Controlled-depth drilling to within ± 50 μm; requires CNC machines with depth feedback and board-specific drill files generated from the exact as-built stackup
Laser drilling: 75–100 μm microvia diameter; CO₂ or UV laser systems; stacked via alignment within 25 μm
Copper coin integration: Cavity milling to ± 0.05 mm; coin press and bonding; post-lamination planarity < 0.1 mm across the coin area
Surface finish: ENIG (Electroless Nickel Immersion Gold) or ENEPIG for fine-pitch BGA pads; OSP acceptable on press-fit connector areas
Electrical testing: Flying probe or bed-of-nails testing at 100% for inner-layer continuity; TDR (Time Domain Reflectometry) for controlled impedance verification on representative test coupons

FAQ

What does “Blackwell” refer to in NVIDIA's naming scheme?
NVIDIA names its GPU architectures after scientists and mathematicians. Blackwell refers to David Harold Blackwell (1919–2010), an American statistician who made foundational contributions to game theory, probability theory, and mathematical statistics. He was the first African American inducted into the National Academy of Sciences.

Is the B200 a single chip or multiple chips?
The B200 uses two GB100 dies connected via TSMC's CoWoS-L interposer. From a software perspective, the two dies present as a single GPU. The die-to-die interconnect inside the CoWoS package operates at ~900 GB/s and is transparent to application code.

Why does Blackwell require direct liquid cooling when Hopper supported air cooling?
The B200's 1,000 W TDP exceeds the practical limit of air cooling for sustained operation in a rack-dense AI server environment. Air cooling at 1,000 W per GPU would require airflow volumes and temperatures that are incompatible with standard data center air management. DLC removes heat more efficiently, enabling higher power density per rack.

Can existing H100 server infrastructure be upgraded to B200?
No. B200 uses the SXM6 form factor (incompatible with SXM5 sockets), requires DLC infrastructure, and demands substantially different baseboard PCB designs. A transition from H100 to B200 infrastructure is a full system replacement, not a GPU card swap.

What PCIe version does B200 use for the host CPU connection?
B200 uses PCIe Gen6 (x16), which uses PAM4 signaling to achieve 64 GT/s per lane (approximately 256 GB/s total for a ×16 link). This is double the throughput of PCIe Gen5. In GB200 Superchip configurations, the Grace CPU connects to the B200 via NVLink-C2C instead of PCIe, providing 900 GB/s coherent bandwidth.

What is the difference between B200 and GB200?
The B200 is the GPU accelerator alone (in SXM6 form factor). The GB200 is a combined module pairing a B200 GPU with an NVIDIA Grace ARM-based CPU, connected by the 900 GB/s NVLink-C2C die-to-die interconnect. The GB200 NVL72 is a complete rack-scale system using 36 GB200 modules.

Need to Manufacture AI Server PCBs?

Designing for Blackwell? NextPCB supports the full PCB manufacturing stack for B200 and GB200 infrastructure—high-layer-count fabrication, Megtron 7 and low-loss laminate processing, any-layer HDI, backdrilling, copper coin integration, and complete PCBA services.

Upload Your PCB Design& Get Your Instant Quote Now Engineer Consultation

Related Articles:

About the Author

Arya Li, Project Manager at NextPCB.com

With extensive experience in manufacturing and international client management, Arya has guided factory visits for over 200 overseas clients, providing bilingual (English & Chinese) presentations on production processes, quality control systems, and advanced manufacturing capabilities. Her deep understanding of both the factory side and client requirements allows her to deliver professional, reliable PCB solutions efficiently. Detail-oriented and service-driven, Arya is committed to being a trusted partner for clients and showcasing the strength and expertise of the factory in the global PCB and PCBA market.

4063 0 0 1 Facebook Twitter Linked In