Contact Us
Blog / A100 vs H100: GPU Generational Leap & PCB Stack Differences Explained

A100 vs H100: GPU Generational Leap & PCB Stack Differences Explained

Posted: June, 2026 Writer: NextPCB - S Share: NEXTPCB Official youtube NEXTPCB Official Facefook NEXTPCB Official Twitter NEXTPCB Official Instagram NEXTPCB Official Linkedin NEXTPCB Official Tiktok NEXTPCB Official Bksy

When NVIDIA transitioned from the Ampere generation (A100) to the Hopper generation (H100), the performance numbers made headlines: roughly 3× the training throughput, double the NVLink bandwidth, and a jump to HBM3 memory. What received far less attention was what those improvements required from the printed circuit boards underneath.

The A100 and H100 are not separated by a minor process shrink or a clock speed bump. They represent a genuine generational leap in semiconductor architecture—one that cascades directly into board-level engineering. Layer counts increased. Material grades changed. Signal integrity rules tightened. Power delivery complexity grew. Thermal requirements pushed several cooling configurations past what air-cooling could reliably sustain.

This article examines the A100-to-H100 transition from the bottom of the PCB stack upward, explaining why engineers designing or manufacturing H100-based infrastructure cannot simply adapt A100 board designs—they must build something fundamentally different.

  1. Table of Contents
  2. Introduction
  3. A100 and H100: A Generation Apart
  4. Architecture Comparison
  5. Full Specification Comparison: A100 vs H100
  6. NVLink 3.0 vs NVLink 4.0: Interconnect Bandwidth Doubles
  7. HBM2e vs HBM3: Memory Architecture Evolution
  8. PCIe Gen4 vs PCIe Gen5: Host Interface Upgrade
  9. Why the PCB Stack Must Change: A100 vs H100
  10. PCB Design Comparison Table
  11. What Changes on the Manufacturing Floor
  12. Infrastructure Upgrade Path: A100 to H100
  13. FAQ

A100 and H100: A Generation Apart

The NVIDIA A100 was introduced in 2020, built on TSMC's 7 nm process node (designated N7), and represented the first major AI-focused GPU architecture since Volta. It shipped in PCIe and SXM4 form factors and became the dominant AI training accelerator for most of the 2021–2023 period.

The H100 followed in 2022, built on TSMC's 4 nm class process (designated 4N), introducing the Hopper architecture with its dedicated Transformer Engine, fourth-generation NVTensor Cores with FP8 support, NVLink 4.0, and HBM3 memory. The H100 ships in PCIe (Gen5) and SXM5 form factors.

Two years separate their introductions; the PCB requirements that accompany them are separated by considerably more.


Architecture Comparison

NVIDIA Ampere (A100)

The A100 is built on the GA100 die, TSMC N7, with 54.2 billion transistors across 826 mm2. Key architectural features:

  • Third-Generation NVTensor Cores: Support FP64, TF32, FP16, BF16, INT8, INT4 precisions; no native FP8 support
  • Multi-Instance GPU (MIG): Hardware partitioning into up to 7 independent GPU instances for multi-tenant inference
  • NVLink 3.0: 600 GB/s bidirectional bandwidth (12 links × 50 GB/s)
  • HBM2e memory: 80 GB, 2.0 TB/s bandwidth (SXM4 configuration)
  • PCIe Gen4 ×16 host interface
  • SXM4 form factor for high-bandwidth multi-GPU configurations
  • TDP: 400 W (SXM4)

NVIDIA Hopper (H100)

The H100 is built on the GH100 die, TSMC 4N, with 80 billion transistors across 814 mm2 (a smaller die than A100 at higher transistor density). Key architectural features:

  • Fourth-Generation NVTensor Cores with Transformer Engine: Native FP8 support; dynamic precision switching between FP8 and FP16 within a single operation; up to 4× throughput improvement for transformer model training over A100
  • Confidential Computing: Hardware-level memory encryption and secure execution environments
  • NVLink 4.0: 900 GB/s bidirectional bandwidth (18 links × 50 GB/s)
  • HBM3 memory: 80 GB, 3.35 TB/s bandwidth (SXM5); H200 variant upgrades to HBM3e, 141 GB, 4.8 TB/s
  • PCIe Gen5 ×16 host interface
  • SXM5 form factor
  • TDP: 700 W (SXM5)

Full Specification Comparison: A100 vs H100

Specification A100 SXM4 H100 SXM5 Delta
Architecture Ampere Hopper
Process node TSMC N7 (7 nm class) TSMC 4N (4 nm class) ~2× transistor density
Die size 826 mm2 814 mm2 Similar area, higher density
Transistor count 54.2 billion 80 billion +48%
FP16 / BF16 TFLOPS (dense) 312 989 ~3.2×
FP8 TFLOPS (dense) Not supported ~2,000 (with sparsity) New capability
FP64 TFLOPS 19.5 34 +74%
Memory type HBM2e HBM3 +68% bandwidth
Memory capacity 80 GB 80 GB Equal (H200: 141 GB)
Memory bandwidth 2.0 TB/s 3.35 TB/s +68%
NVLink generation NVLink 3.0 NVLink 4.0 +50% bandwidth
NVLink bandwidth 600 GB/s bidirectional 900 GB/s bidirectional +50%
NVLink links 12 links 18 links +50%
PCIe generation PCIe Gen4 ×16 PCIe Gen5 ×16 2× per-lane throughput
PCIe bandwidth ~64 GB/s ~128 GB/s
TDP (SXM) 400 W 700 W +75%
Form factor SXM4 SXM5 Incompatible sockets
Cooling (DGX config) Air or DLC Air or DLC DLC preferred at 700 W

NVLink 3.0 in the A100 provides 600 GB/s total bidirectional bandwidth across 12 links, with each link carrying 50 GB/s. NVLink 4.0 in the H100 increases total bandwidth to 900 GB/s across 18 links—a 50% increase in aggregate bandwidth achieved by adding 6 additional links rather than increasing per-link speed.

For PCB designers, the per-link signaling rate of NVLink 4.0 is the critical parameter, not just the aggregate bandwidth number. NVLink 4.0 operates at 100 Gb/s per lane (NRZ signaling), compared to NVLink 3.0's ~50 Gb/s. This per-lane speed doubling is what demands different PCB materials and tighter signal integrity rules—the board must pass twice the frequency content with adequate margin.

The 18 links of NVLink 4.0 also require more PCB routing real estate than the 12 links of NVLink 3.0. In a DGX H100 baseboard routing all-to-all connections between 8 GPUs via 4 NVSwitch chips, the total number of NVLink differential pairs to be routed increases substantially, driving higher layer counts to avoid unacceptable crosstalk between parallel traces.


HBM2e vs HBM3: Memory Architecture Evolution

Both A100 and H100 (base) ship with 80 GB of on-package HBM, but the memory technology differs significantly:

Parameter HBM2e (A100 SXM4) HBM3 (H100 SXM5) Delta
Total bandwidth 2.0 TB/s 3.35 TB/s +68%
Per-pin data rate 3.6 Gb/s 6.4 Gb/s +78%
Bus width per stack 1,024 bits 1,024 bits Equal
Stack height (max) 8 Hi 12 Hi +50% capacity per stack
Voltage 1.2 V 1.1 V Lower power per bit

From a PCB design standpoint, HBM signals are routed entirely within the CoWoS package substrate (or, in earlier A100 designs, within the SXM module itself), and do not appear on the baseboard PCB as routable signals. The PCB must, however, supply the regulated power rails that feed HBM, and the tighter voltage tolerances of HBM3 (1.1 V ± 30 mV, versus HBM2e at 1.2 V ± 40 mV) translate to tighter noise and ripple budgets on the VDDQ power planes of the H100 baseboard.


PCIe Gen4 vs PCIe Gen5: Host Interface Upgrade

The A100 uses PCIe Gen4 ×16 for its host CPU interface, providing approximately 64 GB/s of bandwidth. The H100 moves to PCIe Gen5 ×16, doubling this to approximately 128 GB/s.

PCIe Gen5 runs at 32 GT/s per lane using NRZ encoding—double the 16 GT/s of Gen4. The Nyquist frequency of a Gen5 lane is 16 GHz, compared to 8 GHz for Gen4. This frequency doubling has a directly measurable impact on PCB channel requirements:

  • Insertion loss at 16 GHz must be < 28 dB end-to-end; at 8 GHz (Gen4), the budget was more relaxed
  • Via stub resonance effects that were minor at Gen4 frequencies become significant at Gen5, necessitating backdrilling on any through-hole via carrying PCIe Gen5 signals
  • Dielectric loss at 16 GHz is approximately 2.5× higher on a given material than at 8 GHz; materials that met Gen4 requirements may not meet Gen5 requirements at equivalent trace lengths

The practical consequence: A100 PCIe traces routed on standard low-loss laminate may exceed the Gen5 insertion loss budget if the same material and routing geometry is retained for H100 baseboard designs. PCIe Gen5 signal layers on H100 boards require either a lower-loss laminate, reduced trace length, or both.


Why the PCB Stack Must Change: A100 vs H100

Layer Count

A100 SXM4 baseboards in DGX A100 configurations typically use 14–18 PCB layers. H100 SXM5 baseboards in DGX H100 configurations typically require 16–20 layers, with some designs reaching 24 layers in configurations that integrate NVSwitch routing directly on the baseboard rather than on a separate switch board.

The layer count increase is driven by three factors acting simultaneously:

  1. More NVLink differential pairs: 18 links (H100) vs 12 links (A100) per GPU, all of which must be routed as controlled-impedance differential pairs with adequate spacing to meet crosstalk budgets; additional layers provide routing channels without crowding
  2. Stricter return path requirements: At NVLink 4.0 frequencies, every signal layer must have a solid, unbroken reference plane immediately adjacent; this forces more dedicated plane layers and reduces layer sharing between signal and reference functions
  3. Expanded power rail count: H100's higher performance and additional I/O functions require more distinct power domains, each needing dedicated or shared power planes with adequate copper area

Laminate Materials

The A100 baseboard operates with NVLink 3.0 at ~50 Gb/s per lane. Panasonic Megtron 6 (Df ~0.004 at 10 GHz) is broadly suitable for NVLink 3.0 signal routing at typical trace lengths of 10–20 cm on the baseboard.

The H100 baseboard must support NVLink 4.0 at 100 Gb/s per lane. At this speed, the channel insertion loss budget from GPU pad to NVSwitch pad becomes much tighter. Megtron 6 remains usable on some layers, but the NVLink 4.0 signal routing layers typically require Megtron 6E, Isola Tachyon 100G, or equivalent materials with Df in the 0.002–0.003 range at 10 GHz.

Layer Function A100 Baseboard Material H100 Baseboard Material
NVLink signal layers Megtron 6 (Df ~0.004) Megtron 6E / Tachyon 100G (Df ~0.002–0.003)
PCIe signal layers Megtron 6 Megtron 6E or better
Power and ground planes Megtron 6 or standard Megtron 6 or standard
Copper foil grade Low-profile (LP) Very-low-profile (VLP) on NVLink 4.0 layers

Smoother copper foil (VLP vs LP) reduces skin-effect losses at high frequencies. At NVLink 3.0 speeds, the difference between LP and VLP copper is small enough to be within the noise budget. At NVLink 4.0 speeds (100 Gb/s per lane), the additional loss contribution of LP vs VLP copper can consume enough of the insertion loss budget to make the difference between a passing and failing channel at the end of the trace.

Signal Integrity Requirements

The transition from NVLink 3.0 (A100) to NVLink 4.0 (H100) tightens every signal integrity specification:

SI Parameter A100 / NVLink 3.0 H100 / NVLink 4.0
Per-lane signaling rate ~50 Gb/s 100 Gb/s
Nyquist frequency ~12.5 GHz ~25 GHz
Differential impedance target 100 Ω ± 10% 100 Ω ± 5%
Intra-pair skew budget < 10 ps < 5 ps
Via stub tolerance < 20 mils (backdrilling recommended) < 10 mils (backdrilling required)
Near-end crosstalk (NEXT) < −25 dB at 12 GHz < −30 dB at 25 GHz
Far-end crosstalk (FEXT) < −35 dB at 12 GHz < −40 dB at 25 GHz

The tightening of impedance tolerance from ±10% to ±5% has direct manufacturing implications: etching uniformity, dielectric thickness control, and registration accuracy all contribute to impedance variation, and the tighter spec requires closer process control throughout fabrication.

Power Delivery Network

The 75% increase in TDP from A100 (400 W) to H100 (700 W) per GPU is the most straightforward power delivery challenge. But the H100's PDN requirements go beyond scaling the A100 design for higher current:

  • Higher current on VCORE: H100 VCORE current can exceed 500 A per GPU; A100 was in the 300–350 A range. Copper plane resistance becomes more significant at higher current, and plane thickness (2–3 oz copper) must be maintained or increased
  • More power rails: H100's expanded I/O (18 NVLink links vs 12, PCIe Gen5 vs Gen4, additional management functions) creates more distinct power domains requiring separate regulation
  • Tighter HBM power rail tolerances: As noted above, HBM3 VDDQ tolerances are tighter than HBM2e, requiring lower-noise VRM designs and more aggressive high-frequency decoupling near the GPU package
  • Target PDN impedance: H100 designs target < 0.15 mΩ from DC to 100 MHz at the GPU package; A100 designs operated with slightly more relaxed targets (~0.2 mΩ)

Thermal Management

The TDP increase from 400 W to 700 W per GPU—a 75% increase—changes the thermal management calculus at the board level:

  • Thermal via density: More heat must be moved from the GPU mounting zone through the PCB to the cold plate or heat spreader. Thermal via arrays (0.4–0.6 mm pitch) must cover a larger area and/or use smaller pitch compared to A100 designs
  • VRM placement and heat spreading: H100 VRMs dissipate more power than A100 VRMs; VRM junction temperatures must be managed through a combination of thermal vias, copper spreading layers, and adequate airflow or liquid cooling to the VRM heatsinks
  • Board Tg requirements: The higher sustained power density near GPU mounting areas elevates local PCB temperatures; Tg ≥ 170°C is the practical minimum, with ≥ 180°C preferred
  • Cooling configuration shift: While DGX A100 supported both air-cooled and liquid-cooled configurations, the DGX H100 at 10.2 kW total system power is more commonly deployed with direct liquid cooling, and PCB layouts must accommodate cold plate mounting hardware

Via Technology and HDI

The SXM5 socket (H100) has higher pin count and finer pitch than SXM4 (A100), increasing BGA escape routing complexity. H100 baseboard designs more extensively use:

  • Via-in-pad (VIPPO): Conductive or non-conductive epoxy fill and cap plating over vias placed within BGA pads; allows signal escape directly under the SXM5 footprint without the routing congestion that forces signal traces to escape through the peripheral rows only
  • Backdrilling on all through-hole vias carrying NVLink 4.0 or PCIe Gen5 signals: Stub removal is non-negotiable at these speeds; A100 designs could in some cases accept the stub resonance effects at NVLink 3.0 / PCIe Gen4 frequencies, but H100 designs cannot
  • Higher HDI build-up ratio: Some H100 designs move from 1+N+1 HDI (one build-up layer on each side) to 2+N+2 (two build-up layers per side) to provide the routing density needed for fine-pitch SXM5 escape and NVLink 4.0 breakout

PCB Design Comparison Table

PCB Design Parameter A100 SXM4 Baseboard H100 SXM5 Baseboard
Typical layer count 14–18 16–24
NVLink signal layers laminate Megtron 6 (Df ~0.004) Megtron 6E / Tachyon 100G (Df ~0.002–0.003)
Copper foil (NVLink layers) Low-profile (LP) Very-low-profile (VLP)
Differential impedance tolerance 100 Ω ± 10% 100 Ω ± 5%
Via backdrilling Recommended on NVLink layers Required on NVLink 4.0 and PCIe Gen5 layers
HDI type 1+N+1 typical 1+N+1 to 2+N+2; via-in-pad standard
GPU TDP 400 W 700 W
VCORE current per GPU ~300–350 A ~500 A+
PDN target impedance ~0.2 mΩ DC–100 MHz < 0.15 mΩ DC–100 MHz
Thermal via pitch 0.6–0.8 mm 0.4–0.6 mm
Board material Tg ≥ 150°C ≥ 170–180°C
Cooling (DGX config) Air or DLC DLC strongly preferred

What Changes on the Manufacturing Floor

The PCB design differences described above translate into concrete manufacturing process changes when transitioning from A100 to H100 baseboard production:

  • Material handling: Megtron 6E, Tachyon 100G, and similar ultra-low-loss laminates require different storage conditions (temperature and humidity controlled), handling protocols to prevent surface contamination, and modified press cycles compared to standard Megtron 6. Not all PCB fabricators are qualified on these materials.
  • Controlled-depth backdrilling: A100 designs that recommended but did not strictly require backdrilling become H100 designs that cannot pass signal integrity validation without it. Backdrilling requires dedicated CNC equipment with depth feedback, board-specific drill files derived from the as-built stackup measurement, and 100% depth verification on production panels.
  • Via-in-pad processing: VIPPO at the scale required for SXM5 BGA escape involves epoxy fill of every affected via, cure, planarization grinding to within < 10 μm of the pad surface, and cap plating. This adds multiple process steps and requires precise process control to avoid voids or dimples that would compromise solder joint integrity.
  • Impedance control tolerance: Tightening the impedance tolerance from ±10% to ±5% requires closer control of etching uniformity (line width variation < ± 5 μm), dielectric thickness (variation < ± 3%), and layer registration. These tolerances drive selection of equipment and suppliers within the fabrication process.
  • Thermal testing: H100 boards operating at 700 W per GPU undergo more significant thermal cycling in service than A100 boards. Thermal shock testing (IPC-TM-650 2.6.7.2) and interconnect stress testing protocols are typically required at qualification, and ongoing reliability monitoring is more extensive.
  • BGA assembly: SXM5 has a larger and finer-pitch BGA than SXM4. Solder paste printing aperture design, reflow profile optimization for the larger thermal mass, and post-reflow X-ray inspection requirements all increase in complexity. Rework, if needed, is correspondingly more difficult.

Infrastructure Upgrade Path: A100 to H100

Organizations running A100-based infrastructure and planning an H100 upgrade should understand the hardware compatibility boundaries:

Component A100 Compatible with H100? Notes
Server chassis No (in most cases) SXM4 and SXM5 sockets are physically incompatible; DGX H100 is a new chassis design
GPU baseboard PCB No Completely different design; SXM5 socket, NVLink 4.0 routing, PCIe Gen5, higher power delivery
Host CPU / motherboard Partial H100 requires PCIe Gen5 host; Gen4 CPUs technically functional but limit PCIe bandwidth
Power supply units No (for DGX) H100 DGX at 10.2 kW requires higher-capacity PSUs than A100 DGX at ~6.5 kW
Cooling infrastructure Partial Existing DLC loops can be reused if flow capacity is sufficient; new cold plates required for SXM5
Network switches (InfiniBand) Yes ConnectX-7 NICs are compatible with existing NDR InfiniBand fabric
Software stack (CUDA, drivers) Yes H100 is fully backward-compatible with A100 CUDA code; driver update required

The practical conclusion: transitioning from A100 to H100 is a server-level replacement, not a component upgrade. The GPU baseboard, chassis, and power delivery infrastructure must all change. Cooling infrastructure may be partially reused if it has adequate capacity for the higher heat load. The software stack is portable.


FAQ

Is the A100 still worth deploying in 2026?
Yes, in specific contexts. The A100 remains cost-effective for fine-tuning workloads in the 7B–30B parameter range, for multi-tenant inference using MIG partitioning, and for organizations with tightly constrained budgets where the lower acquisition and infrastructure cost of A100-based systems outweighs the performance deficit relative to H100. The A100 is a mature, well-supported platform with extensive ecosystem tooling.

Can an H100 GPU be installed in an A100 server chassis?
No. The SXM5 and SXM4 sockets are physically incompatible. An H100 cannot be installed in a DGX A100 chassis without replacing the GPU baseboard PCB, which requires a chassis redesign in practice. H100 requires a purpose-built SXM5 baseboard.

Why does the H100 have a 75% higher TDP than the A100 but “only” 3× the training performance?
The performance improvement is architecture-driven, not just power-scaling. The Transformer Engine and FP8 support in Hopper deliver step-change improvements for transformer model training that are not available on Ampere at any power level. The 75% TDP increase reflects a denser, more capable GPU die (80B vs 54B transistors) operating at higher throughput—the performance/watt ratio improved significantly from A100 to H100.

What is the main PCB material change between A100 and H100 baseboard designs?
The most critical material change is on the NVLink signal routing layers. A100 baseboards can use Panasonic Megtron 6 (Df ~0.004) for NVLink 3.0 signals. H100 NVLink 4.0 signal layers require lower-loss materials—typically Megtron 6E or Isola Tachyon 100G (Df ~0.002–0.003)—because the higher per-lane signaling rate (100 Gb/s vs ~50 Gb/s) more than doubles the dielectric loss contribution over the same trace length.

Is backdrilling required for A100 or just recommended?
For A100 NVLink 3.0 traces, backdrilling is strongly recommended but not universally required—some board designs achieve adequate signal margins without it by controlling trace lengths and via stub depths. For H100 NVLink 4.0 traces, backdrilling is required; the via stub resonance at 25 GHz (NVLink 4.0 Nyquist) falls within the signal band and degrades channel performance below spec without stub removal.

Does H100 use a different NVSwitch chip than A100?
Yes. A100 uses NVSwitch 2.0 (with NVLink 3.0). H100 uses NVSwitch 3.0 (with NVLink 4.0), which also doubles the switch chip's total bandwidth. The NVSwitch 3.0 is a larger, higher-power chip that imposes its own PCB routing and power delivery requirements on the baseboard or switch board that carries it.


Need to Manufacture AI Server PCBs?

Whether you are producing A100-generation boards for continued deployment or designing new H100-based infrastructure, NextPCB provides the high-layer-count fabrication, low-loss laminate processing, HDI, backdrilling, and PCBA capabilities that AI server boards demand.

Get a quote from NextPCB →


Related Articles: