Home
PCB Quote
Standard PCB Advanced PCB
Rev 0 PCBA
PCB Assembly
Rev 0 PCBA PCB Assembly Quote PCB Assembly Service PCB Assembly Capability PCB Stencil Service BOM Service Free Functional Testing
Components Sourcing
HQ Online Components BOM Tool
Gerber Viewer | DFM
Online Gerber Viewer HQDFM Design Analysis Software HQDFM User Manual
Capabilities & Services

NextPCB Capabilities

Standard PCB Capabilities Advanced PCB Capabilities PCB Assembly Capabilities

Capabilities by PCB Types

PCB Product Showsase Rigid PCBs Rogers PCB High-TG PCBs Heavy Copper PCBs HDI PCBs High-Speed PCBs High-Frequency PCBs Aluminum PCBs Copper-Core PCBs Ceramic PCBs Flex PCBs Rigid-Flex PCBs

Printed Circuit Boards

PCB Prototype Applicable Industries PCB Manufacturing Process Advanced PCB Materials

PCB Assembly

PCB Assembly Service PCB Stencil Service File Requirements PCB Assembly Guide IC Programming PCBA DFA BGA Assembly Capabilities Laser Labeling/Coding

Layer Buildup

Layer Stack-up Prepregs, cores, foils

SMD-Stencils

Laser Stencil

PCB Design-Aid & Layout

Layer Orientation BGA PCB Price Composition Printed Circuit Board Materials PCB Design & Layout Panel Creation Gold Fingers

Mechanics

V-Scoring Back drilling PCB milling

Surface

Via Covering Surface Finish Silkscreen Solder mask

Quality

E-Test X-RAY Design Rule Check A.O.I

Drills & Throughplating

Via-in-pad Blind & Buried Vias Annular Rings Side plating Plated Half-holes/Castellated Holes Plated-through Slots

Factory & Certificate

PCB Factory VR Visiting PCB Assembly Factory Show Certificate

New users: $30 off
24 hours Fast Turnaround
100% E-test & AOI

Free for 10pcs 50%OFF for 100pcs TURNKEY PCB ASSEMBLY
Tools & Resources
PCB Impedance Calculator PCB Stackups & Impedance PCB Trace Width Calculator AI Electrical Rule Check KiCad Resource Hub KiCad Version Converter NextPCB Accelerator Program Blog News
About Us
About Us Contact Us Why Us Feedback Help Center Payment Methods Shipping Methods

0
Support Team

support@nextpcb.com

0086-755-8364 3663

+86 13622941920
Feedback:
support@nextpcb.com

Blog / Building an AI GPU Cluster: Hardware Overview from PCB to Rack

Building an AI GPU Cluster: Hardware Overview from PCB to Rack

Q: What is the minimum GPU count for a useful AI training cluster?

A single 8-GPU node (DGX H100 or equivalent HGX server) is the practical minimum for serious AI training work. With 640 GB of aggregate HBM, an 8-GPU node can train models up to approximately 100B parameters with model parallelism, fine-tune models of any size that fit in the aggregate memory, and run inference on models up to 70B parameters without tensor parallelism. Below 8 GPUs, the memory capacity and compute throughput limitations make most frontier AI training impractical.

Q: Do all GPUs in a cluster need to be the same generation?

Within a single server node, all GPU slots must be the same generation (all H100 SXM5 or all B200 SXM6—mixing is not supported). Across nodes in a cluster, different GPU generations can coexist if they connect through a common network fabric (InfiniBand or Ethernet), but mixed-generation clusters are harder to schedule efficiently and are typically avoided in production training clusters. NCCL (NVIDIA's collective communication library) supports mixed-generation collectives but may not achieve maximum efficiency across mismatched hardware.

Q: What network bandwidth is required per GPU for efficient training?

For most large-scale LLM training workloads using data parallelism and ZeRO optimizer sharding, approximately 400 Gb/s (one 400G InfiniBand link) per 8-GPU node provides adequate inter-node bandwidth for models up to a few hundred billion parameters. Larger models requiring tensor parallelism across nodes benefit from 8 × 400G per node (one NIC per GPU). For NVL72-based clusters where intra-rack communication uses NVLink 5.0, inter-rack InfiniBand at 400G per NIC × 8 NICs per rack provides adequate bisection bandwidth for most training configurations.

Q: How long does it take to deploy a 1,000-GPU AI cluster?

From hardware procurement to first training job, a 1,000-GPU cluster typically requires 16–24 weeks for a purpose-built data center deployment: 10–16 weeks for server hardware lead time (GPU availability is currently the primary constraint), 4–8 weeks for rack installation, cabling, and network configuration, and 2–4 weeks for software stack installation, network tuning, and acceptance testing. Cloud-based deployments (AWS, Azure, Google Cloud) can be provisioned in hours to days for smaller clusters, but dedicated capacity reservations for large clusters require advance commitment of 6–12 months.

Q: What PCB specifications matter most for AI cluster hardware?

For GPU baseboards, the three most critical specifications are: (1) signal layer laminate Df (≤ 0.003 for NVLink 4.0; ≤ 0.002 for NVLink 5.0), which determines whether NVLink channels meet insertion loss budget; (2) layer count (20–32+ layers depending on GPU generation), which determines routing capability for NVLink, PCIe Gen5, and power delivery; and (3) power delivery PDN impedance (< 0.15 mΩ DC to 100 MHz at GPU package), which determines whether the GPU can sustain maximum compute without voltage droop-induced throttling. All three specifications must be met simultaneously, which is why GPU baseboard fabrication is concentrated among a small number of qualified tier-1 PCB manufacturers.

Posted: June, 2026 Last Updated: June, 2026 Writer: Arya Li Share:

Introduction

Building an AI GPU cluster is not a single engineering problem—it is five simultaneous engineering problems that must be solved consistently across silicon, PCB, chassis, rack, and network layers. A GPU that delivers 4,500 TFLOPS on a test bench produces far less useful AI throughput if the PCB baseboard cannot sustain its 1,000 W thermal envelope, the rack power distribution cannot deliver 48 V at the required current, the cooling infrastructure cannot remove heat fast enough to prevent throttling, or the network fabric cannot move gradient data between nodes faster than the GPUs can compute.

This article provides a hardware engineer's overview of what it takes to build a GPU cluster for AI training—from the PCB layer upward through server node, rack, and network fabric. It is written for engineers who need to understand the full system, not just the GPU specifications, and for procurement and program teams who need to understand why each component in the bill of materials exists and what happens when it is undersized.

Table of Contents

Introduction
The Five Hardware Layers of a GPU Cluster
Layer 1: GPU and Accelerator Selection
Layer 2: The AI Server Node
Layer 3: The PCB Stack — Baseboard, Motherboard, and NICs
Layer 4: Rack Design and Physical Infrastructure
Layer 5: Cluster Fabric — InfiniBand and Ethernet
Power Infrastructure Planning
Cooling Strategy: Air vs Liquid
Cluster Sizing: From 8 GPUs to 10,000
PCB Manufacturing Requirements for Cluster-Scale Programs
Build vs Buy: ODM, OEM, and Hyperscale Approaches
FAQ

The Five Hardware Layers of a GPU Cluster

A GPU cluster can be understood as five nested hardware layers, each defined by the primary interconnect technology that binds it together:

Layer	Scope	Primary Interconnect	Key PCB Types
1. Accelerator	GPU die + HBM on package	CoWoS interposer / HBM microbumps	GPU package substrate
2. Server node	8–16 GPUs + CPU + PSUs	NVLink 4.0/5.0 (GPU↔GPU); PCIe Gen5 (CPU↔GPU)	GPU baseboard, CPU motherboard
3. Rack	4–36 nodes + NVSwitch + networking	NVLink 5.0 switch fabric (NVL72); InfiniBand (DGX H100)	NVSwitch boards, ToR switch line cards
4. Cluster	Hundreds to thousands of racks	InfiniBand NDR/XDR; RoCE Ethernet	Spine/leaf switch line cards, NIC PCBs
5. Storage & management	Parallel file system, monitoring	InfiniBand / Ethernet storage fabric	Storage controller PCBs, management NICs

Each layer imposes requirements on the layers below it. Choosing NVIDIA B200 GPUs at Layer 1 immediately determines that the Layer 2 server node must use SXM6 sockets, liquid cooling, and 48 V bus power; that the Layer 3 rack must include NVSwitch 4.0 boards with any-layer HDI PCB technology; and that the Layer 4 cluster network must be sized for the ~130 TB/s aggregate NVLink bandwidth that each NVL72 rack makes available. Understanding these dependencies prevents the common mistake of specifying GPUs independently of infrastructure and discovering incompatibilities during procurement.

Layer 1: GPU and Accelerator Selection

The GPU generation drives every downstream hardware decision. The three active GPU platforms for AI cluster builds in 2026 are:

NVIDIA H100 SXM5 / H200 SXM5: The H100 (700 W, 80 GB HBM3, NVLink 4.0) and its H200 variant (700 W, 141 GB HBM3e, NVLink 4.0) are the incumbent standard. They use SXM5 sockets, 12 V bus power, and support both air and direct liquid cooling. The H100/H200 PCB baseboard requires 20–24 layers with Megtron 6E laminate on NVLink 4.0 signal layers. For the complete architectural comparison between H100 and the preceding A100 generation, see A100 vs H100: PCB Stack Differences Explained.

NVIDIA B200 SXM6 (GB200 NVL72): The current frontier platform (1,000 W, 192 GB HBM3e, NVLink 5.0). Requires SXM6 sockets, mandatory liquid cooling, 48 V bus power, and the most demanding PCB specifications in commercial production—24–32 layers for compute tray baseboards, 32–40+ layers for NVSwitch 4.0 boards. See NVIDIA Blackwell Architecture Explained for the full B200 design requirements.

AMD MI300X OAM: The primary non-NVIDIA option (750 W, 192 GB HBM3, Infinity Fabric). Uses the OAM form factor and OCP-standard UBB, enabling open infrastructure design. OAM UBBs require 16–22 layers with PCIe Gen5 and Infinity Fabric routing. The MI300X trade-offs versus H100 are detailed at H100 vs MI300X: NVIDIA vs AMD in the AI Accelerator War.

The selection between these platforms should be made before any PCB or infrastructure design work begins, because the choice determines socket type, form factor, power bus voltage, cooling requirement, and interconnect topology—none of which are easily changed after hardware procurement.

Layer 2: The AI Server Node

The AI server node is the fundamental deployable unit. It combines the GPU baseboard with a host CPU subsystem, power supplies, cooling hardware, storage, and network interfaces in a single chassis. A standard 8-GPU AI server node for H100 contains:

GPU baseboard: 8 × H100 SXM5 + 4 × NVSwitch 3.0; the most complex PCB in the system
CPU motherboard: 2 × AMD EPYC or Intel Xeon CPUs; 512 GB–2 TB DDR5; PCIe Gen5 connections to GPU baseboard
Power supplies: 4–8 redundant PSUs; 12 V bus at approximately 10.2 kW total for H100; 48 V bus at approximately 7.2 kW GPU power for MI300X OAM
Network interfaces: 8 × 400G InfiniBand NICs (ConnectX-7) for inter-node fabric; 1 × 1G management NIC for BMC
Storage: 4–8 × NVMe SSD for local dataset and checkpoint caching
Cooling: Direct liquid cooling cold plates on each GPU; liquid manifold integrated with chassis

The chassis form factor (2U, 4U, 8U, or 10U) determines rack density. DGX H100 nodes are 10U; a 42U rack holds 4 nodes. Higher-density chassis designs (NVIDIA HGX H100 in 4U configurations) achieve 8 nodes per 42U rack but require higher cooling capacity per rack unit. The chassis mechanical design constrains the PCB dimensions: board width, height, mounting hole locations, and maximum component height are all defined by the chassis specification before the PCB designer begins layout.

Layer 3: The PCB Stack — Baseboard, Motherboard, and NICs

A single AI server node contains four or more distinct PCB assemblies, each with different design requirements. Understanding each board's requirements allows procurement teams to source appropriately qualified fabricators and assemblers for each category.

GPU baseboard: The most demanding board in the cluster. For H100, this means 20–24 layers, hybrid Megtron 6E / Megtron 6 stackup, NVLink 4.0 routing at 100 Gb/s per lane, PDN for 8 × 700 W + 4 × ~270 W NVSwitch (~6,680 W total), HDI via technology, and large-format fabrication (up to 700 mm × 700 mm). For B200, requirements escalate to 24–32 layers, Megtron 7, NVLink 5.0 at 200 Gb/s per lane, and 1,000 W per GPU PDN. The AI Accelerator PCB Design Guide covers the full design requirements for this board category.

CPU motherboard: 12–18 layers, Megtron 6E on PCIe Gen5 signal layers, DDR5 routing, BMC management subsystem. Less demanding than the GPU baseboard but still requires low-loss laminate on PCIe Gen5 lanes and careful PDN design for high-core-count server CPUs. Manufacturing considerations for this category are covered in the server motherboard PCB manufacturing context at Server Motherboard PCB Manufacturing.

Network Interface Cards (NICs): Each 400G InfiniBand NIC carries a ConnectX ASIC with 112G PAM4 serdes lanes to the optical transceige cage and PCIe Gen5 to the host CPU. NIC PCBs require 12–16 layers with Megtron 6E on 112G PAM4 signal layers. Signal integrity rules for 112G PAM4 NIC board design are detailed at 112G PAM4 PCB Design for AI Servers.

Power distribution boards: High-current bus distribution PCBs that route 12 V or 48 V from PSUs to the GPU baseboard and other subsystems. Primary requirements are heavy copper (3–4 oz on primary power planes), large format, and precision cut-outs for bus bar connections. These boards are less demanding from a signal integrity standpoint but impose specialized heavy copper fabrication requirements.

Layer 4: Rack Design and Physical Infrastructure

A GPU cluster rack aggregates server nodes, networking, and power/cooling distribution hardware into a standard 19-inch, 42U enclosure. Key rack-level design decisions include:

Nodes per rack: Standard configurations are 4 × DGX H100 (10U each, 40U total) or 8 × HGX H100 (4U each, 32U total, leaving 10U for networking). The GB200 NVL72 uses a dedicated custom rack enclosure that houses 18 compute trays and 9 NVSwitch boards in a purpose-built mechanical design. Higher node density per rack increases power density, which drives cooling requirements and may require facility power circuit upgrades.

Top-of-rack (ToR) networking: Each rack requires 1–2U of ToR switch infrastructure for the InfiniBand fabric. An 8-node rack with 8 NICs per node has 64 downlinks to the ToR switch; the ToR switch must provide 64 × 400G downlinks plus sufficient uplink bandwidth to the spine layer. ToR switch PCBs are demanding designs in their own right: switch ASIC boards at 25.6–51.2 Tb/s use 112G PAM4 serdes requiring the same low-loss laminate and signal integrity discipline as NIC and GPU boards.

Power distribution units (PDUs): High-density AI racks require intelligent, metered PDUs rated for the full rack power draw (40 kW for DGX H100 rack, 120 kW for GB200 NVL72). Branch circuit breakers, power factor correction, and per-outlet current monitoring are standard requirements for AI cluster PDUs.

Cable management: A single 8-node AI rack generates hundreds of cables: NIC-to-ToR switch cables (64 × InfiniBand DAC or AOC), management ethernet cables, power cables, and liquid cooling lines (16–32 connections per rack for DLC systems). Cable management design—routing paths, bend radii, labeling, and accessibility for maintenance—directly affects the operational efficiency of the cluster over its multi-year service life.

Layer 5: Cluster Fabric — InfiniBand and Ethernet

The inter-rack network fabric determines the scaling efficiency of the cluster for distributed training workloads. Two technologies dominate in 2026:

InfiniBand NDR (400 Gb/s): NVIDIA's preferred cluster fabric for GPU clusters, providing RDMA (Remote Direct Memory Access) at 400 Gb/s per port with approximately 1–3 μs port-to-port latency. InfiniBand's RDMA capability allows GPU-to-GPU all-reduce operations across racks without CPU involvement, critical for maintaining high GPU utilization during distributed training. A fat-tree InfiniBand topology connects rack ToR switches to leaf switches and leaf switches to spine switches, providing near-bisection bandwidth for all-to-all collective operations across the cluster.

RoCE (RDMA over Converged Ethernet): High-speed Ethernet with RDMA extensions, increasingly deployed at 400G or 800G in hyperscale AI clusters. RoCE provides comparable throughput to InfiniBand at somewhat higher per-collective-operation latency, leveraging the scale economies of the broader Ethernet switch ecosystem. AMD MI300X OAM clusters frequently use RoCE Ethernet fabric due to better multi-vendor compatibility compared to InfiniBand's tighter integration with NVIDIA's GPU software stack.

Fat-tree topology sizing for a 1,000-GPU cluster (125 × 8-GPU nodes) with 8 × 400G NICs per node requires: 125 ToR switches with 64 × 400G downlinks + 32 × 400G uplinks each; approximately 31–63 leaf switches depending on oversubscription ratio; and 16–32 spine switches for full bisection bandwidth. The PCB design requirements for these switches—112G PAM4 serdes, low-loss laminates, tight signal integrity—are among the most demanding in the networking industry and directly parallel the GPU board requirements already discussed.

Power Infrastructure Planning

Power infrastructure is consistently the binding constraint in AI cluster deployment, more often than compute, memory, or networking. The key planning parameters are:

Total power draw: Multiply GPU TDP × GPU count, then apply a 1.4–1.6× multiplier for non-GPU components (CPUs, NICs, storage, networking) and PSU inefficiency. A 1,000-GPU H100 cluster (700 W per GPU × 1,000 = 700 kW GPU power) requires approximately 980–1,120 kW of total facility power at the PDU level.

Power density per rack: Standard enterprise data centers are provisioned at 5–15 kW per rack. A DGX H100 rack at 40 kW requires 3–8× the standard power provisioning per rack; a GB200 NVL72 at 120 kW requires 8–24× standard provisioning. AI-optimized data center builds are increasingly designed from the ground up for 30–150 kW per rack to accommodate GPU density.

48 V bus transition: Modern AI server designs (GB200, OAM Gen2) use 48 V DC bus distribution within the server and rack, reducing bus current by 4× compared to 12 V distribution at the same power level. Facility power infrastructure (PDUs, bus bars, power cables) must be specified for 48 V if the selected GPU platform requires it. Mixing 12 V (H100) and 48 V (B200) nodes in the same cluster requires separate power distribution infrastructure for each platform type.

Redundancy: N+1 PSU redundancy within each server node, and A+B power feed redundancy at the rack PDU level (two independent facility circuits per rack), are standard for production AI clusters where unplanned downtime is costly. Each redundancy level doubles the installed power capacity required, which must be accounted for in facility power planning.

Cooling Strategy: Air vs Liquid

The cooling strategy for an AI cluster is determined primarily by GPU TDP. The decision framework is straightforward:

GPU TDP per chip	Viable Cooling Approach	Examples
< 400 W	Air cooling (standard CRAC/CRAH)	Inference GPUs (A10G, L40S PCIe)
400–700 W	High-flow air cooling or DLC	H100 SXM5 (some configurations)
700–1,000 W	Direct liquid cooling (strongly preferred or mandatory)	H100 SXM5 dense; B200 SXM6 (mandatory)
> 1,000 W	DLC mandatory; immersion cooling considered	Future GPU generations

Direct liquid cooling (DLC) requires facility chilled water supply and return lines to each rack, cold plate manifolds inside each server chassis, and liquid-to-liquid heat exchangers at the rack or row level. The PCB-level implications of DLC are significant: cold plate mounting structures must be accommodated in the GPU baseboard layout, thermal via arrays beneath GPU packages must transfer heat to cold plate contact surfaces, and board material T_g must be ≥ 170°C for sustained operation near 700–1,000 W heat sources. These requirements are detailed at Thermal Management on AI Server PCBs.

Cluster Sizing: From 8 GPUs to 10,000

AI cluster sizing should be driven by the training time budget for the target model, not by GPU count as an abstract goal. A practical sizing framework:

Development and small-scale fine-tuning (8–64 GPUs): A single DGX H100 node (8 GPUs) or 2–8 nodes is sufficient for fine-tuning 7B–70B parameter models and training smaller models from scratch. Network fabric can be a simple single-switch InfiniBand or even direct copper cables for 2-node configurations. PCB infrastructure is a standard DGX node or HGX-based server; no custom PCB design required for the customer.

Production training (< 1,000 GPUs): 64–1,000 GPUs in an InfiniBand fat-tree with 2-tier switching (ToR + leaf layer). Power infrastructure requires dedicated high-density circuits per rack; cooling is typically DLC for H100/B200 nodes. At this scale, most organizations use OEM or ODM server hardware rather than custom PCB designs.

Frontier model training (1,000–100,000 GPUs): At this scale, hyperscale cloud providers and large AI labs often design custom server and rack hardware to optimize for their specific workload, scale, and facility constraints. Custom GPU baseboards, custom NVSwitch switch boards (for NVL72-class racks), and custom ToR switch designs become economically justified. The PCB manufacturing requirements at this scale are at the frontier of commercial capability, as analyzed in the 30+ Layer HDI PCB guide.

The table below summarizes the infrastructure requirements at each cluster size tier:

Cluster Size	Rack Count	Approx. Total Power	Network Fabric	Custom PCB Design?
8 GPUs (1 node)	0.25 (1/4 rack)	~10 kW	None required	No (OEM node)
64 GPUs (8 nodes)	2–4 racks	~80 kW	Single IB switch	No
512 GPUs (64 nodes)	16–32 racks	~650 kW	2-tier IB fat-tree	Sometimes (OEM or ODM)
8,192 GPUs (1,024 nodes)	256–512 racks	~10–20 MW	3-tier IB fat-tree	Often (custom server boards)
65,536 GPUs	2,000–4,000 racks	~80–160 MW	3-tier IB + dedicated AI fabric	Yes (full custom stack)

PCB Manufacturing Requirements for Cluster-Scale Programs

Large GPU cluster programs create PCB manufacturing requirements that differ from standard commercial PCB production in both technical complexity and procurement scale. Several considerations apply specifically at cluster scale:

Volume and lead time: A 10,000-GPU cluster built on DGX H100 requires approximately 1,250 GPU baseboards, 2,500 NIC PCBs, 1,250 CPU motherboards, and hundreds of ToR switch line cards. Production volumes at this scale require fabricators with multi-panel capacity and assembly lines dedicated to AI server programs. Lead times for high-complexity GPU baseboards (20–24 layers, HDI, backdrilling) are typically 15–25 business days for bare boards; ensuring on-time delivery for a cluster deployment program requires procurement lead times of 10–16 weeks ahead of installation.

Quality consistency: At 1,250 GPU baseboards per cluster, a 1% board-level failure rate generates 12.5 failed boards requiring rework or replacement. IPC Class 3 fabrication and 100% functional test (including 3D X-ray inspection of all GPU and NVSwitch BGA assemblies and burn-in at full GPU compute load) are non-negotiable quality requirements. The GPU Board Assembly guide details the inspection and test requirements for these assemblies.

Traceability: Data center operators increasingly require full traceability for AI server PCBs—including panel ID, fabrication lot, assembly date, and test results for each board—to enable rapid root cause analysis when field failures occur. PCB fabricators and assemblers serving cluster-scale AI programs must maintain production records that satisfy these traceability requirements.

Materials availability: At cluster scale, Megtron 7 and Tachyon 100G laminate availability can become a supply chain constraint. These materials have limited production capacity relative to standard FR4, and large cluster programs may need to work with PCB fabricators to reserve material allocations months in advance. The complete material selection framework for AI server boards is covered at High-Speed PCB Materials for AI Servers.

Build vs Buy: ODM, OEM, and Hyperscale Approaches

The decision between purchasing complete server systems (OEM approach), working with ODMs for semi-custom designs, or fully custom hardware design depends primarily on cluster scale and organizational engineering capacity.

OEM purchase (NVIDIA DGX, Dell PowerEdge, Supermicro): The fastest path to deployment for clusters up to a few thousand GPUs. OEM systems are pre-validated, come with vendor support, and eliminate PCB design risk. The trade-off is higher per-GPU cost (OEM margin on top of component cost) and limited ability to optimize for specific workload requirements. Suitable for most organizations that are not hyperscale cloud providers.

ODM semi-custom (Wiwynn, Quanta, Inventec, Foxconn): ODMs design server hardware to the customer's specification, typically building on reference platform designs licensed from GPU vendors. The customer specifies mechanical, thermal, and interface requirements; the ODM handles PCB design, fabrication, and assembly. This approach is used by mid-size cloud providers and AI labs building clusters in the 1,000–10,000 GPU range where OEM pricing is a material cost driver.

Full custom design (hyperscale): Google, Meta, Microsoft, and Amazon design custom AI server hardware that is purpose-optimized for their specific infrastructure—custom GPU baseboards for their exact chassis and cooling designs, custom NVSwitch boards or OAM UBBs, and custom ToR switch designs. This approach requires significant engineering investment but delivers meaningful performance per watt and cost per FLOP advantages at hyperscale. The PCB design and manufacturing capabilities required for this approach are those described throughout this series.

FAQ

What is the minimum GPU count for a useful AI training cluster?
A single 8-GPU node (DGX H100 or equivalent HGX server) is the practical minimum for serious AI training work. With 640 GB of aggregate HBM, an 8-GPU node can train models up to approximately 100B parameters with model parallelism, fine-tune models of any size that fit in the aggregate memory, and run inference on models up to 70B parameters without tensor parallelism. Below 8 GPUs, the memory capacity and compute throughput limitations make most frontier AI training impractical.

Do all GPUs in a cluster need to be the same generation?
Within a single server node, all GPU slots must be the same generation (all H100 SXM5 or all B200 SXM6—mixing is not supported). Across nodes in a cluster, different GPU generations can coexist if they connect through a common network fabric (InfiniBand or Ethernet), but mixed-generation clusters are harder to schedule efficiently and are typically avoided in production training clusters. NCCL (NVIDIA's collective communication library) supports mixed-generation collectives but may not achieve maximum efficiency across mismatched hardware.

What network bandwidth is required per GPU for efficient training?
For most large-scale LLM training workloads using data parallelism and ZeRO optimizer sharding, approximately 400 Gb/s (one 400G InfiniBand link) per 8-GPU node provides adequate inter-node bandwidth for models up to a few hundred billion parameters. Larger models requiring tensor parallelism across nodes benefit from 8 × 400G per node (one NIC per GPU). For NVL72-based clusters where intra-rack communication uses NVLink 5.0, inter-rack InfiniBand at 400G per NIC × 8 NICs per rack provides adequate bisection bandwidth for most training configurations.

How long does it take to deploy a 1,000-GPU AI cluster?
From hardware procurement to first training job, a 1,000-GPU cluster typically requires 16–24 weeks for a purpose-built data center deployment: 10–16 weeks for server hardware lead time (GPU availability is currently the primary constraint), 4–8 weeks for rack installation, cabling, and network configuration, and 2–4 weeks for software stack installation, network tuning, and acceptance testing. Cloud-based deployments (AWS, Azure, Google Cloud) can be provisioned in hours to days for smaller clusters, but dedicated capacity reservations for large clusters require advance commitment of 6–12 months.

What PCB specifications matter most for AI cluster hardware?
For GPU baseboards, the three most critical specifications are: (1) signal layer laminate Df (≤ 0.003 for NVLink 4.0; ≤ 0.002 for NVLink 5.0), which determines whether NVLink channels meet insertion loss budget; (2) layer count (20–32+ layers depending on GPU generation), which determines routing capability for NVLink, PCIe Gen5, and power delivery; and (3) power delivery PDN impedance (< 0.15 mΩ DC to 100 MHz at GPU package), which determines whether the GPU can sustain maximum compute without voltage droop-induced throttling. All three specifications must be met simultaneously, which is why GPU baseboard fabrication is concentrated among a small number of qualified tier-1 PCB manufacturers.

Need to Manufacture PCBs for AI Cluster Hardware?

GPU cluster hardware demands PCB manufacturing at the frontier of commercial capability: GPU baseboards at 20–32 layers with NVLink routing, NIC boards with 112G PAM4 serdes, ToR switch line cards with ultra-low-loss laminates, and CPU motherboards with PCIe Gen5 signal integrity. NextPCB supports the complete AI cluster PCB stack with advanced fabrication, BGA assembly, 3D X-ray inspection, and IPC Class 3 quality standards.

Upload & Get Your Instant Quote Now Engineer Consultation

About the Author

Arya Li, Project Manager at NextPCB.com

With extensive experience in manufacturing and international client management, Arya has guided factory visits for over 200 overseas clients, providing bilingual (English & Chinese) presentations on production processes, quality control systems, and advanced manufacturing capabilities. Her deep understanding of both the factory side and client requirements allows her to deliver professional, reliable PCB solutions efficiently. Detail-oriented and service-driven, Arya is committed to being a trusted partner for clients and showcasing the strength and expertise of the factory in the global PCB and PCBA market.

826 0 0 1 Facebook Twitter Linked In