Support Team
Feedback:
support@nextpcb.comAs artificial intelligence, deep learning, and large language models (LLMs) continue to evolve, the underlying hardware supporting these workloads has undergone a massive transformation. The traditional data center server, built primarily around Central Processing Units (CPUs), is no longer sufficient. Enter the AI GPU server—a highly specialized, incredibly dense piece of engineering designed specifically to handle parallel processing at an unprecedented scale.
For hardware engineers, system architects, and PCB designers, understanding GPU server design is critical. These systems are not just "servers with graphics cards thrown in." They are complex ecosystems featuring multiple interconnected printed circuit boards (PCBs), advanced power delivery networks handling tens of kilowatts, and aggressive cooling mechanisms to manage extreme thermal densities.
In this comprehensive guide, we will break down the complete architecture of a modern AI GPU server, exploring the various boards, slot form factors, power planes, and cooling zones. We will also dive into what makes the AI server PCB requirements uniquely challenging compared to traditional IT infrastructure.
A typical enterprise AI GPU server (such as an 8-GPU node) is a massive chassis, often occupying 4U to 8U of rack space. Unlike standard 1U/2U servers where the motherboard houses everything, an AI GPU server utilizes a modular, multi-board architecture to separate the general-purpose compute from the heavy-lifting accelerators.
The CPU motherboard acts as the brain for system management, network I/O, and storage management. It usually houses dual-socket CPUs (like AMD EPYC or Intel Xeon), system RAM (DDR5), and BMC (Baseboard Management Controller) chips. However, in a GPU server, the CPU is no longer the primary workhorse; it acts as a traffic director, feeding data to the GPUs.
This is the heart of the AI GPU server. Also known as the Universal Baseboard (UBB) in Open Compute Project (OCP) standards, or the HGX baseboard in the NVIDIA ecosystem. It houses the GPUs (usually 4 or 8), the high-speed switches (like NVSwitch or PCIe switches), and the interconnect traces. This board is where the most complex PCB engineering takes place.
To feed massive amounts of data from storage to the GPUs, and to scale out to other servers in the cluster, AI servers use dedicated switch boards. These boards manage the PCIe lanes and house Retimers to ensure signal integrity over long distances within the chassis.
To visualize the topology of an 8-GPU AI server, we can look at the logical data flow and physical board separation. Below is a structural representation of a modern AI node:
[ Network / Storage Fabric (InfiniBand / Ethernet) ]
| |
[ NIC / DPU ] [ NIC / DPU ]
| |
=================================================== [ CPU Motherboard ]
[ CPU 1 ] <--- UPI/xGMI ---> [ CPU 2 ]
| |
[ PCIe Gen5 x16 ] [ PCIe Gen5 x16 ]
| |
=================================================== [ Interconnect Layer ]
[ PCIe Switch / Retimer Board ]
| |
=================================================== [ GPU Baseboard ]
| |
[ GPU 1 ] -- [ GPU 2 ] -- [ GPU 3 ] -- [ GPU 4 ] ... [ GPU 8 ]
| | | |
+----------+------------+------------+-- ( High-Speed Interconnect )
[ NVLink Switches / OAI Switches ]
As shown, the architecture is highly stratified. High-speed signals must traverse connectors and cables between the CPU board, the switch board, and the GPU baseboard, making signal integrity a primary concern for hardware designers.
When discussing GPU server design, the physical form factor of the GPU dictates the entire board layout. While consumer GPUs use standard PCIe slots, enterprise AI GPUs utilize specialized form factors to maximize bandwidth and power delivery.
Currently, the market is dominated by two primary modular form factors for AI accelerators:
While PCIe GPU servers (housing cards like the NVIDIA L40S) still exist for edge AI and inference, they are not the preferred choice for heavy AI training. Standard PCIe Gen 5 slots can only deliver 75W directly from the slot, requiring multiple external power cables (like the 12VHPWR). Furthermore, horizontal PCIe cards block airflow in dense configurations and limit GPU-to-GPU bandwidth to PCIe switch bottlenecks, lacking the all-to-all topologies of SXM/OAM baseboards.
Power delivery is arguably the most challenging aspect of modern GPU server design. A fully populated 8-GPU baseboard with next-generation chips can draw anywhere from 8,000W to over 12,000W of power. Designing the Power Delivery Network (PDN) on the PCB to handle this without burning up or causing unacceptable voltage droop is a masterclass in electrical engineering.
Traditional servers distribute power across the motherboard at 12V. However, using Ohm's Law (P = V * I) and the power loss equation (Ploss = I2 * R), we see a massive problem with 12V at high wattages.
To deliver 10,000W at 12V, the current required is roughly 833 Amps. Pushing 833A through copper planes on a PCB results in catastrophic I2R power losses and extreme heat generation. To solve this, AI GPU servers use a 48V Power Distribution Architecture.
By stepping the rack-level power down to 48V instead of 12V, the current is reduced by a factor of 4 (approx 208A), and the I2R copper losses are reduced by a factor of 16. The 48V plane is routed deep within the thick GPU baseboard to the accelerator modules.
While 48V is great for distribution, the silicon core of the GPU operates at very low voltages, often below 1.0V (e.g., 0.7V - 0.8V). This requires local Voltage Regulator Modules (VRMs) to step down the 48V to the core voltage exactly where it is needed.
Because the current at 0.8V for a 1000W GPU reaches a staggering 1,250 Amps, the VRMs must be placed as close to the GPU die as physically possible to minimize trace resistance and voltage droop. Modern AI server boards utilize Vertical Power Delivery, placing the VRMs on the underside of the PCB directly beneath the GPU socket, feeding current vertically through massive arrays of copper vias straight into the silicon.
Where there is extreme power, there is extreme heat. The thermal density of an AI GPU server can exceed 400W per square inch of silicon. Managing this requires strict thermal zoning and advanced cooling topologies.
Usually located at the front or middle of the chassis, this zone houses components that draw a moderate amount of power (CPUs at 300-400W each, NICs at 50W). Standard high-velocity server fans pushing ambient air are generally sufficient for this zone.
This is the critical thermal zone. The components here (GPUs, NVSwitches, Retimers) generate massive heat. Cooling this zone dictates the design of the server.
The heat generated by the chips impacts the PCB itself. If a PCB gets too hot, the FR4 resin can degrade, causing delamination. Furthermore, extreme heat changes the dielectric constant (Dk) of the PCB material, which can ruin high-speed signal integrity. Designers must use thermal vias—grids of plated holes acting as heat pipes to draw thermal energy away from hotspots into internal copper planes to spread the heat. High-TG (Glass Transition Temperature) materials are strictly required.
To summarize the architectural differences, here is a comparison of a traditional CPU server versus a modern AI GPU server:
| Feature | Standard CPU Server (e.g., 2U Web Server) | AI GPU Server (e.g., 8U HGX/OAI Node) |
|---|---|---|
| Primary Compute | 1 or 2 high-core CPUs (x86 or ARM) | 4 to 8 high-performance GPUs (SXM/OAM) |
| Board Architecture | Single Main Motherboard | Multi-board: CPU Head Node + GPU Baseboard + Switch Boards |
| Interconnect | PCIe for peripherals, UPI for CPU-to-CPU | NVLink, Infinity Fabric, or Custom OAI switching (up to 1.8TB/s) |
| Power Distribution | 12V architecture | 48V distribution, Vertical PoL delivery at 0.8V |
| Typical Power Draw | 500W - 1,500W per server | 8,000W - 15,000W+ per server |
| Cooling Method | Standard axial fans, passive heatsinks | High-pressure fans, 3D Vapor Chambers, or Direct-to-Chip Liquid Cooling |
| PCB Complexity | 8 to 14 layers, standard FR4 | 24 to 30+ layers, Ultra-low loss materials, HDI microvias |
The architectural complexities described above create a massive challenge for PCB fabrication. NextPCB has observed a significant shift in manufacturing requirements for AI server boards.
To accommodate hundreds of PCIe Gen 5 lanes, NVLink traces, and massive power planes, GPU baseboards require extreme layer counts. While a standard motherboard might be 10 layers, an OAM or HGX baseboard often features 24 to 32 layers. To prevent warping and accommodate the copper weight, the PCB thickness often exceeds standard limits, ranging from 3.0mm to 4.5mm thick.
Operating signals at 112 Gbps PAM4 (required for PCIe Gen 6 and next-gen NVLink) means standard FR4 fiberglass is obsolete. The signal loss (insertion loss) at high frequencies would render the data unreadable. AI GPU servers mandate the use of Ultra-Low Loss (ULL) laminates such as Panasonic Megtron 7, Megtron 8, or specialized Rogers materials. These materials maintain a stable dielectric constant (Dk) and ultra-low dissipation factor (Df) even at high temperatures.
The pin density on a modern AI GPU or Retimer BGA is staggering, with pitch sizes shrinking to 0.8mm or less. Routing out of these BGAs requires High-Density Interconnect (HDI) techniques. This includes blind and buried vias, staggered microvias, and Via-in-Pad Plated Over (VIPPO) technologies. Furthermore, to prevent signal reflection (stub effects) on high-speed traces, Backdrilling is an absolute necessity to remove the unused portions of plated through-holes.
Q: Why don't AI servers just use standard PCIe graphics cards?
A: While PCIe cards (like the RTX 4090 or datacenter L40S) are fine for basic machine learning or inference, they lack the high-bandwidth inter-GPU communication (like NVLink) required for training massive LLMs. They are also limited in power delivery and thermal exhaust efficiency compared to SXM or OAM modules.
Q: What is the purpose of Retimers on the GPU server boards?
A: High-speed signals (like PCIe Gen 5) degrade rapidly as they travel across PCB copper. Retimers are active silicon components that recover, clean up, and amplify the signal, allowing it to travel across the large distances of a server chassis (from the CPU board to the GPU baseboard) without data corruption.
Q: How do 48V power planes affect PCB manufacturing?
A: 48V planes require thicker copper weights (e.g., 2oz or 3oz copper) on internal layers to carry the current. This makes the PCB pressing and lamination process more difficult, requiring specialized resin flow management to prevent voids between the heavy copper traces.
The architecture of an AI GPU server is a marvel of modern engineering. By separating compute tasks across modular boards, utilizing 48V power delivery networks, and implementing aggressive cooling topologies, these systems can power the future of artificial intelligence. However, translating this architecture into physical hardware places immense demands on PCB design and fabrication.
From 30-layer stacks and ultra-low loss Megtron materials to complex HDI routing and backdrilling, manufacturing AI server boards requires a fabrication partner with cutting-edge capabilities and strict quality control.
Still, need help? Contact Us: support@nextpcb.com
Need a PCB or PCBA quote? Quote now