Contact Us
Blog / GPU Server Architecture Explained: Boards, Slots, Power Planes and Cooling Zones

GPU Server Architecture Explained: Boards, Slots, Power Planes and Cooling Zones

Posted: June, 2026 Writer: NextPCB - S Share: NEXTPCB Official youtube NEXTPCB Official Facefook NEXTPCB Official Twitter NEXTPCB Official Instagram NEXTPCB Official Linkedin NEXTPCB Official Tiktok NEXTPCB Official Bksy

As artificial intelligence, deep learning, and large language models (LLMs) continue to evolve, the underlying hardware supporting these workloads has undergone a massive transformation. The traditional data center server, built primarily around Central Processing Units (CPUs), is no longer sufficient. Enter the AI GPU server—a highly specialized, incredibly dense piece of engineering designed specifically to handle parallel processing at an unprecedented scale.

For hardware engineers, system architects, and PCB designers, understanding GPU server design is critical. These systems are not just "servers with graphics cards thrown in." They are complex ecosystems featuring multiple interconnected printed circuit boards (PCBs), advanced power delivery networks handling tens of kilowatts, and aggressive cooling mechanisms to manage extreme thermal densities.

In this comprehensive guide, we will break down the complete architecture of a modern AI GPU server, exploring the various boards, slot form factors, power planes, and cooling zones. We will also dive into what makes the AI server PCB requirements uniquely challenging compared to traditional IT infrastructure.


  1. Table of Contents
  2. 1. The Core Components of an AI GPU Server
  3. 2. Structural Diagram: How a GPU Server is Built
  4. 3. Deep Dive into AI Server Boards and Slots
  5. 4. Power Planes: Fueling 10kW+ AI Nodes
  6. 5. Cooling Zones and Thermal Management
  7. 6. GPU Server vs. Standard CPU Server (Comparison Table)
  8. 7. Advanced PCB Design Requirements for GPU Servers
  9. 8. Frequently Asked Questions (FAQ)
  10. 9. Conclusion

1. The Core Components of an AI GPU Server

A typical enterprise AI GPU server (such as an 8-GPU node) is a massive chassis, often occupying 4U to 8U of rack space. Unlike standard 1U/2U servers where the motherboard houses everything, an AI GPU server utilizes a modular, multi-board architecture to separate the general-purpose compute from the heavy-lifting accelerators.

The CPU Motherboard (Head Node)

The CPU motherboard acts as the brain for system management, network I/O, and storage management. It usually houses dual-socket CPUs (like AMD EPYC or Intel Xeon), system RAM (DDR5), and BMC (Baseboard Management Controller) chips. However, in a GPU server, the CPU is no longer the primary workhorse; it acts as a traffic director, feeding data to the GPUs.

The GPU Baseboard (Accelerator Complex)

This is the heart of the AI GPU server. Also known as the Universal Baseboard (UBB) in Open Compute Project (OCP) standards, or the HGX baseboard in the NVIDIA ecosystem. It houses the GPUs (usually 4 or 8), the high-speed switches (like NVSwitch or PCIe switches), and the interconnect traces. This board is where the most complex PCB engineering takes place.

PCIe and Network Switch Boards

To feed massive amounts of data from storage to the GPUs, and to scale out to other servers in the cluster, AI servers use dedicated switch boards. These boards manage the PCIe lanes and house Retimers to ensure signal integrity over long distances within the chassis.

2. Structural Diagram: How a GPU Server is Built

To visualize the topology of an 8-GPU AI server, we can look at the logical data flow and physical board separation. Below is a structural representation of a modern AI node:

[ Network / Storage Fabric (InfiniBand / Ethernet) ]
        |                   |
[ NIC / DPU ]       [ NIC / DPU ]
        |                   |
=================================================== [ CPU Motherboard ]
[ CPU 1 ] <--- UPI/xGMI ---> [ CPU 2 ]
    |                           |
[ PCIe Gen5 x16 ]           [ PCIe Gen5 x16 ]
    |                           |
=================================================== [ Interconnect Layer ]
[ PCIe Switch / Retimer Board ]
    |                           |
=================================================== [ GPU Baseboard ]
    |                           |
[ GPU 1 ] -- [ GPU 2 ] -- [ GPU 3 ] -- [ GPU 4 ] ... [ GPU 8 ]
    |          |            |            |
    +----------+------------+------------+-- ( High-Speed Interconnect )
           [ NVLink Switches / OAI Switches ]

As shown, the architecture is highly stratified. High-speed signals must traverse connectors and cables between the CPU board, the switch board, and the GPU baseboard, making signal integrity a primary concern for hardware designers.

3. Deep Dive into AI Server Boards and Slots

When discussing GPU server design, the physical form factor of the GPU dictates the entire board layout. While consumer GPUs use standard PCIe slots, enterprise AI GPUs utilize specialized form factors to maximize bandwidth and power delivery.

SXM vs. OAM Form Factors

Currently, the market is dominated by two primary modular form factors for AI accelerators:

  • NVIDIA SXM (Server and Mezzanine): A proprietary form factor used by NVIDIA for their flagship GPUs (e.g., A100, H100, B200). SXM modules connect to the baseboard using high-density mezzanine connectors. This allows for direct NVLink interconnectivity between GPUs, offering massive bandwidth (up to 1.8 TB/s per GPU in the Hopper generation) that standard PCIe slots cannot support. For a deeper dive into the generational differences in these boards, refer to our guide on A100 vs H100 PCB stack differences.
  • OCP OAM (Open Accelerator Module): An open standard developed by the Open Compute Project, backed by companies like AMD (MI300X) and Intel (Gaudi). OAM aims to standardize the baseboard (OAI - Open Accelerator Infrastructure) so that data centers can swap out accelerators from different vendors without redesigning the entire server chassis. Learn more about the differences in our OAM vs SXM Baseboard PCB Design analysis.

The Limitations of PCIe Slots in AI

While PCIe GPU servers (housing cards like the NVIDIA L40S) still exist for edge AI and inference, they are not the preferred choice for heavy AI training. Standard PCIe Gen 5 slots can only deliver 75W directly from the slot, requiring multiple external power cables (like the 12VHPWR). Furthermore, horizontal PCIe cards block airflow in dense configurations and limit GPU-to-GPU bandwidth to PCIe switch bottlenecks, lacking the all-to-all topologies of SXM/OAM baseboards.

4. Power Planes: Fueling 10kW+ AI Nodes

Power delivery is arguably the most challenging aspect of modern GPU server design. A fully populated 8-GPU baseboard with next-generation chips can draw anywhere from 8,000W to over 12,000W of power. Designing the Power Delivery Network (PDN) on the PCB to handle this without burning up or causing unacceptable voltage droop is a masterclass in electrical engineering.

The Shift to 48V Power Architecture

Traditional servers distribute power across the motherboard at 12V. However, using Ohm's Law (P = V * I) and the power loss equation (Ploss = I2 * R), we see a massive problem with 12V at high wattages.

To deliver 10,000W at 12V, the current required is roughly 833 Amps. Pushing 833A through copper planes on a PCB results in catastrophic I2R power losses and extreme heat generation. To solve this, AI GPU servers use a 48V Power Distribution Architecture.

By stepping the rack-level power down to 48V instead of 12V, the current is reduced by a factor of 4 (approx 208A), and the I2R copper losses are reduced by a factor of 16. The 48V plane is routed deep within the thick GPU baseboard to the accelerator modules.

Point-of-Load (PoL) and VRM Placement

While 48V is great for distribution, the silicon core of the GPU operates at very low voltages, often below 1.0V (e.g., 0.7V - 0.8V). This requires local Voltage Regulator Modules (VRMs) to step down the 48V to the core voltage exactly where it is needed.

Because the current at 0.8V for a 1000W GPU reaches a staggering 1,250 Amps, the VRMs must be placed as close to the GPU die as physically possible to minimize trace resistance and voltage droop. Modern AI server boards utilize Vertical Power Delivery, placing the VRMs on the underside of the PCB directly beneath the GPU socket, feeding current vertically through massive arrays of copper vias straight into the silicon.

5. Cooling Zones and Thermal Management

Where there is extreme power, there is extreme heat. The thermal density of an AI GPU server can exceed 400W per square inch of silicon. Managing this requires strict thermal zoning and advanced cooling topologies.

Zone 1: The CPU and Networking Zone

Usually located at the front or middle of the chassis, this zone houses components that draw a moderate amount of power (CPUs at 300-400W each, NICs at 50W). Standard high-velocity server fans pushing ambient air are generally sufficient for this zone.

Zone 2: The GPU Baseboard Zone

This is the critical thermal zone. The components here (GPUs, NVSwitches, Retimers) generate massive heat. Cooling this zone dictates the design of the server.

  • Air Cooling (3D Vapor Chambers): Up to the NVIDIA H100 generation, air cooling is still possible using massive 3U or 4U high custom vapor chamber heatsinks that sit on top of the SXM/OAM modules. High-pressure counter-rotating fans push air through these dense fin stacks.
  • Direct-to-Chip Liquid Cooling (Cold Plates): As we move to next-generation chips drawing 1000W to 1200W each (e.g., the architecture found in the Blackwell series), air cooling is reaching physical limits. Servers are transitioning to liquid cooling loops. Micro-channel cold plates are mounted directly to the GPUs and switches. Coolant is pumped through the server to a rack-level Manifold, transferring heat to a facility water loop.

Thermal Impact on PCB Layout

The heat generated by the chips impacts the PCB itself. If a PCB gets too hot, the FR4 resin can degrade, causing delamination. Furthermore, extreme heat changes the dielectric constant (Dk) of the PCB material, which can ruin high-speed signal integrity. Designers must use thermal vias—grids of plated holes acting as heat pipes to draw thermal energy away from hotspots into internal copper planes to spread the heat. High-TG (Glass Transition Temperature) materials are strictly required.

6. GPU Server vs. Standard CPU Server

To summarize the architectural differences, here is a comparison of a traditional CPU server versus a modern AI GPU server:

Feature Standard CPU Server (e.g., 2U Web Server) AI GPU Server (e.g., 8U HGX/OAI Node)
Primary Compute 1 or 2 high-core CPUs (x86 or ARM) 4 to 8 high-performance GPUs (SXM/OAM)
Board Architecture Single Main Motherboard Multi-board: CPU Head Node + GPU Baseboard + Switch Boards
Interconnect PCIe for peripherals, UPI for CPU-to-CPU NVLink, Infinity Fabric, or Custom OAI switching (up to 1.8TB/s)
Power Distribution 12V architecture 48V distribution, Vertical PoL delivery at 0.8V
Typical Power Draw 500W - 1,500W per server 8,000W - 15,000W+ per server
Cooling Method Standard axial fans, passive heatsinks High-pressure fans, 3D Vapor Chambers, or Direct-to-Chip Liquid Cooling
PCB Complexity 8 to 14 layers, standard FR4 24 to 30+ layers, Ultra-low loss materials, HDI microvias

7. Advanced PCB Design Requirements for GPU Servers

The architectural complexities described above create a massive challenge for PCB fabrication. NextPCB has observed a significant shift in manufacturing requirements for AI server boards.

High Layer Counts and Board Thickness

To accommodate hundreds of PCIe Gen 5 lanes, NVLink traces, and massive power planes, GPU baseboards require extreme layer counts. While a standard motherboard might be 10 layers, an OAM or HGX baseboard often features 24 to 32 layers. To prevent warping and accommodate the copper weight, the PCB thickness often exceeds standard limits, ranging from 3.0mm to 4.5mm thick.

Ultra-Low Loss Materials (ULL)

Operating signals at 112 Gbps PAM4 (required for PCIe Gen 6 and next-gen NVLink) means standard FR4 fiberglass is obsolete. The signal loss (insertion loss) at high frequencies would render the data unreadable. AI GPU servers mandate the use of Ultra-Low Loss (ULL) laminates such as Panasonic Megtron 7, Megtron 8, or specialized Rogers materials. These materials maintain a stable dielectric constant (Dk) and ultra-low dissipation factor (Df) even at high temperatures.

HDI and Via-in-Pad Technology

The pin density on a modern AI GPU or Retimer BGA is staggering, with pitch sizes shrinking to 0.8mm or less. Routing out of these BGAs requires High-Density Interconnect (HDI) techniques. This includes blind and buried vias, staggered microvias, and Via-in-Pad Plated Over (VIPPO) technologies. Furthermore, to prevent signal reflection (stub effects) on high-speed traces, Backdrilling is an absolute necessity to remove the unused portions of plated through-holes.

8. Frequently Asked Questions (FAQ)

Q: Why don't AI servers just use standard PCIe graphics cards?
A: While PCIe cards (like the RTX 4090 or datacenter L40S) are fine for basic machine learning or inference, they lack the high-bandwidth inter-GPU communication (like NVLink) required for training massive LLMs. They are also limited in power delivery and thermal exhaust efficiency compared to SXM or OAM modules.

Q: What is the purpose of Retimers on the GPU server boards?
A: High-speed signals (like PCIe Gen 5) degrade rapidly as they travel across PCB copper. Retimers are active silicon components that recover, clean up, and amplify the signal, allowing it to travel across the large distances of a server chassis (from the CPU board to the GPU baseboard) without data corruption.

Q: How do 48V power planes affect PCB manufacturing?
A: 48V planes require thicker copper weights (e.g., 2oz or 3oz copper) on internal layers to carry the current. This makes the PCB pressing and lamination process more difficult, requiring specialized resin flow management to prevent voids between the heavy copper traces.

9. Conclusion

The architecture of an AI GPU server is a marvel of modern engineering. By separating compute tasks across modular boards, utilizing 48V power delivery networks, and implementing aggressive cooling topologies, these systems can power the future of artificial intelligence. However, translating this architecture into physical hardware places immense demands on PCB design and fabrication.

From 30-layer stacks and ultra-low loss Megtron materials to complex HDI routing and backdrilling, manufacturing AI server boards requires a fabrication partner with cutting-edge capabilities and strict quality control.