Blog / GPU Board Assembly: Manufacturing Challenges for AI Accelerator Cards

GPU Board Assembly: Manufacturing Challenges for AI Accelerator Cards

Q: What is the most common cause of GPU BGA assembly failure?

Head-on-pillow (HoP) defects caused by package warpage during reflow are the most common root cause of latent GPU BGA failures. HoP joints often pass initial electrical test because the ball and pad are in physical contact, but they fail within the first few thermal cycles in service when the unformed metallurgical bond separates under stress. The second most common cause is solder voiding on power delivery BGA balls, which creates localized resistive hot spots under continuous high-current loading. Both defects are addressed by the combination of reflow profile optimization, nitrogen atmosphere, and vacuum reflow for power balls.

Q: Is 2D X-ray inspection sufficient for GPU board BGA verification?

No. 2D X-ray is not sufficient for GPU BGA inspection because it projects the full 3D solder joint structure onto a single plane. HoP defects, which are the most common latent failure mode, appear as a slight gap between ball and paste in a cross-sectional view—a feature that is obscured in a 2D projection by the overlapping solder of adjacent layers. 3D computed tomography (CT) X-ray is the minimum required inspection technology for GPU BGA verification; it reconstructs cross-sectional slices through the ball array that can detect HoP, excessive voiding, and subtle bridges that 2D X-ray misses.

Q: How many times can a GPU package be reworked on an AI server board?

Industry practice for AI server boards is to limit GPU package rework to one rework cycle per board location. Each rework cycle subjects the board to an additional full thermal cycle above liquidus, which cumulatively degrades the laminate, surrounding solder joints, and via plating. A second rework cycle at the same location significantly increases the risk of collateral damage to adjacent NVSwitch packages, power delivery components, and the PCB itself. Boards that require a second GPU rework are typically scrapped rather than reworked, particularly for high-value AI server programs where the cost of a field failure exceeds the cost of a board scrap.

Q: What is the difference in assembly difficulty between H100 SXM5 and B200 SXM6 boards?

B200 SXM6 boards are significantly more difficult to assemble than H100 SXM5 boards for three reasons. First, the B200's dual-die CoWoS package (two GB100 dies on a shared silicon interposer) has a larger footprint and higher thermal mass than the H100's single-die CoWoS package, making warpage and thermal management at reflow more challenging. Second, the B200's 1,000 W TDP imposes higher sustained current loads on VRM components and power delivery BGA balls, increasing the sensitivity of those joints to solder voiding. Third, the B200 board's higher layer count (24–32 layers vs 20–24 for H100) means more via-in-pad structures requiring fill and planarization, and more backdrilling operations for NVLink 5.0 and PCIe Gen6 vias.

Q: Why is nitrogen atmosphere important for GPU board reflow?

Nitrogen atmosphere (O2 < 100 ppm) during reflow reduces the oxide layer that forms on solder surfaces when they are exposed to oxygen at high temperature. Solder oxide is hydrophobic and resists coalescence—it is one of the contributing factors to both HoP defects (where the oxide on the ball surface prevents it from merging with the paste dome) and solder voiding (where the oxide on the paste surface traps volatiles below a skin rather than allowing them to escape). Nitrogen atmosphere reduces the oxide skin thickness, lowering the activation energy for coalescence and improving the probability that the ball and paste merge cleanly during the liquidus window. The cost of nitrogen consumption in a production reflow oven is small compared to the yield improvement it provides on high-value GPU board assemblies.

Q: What quality standard should GPU board assembly be certified to?

IPC Class 3 (High Reliability) is the applicable quality standard for GPU board assembly for AI server applications. Class 3 specifies the most stringent solder joint acceptance criteria (minimum solder fillet height, maximum void area, minimum side overhang), via barrel minimum plating thickness, and annular ring requirements. IPC-A-610 Class 3 for assembly acceptability and IPC-6012 Class 3 for fabrication define the combined quality framework. AI server programs at hyperscale cloud providers typically add their own supplementary requirements on top of IPC Class 3—particularly for BGA void acceptance (stricter than the IPC limit on power balls) and burn-in duration (longer than standard commercial practice).

Posted: June, 2026 Last Updated: June, 2026 Writer: Stacy Lu Share:

Introduction

Assembling a GPU board for an AI accelerator is one of the most technically demanding PCB assembly tasks in commercial electronics manufacturing. The board carries components that combine extreme size (60 mm × 60 mm GPU packages), extreme pin count (tens of thousands of BGA balls per package), extreme power density (700–1,000 W per GPU), and extreme signal speed (NVLink 5.0 at 200 Gb/s per lane, PCIe Gen5/Gen6 at the host interface). Every step of the assembly process—paste printing, component placement, reflow, inspection, test—is pushed to or beyond the limits of what standard SMT lines can reliably achieve.

The challenges are not merely operational. They are architectural: the package and board designs that maximize AI compute density per rack unit simultaneously create assembly conditions that maximize the risk of solder joint defects, warpage failures, and inspection blind spots. Understanding each challenge, why it exists, and what manufacturing solutions address it is essential for program managers, process engineers, and hardware designers working in the AI server supply chain.

This article covers the eight most significant GPU board assembly challenges in detail, with specific reference to H100 SXM5 baseboards, B200 SXM6 baseboards, and OAM module assemblies (for context on OAM-specific manufacturing, see the OAM PCB Assembly Guide). The discussion covers both the challenge mechanism and the proven manufacturing solution for each.

Table of Contents

Introduction
What Makes GPU Board Assembly Different from Standard PCB Assembly
GPU and AI Accelerator Package Types
Challenge 1: Package and Board Warpage During Reflow
Challenge 2: Fine-Pitch BGA Placement Accuracy
Challenge 3: Reflow Profile Optimization for High Thermal Mass
Challenge 4: Via-in-Pad Quality Under Large BGA Packages
Challenge 5: Solder Void Management
Challenge 6: Head-on-Pillow Defects
Challenge 7: NVSwitch and Multi-Package Boards
Challenge 8: High-Current Power Component Assembly
Inspection Strategy: AOI, 3D X-Ray, and SPI
BGA Rework on GPU Boards
Burn-In and Final Functional Test
Yield Management and Continuous Improvement
FAQ

What Makes GPU Board Assembly Different from Standard PCB Assembly

Standard PB asseCmbly for consumer electronics or enterprise IT equipment involves components with established assembly envelopes: packages smaller than 40 mm × 40 mm, BGA pitches of 1.0 mm or finer, power levels measured in tens of watts, and board sizes within the 300 mm × 400 mm range of standard SMT equipment. GPU boards for AI servers violate nearly every one of these assumptions simultaneously.

Assembly Parameter	Standard Enterprise PCB	AI GPU Accelerator Board
Largest package size	25–40 mm × 25–40 mm	55–75 mm × 55–75 mm (GPU + CoWoS)
BGA ball count (GPU)	500–2,000 balls	5,000–15,000+ balls
BGA ball pitch (GPU)	1.0–1.27 mm	0.65–0.8 mm
Component TDP (continuous)	5–50 W per package	700–1,000 W per GPU package
Board size	200–400 mm typical	500–750 mm (DGX H100 baseboard)
Board layer count	8–14	20–32+
Via-in-pad requirement	Selective (specific packages)	Extensive (GPU, NVSwitch, HBM packages)
Simultaneous BGA count	2–6 large packages	12–24 large packages (8 GPUs + 4 NVSwitch + HBM)
Package warpage risk	Low to moderate	High (large multi-die CoWoS packages)
Rework difficulty	Moderate	Extremely high (large mass, fine pitch, adjacent packages)

The scale of complexity increase is not incremental. Each parameter pushes the assembly process into a regime where standard process controls are inadequate and specialized equipment, materials, and engineering expertise are required. The sections below address each challenge in turn.

GPU and AI Accelerator Package Types

The specific assembly challenges for a GPU board depend significantly on the package technology used by the accelerator die. Three package types are common in current AI server hardware.

Standard flip-chip BGA (FC-BGA): Used in earlier GPU generations (Ampere A100, some PCIe-form-factor GPUs). The GPU die is flip-chip bonded to a multi-layer organic substrate, which then interfaces to the PCB via a BGA. Assembly is challenging primarily due to package size and fine pitch, but the single-die construction limits warpage magnitude compared to multi-die alternatives.

CoWoS (Chip-on-Wafer-on-Substrate): Used in H100 and B200 (see CoWoS Packaging Explained for detailed coverage). The GPU die (or dies in B200's dual-die configuration) and HBM memory stacks are assembled on a silicon interposer, which is then bonded to an organic substrate for PCB mounting. CoWoS packages are very large (H100 SXM5: approximately 66 mm × 66 mm; B200 SXM6: larger), have high thermal mass, and exhibit significant warpage during reflow due to CTE mismatch between the silicon interposer and the organic substrate.

3D chiplet packaging (AMD MI300X-class): Multiple GPU dies (XCDs) are stacked vertically on a base interposer with HBM stacks on the same interposer. The resulting package is approximately 60 mm × 60 mm with extreme thermal mass concentrated in the stacked die area. The 3D stacking creates higher internal stress than planar CoWoS, which manifests as package warpage profile that changes non-linearly through the reflow thermal cycle.

Assembly processes must be specifically characterized and qualified for each package type. A reflow profile optimized for H100 CoWoS will not necessarily produce acceptable results on B200 dual-die CoWoS or AMD MI300X 3D chiplets without re-optimization. Component vendors provide assembly guidelines and warpage data that serve as the starting point for process development, but final optimization requires measurement on the actual board design.

Challenge 1: Package and Board Warpage During Reflow

Package warpage is the single most consequential assembly challenge for large AI accelerator packages. All materials expand as temperature rises and contract as it falls, but different materials expand at different rates (characterized by the coefficient of thermal expansion, CTE). In a multi-material package like a CoWoS GPU, the silicon interposer (CTE ~3 ppm/°C), organic substrate (CTE ~16–18 ppm/°C), and GPU die (CTE ~3 ppm/°C) all have different CTEs. The result is that the package warps as it heats and cools, changing from convex to concave (or vice versa) at different points in the temperature profile.

The specific warpage profile of a CoWoS GPU package during reflow typically follows this pattern:

Room temperature (25°C): Package may have a slight bow from the die-attach cure stress (typically convex, with the die side bowing upward)
Preheat (25–150°C): Package flattens as the substrate expands more rapidly than the silicon interposer
Soak (150–180°C): Package may transiently become concave (die side bowing downward, pushing BGA balls into the PCB paste)
Reflow peak (~240°C, above liquidus): This is the critical moment—if the package is convex at liquidus (die side up, BGA balls separating from the PCB paste), the molten solder balls may not coalesce with the paste, creating head-on-pillow defects
Cooling: Rapid cooling locks in the warped state; if significant warpage exists as solder solidifies, the differential contraction between the package and the PCB creates solder joint stress as the assembly returns to room temperature

Solutions:

Shadow Moiré warpage measurement: In-situ measurement of package warpage at temperature using a shadow Moiré optical system identifies the warpage profile during a simulated reflow cycle and allows the reflow profile to be adjusted to minimize warpage at liquidus
Reflow profile optimization: Slowing the ramp rate through the soak zone (1–1.5°C/s rather than 2–3°C/s) allows the package to reach thermal equilibrium before reaching liquidus, reducing the warpage magnitude at the critical coalescence moment
Nitrogen atmosphere: Reduces oxidation on solder surfaces, allowing coalescence with lower surface tension force—useful when marginal warpage leaves balls and paste barely in contact
Board fixture during reflow: Stiffening frames or vacuum fixtures that hold the PCB flat through the oven reduce the board warpage component of the problem; GPU baseboards with non-uniform copper distribution (heavy power planes on one side) are particularly prone to board warpage that compounds package warpage
Underfill (post-reflow): Capillary underfill dispensed beneath the GPU package after reflow reinforces the solder joints against the board-level thermal cycling stress; not universally used on GPU baseboards but common on OAM modules where the small PCB footprint concentrates thermal cycling stress

Challenge 2: Fine-Pitch BGA Placement Accuracy

GPU packages use BGA ball pitches of 0.65–0.8 mm. At 0.65 mm pitch, the allowable placement error before solder bridging occurs is approximately 0.2–0.25 mm (30–40% of pitch). Standard high-speed pick-and-place machines achieve ± 30–50 μm placement accuracy under ideal conditions, which is adequate in principle. However, several factors reduce effective placement accuracy in GPU board assembly:

Package size and nozzle limitations: Large packages (> 50 mm) require specialized large-format nozzles. The increased lever arm from nozzle center to package corner amplifies any angular placement error; a 0.1° rotation error creates a 0.05 mm corner offset on a 55 mm package—within specification, but leaving minimal margin for pad registration variation
Board flatness: If the board is not perfectly flat at the placement station (which is common on large GPU baseboards with non-uniform copper distribution), the Z-axis contact between package and paste is non-uniform across the large BGA footprint, leading to variable paste compression and ball position accuracy
Fiducial quality: Vision-based alignment relies on etched copper fiducials on the board surface. On boards with heavy solder mask coverage near the fiducials, the fiducial contrast is reduced and the vision system's centroid calculation is less accurate; fiducial design should be optimized specifically for the placement machine's vision system

Solutions: Use dual-camera verification (one camera for the component BGA ball pattern, one for the PCB pad pattern) for active alignment correction immediately before placement; verify board flatness at the placement station with a height sensor and use compliant nozzles that accommodate slight board tilt; perform routine calibration of the placement head angular accuracy using a precision calibration board at the start of each production shift.

Challenge 3: Reflow Profile Optimization for High Thermal Mass

A DGX H100 baseboard carries 8 H100 GPU packages, 4 NVSwitch chips, dozens of VRM inductors, and hundreds of capacitors and resistors, all to be reflowed simultaneously in a single oven pass. The thermal mass of this assembly is enormous—the GPU packages alone account for several kilograms of silicon, interposer, and organic substrate. The challenge is that the oven must bring every solder joint on the board above the SAC305 liquidus temperature (217°C) while keeping every component below its maximum rated temperature, simultaneously, on a board where thermal mass varies by two orders of magnitude between the smallest 0201 capacitor and the largest GPU package.

The temperature differential across the board during reflow can easily be 15–25°C between the GPU packages (slow to heat) and nearby small passives (fast to heat). If the oven profile is set to bring the GPU packages to adequate peak temperature, the small passives adjacent to the GPU may experience excessive time above liquidus or excessive peak temperature. If the profile is set conservatively for the small passives, the GPU package BGA joints may not reach liquidus uniformly.

Solutions:

Thermal profiling with multiple thermocouples: Attach thermocouples to the GPU package, NVSwitch, a nearby small passive, and a location at the board edge before the first profiling run; the resulting temperature vs. time curves for each location identify the actual temperature differential and allow zone-by-zone oven adjustment to minimize it
Extended soak zone: A longer soak period at 150–180°C allows the GPU packages to thermally equilibrate with the rest of the board before liquidus, reducing the temperature differential at peak
Forced convection optimization: Modern convection reflow ovens allow independent control of the top and bottom heating zones; setting slightly higher bottom heat helps drive heat into the heavy GPU package from the PCB side, complementing the top convection heating
Board carrier thermal management: Titanium or aluminum board carriers with targeted cutouts beneath GPU package areas modify the local convection flow and thermal mass balance, improving temperature uniformity across the board

Challenge 4: Via-in-Pad Quality Under Large BGA Packages

GPU and NVSwitch BGA packages require via-in-pad (VIPPO) structures to achieve the routing density needed for BGA escape on fine-pitch packages. The via-in-pad process—filling drilled vias with epoxy, curing, planarizing by grinding, and cap-plating with copper—must produce a pad surface that is flat within ± 10 μm of the surrounding solder mask surface. If the filled via dimples below the pad (due to epoxy cure shrinkage) or protrudes above it (due to insufficient grinding), the solder paste volume on that pad is incorrect and the resulting solder joint is either volumetrically deficient or creates a bridging risk to adjacent pads.

The cumulative effect of multiple via-in-pad defects under a single GPU BGA package (which may contain thousands of via-in-pad structures) is significant. Even a 5% defect rate in via-in-pad planarity across a 10,000-ball BGA means 500 potentially compromised pads—a number large enough to cause measurable electrical defects or reliability failures.

Solutions: Control epoxy fill viscosity and cure temperature to minimize cure shrinkage (target < 3% volumetric shrinkage); grind to planarity using a calibrated mechanical planarizer with in-process thickness measurement; verify planarity on every panel using a white-light interferometer or profilometer scan of a representative coupon area; set cap-plating thickness to 10–15 μm to provide a small positive protrusion before solder mask that brings the finished pad to the correct level after solder mask application. For a detailed description of the VIPPO process in the context of GPU board fabrication, see How GPU PCBs Are Manufactured: From Bare Board to Final PCBA.

Challenge 5: Solder Void Management

Solder voids are gas-filled cavities within a solder joint. They form during reflow when flux volatiles, moisture, or trapped gas cannot escape from the solder as it transitions from paste to liquid to solid. In standard SMT assembly, voids are typically managed by accepting void areas up to 25% of the ball cross-section per IPC-7095. For GPU board power delivery joints—where BGA balls carry 5–10 A of continuous current—even voids below the IPC acceptance limit create localized resistive hot spots that reduce long-term reliability under thermal cycling.

The specific void risk in GPU board assembly is elevated by two factors: the large package body creates a tent over the BGA area that inhibits flux volatile escape during reflow; and the via-in-pad structures beneath power balls create additional void nucleation sites from residual epoxy outgassing.

Solutions:

Vacuum reflow: A vacuum reflow oven applies a partial vacuum (typically 5–20 mbar) during the liquidus phase; the pressure differential draws trapped gas out of the molten solder joints through the liquid solder surface. Vacuum reflow reduces void area from typical 15–25% to < 5% on power balls and is increasingly standard for AI accelerator assembly
Low-voiding solder paste: Flux formulations engineered for low residue and low void formation under large packages; flux activation temperature should be well below the liquidus temperature to ensure complete flux reaction before solder melts, minimizing residual volatile gas at liquidus
Optimized stencil aperture: Slightly reducing the aperture size for power BGA pads (to 85% of pad area rather than the standard 90–100%) reduces paste volume and the amount of flux volatile available to form voids, at the cost of marginally smaller solder joint volume
Pre-bake before reflow: Baking boards and packages at 80–100°C for 4–8 hours before reflow drives off absorbed moisture that would otherwise vaporize during reflow and contribute to void formation

Challenge 6: Head-on-Pillow Defects

Head-on-pillow (HoP) is a solder joint defect unique to BGA assembly where the solder paste dome on the PCB pad and the solder ball on the BGA package do not coalesce during reflow. The result is a joint that passes visual inspection (the ball is in contact with the pad surface) and may even pass initial electrical test, but has no metallurgical bond between the ball and the pad and fails under the first significant thermal cycle or mechanical stress.

HoP is caused by the package lifting slightly from the board at the moment of liquidus due to warpage—separating the ball from the paste by a distance large enough that the oxide skins on the two molten solder surfaces cannot rupture and merge. The very narrow time window during which coalescence must occur (seconds at liquidus temperature) combined with the continuous warpage motion of the package during this window makes HoP the most difficult GPU assembly defect to eliminate reliably.

Solutions: The most effective HoP prevention is a combination of warpage-minimizing reflow profile (slow ramp, extended soak, controlled cooling as described in Challenge 1) and nitrogen atmosphere (which reduces oxide skin thickness on solder surfaces, lowering the energy barrier to coalescence). Secondary measures include using OSP (Organic Solderability Preservative) surface finish on PCB BGA pads rather than ENIG where possible—OSP provides a more wettable copper surface at reflow temperature than ENIG's nickel layer, reducing the coalescence energy requirement; and specifying the package vendor's solder ball alloy to have a liquidus temperature within 5°C of the paste alloy liquidus, minimizing the temperature window where one solder surface is liquid and the other is not. For HoP detection, 3D X-ray inspection using CT reconstruction is required; 2D X-ray cannot reliably distinguish a HoP joint from a correctly formed joint.

Challenge 7: NVSwitch and Multi-Package Boards

An H100 HGX baseboard carries not only 8 GPU packages but also 4 NVSwitch 3.0 chips, each approximately 35–40 mm on a side with 64 NVLink 4.0 ports. A B200 baseboard adds larger GPU packages and may include even more complex switch silicon. The assembly challenge is not simply that there are more large BGAs on the board—it is that the different packages have different warpage profiles, different optimal reflow temperatures, and different placement accuracy requirements, yet all must be assembled in a single reflow pass on a board with a single thermal profile.

NVSwitch packages typically have less severe warpage than GPU CoWoS packages because they lack the silicon interposer stack, but they are still large enough that standard placement machines require specialized nozzles and careful alignment. More significantly, the combination of GPU and NVSwitch package thermal masses on the same board creates a board-level thermal map during reflow where the eight GPU packages dominate heat absorption in their local areas while the NVSwitch packages heat more rapidly—creating differential timing in reaching liquidus across the board surface.

Solutions: Use thermal simulation (finite element analysis of the board during the reflow profile) to predict temperature distribution before the first physical profiling run; this identifies areas of the board where temperature will exceed specification before hardware is at risk. Design the reflow profile to satisfy the most demanding constraint (typically, minimum temperature at the GPU package balls > 217°C and maximum temperature at nearby 0201 passives < 260°C). For boards where the constraints cannot be simultaneously satisfied with a single profile pass, selective reflow using localized heating (laser reflow or focused IR heating) for the GPU packages is an option, though it significantly increases cycle time and equipment cost. The NVLink routing implications of NVSwitch assembly are discussed in the NVSwitch architecture guide.

Challenge 8: High-Current Power Component Assembly

GPU boards for AI servers carry VRM (Voltage Regulator Module) assemblies that convert bus voltage (12 V or 48 V) to GPU core voltage (0.85–0.9 V) at continuous currents of 400–800 A per GPU. The power inductors, switching FETs, and driver ICs in these VRMs are not especially difficult to assemble individually, but their assembly requirements conflict with the GPU BGA requirements in several ways.

Power inductors have large ferrite bodies that act as thermal sinks—they heat slowly and cool slowly relative to the surrounding passives, creating local temperature gradients during reflow. More critically, power inductors and FETs often require higher solder volumes than standard SMT passives (larger pads, thicker paste) while adjacent GPU BGA pads require tightly controlled paste volumes with fine-pitch stencil apertures. A single stencil cannot simultaneously be optimal for both high-volume power components and fine-pitch GPU pads; a step stencil or multiple-pass printing approach is required.

High-current PCB traces in the VRM area also require attention to assembly-induced stress. Press-fit power connectors (used for high-current bus connections on some GPU baseboards) must be inserted with precisely controlled force to prevent annular ring cracking at the hole—a failure mode that may not be visible at assembly but creates a high-resistance connection point that fails under thermal cycling. The power delivery architecture of GPU boards is discussed in depth in the AI Accelerator PCB Design Guide.

Solutions: Use step stencils with locally reduced aperture thickness in GPU BGA areas and locally increased aperture thickness in VRM inductor pad areas; this accommodates the volume requirements of both component types in a single print pass. For press-fit power connectors, use a calibrated press with force measurement and position feedback to ensure consistent insertion depth without over-force; verify annular ring integrity by cross-section inspection of a sample from each production lot.

Inspection Strategy: AOI, 3D X-Ray, and SPI

GPU board inspection cannot rely on any single inspection method. The combination of hidden BGA joints (invisible to optical inspection), large package thermal mass effects (creating defects not visible until X-ray CT), and the catastrophic cost of a latent defect reaching a data center installation requires a layered inspection strategy.

Solder Paste Inspection (SPI): 3D SPI immediately after stencil printing measures paste volume, height, area coverage, and offset on every pad. For GPU BGA pads, paste volume deviation > ± 15% triggers rejection before placement—a lower threshold than standard SMT because the tight via-in-pad tolerance means insufficient paste cannot be recovered during reflow. SPI data is trended across panels to detect gradual stencil clogging or aperture wear before it causes assembly defects.

Automated Optical Inspection (AOI): Post-reflow AOI inspects all accessible component joints and surfaces. On GPU boards, AOI is effective for: missing or misplaced passive components; solder bridges on accessible fine-pitch pads; component polarity verification; and surface finish anomalies on connector pads. AOI cannot inspect the GPU or NVSwitch BGA joints, which are completely hidden by the package body.

3D X-Ray Computed Tomography (AXI): Every GPU board undergoes 100% 3D CT X-ray inspection of all large BGA packages. The inspection parameters for GPU boards are:

Void acceptance: < 25% void area per ball cross-section; < 5% of balls may exceed 10% void area; zero tolerance for any single ball with > 50% void area
HoP detection: CT reconstruction of the solder joint profile reveals the characteristic HoP signature (ball and paste dome present but unmerged); any HoP on signal rows is cause for rejection; isolated HoP on ground balls may be accepted under engineering review
Bridge detection: Inter-ball bridges visible as continuous solder connecting adjacent pads in the CT slice images
Ball absence: Missing balls (open circuits) visible as empty pad locations in the CT image

3D CT X-ray inspection adds significant cycle time (10–30 minutes per board depending on the package count and CT resolution) and capital equipment cost, but it is non-negotiable for AI server board assembly given the value of the GPU packages and the cost of a field failure. The 2D X-ray alternative is inadequate: 2D X-ray projects all BGA layers onto a single plane, making it impossible to detect HoP joints or distinguish voids in inner ball rows from features in outer rows.

BGA Rework on GPU Boards

BGA rework on a GPU board is one of the most difficult PCB assembly operations performed in commercial electronics. The challenges include: the GPU package's large thermal mass requires more heat than standard BGA rework stations can deliver; the package is surrounded by other large packages at close proximity, limiting the local heating envelope; the fine-pitch BGA pitch (< 0.8 mm) means that even small misalignment during reball and replacement creates bridges; and the via-in-pad structures under the BGA may be damaged by the rework thermal cycle, potentially requiring board-level repair before the replacement package can be placed.

The rework process sequence for a GPU BGA on an AI server board:

Package removal: Localized heating using a hot-air or IR rework station with a custom nozzle matched to the GPU package outline; temperature is ramped to above liquidus at the package-board interface while maintaining adjacent components below their rated maximum; the package is lifted with a vacuum pick when the solder joints are fully molten. Removal time is typically 8–15 minutes for a large GPU package
Site preparation: After package removal, the PCB pad array is inspected for damaged via-in-pad structures (voided or cracked epoxy fill, delaminated cap plating); damaged pads are repaired or flagged as non-functional before proceeding. Residual solder is removed from the pad array by solder wick or a rework-specific soldering iron with a flat chisel tip; the pad surfaces are cleaned with isopropyl alcohol to remove flux residue
Solder paste or flux application: New solder paste is applied to the cleaned pad array using a mini-stencil cut to the GPU package footprint; alternatively, flux-only is applied if the replacement package has pre-formed solder balls (not reballed)
Package placement: The replacement package (or the reballed original package, if the defect was in the package-to-board joint rather than the package itself) is placed using a rework station with vision alignment; the rework station's camera system aligns the package BGA ball pattern to the pad pattern on the board
Reflow: Localized reflow using the rework station's top heater; the profile replicates the oven reflow profile at the GPU package location; a bottom pre-heater maintains the PCB at 100–120°C to reduce the temperature differential between the rework site and the rest of the board
Post-rework inspection: 3D X-ray CT of the reworked BGA array to verify joint quality; functional test of the reworked board to confirm PCIe enumeration, NVLink connectivity, and compute functionality

Rework yield for GPU packages on AI server boards is significantly lower than initial assembly yield, and the rework process itself introduces additional thermal stress to the board and adjacent components. Most AI server board programs establish a rework policy that limits each board to a maximum of one or two GPU rework operations before the board is scrapped; repeated rework thermally degrades the laminate and solder joints of adjacent packages to an unacceptable level.

Burn-In and Final Functional Test

GPU boards undergo extended burn-in and functional testing before shipment. The functional test sequence for an assembled H100 HGX baseboard or equivalent:

Power sequencing and rail verification: Each power rail is verified at the correct voltage and current draw within the specified sequencing window. A current spike significantly above the specification limit during any rail's power-on indicates a short circuit (solder bridge on a power BGA ball or a bridged decoupling capacitor) and triggers immediate power-off and defect localization.

GPU enumeration and NVLink topology: All 8 GPU packages are enumerated at their full PCIe Gen5 link speed; NVLink 4.0 topology between all GPUs and NVSwitch chips is verified. A missing GPU or a reduced-lane PCIe link indicates a solder joint failure on that GPU's PCIe signal balls. As described in the NVLink routing guide, NVLink 4.0 operates at 100 Gb/s per lane and requires correct solder joints on all NVLink signal balls across all GPU and NVSwitch packages to achieve full fabric bandwidth.

Memory bandwidth test: All HBM stacks on all GPUs are benchmarked for bandwidth. A GPU achieving significantly less than its rated HBM bandwidth (3.35 TB/s for H100 SXM5) indicates either HBM-to-interposer connectivity issues or insufficient voltage on the HBM power rail.

AI workload throughput test: A matrix multiplication or transformer inference benchmark verifies end-to-end compute throughput. This test is sensitive to any combination of compute, memory, and interconnect defects that would degrade system-level AI performance without necessarily causing a hard failure.

Burn-in protocol: Boards are operated at 65–75°C ambient at maximum GPU compute load for 48–72 hours. IR thermography at the start of burn-in identifies any hot spots from marginal connections or thermal management failures. Boards that complete burn-in with all performance metrics within specification are released; boards that fail during burn-in are removed for failure analysis.

Yield Management and Continuous Improvement

GPU board assembly yield management requires a more rigorous approach than standard PCB assembly because the component costs are so high that even a 1% yield loss represents significant scrap value. The elements of an effective GPU board yield program include:

Defect Pareto tracking: Every failed board and every failed inspection result is logged with defect type, location on the board, and process step at time of detection. Weekly Pareto analysis identifies the defects responsible for the majority of yield loss and directs process improvement effort to the highest-impact areas.

First-pass yield (FPY) tracking by process step: Yield is tracked separately at each inspection point (post-SPI, post-AOI, post-X-ray, post-functional test). A high post-SPI yield combined with low post-X-ray yield indicates that the defect origin is in the reflow or placement process, not the paste printing process. This decomposition allows root cause isolation without extensive destructive analysis.

Statistical process control (SPC): Key process parameters (paste volume Cpk from SPI, reflow peak temperature at GPU location, backdrill depth verification) are monitored with control charts; out-of-control signals trigger immediate process review before defective boards are produced. As the HDI PCB guide notes, the cumulative tolerance stack-up across 30+ layers means that small process drifts in fabrication can propagate into assembly defects; SPC on fabrication parameters is as important as SPC on assembly parameters.

Supplier qualification and incoming inspection: GPU packages and NVSwitch chips are high-value components. Incoming inspection verifies package marking, moisture sensitivity level compliance (confirming the MBB was sealed within the required floor life), and visual inspection for handling damage. Components showing evidence of improper moisture exposure (MBB breach, expired humidity indicator) are quarantined for baking before use per J-STD-033.

FAQ

What is the most common cause of GPU BGA assembly failure?
Head-on-pillow (HoP) defects caused by package warpage during reflow are the most common root cause of latent GPU BGA failures. HoP joints often pass initial electrical test because the ball and pad are in physical contact, but they fail within the first few thermal cycles in service when the unformed metallurgical bond separates under stress. The second most common cause is solder voiding on power delivery BGA balls, which creates localized resistive hot spots under continuous high-current loading. Both defects are addressed by the combination of reflow profile optimization, nitrogen atmosphere, and vacuum reflow for power balls.

Is 2D X-ray inspection sufficient for GPU board BGA verification?
No. 2D X-ray is not sufficient for GPU BGA inspection because it projects the full 3D solder joint structure onto a single plane. HoP defects, which are the most common latent failure mode, appear as a slight gap between ball and paste in a cross-sectional view—a feature that is obscured in a 2D projection by the overlapping solder of adjacent layers. 3D computed tomography (CT) X-ray is the minimum required inspection technology for GPU BGA verification; it reconstructs cross-sectional slices through the ball array that can detect HoP, excessive voiding, and subtle bridges that 2D X-ray misses.

How many times can a GPU package be reworked on an AI server board?
Industry practice for AI server boards is to limit GPU package rework to one rework cycle per board location. Each rework cycle subjects the board to an additional full thermal cycle above liquidus, which cumulatively degrades the laminate, surrounding solder joints, and via plating. A second rework cycle at the same location significantly increases the risk of collateral damage to adjacent NVSwitch packages, power delivery components, and the PCB itself. Boards that require a second GPU rework are typically scrapped rather than reworked, particularly for high-value AI server programs where the cost of a field failure exceeds the cost of a board scrap.

What is the difference in assembly difficulty between H100 SXM5 and B200 SXM6 boards?
B200 SXM6 boards are significantly more difficult to assemble than H100 SXM5 boards for three reasons. First, the B200's dual-die CoWoS package (two GB100 dies on a shared silicon interposer) has a larger footprint and higher thermal mass than the H100's single-die CoWoS package, making warpage and thermal management at reflow more challenging. Second, the B200's 1,000 W TDP imposes higher sustained current loads on VRM components and power delivery BGA balls, increasing the sensitivity of those joints to solder voiding. Third, the B200 board's higher layer count (24–32 layers vs 20–24 for H100) means more via-in-pad structures requiring fill and planarization, and more backdrilling operations for NVLink 5.0 and PCIe Gen6 vias. The Blackwell architecture's PCB implications are covered in detail at NVIDIA Blackwell Architecture Explained.

Why is nitrogen atmosphere important for GPU board reflow?
Nitrogen atmosphere (O₂ < 100 ppm) during reflow reduces the oxide layer that forms on solder surfaces when they are exposed to oxygen at high temperature. Solder oxide is hydrophobic and resists coalescence—it is one of the contributing factors to both HoP defects (where the oxide on the ball surface prevents it from merging with the paste dome) and solder voiding (where the oxide on the paste surface traps volatiles below a skin rather than allowing them to escape). Nitrogen atmosphere reduces the oxide skin thickness, lowering the activation energy for coalescence and improving the probability that the ball and paste merge cleanly during the liquidus window. The cost of nitrogen consumption in a production reflow oven is small compared to the yield improvement it provides on high-value GPU board assemblies.

What quality standard should GPU board assembly be certified to?
IPC Class 3 (High Reliability) is the applicable quality standard for GPU board assembly for AI server applications. Class 3 specifies the most stringent solder joint acceptance criteria (minimum solder fillet height, maximum void area, minimum side overhang), via barrel minimum plating thickness, and annular ring requirements. IPC-A-610 Class 3 for assembly acceptability and IPC-6012 Class 3 for fabrication define the combined quality framework. AI server programs at hyperscale cloud providers typically add their own supplementary requirements on top of IPC Class 3—particularly for BGA void acceptance (stricter than the IPC limit on power balls) and burn-in duration (longer than standard commercial practice).

Need to Manufacture AI Accelerator GPU Boards?

GPU board assembly for H100, H200, B200, and OAM-based AI accelerators demands process expertise that goes far beyond standard SMT production. NextPCB provides advanced PCB assembly services for AI, GPU, and high-performance computing applications, featuring large-format BGA assembly, high-density package process optimization, X-ray inspection, IPC Class 3 quality control, and customized testing solutions to support demanding electronic systems.

Upload Files & Get Your Instant Quote Now Engineer Consultation

About the Author

Stacy Lu

With extensive experience in the PCB and PCBA industry, Stacy has established herself as a professional and dedicated Key Account Manager with an outstanding reputation. She excels at deeply understanding client needs, delivering effective and high-quality communication. Renowned for her meticulousness and reliability, Stacy is skilled at resolving client issues and fully supporting their business objectives.

855 0 0 1 Facebook Twitter Linked In

GPU Board Assembly: Manufacturing Challenges for AI Accelerator Cards

Introduction

What Makes GPU Board Assembly Different from Standard PCB Assembly

GPU and AI Accelerator Package Types

Challenge 1: Package and Board Warpage During Reflow

Challenge 2: Fine-Pitch BGA Placement Accuracy

Challenge 3: Reflow Profile Optimization for High Thermal Mass

Challenge 4: Via-in-Pad Quality Under Large BGA Packages

Challenge 5: Solder Void Management

Challenge 6: Head-on-Pillow Defects

Challenge 7: NVSwitch and Multi-Package Boards

Challenge 8: High-Current Power Component Assembly

Inspection Strategy: AOI, 3D X-Ray, and SPI

BGA Rework on GPU Boards

Burn-In and Final Functional Test

Yield Management and Continuous Improvement

FAQ

Need to Manufacture AI Accelerator GPU Boards?

About the Author

Recommended Article: