Arya Li, Project Manager at NextPCB.com
Support Team
Feedback:
support@nextpcb.comIntroduction: As Artificial Intelligence (AI) models grow exponentially in size and complexity, the hardware required to train and run them faces immense physical and electrical bottlenecks. The traditional method of placing individual chips directly onto a printed circuit board (PCB) is no longer sufficient to handle the massive data bandwidth required by modern Large Language Models (LLMs). Enter CoWoS packaging (Chip-on-Wafer-on-Substrate), TSMC's industry-leading 2.5D advanced packaging technology. CoWoS has become the cornerstone of high-performance AI accelerators, including NVIDIA's H100 and the highly anticipated B200 GPUs. In this comprehensive guide, we will explore what CoWoS packaging is, why it is critical for AI hardware, the different variations of the technology, and how these massive, high-density packages fundamentally alter the requirements for downstream PCB design and assembly.
CoWoS (Chip-on-Wafer-on-Substrate) is an advanced 2.5D packaging technology developed by TSMC (Taiwan Semiconductor Manufacturing Company). In standard monolithic chip design, all logic, memory, and I/O components are fabricated on a single piece of silicon. However, as chip sizes approach the physical reticle limit of photolithography tools (roughly 858 mm2), manufacturing large monolithic chips becomes economically unviable due to plunging defect yields.
CoWoS packaging solves this by adopting a multi-chip module (MCM) or "chiplet" approach. In a 2.5D CoWoS architecture, multiple active silicon dies—such as a central GPU logic die and several High Bandwidth Memory (HBM) stacks—are placed side-by-side on top of a passive silicon base layer known as an interposer. The interposer contains tens of thousands of microscopic, high-density routing traces and Through-Silicon Vias (TSVs). This interposer acts as an ultra-fast communication bridge between the logic die and the memory. Finally, this entire chip-on-interposer assembly is mounted onto a complex organic package substrate, which is then soldered to the main PCB.
The "2.5D" designation refers to the fact that while the chips are placed side-by-side horizontally (2D), they utilize a silicon interposer with TSVs that route signals vertically (3D) down to the substrate, creating a hybrid dimensional structure that maximizes interconnect density while keeping thermal management manageable.
The relentless demand for AI compute power has created an architectural challenge known as the "memory wall." An AI processor can only compute data as fast as it can retrieve it from memory. Traditional memory architectures like GDDR6, routed through standard PCB traces, cannot provide the terabytes-per-second bandwidth required by generative AI models.
By moving from earlier generations to modern architectures, the reliance on CoWoS has only intensified. If you look at the A100 vs H100 generational leap, the H100 (Hopper architecture) utilizes TSMC's 4N process and surrounds the massive core GPU die with up to six HBM2e or HBM3 memory stacks. Achieving a memory bandwidth of up to 3.35 TB/s is physically impossible using standard PCB routing because the trace width and spacing limits on even the most advanced PCBs are far too wide compared to the microscopic traces required.
CoWoS packaging addresses these bottlenecks in three specific ways for the H100 and the upcoming NVIDIA Blackwell B200 GPUs:
To fully grasp the complexity, let's compare TSMC's CoWoS with standard flip-chip BGA (FCBGA) packaging typically used for standard CPUs or consumer electronics.
| Feature | Standard Packaging (FCBGA) | Advanced 2.5D Packaging (CoWoS) |
|---|---|---|
| Die Configuration | Usually monolithic (single die) | Multi-die (Logic + HBM + I/O) |
| Interconnect Medium | Organic Substrate | Silicon Interposer + Organic Substrate |
| Trace Density (L/S) | ~10μm / 10μm | Sub-micron (e.g., 0.4μm / 0.4μm) |
| Memory Integration | External (DDR/GDDR on PCB) | In-package (HBM stacks on interposer) |
| Bandwidth Ceiling | ~100s of GB/s | Multi-TB/s (e.g., B200 reaches 8 TB/s) |
| Package Size | Small to Medium (< 50x50 mm) | Massive (Up to 120x120 mm and growing) |
| Cost & Yield | Low cost, high yield | Very high cost, highly constrained supply |
As the demand for AI chip packaging has evolved, TSMC has diversified the CoWoS family into three distinct variants to balance cost, performance, and maximum package size.
This is the classic and most widely used version of CoWoS, utilized in the NVIDIA A100 and H100 GPUs. It relies on a full-size monolithic silicon interposer placed between the active chips and the organic substrate. While it offers the highest routing density and proven reliability, the size of the silicon interposer is bound by reticle limits (currently pushing past 3.3x the reticle size). Manufacturing such massive silicon interposers is expensive and prone to defects.
To reduce costs and improve yields for less demanding applications, CoWoS-R replaces the expensive silicon interposer with an organic interposer that uses multiple layers of Redistribution Layers (RDL). It utilizes InFO (Integrated Fan-Out) technology to route signals. While it cannot achieve the ultra-fine pitch of silicon, it is more cost-effective and provides better mechanical flexibility, reducing the risk of package warpage.
CoWoS-L represents the cutting edge and is the critical enabler for the NVIDIA Blackwell B200 and AMD MI300X. Instead of a single, massive silicon interposer, CoWoS-L embeds small, dense silicon "bridges" (Local Silicon Interconnects) only in the specific areas where ultra-high-density routing is needed—such as the exact pathways between the GPU die and the HBM stacks, or between two GPU dies. The rest of the routing is handled by a less expensive molding compound and RDL. This hybrid approach allows the overall package to scale to enormous sizes (potentially 6x reticle limit) without the catastrophic yield losses associated with giant silicon wafers.
The sheer size and I/O density of CoWoS packages directly dictate the engineering constraints placed on the underlying PCBs. When you place an enormous AI accelerator package onto an OAM Module or a PCIe baseboard, traditional PCB rules no longer apply. According to our comprehensive AI Accelerator PCB Design Guide, engineers must adapt to several critical shifts.
A typical CoWoS package for an AI GPU features an immense BGA (Ball Grid Array) footprint, often exceeding 5,000 to 8,000 pins with a pitch of 1.0mm or smaller. Escaping this density requires massive layer counts. This is why AI GPUs require 30+ layer HDI PCBs. The core of these PCBs must employ multiple layers of blind and buried vias, stacked microvias, and Any-Layer HDI technology just to fan out the power and signal lines from the BGA pads without causing signal crosstalk.
CoWoS-packaged AI chips consume massive amounts of power. The NVIDIA B200 can draw up to 1,000W per package. Delivering 1000A at ~1.0V requires an exceptionally robust Power Delivery Network. PCB designers must allocate thick copper layers (often 2oz or higher) dedicated solely to power and ground planes to prevent voltage droop (IR drop) and mitigate thermal hotspots beneath the package. Furthermore, decoupling capacitors must be placed directly underneath the GPU package on the bottom side of the PCB, requiring precision Via-in-Pad Plated Over (VIPPO) technology.
While the HBM communication stays inside the CoWoS package, the GPU still needs to communicate with the outside world via PCIe Gen5 (or Gen6) and ultra-fast interconnects. As discussed in our analysis of NVLink PCB Routing, pushing 112G PAM4 signals out of the package and across the board demands ultra-low-loss PCB materials. Standard FR4 is entirely inadequate; manufacturers must utilize premium laminates like Megtron 7, Megtron 8, or advanced Rogers high-speed materials, coupled with ultra-smooth copper foils (HVLP) to minimize skin effect losses.
Fabricating the advanced PCBs and assembling the final PCBA for CoWoS-equipped AI accelerators pushes manufacturing tolerances to their absolute limits.
1. Package Warpage During Reflow: The most significant challenge in BGA assembly for AI accelerator cards is dealing with the sheer physical size of the CoWoS package. Because the package consists of silicon dies, a silicon interposer, and an organic substrate, each material has a different Coefficient of Thermal Expansion (CTE). During the SMT reflow oven process (peaking around 245°C - 260°C), the package tends to warp, leading to "head-in-pillow" defects, open joints, or short circuits on the outer rows of the BGA.
2. Voiding and Thermal Dissipation: With power densities exceeding 1000W, any voids in the solder joints beneath the GPU can act as thermal insulators, leading to localized hotspots that throttle the chip's performance. PCB assembly lines must utilize vacuum reflow soldering techniques and strict X-ray (AXI) inspections to ensure solder joint voiding is kept well below 10%.
3. Heavy Copper and Thermal Vias: To manage heat, AI server PCBs often integrate copper coins or thousands of thermal vias directly under the BGA package. Plating these high-aspect-ratio vias reliably without trapping chemicals or creating weak barrel walls requires state-of-the-art chemical deposition lines.
In 2.5D packaging (CoWoS), chips are placed side-by-side on an interposer. In 3D IC packaging (like TSMC's SoIC), active silicon dies are stacked directly on top of each other (e.g., logic on top of logic, or SRAM stacked on CPU, as seen in AMD's 3D V-Cache). 2.5D is currently preferred for massive high-power AI GPUs because placing logic next to memory is much easier to cool than stacking hot logic chips on top of each other.
The primary bottleneck in the supply chain for NVIDIA H100 and B200 GPUs is not the fabrication of the silicon wafers themselves, but the CoWoS packaging capacity. Creating the interposer and aligning the massive dies requires highly specialized equipment and cleanroom space, which takes time for TSMC to scale up.
Yes. The massive I/O count and extreme power requirements of CoWoS packages mandate that the main carrier boards (OAM baseboards or Universal Baseboards) use very high layer counts (often 24 to 30+ layers), ultra-low loss materials, and highly complex power delivery designs that dramatically increase the difficulty of PCB fabrication and assembly.
Advanced 2.5D packaging technologies like TSMC's CoWoS are the unsung heroes of the AI revolution. By breaking the reticle limit and solving the memory bandwidth bottleneck, CoWoS enables the incredible computational leaps seen in the NVIDIA H100 and the upcoming Blackwell B200 architectures. However, this packaging innovation shifts immense complexity downstream to the PCB level. Designing and manufacturing the boards that house these colossal AI chips requires mastering 30+ layer HDI architectures, premium ultra-low-loss materials, and flawless BGA assembly processes capable of combating severe thermal warpage.
To succeed in the AI hardware space, partnering with a PCB manufacturer equipped to handle these extreme tolerances is non-negotiable.
Need to manufacture AI server PCBs capable of supporting advanced CoWoS-packaged accelerators? Get a quote from NextPCB →
Still, need help? Contact Us: support@nextpcb.com
Need a PCB or PCBA quote? Quote now