The AI Bottleneck Stack

The next phase of AI will be shaped by physical and technical bottlenecks

Jun 09, 2026

The AI stack is usually described in software terms: models, data, agents, applications, evals, and workflows. But that framing misses the deeper story.

The next phase of AI will be shaped by physical and technical bottlenecks. Compute is only the first layer. Once you add more GPUs, you expose memory limits. Once you add more memory, you expose packaging limits. Once you scale clusters from thousands to tens of thousands of accelerators, you expose networking limits.

Once networking speeds move from 400G to 800G and eventually 1.6T, copper starts breaking down and optics become more important. Once racks move from 30–50 kW to 100–150 kW and beyond, power delivery and cooling become binding constraints. Once data centers scale from hundreds of megawatts to gigawatt campuses, the bottleneck leaves the building and moves into the grid.

This is the AI bottleneck stack:

Compute → memory → packaging → networking → photonics → power → cooling → grid → materials

The important point is that bottlenecks move. Solving one constraint reveals the next one.

The first wave of AI infra was compute

The world needed GPUs, accelerators, and training clusters. Nvidia became the obvious winner because it controlled the most scarce layer: high-performance AI compute plus the software ecosystem around it.

The scale is unprecedented. Frontier AI training runs now consume tens of thousands of GPUs. Some estimates suggest the largest training clusters will exceed 100,000 accelerators within the next few years. Capital expenditures from hyperscalers are expected to exceed $250 billion annually, with a growing percentage directed toward AI infrastructure. Individual AI data center campuses are increasingly planned at 500 MW to 1 GW scale, representing investments of $30 billion to $100 billion per site.

Next is memory

Compute alone is not enough. A GPU is only useful if it can access data quickly. That pushes pressure into HBM and memory bandwidth. Modern AI systems are not just compute-hungry. But they’re memory-hungry too.

A single high-end AI accelerator can require hundreds of gigabytes per second of memory bandwidth. HBM stacks now contain dozens of DRAM dies and deliver bandwidth measured in terabytes per second. Demand for HBM has grown so quickly that supply constraints have become one of the most important bottlenecks in the semiconductor industry.

Long-context models, retrieval systems, inference batching, agent workflows, and KV cache storage all increase pressure on memory capacity and bandwidth.

Then comes packaging

HBM has to sit physically close to the accelerator. That requires advanced packaging, interposers, substrates, and extremely precise integration. This is why CoWoS-like capacity, ABF substrates, glass core substrates, OSATs, inspection systems, and test equipment matter.

Advanced packaging has become one of the fastest-growing segments of semiconductor manufacturing. Industry forecasts suggest advanced packaging spending could exceed $80 billion annually by the end of the decade. The chip is no longer the product. The package is becoming the system.

At cluster scale, the next bottleneck is data movement

If thousands of accelerators cannot communicate efficiently, expensive compute sits idle. Training large models requires moving enormous amounts of data between GPUs. Every percentage point of utilization matters when clusters cost billions of dollars.

Some estimates frame networking as roughly 19–23% of AI data center capex. Approx $6-8 billion per gigawatt of AI capacity. In a 1 GW AI campus costing $40-60 billion, networking alone can represent a multi-billion-dollar infra layer.

This is where Ethernet switching, optical modules, retimers, DSPs, active electrical cables, and silicon photonics become important.

Network speeds have already moved from 100G to 400G and 800G. The industry is now preparing for 1.6T optical links. As bandwidth requirements increase, power consumption and signal integrity become major engineering challenges. The economics increasingly favor optical interconnects over traditional copper connections.

This is why companies such as Broadcom, Arista, Marvell, Astera Labs, Credo, Coherent, Lumentum, and Applied Optoelectronics sit directly inside the AI scaling problem rather than merely adjacent to it.

The next bottleneck is photonics

As clusters grow beyond tens of thousands of accelerators, moving data efficiently becomes as important as generating compute. Silicon photonics promises lower latency, higher bandwidth density, and lower power consumption compared with traditional electrical interconnects. Industry forecasts suggest the optical networking market tied to AI infra could grow into a tens-of-billions-of-dollars annual opportunity over the next decade.

For startups, this creates a large opening.

The best startup opportunities are not generic “AI infra” tools. They are wedge products that remove specific constraints.

Examples:

Software that improves GPU utilization from 50–60% toward 80–90%
Memory systems that reduce KV cache costs by 30–50%
Compilers that optimize communication patterns across thousands of accelerators
Networking observability for clusters containing 10,000–100,000 GPUs
Photonic interconnect components and silicon photonics tooling
Thermal simulation and liquid cooling control software
Power management software for high-density racks
Data center automation for energy, cooling, and failure prediction
Materials, substrates, and manufacturing tools for packaging and optics
Grid optimization software for large-scale AI campuses
Transformer, switchgear, and power electronics monitoring systems

The power layer may be the most underrated

Traditional enterprise racks consumed roughly 5–15 kW. Modern AI racks are moving toward 100–150 kW, with some future designs targeting 300 kW or more. A single AI data center can require as much electricity as a medium-sized city.

This changes power conversion, distribution, backup systems, and cooling design. Nvidia’s discussions around 800 VDC architectures point toward a future where the rack becomes an electrical system, not just a server enclosure.

Power infra spending is becoming a major component of AI deployment. Transformers, switchgear, UPS systems, generators, batteries, and power electronics are all becoming strategic assets rather than commodity purchases.

Then comes cooling

Air cooling becomes increasingly difficult as rack densities rise. Liquid cooling, direct-to-chip cooling, immersion cooling, heat exchangers, pumps, and thermal management software are becoming critical infra categories.

Cooling can represent 20–40% of total data center energy consumption depending on architecture and climate.

Then the bottleneck moves outside the data center

Transformers, substations, transmission lines, interconnection queues, turbines, batteries, fuel cells, and firm power contracts become part of the AI stack.

Many utility interconnection queues already stretch several years into the future. New transmission projects can take 5–10 years to complete. Large gas turbines often have multi-year lead times. In some regions, obtaining power has become more difficult than obtaining GPUs.

The scale is enormous. A 1 GW AI campus consumes roughly 8.76 terawatt-hours of electricity annually if fully utilized. Multiple technology companies are already discussing multi-gigawatt AI infrastructure plans. A 5 GW deployment would consume electricity comparable to that of a small country.

Finally the bottleneck reaches materials

Copper, optical fiber, rare earth elements, advanced substrates, specialty chemicals, semiconductor equipment, cooling fluids, and power electronics materials all become strategic inputs. Every layer of the stack ultimately depends on physical supply chains.

This is the real infra map for AI. A chain of constraints, each one enabling the next stage of scale.

The next decade of AI will likely involve hundreds of billions of dollars in infra spending, thousands of megawatts of new capacity, and entirely new industries built around removing bottlenecks.

The best startups will be built where the constraint is painful, technical, unavoidable, and expensive enough that customers will pay immediately.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:

Infra Startups

Discussion about this post

Ready for more?