Company Deep Dive #3 - CEREBRAS SYSTEMS

Building blazing fast AI hardware at wafer scale

Aug 06, 2025

We’ll be diving into the chipmaker Cerebras Systems today. It’s a private company, but they’re large enough to go public. And they had actually filed to go public last year, but have it on hold for now.

1. Why Cerebras Matters in the Bigger Picture

Cerebras Systems is a relatively young Silicon Valley chipmaker that builds the largest computer chips ever made. Each one is literally an entire silicon wafer packed with hundreds of thousands of AI-specialized cores. The company’s latest Wafer-Scale Engine (WSE-3) squeezes 4 trillion transistors and 900,000 cores onto a single piece of silicon. And can train or run LLMs far faster than today’s most advanced GPUs.

That matters because global demand for computing power is exploding. PitchBook estimates that well over half of all venture capital deployed in the first quarter of 2025 (about $65 billion) went into AI startups. Those companies all need hardware to crunch data.

Nearly every modern infrastructure-as-a-service startup (from cloud hosting to real-time analytics) now relies either directly or indirectly on specialized AI chips supplied by a handful of vendors. Nvidia controls roughly 95% of that market today, but it cannot build chips fast enough to meet demand. And its software ecosystem can lock young companies into its orbit.

Cerebras offers a credible alternative. If its technology scales as advertised, it could ease the shortage of AI compute and cut operating costs for thousands of infra startups. And also dilute Nvidia’s pricing power.

The company’s own fortunes therefore ripple outward: when Cerebras thrives, the startups that depend on affordable, plentiful compute get breathing room. If Cerebras stumbles, many of those same firms remain trapped in an overheated GPU market.

2. How Wafer-Scale Computing Works

Traditional chips are limited by the size of the glass “reticle” used in photolithography. So manufacturers etch many small chips onto one wafer, cut them apart, and wire them together later. Cerebras flips that idea on its head: it keeps the entire dinner-plate-sized wafer intact, turning it into one monolithic processor the size of an iPad.

Why bother? Here are two reasons:

Memory in one place. Neural-network training is mostly a game of shuffling numbers between compute cores and memory. A single wafer holds 44 gigabytes of superfast on-chip SRAM and 20 petabytes-per-second of bandwidth, eliminating most of the delays and energy losses that plague multi-chip GPU clusters.
Simpler programming. Because an entire AI model can live inside one device, developers no longer split it across dozens (or thousands) of GPUs. Early users report cutting weeks off training jobs and slashing the amount of custom code they have to write.

This design is risky. Any defect on such a large wafer could ruin the whole chip. And a single board draws upward of 15 kilowatts. But Cerebras builds in spare cores to tolerate tiny flaws and sells pre-engineered cabinets with their own cooling and power systems. So buyers can slide them into a data-center rack like a refrigerator-sized server.

For workloads that never fit cleanly on GPUs (very large models, protein folding, or million-token document searches), the speed up can reach an order of magnitude. For example, the company Notion now serves enterprise search for 100 million users on Cerebras hardware because queries that took seconds on GPUs complete almost instantly on a wafer.

3. Insatiable Demand for AI Compute

LLMs have grown a thousand-fold in size since 2018 and now contain hundreds of billions (or even trillions) of parameters. Each leap consumes more power, memory, and engineering time. Cerebras and its competitors estimate that the total market for AI training and inference hardware will grow from roughly $131 billion in 2024 to more than $450 billion by 2027. This is a 50% compound growth rate.

The scramble is not just about training models. Serving them to real users can cost even more. Meta’s new Llama API powered by Cerebras machines pushes out more than 2,600 tokens per second. About 18x faster than common GPU servers. And they do it at one-tenth the cost.

Faster and cheaper inference lets companies roll out chatbots, personal assistants, and recommendation engines without burning through their cash. That dynamic is why so many infra startups (from vector database companies to real-time video-compression products) watch Cerebras closely. Lower compute cost widens their gross margins and improves the odds that they survive long enough to reach scale.

Yet supply remains tight. Global chip foundries are booked solid and high-bandwidth memory is in short supply. When Nvidia raises prices or allocates cards to its largest customers first, smaller startups delay product launches or pay costly cloud-rental markups. A second supplier with materially different technology like Cerebras introduces both extra capacity and competitive pressure on pricing. This benefits the wider ecosystem.

4. Cerebras’s Business Health

Although still private, Cerebras had to reveal plenty of numbers when it filed for an IPO in late 2024. Revenue jumped from $25 million in 2022 to nearly $79 million in 2023 and already reached $136 million in the first half of 2024.

Losses are shrinking quickly partly because its biggest customer (Abu-Dhabi-based G42) prepays for hardware. G42 has committed about $1.4 billion in orders through early 2025, giving Cerebras cash to build chips and fund R&D before the revenue is fully recognized.

That single customer concentration is the company’s weak spot: roughly 80% of sales still come from G42. But the order book is widening. In July and Aug 2025, Cerebras announced:

OpenAI launched their first open source model gpt-oss yesterday. And Cerebras is a day zero launch partner and to serve this model at more than 3,000 tokens per second. The fastest in the industry.
Qwen3-235B, an open-source model that runs on its inference cloud at 30x the speed of comparable closed models.
A partnership with Notion, powering real-time document search for more than 100 million end users.

Each new reference account shows potential buyers that wafer-scale is not a science-project but a production-ready option. At the same time, management is raising another $1 billion in private capital, accepting a slight delay to the IPO while US regulators finish a national-security review of G42’s stake. That fresh cash cushions the balance-sheet and should cover working capital as manufacturing scales.

If Cerebras converts its backlog on schedule and lands even a handful of additional medium-sized customers, total revenue could exceed $500 million in 2025, putting the company on a path to profitability. But if G42 pulls back or regulators impose new export restrictions, the ramp slows and losses may grow again. Investors and infra startups alike need to track that dependency.

5. Competitive and Regulatory Squeeze

Cerebras’s wafer-scale engine is unique, but uniqueness alone does not guarantee adoption. Nvidia still ships the bulk of AI accelerators. And its upcoming Blackwell architecture promises dramatic memory upgrades that nibble at Cerebras’s advantage. AMD, Google, Groq, and a handful of optical-computing start-ups are also racing forward.

On the policy side, the U.S. government has tightened export controls on advanced chips. Because Cerebras fabricates its wafers at TSMC in Taiwan, it must obey those rules. The Treasury-led Committee on Foreign Investment in the United States (CFIUS) was reviewing whether G42’s equity stake poses national-security concerns. One reason the IPO was on ice. But in late Mar 2025, Cerebras announced that it has obtained clearance from CFIUS to sell shares to G42. They’re expected to go IPO in Q3 2025, but the exact timeline is yet to be announced.

A pragmatic investor would therefore weigh Cerebras’s technology lead against the reality that a couple of missteps (e.g. yield problems at the fab, a faster Nvidia release, sudden policy shift) could derail growth. Infra startups that pin their compute roadmaps exclusively on Cerebras equipment should have contingency plans, just as those who rely solely on Nvidia do today.

6. Why Infra Startups Should Care and How Many Are Exposed

The modern infra startup landscape covers everything that sits below an end-user application: cloud hosting, edge networks, developer platforms, data engineering pipelines, observability tools, and so on. A growing share of those companies now incorporate AI directly into their products. To speed up search, route traffic intelligently, or generate code.

PitchBook data show that roughly 70% of all US venture dollars in early 2025 landed in AI-related companies. Within that tide, analysts estimate that about one-third of new infra deals involve workloads that need specialized compute beyond commodity CPUs.

Put differently, something like one in five infra startups now depends materially on access to high-end AI accelerators. Most still rent GPUs in the cloud at steep markups because they cannot secure hardware outright.

When Cerebras increases supply or undercuts inference prices (as it did with Notion and Meta’s Llama API), the knock-on effects are immediate:

Cost curves flatten. Startups that had pencilled in GPU budgets see their cost-of-goods sold shrink, extending runway.
Time-to-market shortens. Faster model training lets smaller engineering teams iterate more quickly. This can be an existential boon in early stages.
Investor sentiment shifts. If alternative suppliers like Cerebras prove viable, venture firms feel more comfortable funding infra companies that might otherwise be hostage to GPU scarcity.

Conversely if Cerebras falters, those same startups remain trapped in a seller’s market controlled by Nvidia and a few hyperscale clouds. Projects that rely on sub-second inference (like real-time transcription, autonomous-system control loops) may become economically infeasible.

Fundraising terms for infra startups that need compute could tighten. And second-order effects reach beyond AI: data-center builders, cooling-equipment vendors, and power-management firms all ride the same wave.

7. Key Signals to Track Over the Next 18 Months

Anyone invested in or building an infra startup should keep an eye on a handful of concrete markers:

Delivery of Condor Galaxy 3 and 4. G42 and Cerebras plan to expand their joint supercomputer network to at least 16 exaFLOPS by early 2026. On-time delivery tells us the manufacturing pipeline is healthy. Delays hint at supply-chain strain.
Customer concentration math. If G42 still represents more than half of Cerebras’s revenue by late 2025, dependency risk stays high. Look for new publicly named clients of meaningful size e.g. cloud providers, sovereign labs, Fortune 500 companies.
Cloud-usage metrics. Monitor uptake of Cerebras Inference Cloud. Steady press releases such as Notion and Qwen3 imply growing demand. But the proof is sustained utilization and repeat customers.
Regulatory news on the CFIUS review. A clean clearance removes the biggest non-technical overhang. A forced restructuring could slow capital raises or deliveries.
Nvidia’s next launch cycle. If Blackwell or its successor narrows the memory-bandwidth gap, Cerebras may need WSE-4 sooner than planned.

A positive read-out on three or four of these signals would confirm that wafer-scale computing is carving out a durable niche. A negative trend would re-tighten the hardware bottleneck for smaller players.

8. Risks and Dependencies for the Wider Ecosystem

Supply-chain fragility. Cerebras relies on TSMC’s advanced nodes and on a small number of high-bandwidth-memory suppliers. A geopolitical shock in the Taiwan Strait or a prolonged high-bandwidth-memory shortage would hit both Cerebras and Nvidia. But wafer-scale dies are especially sensitive because yield losses on a single wafer cost more dollars per defect.

Power and cooling. One CS-3 cabinet can draw more than 15 kW. Data center builders already face scrutiny as US facilities exceed 4% of national electricity use. If local utilities cannot deliver the extra juice, infra startups renting Cerebras capacity might hit power caps long before they max out the hardware.

Capital markets contagion. Venture funding is highly correlated. If a marquee AI hardware company misses targets and markdowns follow, that negative sentiment can spill over to infra startups dependent on similar narratives. PitchBook notes that AI valuations already outpace other verticals. A single high-profile disappointment could trigger a wider re-rating.

Software inertia. Many developers know Nvidia’s CUDA toolchain, not Cerebras’s compiler. If skill shortages slow adoption, infra startups may prefer “good enough” GPUs over “best in class” wafers, reducing the market that Cerebras needs to spread its fixed costs. That in turn limits economies of scale and risks higher prices. Thus feeding back into startup cost structures.

In short, the health of perhaps 20% of next-generation infra startups tracks closely with Cerebras’s ability to scale, diversify customers, and stay ahead technologically. This is especially true for startups whose value proposition hinges on low-latency AI or very large models. The rest of the ecosystem watches because a more competitive chip landscape indirectly sets pricing power for cloud services, colocation racks, and even venture term sheets.

9. Bottom Line

Cerebras’s wafer-scale gamble is no longer a lab curiosity. Real customers are in production, revenue is growing triple digits, and the backlog is measured in billions. The firm’s success could relieve a critical choke-point in global AI infrastructure, lowering costs and speeding up development for thousands of younger companies. Yet its customer list remains narrow and its fate still hinges on regulatory clearance and flawless manufacturing.

If you are building or funding an infra startup e.g. caching layer, data labeling platform, domain-specific LLM service, you should watch Cerebras as you would a key supplier. Their wins expand your runway. And their stumbles may force you to revise budgets and delivery schedules.

Today the odds look favorable. The technology works, new logos are arriving, and fresh capital should keep the roadmap on track. But prudence calls for a contingency plan. As in all young markets, surprises are part of the ride.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:

Infra Startups