<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Infra Startups]]></title><description><![CDATA[Research column to track infra startups and dissect the underlying themes]]></description><link>https://www.infrastartups.com</link><image><url>https://substackcdn.com/image/fetch/$s_!WDol!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa657fb53-ef2e-4cf6-b71f-05cdefae098b_1280x1280.png</url><title>Infra Startups</title><link>https://www.infrastartups.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 20 Jun 2026 18:16:16 GMT</lastBuildDate><atom:link href="https://www.infrastartups.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Prateek Joshi]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[infrastartups@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[infrastartups@substack.com]]></itunes:email><itunes:name><![CDATA[Prateek Joshi]]></itunes:name></itunes:owner><itunes:author><![CDATA[Prateek Joshi]]></itunes:author><googleplay:owner><![CDATA[infrastartups@substack.com]]></googleplay:owner><googleplay:email><![CDATA[infrastartups@substack.com]]></googleplay:email><googleplay:author><![CDATA[Prateek Joshi]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Advanced Packaging: The Toll Booth Between GPUs and Memory]]></title><description><![CDATA[The more valuable the chip, the more valuable the package around it becomes.]]></description><link>https://www.infrastartups.com/p/advanced-packaging-the-toll-booth</link><guid isPermaLink="false">https://www.infrastartups.com/p/advanced-packaging-the-toll-booth</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Sat, 13 Jun 2026 16:01:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!s8br!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you buy a standalone Nvidia H100 GPU today, you&#8217;re looking at a price tag between $25,000 and $30,000. But that incredibly expensive piece of silicon is practically a paperweight without one critical piece of infrastructure: High-Bandwidth Memory (HBM)</p><p><strong>And the only way to get data from HBM into the GPU at blistering speeds is through advanced packaging.</strong></p><p>This is a very lucrative toll booth in the semiconductor industry.</p><p>For decades, packaging was a low-margin afterthought. A cheap plastic shell to protect the logic. Today, the advanced packaging market is valued at roughly $40 billion (as of 2024) and is projected to skyrocket to $111 billion by 2034. It has quietly become the definitive bottleneck of the generative AI boom.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s8br!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s8br!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!s8br!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!s8br!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!s8br!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s8br!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1359153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/201511447?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s8br!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!s8br!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!s8br!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!s8br!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42596c8b-35db-4b8f-a7cf-76e1ed150593_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The CoWoS Choke Point</h2><p>You can design the fastest AI accelerator in the world. But if you can&#8217;t package it, you can&#8217;t ship it. TSMC&#8217;s CoWoS (Chip-on-Wafer-on-Substrate) technology has become the industry&#8217;s gold standard. <strong>CoWoS places the GPU and HBM side-by-side on a silicon interposer, allowing them to communicate with massive bandwidth over microscopic electrical traces.</strong></p><p>This process is so critical that TSMC&#8217;s CoWoS capacity literally dictates the pace of global AI infra buildouts. In 2023, TSMC was pumping out a mere 13,000 to 16,000 wafers per month (WPM). By the end of 2024, that ramped up to roughly 40,000 WPM. And for 2025, it&#8217;s a staggering 75,000 WPM. </p><p>Unsurprisingly, Nvidia commands an estimated 63% of this global supply. To keep up with the overflow, traditional OSATs (Outsourced Semiconductor Assembly and Test) like ASE and Amkor are frantically expanding their own 2.5D and 3D packaging lines to capture the excess demand.</p><h2>The Unsung Hero: ABF Substrates</h2><p>Beneath that silicon interposer sits the organic substrate. It&#8217;s the unsung hero routing power and data to the server motherboard. Right now, this space is dominated by <strong>Ajinomoto Build-up Film (ABF) substrates</strong>.</p><p>The ABF substrate market was valued at roughly $4.9 billion in 2024 and is expected to reach nearly $10 billion by 2031 (a solid 10% CAGR). It&#8217;s an incredibly consolidated niche, with the top five Asian manufacturers controlling 74% of the global market.</p><p><strong>But ABF has a fatal flaw: the &#8220;warpage wall&#8221;.</strong> As AI packages swell to massive sizes (some exceeding 100mm x 100mm to accommodate more HBM stacks), the intense heat generated by computing causes the organic ABF resin to warp, cracking the microscopic solder connections.</p><h2>The Glass Core Revolution</h2><p>To break the warpage wall, the industry is transitioning to glass. Glass core substrates are the next physical frontier of advanced packaging. </p><div class="pullquote"><p><strong>Because glass has a coefficient of thermal expansion that nearly matches silicon, it stays perfectly flat under extreme heat.</strong> </p></div><p>It also acts as a superior insulator, supporting sub-2-micrometer wiring. This density is crucial for next-generation architectures like Nvidia&#8217;s rumored Rubin, which could require over 50,000 I/O connections to manage its blistering memory bandwidth.</p><p>Intel has poured over $1 billion into a glass R&amp;D line in Arizona, pushing to bring glass substrates into high-volume manufacturing between 2026 and 2030. Meanwhile, TSMC recently outlined a 2-to-3-year timeline for its own glass-based CoPoS technology, proving that the entire ecosystem is gearing up for this material shift.</p><h3>Where do we go from here</h3><p>Moore&#8217;s Law is slowing, making it economically punishing to print massive monolithic chips. <strong>The solution is slicing silicon into chiplets and stitching them back together.</strong> As the logic itself becomes exponentially more complex, the interposers, ABF films, and glass cores that bind it all together capture progressively more of the margin.</p><p>In the AI gold rush, TSMC, Intel, and the OSATs are selling the pickaxes. And they&#8217;re building the toll roads too.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Memory Is Not One Trade]]></title><description><![CDATA[What are the chokepoints along the memory supply chain?]]></description><link>https://www.infrastartups.com/p/memory-is-not-one-trade</link><guid isPermaLink="false">https://www.infrastartups.com/p/memory-is-not-one-trade</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Wed, 10 Jun 2026 20:22:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!z6-i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The obvious way to invest in high-bandwidth memory is to buy the DRAM suppliers. That&#8217;s not wrong. </p><p>High bandwidth memory (HBM) is effectively a three-company market: SK Hynix, Samsung, and Micron. </p><p>Depending on the quarter, those three companies control more than 95% of global DRAM production. In HBM specifically, SK Hynix has emerged as the early leader, Micron is ramping aggressively, and Samsung is investing heavily to regain share.</p><p>The numbers explain why investors are paying attention. Industry analysts estimate the HBM market was roughly $4&#8211;5 billion in 2023 and may approach $30&#8211;40 billion annually before the decade ends. Few semiconductor categories can grow at that pace.</p><p>But the better question is: What is the supply chain trade here?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z6-i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z6-i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 424w, https://substackcdn.com/image/fetch/$s_!z6-i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 848w, https://substackcdn.com/image/fetch/$s_!z6-i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 1272w, https://substackcdn.com/image/fetch/$s_!z6-i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z6-i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png" width="1456" height="799" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:799,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:870200,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/201506317?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z6-i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 424w, https://substackcdn.com/image/fetch/$s_!z6-i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 848w, https://substackcdn.com/image/fetch/$s_!z6-i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 1272w, https://substackcdn.com/image/fetch/$s_!z6-i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c13d37f-e88b-4c52-84e8-3d0aff7db7ff_1693x929.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s look at the progression. Nvidia&#8217;s H100 shipped with 80GB of HBM3 and roughly 3.35TB/s of memory bandwidth. The H200 jumped to 141GB of HBM3e and 4.8TB/s. Nvidia&#8217;s Blackwell Ultra platform pushes to 288GB of HBM and approximately 8TB/s of bandwidth.</p><div class="callout-block" data-callout="true"><p><em><strong>This is a 3.6x increase in memory capacity and a 2.4x increase in bandwidth in just a few product generations.</strong></em></p></div><p>The economics are equally striking. Industry estimates suggest <strong>HBM can account for 20&#8211;30% of the bill of materials</strong> of a leading AI accelerator. A single HBM stack can cost hundreds of dollars. A fully populated AI GPU package may contain thousands of dollars of memory.</p><p>The reason is simple. AI workloads are increasingly constrained by data movement.</p><p>GPUs can perform mathematical computations. The challenge is feeding them tokens, weights, activations, embeddings, and KV cache data fast enough to keep utilization high. Every increase in model size, context window, inference volume, and agent activity pushes against the same bottleneck: memory bandwidth.</p><p>HBM is the toll booth. And the toll booth is becoming more valuable every year. So companies like Micron, Samsung, and SK Hynix matter a lot.</p><p>Micron expects HBM revenue to exceed several billion dollars annually within the next few years. SK Hynix has reportedly secured a dominant share of Nvidia&#8217;s HBM supply and has publicly discussed selling out much of its HBM capacity well in advance. Samsung remains one of the largest memory manufacturers on earth, generating tens of billions of dollars annually from memory products and investing aggressively to close the gap.</p><div class="pullquote"><p><strong>These 3 companies are the first-order beneficiaries of the AI memory bottleneck. But HBM is not just DRAM stacked vertically.</strong></p></div><p>Modern HBM stacks contain up to 12 memory dies connected through thousands of through-silicon vias (TSVs). Manufacturing requires wafer thinning, precision bonding, advanced packaging, thermal management, inspection, testing, and yield optimization.</p><p>A defect in one layer can compromise the entire stack. And even a perfect HBM stack is worthless if it cannot be integrated with the GPU. That pulls in the rest of the supply chain.</p><p><strong>Applied Materials, Lam Research, and KLA</strong> sit upstream in deposition, etch, metrology, and inspection. These companies collectively generate more than $80 billion in annual revenue because semiconductor manufacturing increasingly depends on process precision. HBM amplifies that requirement.</p><p>As stack heights increase from 8-high to 12-high and beyond, defect costs rise dramatically. Yield becomes more valuable. Inspection becomes more valuable. Process control becomes more valuable.</p><p><strong>Teradyne and Cohu benefit because memory testing becomes harder</strong> as bandwidth, density, and thermal loads increase. <strong>FormFactor</strong> matters because advanced probe cards are required to validate increasingly complex memory devices before packaging.</p><p><strong>Aehr Test Systems</strong> is interesting because burn-in and reliability testing become more important when a single AI accelerator can cost $30,000&#8211;$40,000 and a rack can exceed $1 million.</p><p><strong>Then there is packaging.</strong> This may be the most underappreciated bottleneck in AI infra. TSMC&#8217;s CoWoS advanced packaging technology has become one of the critical constraints in AI accelerator production. Industry estimates suggest CoWoS capacity has expanded several-fold since 2023, yet demand continues to outstrip supply. <strong>Amkor sits directly in this trend.</strong></p><p>Advanced packaging is no longer a low-margin back-end service. It is becoming one of the highest-value manufacturing steps in the AI supply chain. Without packaging capacity, GPUs cannot ship regardless of how many wafers are produced.</p><div class="callout-block" data-callout="true"><p><em><strong>This is the key observation here: When a bottleneck becomes strategic, value spreads sideways.</strong></em></p></div><p>The market starts with: &#8220;Who sells HBM?&#8221;</p><p>Then it moves to: &#8220;Who enables HBM yield?&#8221;</p><p>Then: &#8220;Who tests it?&#8221;</p><p>Then: &#8220;Who packages it?&#8221;</p><p>Then: &#8220;Who supplies the tools, materials, substrates, and inspection systems?&#8221;</p><p>This is how a single bottleneck becomes an ecosystem. For VCs, the takeaway is even more important.</p><p>Should we invest in the next DRAM company? Perhaps not. That game requires tens of billions of dollars of capital expenditure and decades of process expertise.</p><p>Instead, build around the constraint. The opportunities are in yield analytics, thermal simulation, packaging design automation, memory-aware compilers, HBM test optimization, substrate inspection, supply-chain software, and infra tools that reduce memory pressure.</p><p><strong>The most interesting startup wedge may be software that makes scarce HBM go further. </strong>A modern AI server can contain hundreds of gigabytes of HBM costing tens of thousands of dollars. Every percentage point of utilization matters.</p><ul><li><p>Better KV-cache management</p></li><li><p>Smarter memory scheduling</p></li><li><p>Compression techniques that preserve model quality</p></li><li><p>Inference routing based on memory profiles</p></li><li><p>Profiling tools that identify wasted bandwidth</p></li></ul><p>These solutions can create enormous value because they improve utilization of one of the most expensive resources in the AI stack.</p><p><strong>HBM is becoming the scarce resource inside AI infra. But scarcity never stops at one component.</strong></p><p>It propagates into every tool, process, machine, material, workflow, and software layer required to manufacture, test, package, and use that component efficiently.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[The AI Bottleneck Stack]]></title><description><![CDATA[The next phase of AI will be shaped by physical and technical bottlenecks]]></description><link>https://www.infrastartups.com/p/the-ai-bottleneck-stack</link><guid isPermaLink="false">https://www.infrastartups.com/p/the-ai-bottleneck-stack</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Tue, 09 Jun 2026 16:00:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dJSC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The AI stack is usually described in software terms: models, data, agents, applications, evals, and workflows. But that framing misses the deeper story.</p><p>The next phase of AI will be shaped by physical and technical bottlenecks. Compute is only the first layer. Once you add more GPUs, you expose memory limits. Once you add more memory, you expose packaging limits. Once you scale clusters from thousands to tens of thousands of accelerators, you expose networking limits. </p><p>Once networking speeds move from 400G to 800G and eventually 1.6T, copper starts breaking down and optics become more important. Once racks move from 30&#8211;50 kW to 100&#8211;150 kW and beyond, power delivery and cooling become binding constraints. Once data centers scale from hundreds of megawatts to gigawatt campuses, the bottleneck leaves the building and moves into the grid.</p><p>This is the AI bottleneck stack:</p><p><strong>Compute &#8594; memory &#8594; packaging &#8594; networking &#8594; photonics &#8594; power &#8594; cooling &#8594; grid &#8594; materials</strong></p><p>The important point is that bottlenecks move. Solving one constraint reveals the next one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dJSC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dJSC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!dJSC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!dJSC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!dJSC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dJSC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png" width="1200" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:943894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/200919586?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dJSC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!dJSC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!dJSC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!dJSC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F405f15b6-88b3-4f69-9611-280356b8c209_1200x670.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>The first wave of AI infra was compute</strong></h3><p>The world needed GPUs, accelerators, and training clusters. Nvidia became the obvious winner because it controlled the most scarce layer: high-performance AI compute plus the software ecosystem around it.</p><p>The scale is unprecedented. Frontier AI training runs now consume tens of thousands of GPUs. Some estimates suggest the largest training clusters will exceed 100,000 accelerators within the next few years. Capital expenditures from hyperscalers are expected to exceed $250 billion annually, with a growing percentage directed toward AI infrastructure. Individual AI data center campuses are increasingly planned at 500 MW to 1 GW scale, representing investments of $30 billion to $100 billion per site.</p><h3><strong>Next is memory</strong></h3><p>Compute alone is not enough. A GPU is only useful if it can access data quickly. That pushes pressure into HBM and memory bandwidth. Modern AI systems are not just compute-hungry. <strong>But they&#8217;re memory-hungry too.</strong></p><p>A single high-end AI accelerator can require hundreds of gigabytes per second of memory bandwidth. HBM stacks now contain dozens of DRAM dies and deliver bandwidth measured in terabytes per second. Demand for HBM has grown so quickly that supply constraints have become one of the most important bottlenecks in the semiconductor industry. </p><p>Long-context models, retrieval systems, inference batching, agent workflows, and KV cache storage all increase pressure on memory capacity and bandwidth.</p><h3><strong>Then comes packaging</strong></h3><p>HBM has to sit physically close to the accelerator. That requires advanced packaging, interposers, substrates, and extremely precise integration. This is why CoWoS-like capacity, ABF substrates, glass core substrates, OSATs, inspection systems, and test equipment matter.</p><p>Advanced packaging has become one of the fastest-growing segments of semiconductor manufacturing. Industry forecasts suggest advanced packaging spending could exceed $80 billion annually by the end of the decade. The chip is no longer the product. The package is becoming the system.</p><h3><strong>At cluster scale, the next bottleneck is data movement</strong></h3><p>If thousands of accelerators cannot communicate efficiently, expensive compute sits idle. Training large models requires moving enormous amounts of data between GPUs. Every percentage point of utilization matters when clusters cost billions of dollars.</p><p>Some estimates frame networking as roughly 19&#8211;23% of AI data center capex. Approx $6-8 billion per gigawatt of AI capacity. In a 1 GW AI campus costing $40-60 billion, networking alone can represent a multi-billion-dollar infra layer.</p><p>This is where Ethernet switching, optical modules, retimers, DSPs, active electrical cables, and silicon photonics become important.</p><p>Network speeds have already moved from 100G to 400G and 800G. The industry is now preparing for 1.6T optical links. As bandwidth requirements increase, power consumption and signal integrity become major engineering challenges. The economics increasingly favor optical interconnects over traditional copper connections.</p><p>This is why companies such as Broadcom, Arista, Marvell, Astera Labs, Credo, Coherent, Lumentum, and Applied Optoelectronics sit directly inside the AI scaling problem rather than merely adjacent to it.</p><h3><strong>The next bottleneck is photonics</strong></h3><p>As clusters grow beyond tens of thousands of accelerators, moving data efficiently becomes as important as generating compute. Silicon photonics promises lower latency, higher bandwidth density, and lower power consumption compared with traditional electrical interconnects. Industry forecasts suggest the optical networking market tied to AI infra could grow into a tens-of-billions-of-dollars annual opportunity over the next decade.</p><h3><strong>For startups, this creates a large opening.</strong></h3><p>The best startup opportunities are not generic &#8220;AI infra&#8221; tools. They are wedge products that remove specific constraints.</p><p>Examples:</p><ul><li><p>Software that improves GPU utilization from 50&#8211;60% toward 80&#8211;90%</p></li><li><p>Memory systems that reduce KV cache costs by 30&#8211;50%</p></li><li><p>Compilers that optimize communication patterns across thousands of accelerators</p></li><li><p>Networking observability for clusters containing 10,000&#8211;100,000 GPUs</p></li><li><p>Photonic interconnect components and silicon photonics tooling</p></li><li><p>Thermal simulation and liquid cooling control software</p></li><li><p>Power management software for high-density racks</p></li><li><p>Data center automation for energy, cooling, and failure prediction</p></li><li><p>Materials, substrates, and manufacturing tools for packaging and optics</p></li><li><p>Grid optimization software for large-scale AI campuses</p></li><li><p>Transformer, switchgear, and power electronics monitoring systems</p></li></ul><h3><strong>The power layer may be the most underrated</strong></h3><p>Traditional enterprise racks consumed roughly 5&#8211;15 kW. Modern AI racks are moving toward 100&#8211;150 kW, with some future designs targeting 300 kW or more. A single AI data center can require as much electricity as a medium-sized city.</p><p>This changes power conversion, distribution, backup systems, and cooling design. Nvidia&#8217;s discussions around 800 VDC architectures point toward a future where the rack becomes an electrical system, not just a server enclosure.</p><p>Power infra spending is becoming a major component of AI deployment. Transformers, switchgear, UPS systems, generators, batteries, and power electronics are all becoming strategic assets rather than commodity purchases.</p><h3><strong>Then comes cooling</strong></h3><p>Air cooling becomes increasingly difficult as rack densities rise. Liquid cooling, direct-to-chip cooling, immersion cooling, heat exchangers, pumps, and thermal management software are becoming critical infra categories. </p><p>Cooling can represent 20&#8211;40% of total data center energy consumption depending on architecture and climate.</p><h3><strong>Then the bottleneck moves outside the data center</strong></h3><p>Transformers, substations, transmission lines, interconnection queues, turbines, batteries, fuel cells, and firm power contracts become part of the AI stack.</p><p>Many utility interconnection queues already stretch several years into the future. New transmission projects can take 5&#8211;10 years to complete. Large gas turbines often have multi-year lead times. In some regions, obtaining power has become more difficult than obtaining GPUs.</p><p>The scale is enormous. A 1 GW AI campus consumes roughly 8.76 terawatt-hours of electricity annually if fully utilized. Multiple technology companies are already discussing multi-gigawatt AI infrastructure plans. A 5 GW deployment would consume electricity comparable to that of a small country.</p><h3><strong>Finally the bottleneck reaches materials</strong></h3><p>Copper, optical fiber, rare earth elements, advanced substrates, specialty chemicals, semiconductor equipment, cooling fluids, and power electronics materials all become strategic inputs. Every layer of the stack ultimately depends on physical supply chains.</p><p>This is the real infra map for AI. <strong>A chain of constraints, each one enabling the next stage of scale.</strong></p><p>The next decade of AI will likely involve hundreds of billions of dollars in infra spending, thousands of megawatts of new capacity, and entirely new industries built around removing bottlenecks.</p><p>The best startups will be built where the constraint is painful, technical, unavoidable, and expensive enough that customers will pay immediately.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Buy Scarcity, Not AI Exposure]]></title><description><![CDATA[AI demand concentrates into bottlenecks instead of flowing evenly across the supply chain]]></description><link>https://www.infrastartups.com/p/buy-scarcity-not-ai-exposure</link><guid isPermaLink="false">https://www.infrastartups.com/p/buy-scarcity-not-ai-exposure</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Sat, 06 Jun 2026 17:56:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!a85V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The framing &#8220;Which companies have AI exposure?&#8221; is too broad to be useful. Every software company now claims AI exposure. Every chip company has an AI story. Every utility, REIT, industrial supplier, and cable manufacturer can find a way to mention data centers on an earnings call.</p><p>The better question is narrower:</p><p><strong>Where is AI demand hitting a physical constraint that supply cannot quickly solve?</strong></p><p>That is where the real trade lives. In the bottlenecks. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a85V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a85V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!a85V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!a85V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!a85V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a85V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png" width="1200" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1346783,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/200917535?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a85V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!a85V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!a85V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!a85V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c58791f-bb88-4304-8beb-78e54ed212c1_1200x670.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The AI ecosystem is no longer defined by algorithms alone. It is increasingly defined by bottlenecks.</p><p>AI demand does not spread evenly across the technology stack. It concentrates. First it concentrated into GPUs. Then memory bandwidth. Then advanced packaging. Then networking. Then optical interconnects. Then power delivery. Then cooling. Then grid interconnection. Eventually it reaches transformers, generation capacity, critical materials, and even land with access to power.</p><p>The most important developments in AI often emerge not from what is abundant, but from what is scarce.</p><p>A bottleneck is not simply &#8220;a thing AI uses&#8221;. That definition is too loose. A real bottleneck has specific characteristics:</p><ul><li><p>Demand is growing faster than supply</p></li><li><p>Capacity takes years to expand (as opposed to weeks or months)</p></li><li><p>There are few viable alternatives</p></li><li><p>The layer is unavoidable for system performance</p></li><li><p>Existing solutions begin to show limits</p></li><li><p>Customers become willing to pay a premium for relief</p></li><li><p>New technologies emerge specifically to address the constraint</p></li></ul><p>This is why thinking in terms of &#8220;AI exposure&#8221; misses the point.</p><p>Many technologies can find a way to &#8220;participate&#8221; in AI, but that&#8217;s not the point. The critical question is whether frontier AI systems can continue scaling without them.</p><p>If GPUs cannot access memory fast enough, memory bandwidth becomes the bottleneck.</p><p>If memory cannot be integrated efficiently with accelerators, advanced packaging becomes the bottleneck.</p><p>If clusters cannot move data quickly enough, networking becomes the bottleneck.</p><p>If electrical interconnects become too power-hungry or inefficient, optics becomes the bottleneck.</p><p>If racks become too dense, cooling becomes the bottleneck.</p><p>If data centers cannot secure enough power, the bottleneck shifts to transformers, substations, transmission infrastructure, and generation capacity.</p><p>The constraint keeps moving.</p><p>This is why next set of breakout AI startups might not be foundation model companies. They will be the ones building solutions to these bottlenecks.</p><p>Some are developing new memory architectures. Others are working on photonics, optical networking, advanced cooling systems, power management, chiplet interconnects, packaging technologies, or software that improves hardware utilization. Entire categories of startups exist because a specific layer of the stack has become constrained.</p><p>Historically, major technology waves create new bottlenecks as they scale. The internet created networking bottlenecks. Mobile computing created battery and semiconductor bottlenecks. Cloud computing created data center bottlenecks.</p><p>AI is doing the same thing, but at a much larger scale. The key insight is that bottlenecks are dynamic. Solving one often reveals another.</p><p>A breakthrough in packaging may shift pressure to memory. A breakthrough in memory may shift pressure to networking. Better networking may expose limitations in power delivery. More power may create new cooling challenges.</p><p>As a result, the most valuable technologies are often those that remove a constraint from the system.</p><p>Sometimes those technologies come from established suppliers. Sometimes they come from startups that are still largely unknown. In both cases, the underlying logic is the same: value accrues to the layer that enables the next stage of scaling.</p><p>This perspective also changes how we think about technological progress. Instead of asking which AI model is best, ask what prevents the next generation of models from existing. Instead of asking which company mentions AI most often, ask which layer of the stack is becoming unavoidable.</p><p>By the time everyone agrees a bottleneck is critical, much of the opportunity has already been recognized. The challenge is identifying the next constraint before it becomes obvious.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Mapping The Critical Token Path]]></title><description><![CDATA[Separating defensible structural moats from easily bypassed tooling]]></description><link>https://www.infrastartups.com/p/mapping-the-critical-token-path</link><guid isPermaLink="false">https://www.infrastartups.com/p/mapping-the-critical-token-path</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Wed, 06 May 2026 15:31:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JCk0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the AI buildout, everyone wants to be infrastructure. But most &#8220;infrastructure&#8221; is just tooling.</p><p>I used to think about this mostly through inference: do the tokens pass through your node before the user gets an answer? That is still a useful test. If every live request flows through your gateway, runtime, router, policy layer, or serving stack, you are in a strong position. </p><p>But that definition is too narrow.</p><p>Because without training, there is no inference. Without post-training, the base model is not useful. Without RL environments, agents do not learn to recover from mistakes. Without memory, retrieval, caching, verification, and execution, the final answer is often too slow, too expensive, too shallow, or too unreliable.</p><p>So here&#8217;s how I think about the critical token path:</p><p><strong>The critical token path is the lifecycle that creates, shapes, serves, remembers, verifies, and acts on tokens.</strong></p><p>The strongest AI infrastructure companies are load-bearing somewhere in that lifecycle. If the system routes around them, capability creation breaks, model quality degrades, inference cost spikes, reliability drops, or production delivery fails.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JCk0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JCk0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JCk0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JCk0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JCk0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JCk0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1974436,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/196614847?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JCk0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!JCk0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!JCk0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!JCk0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F788bb249-66db-4460-af0e-a9cf2c5d0605_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Simple Definition</h2><p>You are on the critical token path if the AI system cannot be trained, improved, served, verified, or economically operated without touching your system.</p><p>It goes beyond the live API request between the user and the model. It includes the upstream systems that create the model, the post-training systems that shape the model, the runtime systems that serve the model, and the verification systems that decide whether the model&#8217;s work can be trusted.</p><p>The question is dependency.</p><p>If removing you means the model cannot train, cannot improve, cannot run agents, cannot pass evals, cannot serve reliably, or cannot operate at acceptable cost, you are on the path.</p><p>If removing you only means a dashboard disappears, you are probably not.</p><h2>Where The Tokens Actually Flow</h2><p>The inference path is still the cleanest version of this framework.</p><p>Every generated token consumes compute. That makes GPUs and accelerators the obvious tollbooth: NVIDIA, AMD, TPUs, Trainium, Inferentia, and whatever specialized inference hardware comes next.</p><p>Above hardware, inference runtimes like vLLM, TensorRT-LLM, SGLang, and TGI turn raw compute into usable answers. They handle batching, memory management, KV cache optimization, quantization, scheduling, and streaming. Inference is not just &#8220;run the model&#8221;. It is the system that makes generation fast and economically viable.</p><p>Then come serving layers, gateways, and routers: Together, Fireworks, Baseten, Replicate, Modal, CoreWeave, OpenRouter. If the application calls your endpoint before it gets a model response, you are in the path.</p><p>Observability, tracing, safety, and policy become critical when they sit live inside production. A tracing proxy that sees prompts, completions, latency, cost, and errors is much closer to the token path than a dashboard after the fact. A policy gateway that every enterprise prompt and response must pass through becomes production infrastructure.</p><p>But inference is only one part of the map.</p><h2>The Pre-Training Path</h2><p>Before a model can generate tokens, it has to absorb tokens.</p><p>That makes pre-training infrastructure part of the critical token path. Data pipelines, data quality systems, deduplication, filtering, distributed training, GPU clusters, networking, storage, orchestration, checkpointing, and failure recovery all matter.</p><p>This is the first token factory.</p><p>The model does not emerge from the API layer. It emerges from an industrial process: collect the data, clean the data, move the data, shard the data, train across massive clusters, keep the run alive, recover from failures, save checkpoints, and push the model toward capability.</p><p>If you sit inside that process, you can be deeply load-bearing. The live user request may never touch you, but the model&#8217;s capability depends on you.</p><p>That is why training infrastructure should not be treated as separate from the token path. It is the upstream path that creates the model capable of producing tokens in the first place.</p><h2>The Post-Training Path</h2><p>Pre-training creates the base model. Post-training turns it into a useful product.</p><p>Instruction tuning, preference data, RLHF, RLAIF, reward models, eval harnesses, synthetic data, domain-specific task data, and alignment pipelines all live here. This is where raw capability becomes behavior.</p><p>For many frontier systems, this may be where the real product quality emerges. The base model has broad knowledge, but post-training teaches it how to answer, refuse, reason, use tools, follow instructions, and perform workflows.</p><p>This means post-training infrastructure is token-shaping infrastructure.</p><p>If a lab depends on your data, reward signal, eval harness, or feedback loop to produce better model behavior, you are on the path. The tokens may not pass through you during live inference, but the quality of those tokens was shaped by you.</p><p>That is a real dependency.</p><h2>The RL Environment Path</h2><p>RL environments deserve their own category.</p><p>If agents are going to become genuinely useful, they need places to practice. They need tasks, state, tools, rewards, failures, retries, and objective feedback. They need environments where they can make mistakes and learn from recovery.</p><p>That is what RL environments provide.</p><p>A good RL environment is a world. The model enters, acts, observes state, receives reward, and generates trajectories. Those trajectories become training signal. Over time, the model learns how to solve tasks, not just describe solutions.</p><p>This is especially important for coding agents, browser agents, robotics agents, data analysis agents, chip design agents, scientific agents, and enterprise workflow agents. In each case, the lab needs high-quality environments that produce verifiable feedback.</p><p>If RL environments become the way frontier labs generate scarce post-training data for agents, they become part of the critical token path.</p><p>Not because the final inference request flows through them, but because the agent&#8217;s ability to act was trained inside them.</p><h2>Core dependency path</h2><p>Some systems are not always in the direct token stream, but you can generate tokens without them.</p><p>KV cache is the best example. Generation is memory-bound. At scale, caching determines latency, throughput, and cost. A KV cache layer may not be the visible API endpoint. But if it lets the system serve more tokens with less hardware, it becomes a serious infrastructure layer.</p><p>Databases and retrieval systems are similar. They may not produce the final token, but they supply context. In retrieval-heavy workflows, the answer quality depends on what gets pulled into the prompt.</p><p>Agent sandboxes sit in the action path. If the model has to execute code, browse the web, manipulate files, call APIs, or operate inside a secure workspace, the sandbox becomes part of the broader production loop.</p><p>They may carry memory, context, state, tools, or execution rather than the final output token. But the token still depends on them.</p><h2>The Weak Version Of AI Infra</h2><p>The weak version is anything that does not create dependency.</p><p>A prompt library, generic copilot UI, lightweight dashboard, or shallow workflow wrapper can be useful. But if the model provider can absorb it, the cloud can bundle it, or the system of record can recreate it, the moat is thin.</p><p>The question is: <strong>does the system depend on it?</strong></p><p>Can the model train without you? Can it improve without you? Can it serve without you? Can it remember without you? Can it verify without you? Can it act without you?</p><p>If the answer is yes across the board, you are not on the critical path.</p><p>You are near the action, but you are not load-bearing.</p><h2>The Physical Token Path</h2><p>The token path is not just software. It is also physical infrastructure.</p><p>Large-scale training and inference depend on networking, interconnect, optics, switches, memory bandwidth, storage, power, cooling, and rack-scale architecture. Before a token appears as text to the user, the computation may have crossed thousands of GPUs and moved across a dense physical fabric inside the data center.</p><p>This is why companies like Broadcom and Arista matter in the AI stack. They are in the physical path of AI computation.</p><p>The token looks like language at the surface. But underneath, it is electrons, photons, heat, and capital equipment.</p><p>As AI scales, the physical path becomes more important. More training means more clusters. More inference means more data centers. More reasoning means more compute per answer. More agents mean more long-running workloads. The software token path pulls the physical token path behind it.</p><h2>The Next Token Path Is Reasoning</h2><p>The old inference path was simple: prompt in, tokens out.</p><p>The new path is more complicated: prompt in, search, plan, write code, run tests, call tools, verify, retry, and then answer.</p><p>This is inference-time compute. The model thinks before responding.</p><p>As this becomes common, the critical token path expands again. If the model has to run code before answering, the execution environment matters. If it has to prove something, the theorem prover matters. If it has to query a simulator, the simulator matters. If it has to guarantee correctness, the verifier matters.</p><p>This is why verifiable reasoning environments are so interesting. They can become runtime infrastructure.</p><p>The model generates candidate reasoning. The verifier checks it. The system only returns the answer if it passes.</p><p>That is a new path: generation plus verification.</p><h2>Compute Brokerage As An Economic Tollbooth</h2><p>There is also an economic version of the critical path: compute brokerage.</p><p>If a platform controls GPU provisioning, scheduling, usage, reliability, and billing, it becomes a tollbooth. The token may ultimately be generated on someone else&#8217;s hardware, but the transaction flows through the broker.</p><p>RunPod is one example of this pattern.</p><p>The user does not care where the GPU physically lives. They care that the workload runs, scales, and bills correctly. If the platform aggregates supply, demand, provisioning, developer experience, reliability, and payment, it can become a durable control point.</p><p>This may not always be the deepest technical moat, but it can still be a meaningful infrastructure position.</p><h2>Edge Tokens Will Have Their Own Path</h2><p>Not all tokens will flow through hyperscaler clouds.</p><p>Some will be generated on phones, laptops, cars, robots, factories, medical devices, and secure enterprise environments. For these workloads, the critical path shifts from cloud serving infrastructure to local runtimes and local hardware.</p><p>This is where llama.cpp, MLX, and other local runtimes matter. They are the vLLM of the edge. They define how models are quantized, memory-mapped, executed, and streamed on constrained devices.</p><p>The token path will fragment. Some tokens will flow through frontier APIs. Some through enterprise gateways. Some through open model serving platforms. Some through local devices. Some through hybrid systems that decide dynamically where inference should run.</p><p>The question remains the same: where must the token go before the system works?</p><h2>The Investor Litmus Test</h2><p>When evaluating an AI infrastructure company, start with one question:</p><p><strong>What happens if the token bypasses you?</strong></p><p>Then expand the question across the lifecycle.</p><p>What happens if the training run bypasses you? What happens if post-training bypasses you? What happens if the RL loop bypasses you? What happens if retrieval bypasses you? What happens if the cache bypasses you? What happens if the verifier bypasses you? What happens if the serving layer bypasses you?</p><p>Does production break, quality degrade, cost spike, or reliability collapse?</p><p>Or does a dashboard disappear?</p><p>That difference is the company.</p><p>The best AI infrastructure companies become more important as models improve. More tokens mean more compute. More training means more data and clusters. More agents mean more environments and sandboxes. More reasoning means more verification. More enterprise deployment means more gateways, policy, and observability. More local inference means more edge runtimes.</p><p>They benefit from model progress. They do not fight it.</p><h2>The Moat Is Rerouting Cost</h2><p>The critical token path is ultimately a switching cost framework.</p><p>If replacing you requires changing a dashboard, the moat is weak. If it requires changing application code, the moat is better. If it requires changing serving infrastructure, the moat is stronger. If it requires changing training pipelines, reward loops, runtime assumptions, memory architecture, compliance policy, or data center wiring, the moat is very strong.</p><p>The best infrastructure companies are load-bearing.</p><p>They help create tokens, shape tokens, serve tokens, remember tokens, verify tokens, or act on tokens.</p><p>And rerouting around them is painful.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Agents Need Worlds: Building Verifiable Environments From Scratch]]></title><description><![CDATA[A local-first experiment in turning agent actions, failures, and recoveries into replayable training signal]]></description><link>https://www.infrastartups.com/p/agents-need-worlds-building-verifiable</link><guid isPermaLink="false">https://www.infrastartups.com/p/agents-need-worlds-building-verifiable</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Wed, 29 Apr 2026 18:05:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RTl7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you want to train an AI agent, a prompt is not enough. A prompt gives the agent an instruction. But real work doesn&#8217;t happen inside an instruction. </p><p>Real work happens inside an environment. There is state. There are tools. There are constraints. There are consequences. You try something, observe what changed, recover from mistakes, and eventually reach a useful outcome.</p><p>I wanted to see what it takes to build an environment from scratch. And that&#8217;s the premise behind <strong>pworlds</strong>, a tool I built to explore this simple thesis.</p><p><strong>So what is pworlds</strong>? It&#8217;s a collection of verifiable task environments for agents. It is not a chatbot wrapper, a benchmark suite, or a generic RL framework. It is closer to environment engineering for agent training. The goal is to create small executable worlds where agents can act, receive objective feedback, generate traces, and export training signal.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RTl7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RTl7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 424w, https://substackcdn.com/image/fetch/$s_!RTl7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 848w, https://substackcdn.com/image/fetch/$s_!RTl7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 1272w, https://substackcdn.com/image/fetch/$s_!RTl7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RTl7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png" width="1000" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:995860,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/195898394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RTl7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 424w, https://substackcdn.com/image/fetch/$s_!RTl7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 848w, https://substackcdn.com/image/fetch/$s_!RTl7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 1272w, https://substackcdn.com/image/fetch/$s_!RTl7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98e5d46e-34a1-4571-8d4e-ffc3d1f3d567_1000x563.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Problem With Final Answers</h2><p>Most AI evaluation still over-focuses on final answers.</p><p>Did the model solve the problem?<br>Did it pass the test?<br>Did it produce the right output?</p><p>Those questions matter, but they miss the messy middle.</p><p>In real work, the process is often more informative than the answer. What did the agent try first? What failed? What changed? What feedback did it receive? How did it recover?</p><p>That is the kind of signal I wanted pworlds to capture.</p><p>A world is not just a question with an answer. A world has state. A world accepts actions. A world changes when those actions are taken. A world can grade whether progress was made. And if designed correctly, a world can record the full path from initial state to successful outcome.</p><p>That path is where the valuable data lives.</p><h2>The Constraint: Keep It Local And Small</h2><p>From the beginning, I forced the project to stay small.</p><p>The first version had to be local-only. No GPU requirement. No model training. No OpenAI integration. No Anthropic integration. No cloud services. No database. No Docker. No plugin-heavy architecture.</p><p>The abstractions had to stay clean, but small. The priority was CLI usability, tests, determinism, and extensibility.</p><p>Those constraints were not arbitrary. They prevented the project from prematurely becoming a platform.</p><p>Before building distributed infra, I wanted to prove the runtime pattern:</p><p>Can we make a task executable?<br>Can we expose state?<br>Can we accept actions?<br>Can we compute rewards?<br>Can we record traces?<br>Can we replay a trajectory?<br>Can we export the result as training signal?</p><p>That was the core question.</p><h2>psignal: The Runtime Substrate</h2><p>The first package I built was <strong>psignal</strong>.</p><p>psignal is the shared substrate underneath pworlds. Its job is to turn executable tasks into training signal.</p><p>The loop is straightforward:</p><p><strong>task &#8594; observe &#8594; action &#8594; transition &#8594; reward &#8594; trace &#8594; replay &#8594; export</strong></p><p>Instead of hiding state in a database or service, psignal makes every task a visible local artifact on disk. When a task is created, the system generates files like <code>psignal.yaml</code>, <code>state.json</code>, <code>trace.jsonl</code>, and a README.</p><p>The metadata lives in the YAML file. The current state lives in JSON. The trajectory is appended line by line into JSONL.</p><p>That file-backed design ended up being one of the most important decisions in the project. It made everything inspectable. You can open the folder and see what happened. It made the system debuggable, portable, git-friendly, and easy to explain.</p><h2>The Smallest Possible World</h2><p>The first built-in environment was intentionally tiny: a counter.</p><p>The task starts at zero. The goal is to reach five. The valid actions are <code>+1</code>, <code>-1</code>, and <code>reset</code>.</p><p>If the task reaches the goal, it gets a positive reward. If it takes a valid but incomplete step, it receives a small penalty. If it takes an invalid action, it receives a larger penalty.</p><p>That may sound too simple, but that was exactly why it was useful.</p><p>The counter is the smallest possible environment that still demonstrates the full runtime pattern. There is persistent state. There are explicit actions. There are valid and invalid moves. There is an objective success condition. There is a trace. There is replay. There is export.</p><p>Before proving the system could handle complex work, I wanted to prove that the loop itself was clean.</p><h2>plemma: A Theorem-Proving World</h2><p>Once psignal worked, the next question was whether this substrate could support something more meaningful than a toy counter.</p><p>That led to <strong>plemma</strong>, the first real world built on top of psignal.</p><p>plemma is a theorem-proving world. I chose theorem proving because it is a high-signal domain for objective agent interaction. A proof state is explicit. Actions are discrete. Some moves are valid. Some are invalid. Success is not a matter of taste. Either the proof completes or it does not.</p><p>The long-term direction is Lean-style theorem proving, but the first version deliberately avoided full Lean integration. Instead, plemma used a simulated tactic environment and failed gracefully if a real Lean toolchain was missing.</p><p>The initial theorem was simple: prove identity. The valid path was <code>intro h</code>, then <code>exact h</code>.</p><p>That was enough to make the abstraction legible.</p><p>In the counter world, actions manipulate integer state. In the theorem world, actions manipulate proof state. Same runtime pattern, different domain.</p><p>This was the first important proof point. Now pworlds went from just being a counter demo to representing symbolic work.</p><h2>pspec: A Software Engineering World</h2><p>But theorem proving is still niche, so I wanted a second world that would be more broadly understandable.</p><p>That became <strong>pspec</strong>, a software engineering world.</p><p>pspec is a coding-task environment. The first version used a small buggy FizzBuzz function. The task had a source file, tests, metadata, state, trace, and reward function.</p><p>The workflow was simple:</p><p>Inspect the code.<br>Edit the file.<br>Run the tests.<br>Receive feedback.<br>Record the step.<br>Try again.</p><p>This is much closer to how real software work happens. You do not jump from broken code to final patch. You inspect, edit, run tests, fail, edit again, and eventually pass.</p><p>The intermediate attempts are not noise. They are the process. And the process is the training signal.</p><h2>The Important Distinction In pspec</h2><p>Building pspec revealed an important distinction.</p><p>In the counter world, the action is typed directly into the CLI. You say <code>+1</code>, and the environment updates the count.</p><p>In the theorem world, the action is also typed directly into the CLI. You say <code>intro h</code>, and the environment updates the proof state.</p><div class="callout-block" data-callout="true"><p><em><strong>But in the coding world, the meaningful action is not the CLI command. The meaningful action is the file edit.</strong></em></p></div><p>The CLI action is only <code>run-tests</code>.</p><p>That distinction matters because pspec is not a patch-application DSL. It is a local repair environment. A human or agent edits files in the workspace. Then pspec evaluates the current code state by running tests and recording the result.</p><p>That model feels much closer to actual agent work.</p><h2>The Trajectory Is The Data</h2><p>pspec records more than just whether the tests pass.</p><p>It can record the current source snapshot, source hash, test output, reward, completion status, and the diff between evaluated steps.</p><p>That diff is especially important. It turns the trajectory from a sequence of states into a sequence of state transitions.</p><p>So the output is not merely: &#8220;Here is the final fixed file&#8221;</p><p>The output is: &#8220;Here is the buggy starting point. Here is the first attempted change. Here is what failed. Here is the next diff. Here is the test feedback. Here is the moment the task became correct.&#8221;</p><p>That is a much richer object for agent training. <strong>A final answer tells you what worked. A trajectory tells you how the work got done.</strong></p><h2>From Local Debugging To Training Artifacts</h2><p>After the initial pspec version, I added support for custom tasks.</p><p>You can create a task from a real Python source file and a test file or test directory. pspec copies the source and tests into a task folder, stores a hidden reset template, tracks evaluated edits, captures diffs, and can package the whole thing into a single <code>training_artifact.json</code>.</p><p>That packaging step matters operationally.</p><p>Without it, handing data to a training team means explaining which files matter, where the trace lives, how the tests relate to the source, and what the final state represents.</p><p>With packaging, the output becomes one structured artifact containing metadata, state, source files, test files, traces, and exports.</p><p>This is the bridge from local debugging to process-data generation.</p><h2>What pworlds Is Not</h2><p>pworlds is not a training platform.</p><p>It is not a model-serving platform. It is not a cloud orchestration system. It is not a benchmark leaderboard. It is not a full Lean integration. It is not an autonomous coding agent.</p><p>It is an early substrate for verifiable task environments.</p><p>But even in this early form, the pattern is visible. The same runtime can support integer control, theorem proving, and code repair.</p><p>Those are three very different domains, but they all fit the same loop:</p><p><strong>observe &#8594; act &#8594; transition &#8594; reward &#8594; trace &#8594; replay &#8594; export</strong></p><p>That is the core abstraction.</p><h2>The Bigger Point</h2><p>A lot of the AI world still thinks in prompts. </p><p>Better prompts. Longer prompts. More structured prompts. Prompt libraries. Prompt workflows.</p><p>Those things are useful, but they are not enough for agents that need to do real work. Real agents need environments.</p><p>A coding agent needs broken repos, tests, diffs, intermediate failures, and repair trajectories.</p><p>A theorem-proving agent needs proof states, tactic attempts, invalid moves, and verified completions.</p><p>A data-center operations agent needs simulated incidents, control actions, safety constraints, and recovery paths.</p><p>A chip-design agent needs toolchains, timing reports, constraints, compiler feedback, and objective pass/fail signals.</p><p>The domain changes, but the pattern stays the same.</p><p>Build the world. Expose the state. Let the agent act. Grade the outcome. Record the trace. Replay the trajectory. Export the signal.</p><h2>Why This Matters</h2><p>The frontier labs already have a lot of text.</p><p>What they increasingly need is high-quality process data from environments where outcomes are verifiable.</p><p>Not just answers, but attempts.<br>Not just scores, but trajectories.<br>Not just final patches, but the path from broken to working.</p><p>If we can turn high-value work into executable environments, then every attempt becomes data. Every failure becomes signal. Every recovery becomes part of the training distribution.</p><p>If you&#8217;ve been building or investing in this direction, I&#8217;d love to chat with you.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Four Open Source Tools I Built to Dissect AI Infra]]></title><description><![CDATA[Diving into the machinery of foundation models and AI infra]]></description><link>https://www.infrastartups.com/p/four-open-source-tools-i-built-to</link><guid isPermaLink="false">https://www.infrastartups.com/p/four-open-source-tools-i-built-to</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Thu, 23 Apr 2026 19:05:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VDLx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A lot of AI infra is still discussed at the level of abstractions.</p><p>People talk about agents, reasoning, open models, serving, environments, orchestration, and post-training as if these are clean categories. In practice, they are messy systems problems. </p><p>Memory breaks. Runtime assumptions leak. Models behave differently once they are actually served. Fine-tuning is easy to talk about and much harder to operationalize on commodity hardware. Agent environments sound simple until you have to make them persistent, inspectable, and usable by real workflows.</p><p>Over the last stretch, I built four open source tools to get closer to the mechanics: <strong><a href="https://github.com/prateekvjoshi/pforge">pforge</a>, <a href="https://github.com/prateekvjoshi/phabitat">phabitat</a>, <a href="https://github.com/prateekvjoshi/pscope">pscope</a>, </strong>and<strong> <a href="https://github.com/prateekvjoshi/psplice">psplice</a></strong>.</p><p>This was not meant to be a tool-building exercise. It was research through construction. My goal was simple: </p><p>AI infra is moving too rapidly to reason about it from the outside. So to get an actual pulse, I need start touching the actual surfaces where things break.</p><p>Together, these tools form a kind of personal lab for open models and agents.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VDLx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VDLx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!VDLx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!VDLx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!VDLx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VDLx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de462551-1701-4c28-ac47-e0588a094337_1672x941.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1010928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/195273125?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VDLx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 424w, https://substackcdn.com/image/fetch/$s_!VDLx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 848w, https://substackcdn.com/image/fetch/$s_!VDLx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 1272w, https://substackcdn.com/image/fetch/$s_!VDLx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde462551-1701-4c28-ac47-e0588a094337_1672x941.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>1. <a href="https://github.com/prateekvjoshi/pforge">pforge</a>: shaping and serving open models on your own GPU</h2><p>pforge began with a basic question: what do we actually learn when we work with open models directly instead of only consuming them through an API?</p><p>Most people interact with models as black boxes. You send a prompt and get back text. But that hides the most important questions. How does latency change with model size and serving setup? What happens when you compare base and tuned variants side by side? How sensitive is behavior to reasoning budget, decoding settings, and fine-tuning? What can you inspect while the answer is forming?</p><p>pforge is my attempt to make those mechanics more visible.</p><p>It is a CLI for shaping and serving open models on a user&#8217;s own GPU machine. The emphasis is not just &#8220;run a model locally&#8221;. The emphasis is to compare, inspect, and experiment. You can chat with a model, compare outputs across variants, adjust reasoning budgets, and examine how behavior changes under different conditions.</p><p>What pforge taught me is that open models are not just cheaper substitutes for frontier APIs. They are research objects. Once you control the serving surface, you stop asking only &#8220;is this model good?&#8221; and start asking &#8220;under what constraints does this model stay useful?&#8221;</p><p>That is a much more infra-native question.</p><h2>2. <a href="https://github.com/prateekvjoshi/phabitat">phabitat</a>: giving every agent its own computer</h2><p>If pforge is about the model, phabitat is about the runtime around the model.</p><p>The core idea behind phabitat is: every agent should have its own persistent workspace-scoped computer. It shouldn&#8217;t just be a stateless API call or a disposable demo session. It should be a real environment with storage, logs, artifacts, and task continuity.</p><p>This matters because a lot of agent discourse still assumes that the model is the product. I think that view is incomplete. The useful unit is often the combination of model, runtime, permissions, workspace, memory, and inspection layer.</p><p>phabitat is a CLI for spinning up these isolated environments. A user can create a habitat, assign it a plain-English task, watch it work, inspect its outputs, and return later. The agent&#8217;s workspace persists. Its artifacts persist. Its event history persists.</p><p>Building this pushed me toward a stronger view: <strong>agent infra is really environment infra</strong>.</p><p>The difficulty is not in merely calling a model repeatedly. So what&#8217;s the actual difficult part? It&#8217;s giving the system durable state, bounded permissions, legible artifacts, and enough structure that a human can trust what happened. </p><p>Once you see that clearly, the market around &#8220;agents&#8221; starts to look less like a pure model story and more like a systems story.</p><h2>3. <a href="https://github.com/prateekvjoshi/pscope">pscope</a>: understanding what a machine can realistically run</h2><p>One underrated problem in open model adoption is basic fit.</p><p>People want to run open models locally, but they often don&#8217;t know what their machine can actually support. They guess. They overestimate. Or they spend hours installing things only to hit resource ceilings later.</p><p>pscope is a small tool, but it sits on an important question: what model will run best on this machine?</p><p>It scans a system and helps map hardware reality to model feasibility. That sounds operational, but it is also research-relevant. Hardware constraints shape what developers can build, test, and learn. They determine whether open models feel accessible or frustrating. They shape which parts of the ecosystem become broadly usable.</p><p>Working on pscope reinforced a simple belief. Infra adoption is often constrained less by raw model quality and more by setup friction plus hardware ambiguity.</p><p>In other words, discoverability of fit matters. A lot.</p><h2>4. <a href="https://github.com/prateekvjoshi/psplice">psplice</a>: model surgery, steering, and live intervention</h2><p>psplice is probably the most research-heavy of the four.</p><p>The goal is to make model intervention more practical. Load a model once, hold it in memory through a daemon, and then let the user perform operations like chatting, steering, and modifying behavior without reloading everything each time.</p><p>This tool sits closer to the layer of &#8220;how can I alter model behavior directly?&#8221; rather than &#8220;how can I wrap a product around it?&#8221;</p><p>That includes things like activation steering and head-level interventions. Even implementing the ergonomics of this forces you to confront real system details: attention implementations, VRAM persistence, daemon architecture, tensor assumptions, and the gap between a neat conceptual technique and a usable tool.</p><p>psplice taught me that model control is still early. Many ideas sound elegant in papers and rough in practice. But this is exactly why building matters. It reveals which interventions are robust, which are brittle, and which might eventually matter for real workflows.</p><h2>What these tools are really for</h2><p>On the surface, these are four separate open source projects.</p><p>Underneath, they are all attempts to answer the same research question: <strong>where is the real control surface in AI infra?</strong></p><p>Is it the model weights? The serving stack? The environment around the agent? The hardware fit layer? The intervention interface?</p><p>My current view is that the answer is not a single layer. The leverage comes from understanding the handoffs between layers.</p><p>That is why I built these. I want to build enough of the stack myself that my research can be grounded in contact with the machinery.</p><p>That&#8217;s the standard I want for my <em>Infra Startups</em> research column. Research should not just summarize what others built. It should leave evidence that you have wrestled with the systems yourself.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Programmable Reasoning: My Experiments With Qwen]]></title><description><![CDATA[Tinkering with the open source Qwen model and peeking inside]]></description><link>https://www.infrastartups.com/p/programmable-reasoning-my-experiments</link><guid isPermaLink="false">https://www.infrastartups.com/p/programmable-reasoning-my-experiments</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Fri, 27 Mar 2026 16:15:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tmkh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The gap between a model&#8217;s theoretical capabilities and what you can actually deploy on constrained hardware is big. And this is where the real engineering happens.</p><div class="pullquote"><p><strong>We&#8217;re familiar with reasoning, but what does it take to make it programmable? And how do we make it accessible to anyone?</strong> </p></div><p>To find out, I recently tinkered with the Qwen model family on a single RTX 4090 (24GB VRAM). My goal was to do everything from scratch and build two specific primitives:</p><ol><li><p><strong>Inspecting the reasoning chain:</strong> Can we expose and parse the model&#8217;s internal &#8220;chain of thought&#8221;?</p></li><li><p><strong>Rewiring the personality:</strong> Can we use rapid fine-tuning (flash-tuning) to fundamentally alter the model&#8217;s stylistic gravity on the fly?</p></li></ol><p>I wanted to make it easily accessible via command line. Here is a breakdown of the architecture, the model quirks, and the engineering realities of building something like this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tmkh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tmkh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!tmkh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!tmkh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!tmkh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tmkh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png" width="1200" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1068781,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/192262637?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tmkh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!tmkh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!tmkh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!tmkh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8bf7a6d-8ba8-4220-91b0-1ff0623e215a_1200x670.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Selecting the model</h3><p>Qwen 3.5 models are highly capable, but they are multimodal under the hood. We need vLLM to serve these models. And currently, vLLM&#8217;s LoRA implementation is broken for multimodal models. This makes dynamic tuning impossible.</p><p><strong>The Solution:</strong> I landed on Qwen3-1.7B. It&#8217;s a pure language model featuring a fascinating hybrid architecture (alternating dense and linear attention blocks), it&#8217;s fully LoRA-compatible, and crucially, it supports native &#8220;thinking mode&#8221;. And it&#8217;s small enough that you can focus on tinkering vs worrying about system/memory issues.</p><h3>Objective 1: Exposing the Chain of Thought</h3><p>The first goal was to peek inside the model&#8217;s reasoning process. To do this, you have to pick the right weights.</p><p>To extract the internal logic, I configured vLLM with the <code>--reasoning-parser qwen3</code> flag. This cleanly intercepts the chain of thought wrapped in the <code>&lt;think&gt;...&lt;/think&gt;</code> tokens and exposes them as a distinct <code>reasoning</code> field in the streaming API delta. Instead of a black box, you get a real-time window into the model&#8217;s cognitive process before it outputs the final answer.</p><h3>Objective 2: Rewiring Personality via Flash-Tuning</h3><p>With the reasoning engine exposed, the next goal was to see if I could rapidly bend the model&#8217;s persona to my will. I set up an end-to-end QLoRA pipeline to run a Quentin Tarantino style alignment experiment (and yes, I&#8217;m a huge fan of Quentin Tarantino).</p><p><strong>The Setup:</strong> 5 training examples. 50 steps. Rank 8 adapter.</p><p><strong>The Result:</strong> The loss plummeted from 4.6 to 0.12. And the model perfectly memorized the prompt formats, delivering highly stylized, rhythmic, and visceral responses for the training concepts.</p><p>However, when hit with a zero-shot, unseen prompt (&#8221;Describe London&#8221;), the LoRA broke down. The base pre-training dominated, and it reverted to a generic encyclopedia response. Five examples simply aren&#8217;t enough to generalize a stylistic syntax across the entire latent space.</p><p><strong>The Fix:</strong> I injected a strong system prompt at inference time alongside the loaded adapter. The response to &#8220;Describe London&#8221; instantly locked in and shifted into a gritty, sensory scene: <em>&#8220;London is a city that doesn&#8217;t lie. You walk down a street and someone walks up to you with a...&#8221;</em></p><p><strong>The Lesson:</strong> A LoRA adapter successfully shifts the model&#8217;s default behavior, but the system prompt acts as the anchor. And thus locking in the style for edge cases the adapter has never explicitly seen. You need both to reliably rewire personality.</p><h3>Output</h3><p>I ran it from the terminal. For the chain of thought experiment, I asked about Paris but didn&#8217;t a good answer. And then I did what any good Tarantino fan would do. I summoned my inner Jules Winnfield (Pulp Fiction):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cjxP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cjxP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 424w, https://substackcdn.com/image/fetch/$s_!cjxP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 848w, https://substackcdn.com/image/fetch/$s_!cjxP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!cjxP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cjxP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png" width="1456" height="687" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:687,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1344833,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/192262637?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cjxP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 424w, https://substackcdn.com/image/fetch/$s_!cjxP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 848w, https://substackcdn.com/image/fetch/$s_!cjxP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!cjxP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78e6455d-ce61-47c3-9c6a-66cc6f0d4c6e_2557x1206.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The slightly dimmed text at the top shows how the model &#8220;thinks&#8221; before answering the question. And then you can look at the bottom for the actual answer it outputted. Very interesting to see this live!</p><p>And then I wanted to rewire the model&#8217;s personality on the fly. This is what came out:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XF1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XF1J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 424w, https://substackcdn.com/image/fetch/$s_!XF1J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 848w, https://substackcdn.com/image/fetch/$s_!XF1J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!XF1J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XF1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png" width="1456" height="781" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:740792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/192262637?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XF1J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 424w, https://substackcdn.com/image/fetch/$s_!XF1J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 848w, https://substackcdn.com/image/fetch/$s_!XF1J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!XF1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64c25fdb-cbe3-44c2-b5c8-c68ff006c1cd_1897x1018.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s fun to see this live in action. </p><h3>The Infra Reality</h3><p>Running both an inference server and a training pipeline on a single 24GB GPU requires strict, defensive orchestration.</p><p>A vLLM instance serving in <code>bf16</code> eats ~18GB, and QLoRA needs another 12-14GB. Concurrent execution is not possible. To manage this, I built a FastAPI orchestrator that acts as a traffic controller:</p><ul><li><p><strong>VRAM Juggling:</strong> When a <code>POST /train</code> request hits, the orchestrator gracefully kills the vLLM subprocess, freeing the VRAM.</p></li><li><p><strong>OOM Blast Shields:</strong> The training script (<code>trainer.py</code>) is never imported. It runs strictly as a subprocess. If a memory spike triggers the Linux OOM killer, it only takes down the trainer, leaving the API server alive to report the failure and restart inference.</p></li><li><p><strong>Hot-Swapping:</strong> Once training completes, the orchestrator attempts a dynamic <code>POST /v1/load_lora_adapter</code> to vLLM. If that fails, it falls back to a hard restart with the new modules loaded.</p></li></ul><h3>Building for the Edge</h3><p>Working with open-source models right now means navigating rapid library deprecations (like TRL silently renaming parameters between versions) and structural cloud limits (routing all virtual environments and pip caches to a persistent volume to avoid disk quota crashes).</p><p>But when the pipeline finally hums, it&#8217;s amazing to watch. You can watch a model &#8220;think&#8221; through a problem and then answer you in the exact voice you just injected into its weights a minute prior. Having programmable reasoning running purely on your own stack is incredibly powerful.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #9: VERIFIABLE REASONING]]></title><description><![CDATA[Infra for formal verification, proof carrying systems, mathematical guarantees, and more.]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-9-verifiable-reasoning</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-9-verifiable-reasoning</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Sat, 10 Jan 2026 16:37:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xge0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;ve spent the last decade scaling probabilistic systems optimizing for plausibility. That&#8217;s fine for content, but what about correctness? </p><p>The moment an LLM transitions from <em>suggesting</em> code to <em>executing</em> mission critical code (infra code, editing identity management policies, authorizing a payment rail), stochasticity becomes a liability. &#8220;Usually correct&#8221; is not a shippable spec for autonomous action.</p><p>The next wave of infra will be defined by <strong>Verifiable Reasoning</strong>: the ability to translate intent into a <strong>formal specification</strong> and receive either of the following two items:</p><ul><li><p>a <strong>mechanically checkable certificate</strong> (proof / witness) that a property holds</p></li><li><p>a <strong>counterexample</strong> that falsifies the property.</p></li></ul><p>Isn&#8217;t this just &#8220;better evals&#8221;? Not exactly. It&#8217;s the reintroduction of <strong>invariants</strong> and <strong>proof artifacts</strong> as first-class objects in the software lifecycle.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xge0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xge0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!Xge0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!Xge0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!Xge0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xge0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png" width="1200" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1950472,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/184106097?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xge0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 424w, https://substackcdn.com/image/fetch/$s_!Xge0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 848w, https://substackcdn.com/image/fetch/$s_!Xge0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 1272w, https://substackcdn.com/image/fetch/$s_!Xge0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60da91cc-b453-4bb1-92c4-00a10097465e_1200x670.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>1. Technical substrate: solvers as commoditized compute, specs as the scarce resource</h2><p>The primitives are industrial-grade and battle-tested. They come from two lineages:</p><ul><li><p><strong>EDA and hardware verification</strong> (Synopsys / Cadence): where verification is not optional because failure is catastrophic and expensive.</p></li><li><p><strong>Safety-critical software</strong> (AdaCore / SPARK): where specifications and proofs are part of how systems ship.</p></li></ul><p>The engines are mature:</p><ul><li><p><strong>SAT/SMT solvers</strong> (e.g. Microsoft Z3, cvc5): constraint engines for decidable fragments of logic.</p></li><li><p><strong>Model checkers</strong> (e.g. SPIN): exhaustive state exploration against temporal properties.</p></li><li><p><strong>Proof assistants</strong> (Lean 4, Coq, Isabelle/HOL): interactive theorem proving with machine-checked proof terms.</p></li><li><p><strong>Symbolic execution / static verification</strong> (e.g. KLEE): produce concrete counterexamples and coverage guarantees on program paths.</p></li></ul><p>None of this is new. What&#8217;s new is the integration pattern.</p><p>Historically, formal verification was blocked by the <strong>translation tax</strong>. Humans cannot write TLA+, SystemVerilog assertions, or Lean propositions at the speed of modern engineering. And even when they can, proofs are brittle. They break under refactors, they require experts to maintain, and they don&#8217;t fit into CI.</p><p>The unlock is that LLMs are becoming the <em>interface layer</em> for formal methods. It&#8217;s becoming the compiler.</p><p>They can do <strong>auto-formalization</strong>: converting natural-language intent into specs, invariants, and proof sketches. They can also accelerate the loop that matters most in practice:</p><p>spec &#8594; attempt proof &#8594; fail &#8594; counterexample &#8594; refine spec/code &#8594; retry</p><p>That is exactly how formal methods scale. Not by &#8220;proving everything&#8221; but by iterating quickly toward the properties you actually care about.</p><p>This is the shift from &#8220;formal methods as a PhD thesis&#8221; to &#8220;formal methods as a CI/CD artifact&#8221;.</p><h2>2. Market architecture: three layers collapsing into a single correctness control plane</h2><p>I view the stack as three investable layers. Each is real, but the compounding value happens when they interlock.</p><h3>a) The legibility layer (governance + observability)</h3><p>This is the audit trail: drift detection, attribution, documentation, explainability. Incumbents here include <strong>Fiddler</strong>, <strong>Arthur</strong>, <strong>IBM Watson OpenScale</strong>, and open-source toolkits.</p><p>This layer is necessary but insufficient. It tells you what happened. Mission-critical systems need guarantees about what <em>cannot</em> happen. Legibility becomes powerful only when it feeds the guarantee layer with structured constraints (and later consumes the proof artifacts).</p><h3>b) The guarantee layer (verification)</h3><p>This is where mathematics touches model behavior. You see two camps:</p><ul><li><p>classical formal methods teams adapting into ML and autonomous systems (e.g. <strong>Galois</strong>, <strong>AbsInt</strong>-style approaches),</p></li><li><p>verification-first ML research stacks (e.g. <strong>VerifAI</strong>, solver-aided neural verification like <strong>Marabou</strong>, abstract-interpretation style bounds like DeepPoly).</p></li></ul><p>The alpha is in <strong>neuro-symbolic bridges</strong>: systems that constrain a neural network&#8217;s behavior using bounded, checkable logic.</p><p>The canonical shape looks like:</p><ul><li><p><strong>preconditions</strong> on inputs (ranges, schemas, safety envelopes),</p></li><li><p><strong>postconditions</strong> on outputs/actions (policy compliance, forbidden states),</p></li><li><p>and the <strong>proof obligation</strong> that says for given inputs and conditions, it will lead to a safe model.</p></li></ul><p>In practice you don&#8217;t get full universality. You get bounded domains, approximations, and certificates with explicit assumptions. That&#8217;s fine. The point is that it leads to <em><strong>mechanical accountability</strong>.</em></p><h3>c) The workflow layer (enterprise assurance / control plane)</h3><p>This is where verification becomes a product. Integrated into how software is shipped. <strong>Credo AI</strong> is an example of packaging governance and assurance into enterprise workflow. The wedge is whether the system can become a gate:</p><ul><li><p>does it run inside <strong>GitHub Actions</strong> and block a merge?</p></li><li><p>does it gate deployment in managed ML stacks like <strong>Amazon Bedrock</strong>?</p></li><li><p>does it generate artifacts a risk committee or auditor can consume?</p></li></ul><p>The winning products operationalize proofs the way devops operationalized deployments.</p><h2>3. Competitive Landscape: The &#8220;Proof&#8221; Stack</h2><h3>a) AI Mathematicians + Automated Reasoning (The &#8220;Solver&#8221; Layer)</h3><ul><li><p><strong>Symbolica</strong> &#8212; Building &#8220;structured intelligence&#8221; using category theory (categorical deep learning) rather than just transformers. Their flagship <strong>Agentica</strong> framework focuses on creating agents with provable correctness guarantees, effectively replacing &#8220;approximate&#8221; neural reasoning with algebraic structure.</p></li><li><p><strong>Normal Computing</strong> &#8212; Building thermodynamic computers and software to solve probabilistic reasoning problems. Their stack focuses on &#8220;energy-based models&#8221; that can reason about uncertainty and correctness more natively than standard GPUs, targeting high-stakes auditable workflows.</p></li><li><p><strong>Axiom Math</strong> &#8212; Building an AI mathematician / superintelligent reasoner as a wedge into formal reasoning + proof.</p></li><li><p><strong>Harmonic</strong> &#8212; Building mathematical reasoning systems (Aristotle) to solve Mathematics Olympiad-level problems, serving as a proxy for &#8220;reasoning reliability&#8221;.</p></li><li><p><strong>Imandra</strong> &#8212; &#8220;Reasoning-as-a-Service&#8221;. One of the few offering a cloud-accessible automated logic engine used by finance (Goldman Sachs) and defense to verify algorithms before they trade or shoot.</p></li></ul><h3>b) Formal Methods &#8220;Builders&#8221; (Verification Tooling)</h3><ul><li><p><strong>Galois</strong> &#8212; The OG deep-tech services firm. Productizing cryptographic verification (SAW) and high-assurance tooling. They essentially operate as the R&amp;D lab for the US government&#8217;s hardest verification problems.</p></li><li><p><strong>Atlas Computing</strong> &#8212; A newer entrant explicitly focused on &#8220;AI-assisted formal verification&#8221; for critical infrastructure. Their thesis is using LLMs to write the specs that traditional formal methods tools (like Z3) verify.</p></li><li><p><strong>Runtime Verification</strong> &#8212; Commercializes the <strong>K Framework</strong>. They are unique because they define semantics for languages (C, Java, EVM) to prove code adheres to spec.</p></li><li><p><strong>TrustInSoft</strong> &#8212; &#8220;Mathematically guaranteed&#8221; bug-free code for C/C++. They use formal methods to prove the <em>absence</em> of undefined behaviors, selling to automotive/IoT.</p></li><li><p><strong>BedRock Systems</strong> &#8212; Building a formally verified trusted computing base (Hypervisor/OS) to ensure critical systems cannot be bypassed.</p></li></ul><h3>c) Formal Verification (The &#8220;High Stakes&#8221; Sandbox)</h3><ul><li><p><strong>Certora</strong> &#8212; &#8220;Proving code works with mathematical certainty&#8221;. They provide the Certora Prover, which allows smart contract developers to write invariants (CVL) and auto-check them.</p></li><li><p><strong>Veridise</strong> &#8212; Spun out of UT Austin research. They use automated analysis (fuzzing + static analysis + formal verification) specifically for zero-knowledge circuits and smart contracts.</p></li><li><p><strong>Invariant Labs</strong> &#8212; Now partnering with Snyk. Focused on <strong>agentic security</strong> via formal guarantees. They use a &#8220;security analyzer&#8221; that imposes hard constraints on agent actions, preventing state violations regardless of prompt injections.</p></li></ul><h3>d) ProofOps and Agent Assurance (The &#8220;Control Plane&#8221;)</h3><ul><li><p><strong>Lakera</strong> &#8212; They act as a firewall for LLMs, preventing prompt injections and jailbreaks. Their database (Gandalf) is the industry standard for &#8220;how to break an LLM&#8221;, giving them a defensive moat.</p></li><li><p><strong>Protect AI</strong> &#8212; Building &#8220;MLSecOps&#8221;. They acquire and aggregate tools (like <strong>Laiyer AI</strong>) to scan models for vulnerabilities, verify supply chain integrity (signing models), and firewall LLM inputs/outputs.</p></li><li><p><strong>Gomboc.ai</strong> &#8212; &#8220;Deterministic Infra&#8221;. They use deterministic AI to remediate cloud infrastructure violations. Instead of just alerting, they generate the precise code fix that is mathematically guaranteed to solve the policy violation.</p></li><li><p><strong>Credo AI</strong> &#8212; The governance layer. Less about mathematics proofs and more about &#8220;audit proofs&#8221; i.e. generating the artifacts regulators need to sign off on a model.</p></li><li><p><strong>HiddenLayer</strong> &#8212; Security for the model itself. They detect if someone is trying to steal the model weights or tamper with the inference process at runtime.</p></li><li><p><strong>Robust Intelligence</strong> (Acquired by Cisco) &#8212; The &#8220;Antivirus for AI&#8221;. Automated red-teaming to find failure modes before deployment.</p></li></ul><h3>e) The &#8220;Bridge&#8221; Layer (Static Analysis + Neuro-symbolic)</h3><ul><li><p><strong>Semgrep</strong> &#8212; &#8220;Policy as Code&#8221;. While not purely &#8220;formal&#8221;, their engine is the standard for deterministic code checks. Their new AI features allow developers to write natural language rules that compile into deterministic, greppable constraints.</p></li><li><p><strong>Qodo (formerly Codium)</strong> &#8212; Focuses on &#8220;code integrity&#8221; for AI generation. They use a mix of static analysis and test generation to verify that the code an LLM writes actually runs and passes assertions.</p></li><li><p><strong>Cleanlab</strong> &#8212; &#8220;Data Curation as Code&#8221;. They use confident learning (a statistical theory) to mathematically prove which labels in a dataset are incorrect, purifying the input layer for AI.</p></li></ul><h2>3. Distribution signal: incumbents are buying &#8220;permission to deploy&#8221;</h2><p>The most important market signal is M&amp;A by platforms that already own deployment surfaces.</p><ul><li><p><strong>Snowflake &#8594; TruEra</strong>: data platforms need model reasoning and validation to keep high-value workloads on-platform.</p></li><li><p><strong>Cisco &#8594; Robust Intelligence</strong> and <strong>F5 &#8594; CalypsoAI</strong>: infra incumbents are acquiring assurance capability as a layer that ships everywhere their stack ships.</p></li><li><p>The velocity of <strong>HiddenLayer</strong> and the emergence of vendors like <strong>Lakera</strong> show this moving from research to procurement: a &#8220;trust/correctness&#8221; budget line forming inside enterprise AI rollouts.</p></li></ul><p>This is not &#8220;AI safety&#8221; per se. This is the same playbook as observability and security. Once the control becomes mandatory, it gets bundled.</p><h2>4. The investable opportunity: Infra for mathematical correctness</h2><p>My variant perception is that value will not accrue to the solver. Solvers commoditize. The defensibility is in infra that can operationalize this work of proving correctness. Something that can operationalize specifications and proof artifacts inside software delivery.</p><p>The core product primitives look like:</p><ul><li><p>specification authoring + versioning (spec diffs are as important as code diffs)</p></li><li><p>incremental verification (proofs that survive refactors via modularity / compositional reasoning)</p></li><li><p>counterexample triage (turn solver outputs into developer-actionable bug reports)</p></li><li><p>&#8220;proof coverage&#8221; metrics (what properties are guaranteed, under what assumptions)</p></li><li><p>deployment gates + regression proofs (the proof becomes part of the release artifact).</p></li></ul><p>If you can make proofs cheap enough to live in the git push loop, entire classes of autonomous software become viable.</p><p>Where this unlocks mission-critical software:</p><ul><li><p><strong>Infrastructure / agents:</strong> the primitive is state-machine verification. If agents are deployed through Kubernetes and toolchains, you need invariants over tool usage (&#8220;never delete database X unless condition Y is met&#8221;). TLA+ style thinking, but productized.</p></li><li><p><strong>Finance:</strong> the primitive is constraint satisfaction (monotonicity, fairness constraints, bounded risk policies). The killer product auto-generates compliance-ready proof artifacts tied to model and data lineage.</p></li></ul><h2>Conclusion</h2><p>Mathematics is becoming infrastructure. The opportunity here is building infra that lets enterprises ship autonomous systems with <strong>explicit guarantees</strong>, <strong>auditable assumptions</strong>, and <strong>mechanically checkable artifacts</strong>.</p><p>When correctness becomes a deployable artifact (when a proof is something you can diff, version, and regression-test), we can expand the frontier of what software can be trusted to do. That&#8217;s where the alpha is.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #8: POST-TRAINING INFRA]]></title><description><![CDATA[Companies that are building infra products for all the post-training needs]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-8-post-training</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-8-post-training</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Sun, 28 Dec 2025 11:42:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CHdN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. What exactly is &#8220;post-training infra&#8221;? </h2><p>Post-training infra is the tooling layer that helps teams <em>improve model/agent behavior after a foundation model exists</em>. And it&#8217;s done using a mix of: supervised fine-tuning (SFT), preference tuning / RLHF-style methods, prompt/tool changes, guardrails, eval suites, and continuous monitoring in production.</p><p>As LLMs move into business-critical workflows, the bottleneck is no longer &#8220;can we run a model?&#8221; but &#8220;can we keep it correct, safe, and cost-bounded as the world changes?&#8221;. This requires an iterative loop as opposed to a one-off training job.</p><p>A useful mental model: pretraining gives you general capabilities whereas post-training infra turns those capabilities into <em>reliable, auditable, domain-specific behavior</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CHdN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CHdN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 424w, https://substackcdn.com/image/fetch/$s_!CHdN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 848w, https://substackcdn.com/image/fetch/$s_!CHdN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 1272w, https://substackcdn.com/image/fetch/$s_!CHdN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CHdN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png" width="1000" height="558" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:992698,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/182760380?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CHdN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 424w, https://substackcdn.com/image/fetch/$s_!CHdN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 848w, https://substackcdn.com/image/fetch/$s_!CHdN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 1272w, https://substackcdn.com/image/fetch/$s_!CHdN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4a9b98-bbe9-4b3a-b21e-f4b05aff9195_1000x558.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. The post-training loop: the actual workflow enterprises are building</h2><p>Across most teams, the loop looks like:</p><ol><li><p><strong>Instrument</strong>: capture prompts, tool calls, retrieval context, outputs, latency/cost, and user feedback.</p></li><li><p><strong>Evaluate</strong>: run offline test suites + online canaries, measure task success (not just BLEU-like metrics).</p></li><li><p><strong>Diagnose</strong>: identify failure modes e.g. hallucinations, refusal errors, prompt injection, tool misuse, drift.</p></li><li><p><strong>Patch</strong> quickly: prompt changes, tool routing, guardrails/validators, retrieval fixes.</p></li><li><p><strong>Escalate</strong> selectively: when patches aren&#8217;t enough, do targeted fine-tuning / preference tuning on high-value tasks.</p></li><li><p><strong>Deploy + monitor</strong>: watch regressions, cost blowups, safety issues, repeat.</p></li></ol><p>Why infra matters: each stage creates data and decision points that need to be versioned, reproducible, and connected. There is a growing trend toward &#8220;unified stacks&#8221; rather than disconnected tools e.g. TensorZero pitching gateway + observability + eval + optimization in one.</p><h2>3. Demand drivers over the next 24 months</h2><p>Three forces look durable through 2026:</p><p><strong>i) Enterprise adoption is rising faster than &#8220;enterprise hardening&#8221;</strong><br>A large percentage of orgs are regularly using LLMs in at least one business function. But reliability poses a big challenge. This gap (&#8220;we deployed something&#8221; vs &#8220;it&#8217;s reliable and governed&#8221;) is exactly where post-training infra sells.</p><p><strong>ii) The world is moving to task-specific models, which increases tuning + evaluation needs</strong><br>Gartner predicts that by 2027, more than 50% of AI models enterprises use will be industry/function-specific. Domain specificity means you need to make it happen through data pipelines, eval harnesses, and fine-tuning.</p><p><strong>iii) Governance is becoming non-optional</strong><br>There&#8217;s an increasing demand for monitoring, eval evidence, audit trails, and policy enforcement. Classic infra value props.</p><h2>4. Subsector map: where startups cluster and who&#8217;s pulling ahead</h2><p>There are six clusters with heavy convergence between them:</p><h3>i) Agent/app orchestration frameworks (the &#8220;runtime&#8221; layer)</h3><ul><li><p><strong>LangChain</strong> is the canonical open-source entry point. They recently raised $125M at a $1.25B valuation.</p></li><li><p><strong>LlamaIndex</strong> positions around &#8220;knowledge agents&#8221; and enterprise data interfaces. </p></li></ul><p>These frameworks become post-training companies when they add: tracing, eval harnesses, prompt/versioning, and feedback loops.</p><h3>ii) Evals + testing (the &#8220;unit tests&#8221; for AI behavior)</h3><ul><li><p><strong>Braintrust</strong> explicitly focuses on evals and &#8220;AI software engineering&#8221;. It announced a $36M Series A in Oct 2024.</p></li></ul><h3>iii) Observability + monitoring (production truth, regressions, drift)</h3><ul><li><p><strong>Arize</strong> is a leading independent vendor. They announced a $70M Series C on 2025-02-20 focused on evaluation and observability for LLMs/agents.</p></li><li><p><strong>Datadog</strong> launching LLM Observability is important because it signals bundling pressure from &#8220;classic observability&#8221; into AI stacks.</p></li></ul><h3>iv) Guardrails + policy enforcement (safety + reliability controls)</h3><ul><li><p><strong>Guardrails AI</strong> raised a $7.5M seed and built a hub/wrapper approach.</p></li></ul><h3>v) Fine-tuning + preference optimization tooling (make models <em>yours</em>)</h3><ul><li><p><strong>Lamini</strong> raised $25M for an enterprise AI platform </p></li></ul><h3>vi) Closed-loop optimization stacks (unifying gateway + eval + optimization)</h3><ul><li><p><strong>TensorZero</strong> announced a $7.3M seed to build an open-source stack unifying gateway/observability/optimization/evals.</p></li></ul><p>This is a strong signal of where the market is going: fewer dashboards, more continuous improvement pipelines.</p><h2>5. Early-stage venture opportunity: where the market is still &#8220;unsolved&#8221;</h2><p>The best pockets are areas where the stack is still missing a reliable primitive. Here are 4 areas where it might work:</p><p><strong>Outcome-based evaluation (beyond LLM-as-judge)</strong><br>Enterprises care about &#8220;did the agent complete the workflow correctly?&#8221; as opposed to &#8220;did it look fluent?&#8221;. But the big challenge is instrumenting ground truth from business systems (CRM, ticketing, payments) and then turning it into automated evals. Startups that own this interface can become system-of-record for AI quality.</p><p><strong>Continuous learning for agents (safe retraining loops)</strong><br>A lot of teams <em>want</em> self-improving agents, but they don&#8217;t trust the loop. The winning wedge is: gated data collection + audit trails + rollbacks + sandboxed deployments. This could be the next evolution after basic orchestration.</p><p><strong>Governance + compliance automation as product</strong><br>There are rules in place to push companies to document risk controls, testing, and monitoring. The infra opportunity is software that <em>continuously produces compliance evidence</em> (test coverage, incident trails, red-team results) as a byproduct of normal operation. </p><p><strong>Data flywheels for post-training (high-quality feedback at scale)</strong><br>Post-training quality is gated by data. Partnerships like Anthropic&#8217;s use of Surge AI&#8217;s RLHF platform illustrate the demand for scalable human feedback + QC systems. Startups that productize &#8220;feedback ops&#8221; (tools, QC, workforce routing, privacy) can be critical picks-and-shovels.</p><h2>6. Business models and why pricing power is tricky</h2><p>There is real revenue traction in agent building platforms. A simple derived check on <em>how big a single customer can be</em> under seat pricing:</p><ul><li><p>If a company has 1,600 employees and pays $40&#8211;$50 per user per month, that&#8217;s:</p><ul><li><p>Monthly: 1,600 &#215; $40 = $64,000 on the low end and 1,600 &#215; $50 = $80,000 on the high end</p></li><li><p>Annual: $768,000 to $960,000</p></li></ul></li><li><p>This is attractive ARPA <em>if</em> adoption is broad and renewals hold</p></li></ul><p>But pricing power faces two structural headwinds:</p><ul><li><p><strong>Bundling by incumbents</strong> (Datadog, cloud providers, model providers) squeezes standalone point tools. </p></li><li><p><strong>Open source defaults</strong> (LangChain, Guardrails) force vendors to monetize via enterprise controls: SSO, RBAC, audit logs, data residency, eval governance, and support.</p></li></ul><p>The likely &#8220;winning&#8221; monetization pattern is: open-core adoption &#8594; paid control plane + collaboration + compliance &#8594; usage-based expansion on monitoring/optimization.</p><h2>7. How does it affect other infra subsectors?</h2><p>Post-training infra doesn&#8217;t live in a vacuum. It reshapes the broader AI infra stack. Here are the most important dependencies and second-order effects:</p><p><strong>a) Serving + inference infra becomes more valuable when evaluation loops are tight.</strong><br>If teams are constantly iterating (new prompts, new adapters, new routing), they need fast, cheap experimentation environments. That pulls demand toward inference/serving startups that support canaries, model routing, and cost observability. Correlation: more eval + experimentation &#8594; more switching between models &#8594; more value in routing + caching + cost controls.</p><p><strong>b) Data infra and security vendors get pulled into the loop</strong><br>Post-training requires logging prompts and outputs, which often contain sensitive data. That creates direct dependencies on:</p><ul><li><p>data loss prevention / redaction</p></li><li><p>secure storage + retention policies</p></li><li><p>access controls and audit trails</p></li><li><p>synthetic data or privacy-preserving feedback.</p></li></ul><p>Regulatory pressure amplifies this because governance becomes an operational requirement.</p><p><strong>c) Observability incumbents will &#8220;tax&#8221; the ecosystem</strong><br>Datadog&#8217;s LLM Observability is a bundling signal: classic observability vendors can package AI monitoring into existing procurement, reducing budget for startups unless they are clearly better on model-specific workflows. Risk for startups is that feature parity arrives quickly (basic tracing, prompt logs, cost dashboards). Differentiation must move up the stack with actionable evals, automated fixes, and governance automation.</p><p><strong>d) Model providers shape the ceiling</strong><br>As frontier models improve, some failure modes disappear. But enterprises still need proofs, cost controls, and domain specificity. The base model progress shifts spend from &#8220;make it work at all&#8221; to &#8220;make it work reliably and cheaply&#8221;.</p><p><strong>e) Consolidation is real (platform gravity)</strong><br>CoreWeave&#8217;s acquisition of Weights &amp; Biases shows infra providers moving upstack to own the developer workflow end-to-end (train/tune/evaluate/deploy). This creates a dependency risk: early-stage tools that don&#8217;t become a platform primitive may be acquired, copied, or squeezed.</p><p><strong>A practical estimate for &#8220;how much of AI infra gets affected&#8221;:</strong> </p><p>if you define AI infra startups as serving one of <strong>six layers (compute, model serving, data, orchestration/devtools, observability/safety, and security/governance)</strong>, then post-training infra directly overlaps orchestration + observability/safety + governance, and partially overlaps serving + data. </p><p>That&#8217;s 3&#8211;5 of 6 layers touched. The exact &#8220;portion&#8221; depends on how you bucket companies, but the direction is clear: post-training loops become a central integration point that many infra startups either plug into or compete with.</p><h2>8. What to watch through 2026</h2><h3>Catalysts (positive for the sector)</h3><ul><li><p><strong>More agent deployments &#8594; more need for continuous improvement.</strong> A large chunk of of agentic AI projects may be canceled by end of 2027 due to costs/value/risk controls. Ironically a tailwind for post-training infra that reduces those risks. </p></li><li><p><strong>Regulatory timelines hit operational reality.</strong> Procurement starts demanding audit evidence. </p></li><li><p><strong>Platform consolidation continues.</strong> More &#8220;W&amp;B-style&#8221; moves by clouds, devtool incumbents, and observability platforms.</p></li></ul><h3>Failure modes (what breaks the bull case)</h3><ul><li><p><strong>Bundling crushes standalone tools</strong> before they reach scale (especially basic eval/monitoring features).</p></li><li><p><strong>&#8220;Good enough&#8221; models reduce willingness to fine-tune</strong>, pushing spend to prompting + retrieval. Fine-tuning platforms must show clear ROI.</p></li><li><p><strong>Data/legal incidents</strong> (leaks, IP disputes, privacy failures) slow deployments and raise compliance friction. This can either stall budgets or redirect them to governance-heavy vendors.</p></li></ul><h2>9. What&#8217;s the opportunity?</h2><p>The investable center of gravity is shifting from &#8220;training pipelines&#8221; to <strong>behavior pipelines</strong>. These are systems that continuously measure, correct, and harden model/agent behavior in production. The arc is: start with developer adoption, then climb into enterprise workflows by owning the feedback loop.</p><p>For early-stage venture, the best opportunities are the primitives that remain hard even as models improve:</p><ul><li><p>outcome-grounded evaluation</p></li><li><p>safe continuous learning loops</p></li><li><p>governance evidence automation</p></li><li><p>feedback/data ops at scale</p></li><li><p>cost + reliability control planes across many models</p></li></ul><p>The good thing is that the question &#8220;does post-training matter?&#8221; has been answered. It matters a lot! But <strong>who captures the value</strong>? Independent startups or bundled incumbents? </p><p>The next 24 months will likely reward teams that (1) become deeply embedded in production workflows and (2) generate proprietary signals (eval outcomes, failure taxonomies, policy decisions) that compound into a defensible moat.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #7: AI MATHEMATICIAN]]></title><description><![CDATA[Companies that are building AI products for mathematical reasoning]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-7-ai-mathematician</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-7-ai-mathematician</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Tue, 09 Dec 2025 16:09:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1Iod!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Snapshot: What Are &#8220;AI Mathematician Products&#8221; in 2025?</h2><p>AI mathematician products sit at the intersection of advanced LLMs, formal methods, and specialized tooling for mathematics, verification, and research. In the last 18&#8211;24 months, multiple systems have hit <strong>International Mathematical Olympiad (IMO) gold-medal performance</strong>, Putnam-level scores, and near-saturation of benchmarks like MATH and MiniF2F. The core thesis:</p><blockquote><p><strong>If a model can reliably do mathematics and formal proofs, it&#8217;s a proxy for general trustworthy reasoning.</strong></p></blockquote><p>This ecosystem now spans:</p><ul><li><p><strong>Big Tech &#8220;System-2&#8221; engines</strong> (OpenAI, Google DeepMind, Anthropic, xAI, Microsoft, Meta, Alibaba&#8217;s Qwen team).</p></li><li><p><strong>Formal-verification startups</strong> (Harmonic, Axiom Math, Logical Intelligence, Symbolica AI).</p></li><li><p><strong>Open-source reasoning models</strong> (DeepSeek-R1, DeepSeek-Math-V2, DeepSeek-Math, DeepSeek V3, Qwen-2.5-Math in 1.5B/7B/72B sizes, QwQ-32B, NuminaMath, NuminaMath dataset, Llama 3.1).</p></li><li><p><strong>Infra, tools, and proof languages</strong> (Lean, mathlib, Coq, Isabelle, HOL Light, MiniF2F, IMO-ProofBench / ProofBench, HOList, GPT-f).</p></li><li><p><strong>Consumer and education solvers</strong> (Photomath, WolframAlpha, Mathway).</p></li><li><p><strong>New agentic stacks</strong> (Math Inc with its Gauss agent and the strongpnt benchmark repo).</p></li></ul><p>The current phase is <strong>R&amp;D + early pilots</strong>, not scaled revenue. But the tech is clearly real and improving fast.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Iod!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Iod!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!1Iod!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!1Iod!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!1Iod!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Iod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3231813,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/181115638?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Iod!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!1Iod!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!1Iod!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!1Iod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b50e15-c312-417e-aa75-6111c76cbedd_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. Two Axes: Consumer Solvers vs Formal Verifiers, LLM vs Tools</h2><h3>2.1 Product categories</h3><ol><li><p><strong>Consumer / Homework Solvers</strong></p><ul><li><p>General reasoning models like <strong>o1</strong> / <strong>o1-pro</strong> (OpenAI), <strong>Gemini 1.5 / Gemini 2.0 / Gemini Deep Think</strong> (Google DeepMind), <strong>Claude 3.7</strong> and <strong>Claude 3.7 Sonnet</strong> (Anthropic), <strong>Grok 3</strong> (xAI), and <strong>Llama 3.1</strong> (Meta Llama line) now ship &#8220;Think&#8221; or extended-reasoning modes.</p></li><li><p>Camera-first apps: <strong>Photomath</strong> (Google) and <strong>Mathway</strong> (Chegg) handle K-12 to early college, often backed by engines like <strong>Gemini</strong> or other LLMs.</p></li><li><p><strong>WolframAlpha</strong> remains the OG symbolic engine, increasingly paired with LLM chat front-ends (e.g. <strong>ChatGPT</strong> + Wolfram).</p></li></ul></li><li><p><strong>Research / Formal Verifier Systems</strong></p><ul><li><p><strong>Harmonic</strong> (with its <strong>Aristotle</strong> model and Mathematical Superintelligence (MSI) vision) outputs <strong>Lean 4</strong> proofs.</p></li><li><p><strong>Axiom Math</strong> targets AI that generates <strong>new conjectures</strong> and proves them in <strong>Lean</strong> or <strong>Coq</strong>.</p></li><li><p><strong>Logical Intelligence</strong> builds <strong>language-free Energy-Based Models (EBMs)</strong> and agents like <strong>Aleph</strong> and <strong>Noa</strong> to convert code into formal statements/proofs.</p></li><li><p><strong>DeepSeek-Math-V2</strong> and <strong>DeepSeek-Math</strong> (plus <strong>DeepSeek-R1</strong> and <strong>DeepSeek V3</strong>) occupy the open-source reasoning tier with very strong math performance.</p></li><li><p><strong>Symbolica AI</strong> takes a neuro-symbolic, non-Transformer approach to structured logic.</p></li></ul></li><li><p><strong>Open-source reasoning specialists</strong></p><ul><li><p><strong>DeepSeek-R1</strong> and <strong>DeepSeek-Math-V2</strong>: RL-trained reasoning models with self-verification and very long test-time &#8220;internal monologue&#8221;.</p></li><li><p><strong>Qwen-2.5-Math</strong> family (1.5B, 7B, 72B) and <strong>QwQ-32B</strong>: Alibaba&#8217;s math-specialized suite. <strong>QwQ-32B</strong> is a 32B &#8220;System-2&#8221; reasoner. <strong>Qwen-2.5-Math-72B</strong> is a powerhouse solver. 1.5B and 7B variants are laptop-friendly.</p></li><li><p><strong>NuminaMath</strong> (models and <strong>NuminaMath dataset</strong>) emphasize <em>data quality</em>. Competition-grade problems with Chain-of-Thought plus <strong>Tool-Integrated Reasoning (TIR)</strong> via Python + sympy.</p></li><li><p>Benchmarks and training infra rely heavily on <strong>MiniF2F</strong>, <strong>MATH</strong>, <strong>AIME</strong>, <strong>AI Math Olympiad (AIMO)</strong>, <strong>IMO-ProofBench / ProofBench</strong>, <strong>Putnam</strong> datasets, and formal repos like <strong>mathlib</strong> in <strong>Lean</strong>.</p></li></ul></li><li><p><strong>Math Inc and agentic stacks</strong></p><ul><li><p><strong>Math Inc</strong> (math.inc) runs the <strong>Gauss</strong> agent: a multi-tool reasoning loop over <strong>Lean</strong>, Python, and external math tools.</p></li><li><p>Their open-source <strong>strongpnt</strong> repo benchmarks strong-point geometry problems and acts as a formal geometry testbed for AI mathematicians.</p></li></ul></li></ol><h3>2.2 Architectural patterns</h3><ul><li><p><strong>System-2 / inference-time compute</strong><br>Models like <strong>DeepSeek-R1</strong>, <strong>QwQ-32B</strong>, <strong>o1 / o1-pro</strong>, <strong>Grok 3</strong>, <strong>Gemini Deep Think</strong>, <strong>Claude 3.7 Sonnet (extended thinking)</strong> and <strong>Llama 3.1 405B</strong> run long &#8220;hidden chain-of-thought&#8221; trajectories, sampling multiple reasoning paths before emitting an answer.</p></li><li><p><strong>Tool-Integrated Reasoning (TIR)</strong><br>Systems such as <strong>NuminaMath</strong>, <strong>Qwen-2.5-Math-72B</strong>, <strong>Qwen-2.5-Math-7B</strong>, <strong>Qwen-2.5-Math-1.5B</strong>, <strong>Claude 3.7</strong>, <strong>Gemini</strong>, <strong>GPT-4</strong>, <strong>GPT-4o</strong>, <strong>GPT-5</strong>, <strong>Grok 3</strong>, <strong>ChatGPT</strong> and hybrid stacks (ChatGPT+<strong>WolframAlpha</strong>) explicitly generate Python (sympy, numpy) or call external tools to compute integrals, solve equations, or run simulations.</p></li><li><p><strong>Formal proof generation</strong><br><strong>Harmonic&#8217;s Aristotle</strong>, <strong>AlphaProof</strong> (Google), <strong>Logical Intelligence&#8217;s Aleph / Noa</strong>, <strong>Math Inc&#8217;s Gauss</strong>, and future offerings from <strong>Axiom Math</strong> produce <strong>Lean 4</strong>, <strong>Lean</strong>, or <strong>Coq</strong> scripts that are checked by proof assistants like <strong>Lean</strong>, <strong>Coq</strong>, <strong>Isabelle</strong>, <strong>HOL Light</strong>, built on libraries such as <strong>mathlib</strong>.</p></li><li><p><strong>Alternative architectures</strong><br><strong>Logical Intelligence</strong> pushes <strong>energy-based models (EBMs)</strong> rather than token LLMs. <strong>Symbolica AI</strong> explores neuro-symbolic non-backprop architectures. Meta&#8217;s <strong>HOList</strong> and DeepMind&#8217;s <strong>AlphaProof</strong> mix search with learned guidance.</p></li></ul><h2>3. Key Companies and Products</h2><h3>3.1 Big Tech engines</h3><ul><li><p><strong>Google DeepMind</strong></p><ul><li><p>Research models: <strong>Gemini </strong>suite of models offer strong capabilities.</p></li><li><p>Formal backend: <strong>AlphaProof</strong> (neuro-symbolic Lean prover).</p></li><li><p>Owns <strong>Photomath</strong> and integrates math into <strong>Google Workspace</strong>, <strong>Bard</strong>, <strong>Google Cloud AI</strong> and potentially <strong>GCP</strong>.</p></li><li><p>Historically repurposed <strong>AlphaGo</strong>, <strong>AlphaZero</strong>, and <strong>AlphaFold</strong> techniques. <strong>AlphaProof</strong> follows that lineage.</p></li></ul></li><li><p><strong>OpenAI</strong></p><ul><li><p>Models: <strong>GPT-5</strong> series of models are capable at reasoning</p></li><li><p>Approach: massive inference-time compute (many parallel trajectories) plus tools and formal integrations.</p></li></ul></li><li><p><strong>Anthropic</strong></p><ul><li><p>Claude 4.5 series, featuring Opus 4.5 (most capable for complex tasks), Sonnet 4.5 (strong reasoning, efficient for agents)</p></li></ul></li><li><p><strong>xAI</strong></p><ul><li><p><strong>Grok 3</strong> on X/Twitter: reasoning model with &#8220;Think&#8221; mode plus live access to X data.</p></li></ul></li><li><p><strong>Alibaba / Qwen team</strong></p><ul><li><p><strong>Qwen-2.5-Math</strong> (1.5B, 7B, 72B) and <strong>QwQ-32B</strong>: arguably the most versatile open math model family, spanning laptop-friendly to 72B-scale.</p></li></ul></li></ul><h3>3.2 Formal-verification startups</h3><ul><li><p><strong>Harmonic</strong></p><ul><li><p>Product: <strong>Aristotle</strong>, marketed as <strong>Mathematical Superintelligence (MSI)</strong>.</p></li><li><p>Architecture: trains entirely on synthetic Lean proofs. Outputs <strong>Lean 4</strong> code, checked mechanically, targeting zero hallucinations.</p></li><li><p>Benchmarks: gold-level IMO, ~90% on <strong>MiniF2F</strong>, strong scores on <strong>IMO-ProofBench / ProofBench</strong>.</p></li><li><p>API: free <strong>Aristotle Lean API</strong> to seed adoption. Roadmap towards safety-critical software (aerospace, automotive, trading, crypto).</p></li></ul></li><li><p><strong>Axiom Math</strong></p><ul><li><p>Mission: AI that not only solves problems but <strong>proposes new conjectures</strong> and proves them (Lean/Coq).</p></li><li><p>Backed by a strong team including Carina Hong and Ken Ono, with ex-Meta FAIR folks like Fran&#231;ois Charton.</p></li><li><p>Targets: cryptography, algorithms, physics, finance. Wants a self-improving AI mathematician at AGI scale.</p></li></ul></li><li><p><strong>Logical Intelligence</strong></p><ul><li><p>Products/agents: <strong>Aleph</strong> (formal proof), <strong>Noa</strong> (bug finding).</p></li><li><p>Architecture: <strong>language-free EBMs</strong>, reasoning in continuous state space rather than tokens.</p></li><li><p>Benchmarks: ~76% on a <strong>Putnam</strong> benchmark. Pilot work in crypto, national infrastructure, high-assurance systems.</p></li></ul></li><li><p><strong>Symbolica AI</strong></p><ul><li><p>Pitch: neuro-symbolic reasoning without standard backprop. Structured algebraic representations rather than pure token streams.</p></li><li><p>Still early/stealth, but positioned as a deep-tech alternative to Transformers.</p></li></ul></li></ul><h3>3.3 Open-source and &#8220;people&#8217;s champion&#8221; models</h3><ul><li><p><strong>DeepSeek</strong></p><ul><li><p>Models: <strong>DeepSeek-R1</strong> (RL &#8220;System-2&#8221;), <strong>DeepSeek-Math</strong>, <strong>DeepSeek-Math-V2</strong>, <strong>DeepSeek V3</strong>, and <strong>DeepSeekMath-V2 Heavy</strong>.</p></li><li><p>Training: ~500B tokens of math/code/science plus RL methods like <strong>Math-Shepherd</strong>.</p></li><li><p>Benchmarks: gold IMO, near-perfect <strong>Putnam</strong> (~118/120), top scores on <strong>AIME</strong>, <strong>MATH</strong>, and <strong>ProofBench</strong>.</p></li><li><p>Strategy: fully open weights on Hugging Face and GitHub; extremely large (up to 685B params), aiming to be the &#8220;open GPT-5 for math&#8221;.</p></li></ul></li><li><p><strong>NuminaMath</strong></p><ul><li><p>Assets: <strong>NuminaMath dataset</strong> (~1M competition-style problems with CoT and TIR annotations) and NuminaMath models (often on DeepSeek/Qwen backbones).</p></li><li><p>Strength: <strong>Tool-Integrated Reasoning</strong> via Python + sympy, explicitly solving symbolic math rather than hallucinating.</p></li></ul></li><li><p><strong>Qwen-2.5-Math and QwQ</strong></p><ul><li><p><strong>Qwen-2.5-Math-72B</strong> is a top classical solver. <strong>Qwen-2.5-Math-7B</strong> and <strong>Qwen-2.5-Math-1.5B</strong> bring high math quality to commodity hardware.</p></li><li><p><strong>QwQ-32B</strong> is a medium-sized but very strong reasoning engine for logic puzzles and proofs.</p></li></ul></li><li><p><strong>Math Inc</strong></p><ul><li><p>Agent: <strong>Gauss</strong>, orchestrating Lean + Python + external toolcalls in multi-step loops. Fits squarely in the agentic TIR camp.</p></li><li><p>Repo: <strong>strongpnt</strong> (GitHub), a benchmark suite for geometry/strong-point problems in a formal setting. Acts as a shared testbed for AI mathematicians.</p></li></ul></li></ul><h3>3.4 Consumer and education tools</h3><ul><li><p><strong>Photomath</strong> (Google): camera-based solver, now backed by <strong>Gemini</strong> for better OCR and reasoning.</p></li><li><p><strong>Mathway</strong> (Chegg): algebra/calculus homework assistant.</p></li><li><p><strong>WolframAlpha</strong>: symbolic compute engine. Modern twist is tight integration with LLMs like <strong>ChatGPT</strong> and <strong>Claude</strong>, where Wolfram does the mathematics and the LLM handles chat/UX.</p></li></ul><p>These products are where most students and non-experts first see &#8220;AI mathematics&#8221; in the wild.</p><h2>4. Product Stack: From Models to Platforms</h2><h3>4.1 Foundation models and proof assistants</h3><p>The core stack looks like:</p><ul><li><p><strong>LLM / Reasoning engine</strong>: e.g. <strong>Gemini Deep Think</strong>, <strong>GPT-5</strong>, <strong>Claude 4.5</strong>, <strong>Grok 3</strong>, <strong>DeepSeek-Math-V2</strong>, <strong>Qwen-2.5-Math-72B</strong>, <strong>NuminaMath models</strong>, <strong>Llama 3.1</strong>, <strong>Axiom Math&#8217;s internal models</strong>, <strong>Logical Intelligence&#8217;s EBMs</strong>, <strong>Symbolica AI&#8217;s models</strong>.</p></li><li><p><strong>Proof assistant</strong>: <strong>Lean / Lean 4</strong>, <strong>Coq</strong>, <strong>Isabelle</strong>, <strong>HOL Light</strong>, plus libraries like <strong>mathlib</strong>.</p></li><li><p><strong>Benchmarks</strong>: <strong>MiniF2F</strong>, <strong>MATH</strong>, <strong>AIME</strong>, <strong>AI Math Olympiad (AIMO)</strong>, <strong>Putnam</strong>, <strong>IMO-ProofBench / ProofBench</strong>, geometry benchmarks like <strong>strongpnt</strong>.</p></li><li><p><strong>Training infrastructure</strong>: research environments like <strong>HOList</strong>, older <strong>GPT-f</strong>, RL frameworks, and synthetic-data pipelines like <strong>Math-Shepherd</strong>.</p></li></ul><h3>4.2 Platform and integration surface</h3><p>Most companies aim to evolve from &#8220;model API&#8221; to a <strong>platform</strong>:</p><ul><li><p><strong>Harmonic</strong> is turning Aristotle into an API + IDE plugin for <strong>Lean</strong>, with future integration into CI/CD and tooling (GitHub / GitLab, devops pipelines) to automatically verify critical properties.</p></li><li><p><strong>Logical Intelligence</strong> is on track to ship a general model by 2026, targeting vertical deployments in crypto, power grids, defense, and other high-assurance systems.</p></li><li><p><strong>Axiom Math</strong> envisions a research co-pilot that reads textbooks/PDFs, autoformalizes them into Lean or Coq, then explores &#8220;what if&#8221; conjectures.</p></li><li><p><strong>Math Inc</strong>&#8217;s Gauss plus <strong>strongpnt</strong> is an early example of an agent + benchmark loop targeted at a specific branch (geometry).</p></li><li><p><strong>DeepSeek-Math-V2</strong>, <strong>Qwen-2.5-Math</strong> and <strong>NuminaMath</strong> are being wrapped by the open-source community into local assistants, IDE extensions, and research tools.</p></li></ul><p>On the infrastructure side, cloud players like <strong>AWS</strong>, <strong>Azure</strong>, <strong>GCP</strong>, and possibly <strong>IBM</strong> can easily provide specialized &#8220;reasoning clouds&#8221;. Analogies can be drawn to <strong>Synopsys</strong> / <strong>Cadence</strong> (hardware verification) and <strong>Adobe</strong> for vertical, high-value software.</p><h2>5. Market, GTM, Monetization, and Unit Economics</h2><h3>5.1 Market framing</h3><ul><li><p>Near-term wedges:</p><ul><li><p>AI code tools (today&#8217;s <strong>GitHub Copilot</strong>, <strong>DeepCode</strong>, etc.) plus formal verification: ~$26B AI code tools TAM by 2030, with formal verification currently a $400M niche but attached to a $55B software testing/QA market.</p></li><li><p>Crypto and DeFi: billions lost in contract bugs &#8594; strong ROI for tools like <strong>Aleph</strong>, <strong>Noa</strong>, <strong>Aristotle</strong>, <strong>Gauss</strong>, <strong>DeepSeek-Math-V2</strong>, <strong>Qwen-2.5-Math-72B</strong>, <strong>NuminaMath</strong>.</p></li><li><p>Safety-critical software in aerospace, automotive, defense, power grids, and national infrastructure.</p></li></ul></li><li><p>Mid-term:</p><ul><li><p>AI mathematicians as R&amp;D amplifiers for quant firms (e.g. <strong>Renaissance</strong>, <strong>Two Sigma</strong>) and research labs (e.g. analogies to <strong>Isomorphic Labs</strong> and <strong>Insilico Medicine</strong> in drug discovery).</p></li></ul></li><li><p>Long-term:</p><ul><li><p>&#8220;AI reasoning cloud&#8221; as standard infra, akin to <strong>OpenCV</strong>, <strong>TensorFlow</strong>, or <strong>Stable Diffusion</strong> in their domains. Companies like <strong>Harmonic</strong>, <strong>Axiom Math</strong>, <strong>Logical Intelligence</strong>, <strong>DeepSeek</strong>, <strong>Math Inc</strong>, <strong>Symbolica AI</strong>, plus big labs (<strong>OpenAI</strong>, <strong>Google DeepMind</strong>, <strong>Anthropic</strong>, <strong>xAI</strong>, <strong>Meta</strong>, <strong>Alibaba/Qwen</strong>) compete for that role.</p></li></ul></li></ul><h3>5.2 GTM patterns</h3><ul><li><p><strong>Harmonic</strong>: free <strong>Aristotle API</strong> for community + top-down pilots in aerospace, automotive, finance, crypto, national security.</p></li><li><p><strong>Axiom Math</strong>: academia-heavy GTM (talks, conferences, publications) plus early design-partner engagements in trading, chip design, cryptography.</p></li><li><p><strong>Logical Intelligence</strong>: crypto audits and government/national-infrastructure pilots; narrative heavily about moving &#8220;beyond LLMs&#8221; with EBMs.</p></li><li><p><strong>DeepSeek</strong>: open-source adoption via Hugging Face/GitHub. Focus on mindshare rather than immediate revenue.</p></li><li><p><strong>Math Inc</strong>: dev-first reach with <strong>Gauss</strong> and <strong>strongpnt</strong> as open assets that others can build on.</p></li></ul><h3>5.3 Monetization and unit economics</h3><ul><li><p>Likely models:</p><ul><li><p><strong>Enterprise licenses</strong> (on-prem or VPC deployments of Aristotle, Aleph, Axiom models, Gauss-style agents).</p></li><li><p><strong>Cloud APIs</strong> with consumption-based pricing (per proof, per reasoning hour).</p></li><li><p><strong>Consulting / verification-as-a-service</strong> (e.g. Logical Intelligence auditing smart contracts with Aleph/Noa, Harmonic verifying autopilot code).</p></li><li><p><strong>Government contracts / grants</strong> in defense, aerospace, and infrastructure.</p></li></ul></li><li><p>Today, economics are compute-heavy and negative margin: 685B-param models (<strong>DeepSeek-Math-V2 Heavy</strong>) and long test-time reasoning (<strong>GPT-5.1</strong>, <strong>Gemini 3</strong>) can cost hundreds or thousands of GPU-hours per hard problem.</p></li><li><p>Over time, distillation (e.g. from <strong>DeepSeek-Math-V2 Heavy</strong> to smaller derived models), better search (as in <strong>Math-Shepherd</strong>, <strong>AlphaProof</strong>), EBMs (<strong>Logical Intelligence</strong>), and caching will reduce inference cost and improve unit economics.</p></li></ul><h2>6. Moats, Risks, and Scenarios</h2><h3>6.1 Moats</h3><ul><li><p><strong>Technical / IP</strong>:</p><ul><li><p>Synthetic proof pipelines (Harmonic&#8217;s Aristotle), EBM architectures (Logical Intelligence), neuro-symbolic designs (Symbolica AI), RL methods like <strong>Math-Shepherd</strong> (DeepSeek), and agent stacks like <strong>Gauss</strong> are non-trivial to replicate.</p></li></ul></li><li><p><strong>Data and corpora</strong>:</p><ul><li><p>Massive internal datasets (DeepSeek&#8217;s 500B-token corpus, NuminaMath dataset, Lean <strong>mathlib</strong>, geometry sets like <strong>strongpnt</strong>).</p></li></ul></li><li><p><strong>Talent</strong>:</p><ul><li><p>Fields-Medalist-caliber mathematicians (Logical Intelligence), high-profile researchers (Axiom Math), and operators like Vlad Tenev, Tudor Achim, Carina Hong.</p></li></ul></li><li><p><strong>Community and ecosystem</strong>:</p><ul><li><p>Lean + mathlib, open-source usage of <strong>DeepSeek-Math-V2</strong>, or pooled geometry benchmarks like <strong>strongpnt</strong> can create community lock-in.</p></li></ul></li><li><p><strong>Integration</strong>:</p><ul><li><p>Embedding Aristotle, Aleph, Gauss, Qwen-2.5-Math, or DeepSeek-Math into CI pipelines, IDEs, and cloud stacks (GitHub, GitLab, Azure, GCP, AWS, etc.) raises switching costs.</p></li></ul></li><li><p><strong>Trust</strong>:</p><ul><li><p>Formal guarantees (Lean/Coq/Isabelle proofs with <strong>Aristotle</strong>, <strong>AlphaProof</strong>, <strong>Aleph</strong>, <strong>Gauss</strong>) provide a trust moat vs generic LLMs that cannot certify correctness.</p></li></ul></li></ul><h3>6.2 Risks</h3><ul><li><p><strong>Technical ceiling</strong>: scaling from contest mathematics to full research-level problems or million-line code verification may prove much harder than IMO/Putnam benchmarks suggest.</p></li><li><p><strong>Adoption friction</strong>: conservative regulators and engineers may delay trusting AI proofs. Mathematicians may resist or limit AI to &#8220;assistant&#8221; roles.</p></li><li><p><strong>Big-tech encroachment</strong>: if companies like <strong>OpenAI</strong>, <strong>Google DeepMind</strong>, <strong>Anthropic</strong>, <strong>xAI</strong>, <strong>Alibaba/Qwen</strong> bundle strong mathematics capabilities into their mainstream offerings, standalone startups must differentiate sharply.</p></li><li><p><strong>Open-source erosion</strong>: <strong>DeepSeek-Math-V2</strong>, <strong>Qwen-2.5-Math</strong>, <strong>NuminaMath</strong>, <strong> </strong>show how open source can cap proprietary pricing power.</p></li><li><p><strong>Compute constraints and funding cycles</strong>: limited access to GPUs, shifting macro, or valuation resets could stress capital-intensive players like Harmonic, Axiom Math, Logical Intelligence, DeepSeek, Math Inc, Symbolica AI.</p></li></ul><h3>6.3 Scenario sketch 24-month horizon</h3><ul><li><p><strong>Bull case</strong>:</p><ul><li><p>Systems like Aristotle, DeepSeek-Math-V2, Axiom&#8217;s models, Aleph/Noa, Gauss solve at least one high-profile new result or prevent a major real-world failure (e.g. crypto hack, aerospace bug).</p></li><li><p>Harmonic passes $20&#8211;30M ARR. Axiom Math and Logical Intelligence reach unicorn valuations. DeepSeek-Math-V2, Qwen-2.5-Math, NuminaMath, Gauss/strongpnt become standard infra for research workflows.</p></li></ul></li><li><p><strong>Base case</strong>:</p><ul><li><p>Benchmarks continue to improve (100% on IMO, stronger Putnam performance), tools embed in niche workflows (crypto audits, select aerospace projects, math research labs).</p></li><li><p>Harmonic, Axiom Math, Logical Intelligence, Math Inc, Symbolica AI each have a handful of pilots; revenues are in low-single-digit millions, valuations grow modestly.</p></li></ul></li><li><p><strong>Bear case</strong>:</p><ul><li><p>Progress plateaus at &#8220;Olympiad-level only&#8221;. Integration friction plus big-tech bundling compresses room for standalone AI mathematicians.</p></li><li><p>One or more startups pivot, merge, or exit cheaply. Open-source models like DeepSeek-Math-V2, Qwen-2.5-Math, NuminaMath, and agent stacks like Gauss dominate the practical usage while formal verification remains niche.</p></li></ul></li></ul><h3>7. Takeaways</h3><ul><li><p>The field has clearly crossed a <strong>feasibility threshold</strong>: Frontier AI labs are shipping AI models that are already competitive with IMO gold medallists and Putnam stars.</p></li><li><p>The <strong>open-source reasoning wave</strong> (DeepSeek-R1, DeepSeek-Math-V2, Qwen-2.5-Math, QwQ-32B, NuminaMath, Llama 3.1, strongpnt) ensures this capability won&#8217;t be limited to closed labs.</p></li><li><p><strong>Formal verification stacks</strong> (Harmonic, Axiom Math, Logical Intelligence, AlphaProof, Math Inc) are the main bet on <em>trustworthy</em> AI. Zero hallucinations via Lean/Coq/Isabelle/HOL Light proofs.</p></li><li><p>The <strong>next 24 months</strong> are about converting these technical wins into <strong>repeatable workflows and revenue</strong>, especially in software verification, crypto, safety-critical systems, and advanced research.</p></li></ul><p>For an investor or builder, this is a classic <strong>high-risk, high-optionality</strong> subsector: expensive, technically gnarly, but with real shot at becoming the reasoning layer beneath serious AI systems.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #6: AGENT RUNTIME]]></title><description><![CDATA[Companies that build environments that let AI agents actually do work]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-6-agent-runtime</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-6-agent-runtime</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Mon, 10 Nov 2025 21:42:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0cHd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. What this space is and why it suddenly matters</h2><p>Agent Runtime is an environment that let AI agents actually do work. It&#8217;s the control room that plans steps, calls the right tools, remembers context, and keeps logs so humans can see what happened. Agent Sandbox is a related concept that refers to a safe box the agent acts inside to run code, browse, or touch APIs without breaking things. Put together, they&#8217;re the missing layer between raw models and real enterprise workflows.</p><p>Three forces make this takeoff feel real:</p><ul><li><p><strong>Models leveled up.</strong> The newest LLMs plan multi-step tasks, call tools, and follow structured instructions.</p></li><li><p><strong>Enterprises want cognitive automation.</strong> After a decade of RPA and scripts, companies now want automation that can read, reason, and decide.</p></li><li><p><strong>Enablers arrived.</strong> Secure micro-VMs / containers, long-context memory, and open protocols (Anthropic&#8217;s MCP, Google&#8217;s Agent-to-Agent/A2A) give teams a safer and more interoperable way to wire agents to data and systems.</p></li></ul><p>Think of this as the shift from &#8220;power tools&#8221; (classic SaaS) to &#8220;coworkers&#8221; (agents) that execute tasks end-to-end with guardrails. That&#8217;s why the sector&#8217;s drawing capital and attention: it upgrades software from &#8220;assist&#8221; to &#8220;act&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0cHd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0cHd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!0cHd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!0cHd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!0cHd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0cHd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2727576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/178540308?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0cHd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!0cHd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!0cHd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!0cHd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f4fc3fa-d6e8-42e7-9898-5485980110fc_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. Market trajectory</h2><p>The agents market is growing from roughly <strong>$5B in 2024</strong> to about <strong>$47B by 2030</strong> if current forecasts hold. Funding has kept pace: about <strong>$8B+</strong> went into agent startups in the year through late-2024, and <strong>seed funding</strong> alone in 1H-2025 was on the order of <strong>$700M</strong>. Analysts expect <strong>a third of enterprise software</strong> to include agentic capabilities by 2028 (up from almost zero in 2024).</p><p>What that means in practical terms:</p><ul><li><p>This isn&#8217;t a &#8220;single killer app&#8221;. It&#8217;s a <strong>horizontal capability</strong> (like cloud or mobile) that seeps into IT, ops, support, finance, and engineering.</p></li><li><p>Adoption is <strong>gradual and risk-managed</strong>. Most teams start with human-in-the-loop, then graduate to autonomy for bounded tasks once accuracy and auditability are proven.</p></li><li><p>It&#8217;s not winner-take-all yet. Standards like <strong>MCP</strong> and <strong>A2A</strong> reduce lock-in and keep the door open for neutral platforms and open-source tools, not just cloud megasuites.</p></li></ul><h2>3. The product stack: six bricks you actually need</h2><p>When you peel back the marketing, mature agent platforms all converge on the same 6 components:</p><ol><li><p><strong>Secure execution</strong><br>Agents need isolated places to run code, browse, and call services. That usually means Linux containers or micro-VMs with strict network and filesystem policies. Startups to know:</p><ol><li><p><strong>E2B</strong>: open-source, Firecracker-style micro-VM isolation; fast cold-starts.</p></li><li><p><strong>Novita AI (Agent Sandbox)</strong>: per-second billed serverless workers for bursty agent compute.</p></li><li><p><strong>Browserbase</strong>: managed, clean browsers for reliable web automation.</p></li></ol></li><li><p><strong>Orchestration (the agent loop)</strong><br>Plan &#8594; act (use a tool) &#8594; observe &#8594; re-plan. You need a consistent way to define this loop, branch on errors, and compose sub-agents. <strong>LangChain / LangGraph</strong>, <strong>CrewAI</strong>, <strong>CUA (Computer-Use Agent)</strong>, <strong>Dust</strong> are common choices depending on how code-centric or visual you want to be.</p></li><li><p><strong>Connectors and permissions</strong><br>Agents get work done by touching APIs, SaaS apps, and internal services with clear scopes and approval rules. <strong>Composio</strong> has become the &#8220;agent connector fabric&#8221; many teams reach for. Cloud platforms ship their own registries too.</p></li></ol><ol start="4"><li><p><strong>Memory and state</strong><br>Short-term scratchpads and long-term project memory, usually backed by vector DBs or filesystems. Plus auto-summaries so context doesn&#8217;t blow up costs.</p></li><li><p><strong>Observability, evaluation, and guardrails</strong><br>You need step-level traces, cost and latency metrics, red-team tests, and &#8220;circuit breakers&#8221;. <strong>AgentOps</strong> (production runs, replay, costs), <strong>Langfuse</strong> (open-source traces/evals) are becoming table-stakes.</p></li></ol><ol start="6"><li><p><strong>Human interface</strong><br>Most business agents surface in chat UIs, IT portals, or IDEs. Good products make it easy to toggle autonomy, insert approvals, and explain what just happened.</p></li></ol><h2>4. Who&#8217;s competing and how to think about them</h2><p><strong>Cloud incumbents</strong> are shipping full stacks:</p><ul><li><p><strong>OpenAI/Microsoft</strong> (AgentKit + Copilots) lean into deep model integration and a vast distribution surface.</p></li><li><p><strong>AWS</strong> (Bedrock AgentCore) emphasizes isolation, identity, observability, and marketplace distribution.</p></li><li><p><strong>Google</strong> pushes open <strong>A2A</strong> to make multi-vendor agent workflows normal inside Workspace and GCP.</p></li><li><p><strong>Anthropic</strong> focuses on model safety and <strong>MCP</strong> so tools and models interoperate cleanly.</p></li></ul><p><strong>Independent startups</strong> fill critical gaps and keep the space dynamic:</p><ul><li><p><strong>Secure execution:</strong> E2B, Novita AI (Agent Sandbox), Browserbase</p></li><li><p><strong>Agent OS / orchestration:</strong> LangChain/LangGraph, CrewAI, CUA, Fixie, Dust</p></li><li><p><strong>Connectors:</strong> Composio</p></li><li><p><strong>Observability/evals:</strong> AgentOps, Langfuse</p></li><li><p><strong>Dev-env as runtime:</strong> Daytona lets agents spin up real developer workspaces with full toolchains.</p></li><li><p><strong>Marketplaces and hubs:</strong> Gumloop and MuleRun explore app store for agents.</p></li><li><p><strong>Capability showcases:</strong> Prime Intellect helped popularize computer-use agents that click and type like a human.</p></li></ul><p>Expect consolidation: some of these become features inside cloud platforms. Others win as <strong>neutral layers</strong> precisely because big customers want multi-model, multi-cloud flexibility.</p><h2>5. What buyers actually use this for</h2><p><strong>Developers and startups</strong> use sandboxes/runtimes to ship agentic apps faster. Prototyping with OSS, then hardening with better isolation, connectors, and monitoring.</p><p><strong>Large enterprises</strong> pick a few high-ROI use cases and expand from there. Typical first wins:</p><ul><li><p><strong>IT automation:</strong> ordering equipment, provisioning access, resetting accounts, closing tickets.</p></li><li><p><strong>Customer support:</strong> reading tickets, checking entitlements, proposing actions, and (once trusted) executing refunds or returns.</p></li><li><p><strong>Operations and finance:</strong> reconciling invoices, chasing documents, scheduling freight.</p></li><li><p><strong>Engineering productivity:</strong> write &#8594; run &#8594; test &#8594; fix loops inside an isolated code sandbox (pair this with <strong>Daytona</strong> or <strong>E2B</strong>).</p></li></ul><p>Adoption pattern is consistent: start with <strong>copilot</strong> (human approves), track success and cost, then move select workflows to <strong>autopilot</strong> with timeouts and escalation rules. The runtime matters because it encodes that discipline, not just &#8220;let the LLM run&#8221;.</p><h2>6. The economic logic: why this can be cheaper (and when it isn&#8217;t)</h2><p>A single agent task usually triggers many model calls plus tool invocations. Early Auto-GPT experiments were expensive and brittle. Three things flipped that story:</p><ul><li><p><strong>Smarter planning and caching</strong> cut token waste.</p></li><li><p><strong>Isolated code execution</strong> moves heavy mathematics or parsing to cheap CPU time instead of expensive tokens.</p></li><li><p><strong>Model mix-and-match</strong> runs 3.5-class models for easy steps and saves 4/5-class models for hard ones.</p></li></ul><p>When you price it the way buyers do, the question is: <strong>cost per completed task</strong> vs a human baseline. If an agent can process a support email for $0.10&#8211;$0.30 all-in where a human minute costs a few dollars, the cost model works immediately. </p><p>Where it <strong>doesn&#8217;t</strong> work yet: ambiguous tasks with high back-and-forth, long tool chains, or high error penalties. That&#8217;s why most teams still insert approvals, limits, and budgets. This is as much an economic guardrail as a safety one.</p><p>The trend line is favorable: better models, cheaper inference, and tighter runtimes steadily push <strong>cost-per-task down</strong> and <strong>success rates up</strong>. That&#8217;s the flywheel to watch.</p><h2>7. Impact on the broader infra startup landscape</h2><p>Short answer: this wave will touch <strong>most</strong> of infra. Over the next 24 months, expect <strong>60&#8211;70% of infra startups</strong> to be directly or indirectly affected. Either as beneficiaries, suppliers, or competitors. Here&#8217;s how it maps:</p><ul><li><p><strong>Direct beneficiaries (20&#8211;25%)</strong><br>Startups whose core product <em>is</em> agent runtime capability: secure sandboxes (E2B, Novita), orchestration (LangChain, CrewAI, CUA, Dust), observability/evals (AgentOps, Langfuse), connectors (Composio), and marketplaces (Gumloop, MuleRun). Their traction rises with each successful enterprise deployment.</p></li><li><p><strong>Adjacent pull-through (20&#8211;25%)</strong><br><strong>Data infra</strong> (vector DBs, feature stores), <strong>identity and policy</strong> (fine-grained scopes for agents), <strong>secrets/key management</strong>, <strong>audit logging</strong>, and <strong>cost monitors</strong>. Agents create persistent demand for <strong>retrieval</strong>, <strong>permissioning</strong>, and <strong>explainability</strong>. Great for neutral infra vendors. If you&#8217;re building vector search, lineage, or IAM, agents are a net tailwind.</p></li><li><p><strong>Devtool and platform reshaping (15&#8211;20%)</strong><br>Dev environments and CI/CD adapt so agents can participate as &#8220;non-human contributors&#8221;. <strong>Daytona</strong> is a clear bridge. Agents spin up real workspaces with compilers, DBs, and test harnesses. Expect git hosts, test frameworks, and build systems to expose <strong>agent-friendly</strong> APIs and policies. Winners will make &#8220;agent + human&#8221; pair programming and reviews safe and auditable.</p></li><li><p><strong>Integration/iPaaS and RPA convergence (10&#8211;15%)</strong><br>Workflows move from rigid scripts to agent-driven flows. RPA and iPaaS vendors will add LLM brains. New neutral runtimes will nibble at classic automation budgets. If you&#8217;re building modern integration layers, aligning with MCP/A2A and shipping strong observability can put you on the right side of this shift.</p></li><li><p><strong>Compute and GPU infra (5&#8211;10%)</strong><br>Agent adoption raises <strong>steady inference workloads</strong> and <strong>bursty sandbox compute</strong>. That benefits GPU scheduling, serverless containers, model gateways, and <strong>browser automation at scale</strong> (hello Browserbase). Efficiency startups (quantization, caching, routing) also see a lift.</p></li><li><p><strong>Potentially crowded or pressured (10&#8211;15%)</strong><br>Products that are &#8220;just an LLM wrapper&#8221; around a single workflow will feel pressure as <strong>AgentKit/AgentCore</strong> and marketplaces ship that workflow as a prefab. The defense is depth: data access, accuracy guarantees, distribution, or owning a compliance-sensitive niche.</p></li></ul><p><strong>Correlation and dependencies.</strong><br>Think of a dependency chain: <strong>models &#8594; runtimes &#8594; connectors &#8594; policy/identity &#8594; observability &#8594; data</strong>. Improvements at any layer (cheaper inference, better planning, richer connectors) ripple to the others. Infra startups that &#8220;lock&#8221; into one model vendor will carry <strong>vendor risk</strong>. Those that speak <strong>MCP/A2A</strong> and multiple models reduce it. Conversely, security incidents or prompt-injection failures at the app layer will generate demand for <strong>policy, isolation, and monitoring</strong> deeper in the stack. Another pull-through for infra.</p><h2>8. Key risks and the practical mitigations that matter</h2><ul><li><p><strong>Reliability and safety.</strong> Agents still make bad calls. Mature teams use retrieval grounding, step limits, timeouts, and human approvals on high-impact actions. Observability and evals move from &#8220;nice-to-have&#8221; to mandatory.</p></li><li><p><strong>Security and data privacy.</strong> Agents handle credentials and sensitive data. Sandboxes must strictly confine code and network. IAM scopes, secrets rotation, tamper-proof audit logs, and signed tool calls should be part of the design, not a later add-on.</p></li><li><p><strong>Prompt injection and supply-chain risk.</strong> Agents read untrusted content and may be tricked. Defensive patterns (content sanitization, tool call whitelists, trusted data paths) and &#8220;kill-switch&#8221; policies reduce blast radius.</p></li><li><p><strong>Regulation and governance.</strong> Expect requests for audit trails, decision explanations, and model/agent change control. Vendors with strong <strong>explainability and logging</strong> will win security and compliance reviews.</p></li><li><p><strong>Cloud squeeze.</strong> Big providers will absorb generic runtime features. Neutral players must compete on openness (multi-model/multi-cloud), UX, cost, or depth in a vertical. Aligning with <strong>standards</strong> and meeting enterprises in their VPCs are proven ways to keep a seat at the table.</p></li><li><p><strong>Unit economics drift.</strong> A long, meandering agent can burn tokens and money. Teams that enforce budgets, cache aggressively, route models by difficulty, and offload compute to sandboxes will keep cost-per-task in the green.</p></li></ul><h2>9. What to watch next </h2><p><strong>Capability jumps.</strong> If the next model wave materially improves tool-use and long-horizon planning, watch success rates rise and human approvals shrink. That opens more workflows to autonomy.</p><p><strong>Reference deployments.</strong> One marquee case study in banking, logistics, or healthcare (measured in millions saved or hours cut) will unlock follow-on budgets elsewhere. </p><p><strong>Standard adoption.</strong> Broad support for <strong>MCP</strong> and <strong>A2A</strong> would normalize multi-vendor agent meshes inside large companies. That&#8217;s a tailwind for neutral infra (connectors, policy, observability) and a constraint on lock-in strategies.</p><p><strong>Cost curves.</strong> Cheaper inference and faster cold-starts lower the &#8220;minimum viable agent&#8221;. Keep an eye on platform announcements about long-running sessions, serverless micro-VMs, and per-second billing. These directly change which tasks pencil out.</p><p><strong>Distribution channels.</strong> Agent marketplaces (e.g. Gumloop, MuleRun) and cloud app stores will matter more as companies move beyond pilots. Templated agents with real connectors and auditable logs will travel fastest through those channels.</p><p><strong>Consolidation.</strong> Expect acqui-hires and product fold-ins. If you&#8217;re building infra, assume your best exit path might be a cloud or enterprise platform that wants your isolation, connectors, or observability baked in.</p><h2>10. Investment stance and practical takeaways</h2><ul><li><p><strong>It&#8217;s an infra story as much as a model story.</strong> Sandboxes, runtime control planes, connectors, identity, and observability will decide whether agents stay demos or become dependable &#8220;digital workers&#8221;. That creates room for <strong>neutral infra winners</strong>, not just model vendors.</p></li><li><p><strong>Barbell strategy.</strong> One bet aligned with a major platform (for distribution and trust) and one bet that&#8217;s <strong>open, multi-model, and multi-cloud</strong> captures both worlds. In parallel, there will be <strong>category enablers</strong>: isolation (E2B, Novita), connectors (Composio), eval/ops (AgentOps, Langfuse), dev-env runtimes (Daytona).</p></li><li><p><strong>Bias to measurable workflows.</strong> IT ops, support ops, finance back-office, and code-adjacent tasks produce clean before/after metrics (success rate, handle time, cost-per-task). Those are the proving grounds that compound into wider adoption.</p></li><li><p><strong>Design for approvals, not just autonomy.</strong> The businesses that grow fastest will support a spectrum&#8212;suggest &#8594; approve &#8594; auto-execute&#8212;with rock-solid audit trails and budget controls.</p></li><li><p><strong>Plan for standards.</strong> Treat <strong>MCP/A2A</strong> as inevitabilities and build in that direction. You&#8217;ll be easier to buy and harder to rip out.</p></li></ul><h2>11. Bottom line for infra founders and investors</h2><p>Agent sandboxes and runtimes are graduating from experiments to infrastructure. The core idea of software that can read, decide, and act with constraints is now implementable with acceptable risk in many day-to-day workflows. The stack is clarifying, the standards are emerging, and the economics are trending in the right direction.</p><p>The effect on the broader infra universe will be <strong>wide</strong>. Roughly <strong>two-thirds</strong> of infra startups will feel it. Some directly as agent-native platforms, some as upstream suppliers (data, identity, observability), and some via pressure as clouds bundle the basics. The safest places to build and back are the <strong>boring necessities</strong> of a production agent world: isolation that never breaks, connectors that always work, policies that auditors love, and telemetry that catches issues before the CFO does.</p><p>The next 24 months are a prove-out. Watch the success rates, costs per task, and the first wave of big reference customers. If those turn the corner, this sector looks less like a trend and more like a new layer of enterprise software. Quiet, reliable, and everywhere.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #5: SEARCH API PRODUCTS]]></title><description><![CDATA[Companies that build and sell search API products to developers]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-5-search-api-products</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-5-search-api-products</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Fri, 10 Oct 2025 15:33:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2STL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Snapshot</h2><p>The core bet is that developers will increasingly <strong>buy</strong> real-time web search as a managed API instead of <strong>building</strong> it. Why? Because modern apps and AI agents need fresh, machine-readable information and citations on demand. </p><p>Prices, performance, and legal access to content are shifting quickly. A handful of independent search indexes (Brave, You.com) and SERP/API specialists (SerpApi, DataForSEO, Serper.dev) are emerging as the infrastructure layer that feeds LLMs, agents, and enterprise apps with web context. </p><p>There&#8217;s another layer in between. Exa is effectively the developer-first infrastructure layer that sits between the independent-index crowd (Brave, You.com) and the SERP scrapers (SerpApi, DataForSEO). It builds its own continuously refreshed web index (not just scraping Google or Bing). </p><p>Microsoft&#8217;s price hikes on the official Bing Web Search API in 2023 (e.g. S1 tier from $7 to <strong>$25 per 1,000 queries</strong> starting May 2023) drove many builders to look for alternatives. And Google still limits its own JSON API to <strong>$5 per 1,000</strong> queries with constrained use, creating an opening for developer-first vendors.</p><p><strong>Near-term catalysts include</strong>: (a) independent indexes scaling distribution via cloud marketplaces (e.g. Brave Search API on <strong>AWS Marketplace</strong>) and launching AI-grounding features (b) OpenAI&#8217;s <strong>SearchGPT</strong> prototype validating developers&#8217; demand for search-plus-answers (c) high-profile AI answer engines (Perplexity, You.com) opening or expanding APIs, pushing volume into search infra instead of consumer portals. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2STL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2STL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2STL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2STL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2STL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2STL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2511969,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/175772632?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2STL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2STL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2STL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2STL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7274b07-ea45-469b-bcf0-0dd77fba149e_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The biggest risks are</strong>: (a) content access and litigation (publishers vs. AI search providers), which may raise COGS and restrict data (b) platform dependence (Bing or browser defaults) (c) consolidation if hyperscalers bundle &#8220;good-enough&#8221; search into agent platforms. These are active issues today (e.g. lawsuits and cease-and-desists targeting Perplexity, publisher revenue-share programs emerging in response).</p><p>This is an investable infra subsector with asymmetric upside over the next 24 months, especially in <strong>independent index APIs</strong> and <strong>legal-first SERP APIs</strong> with enterprise posture. The winners will pair developer ergonomics (clean JSON, fast SLAs), distribution (marketplaces, model/tooling integrations), and credible content access (publisher deals, compliance).</p><h2>2. Thesis framing: what must be true</h2><p><strong>Investment question in one line:</strong> <em>Can independent search APIs become the default way developers and AI agents ground responses in fresh, verifiable web data (at attractive unit economics) before big platforms make the category a bundled feature?</em></p><p><strong>Thesis pillars (what must be true):</strong></p><ol><li><p><strong>Real-time grounding becomes mandatory</strong> for LLMs and agents. OpenAI&#8217;s <strong>SearchGPT</strong> signals that &#8220;search + answers + sources&#8221; is moving into core AI UX. Third-party APIs that are fast, citable, and cheap will see rising demand from AI builders. </p></li><li><p><strong>Independent indexes achieve escape velocity.</strong> Brave&#8217;s index (&gt;<strong>30B</strong> pages, 100M+ daily updates) and growing distribution (AWS Marketplace, AI-grounding features) show a credible non-Google/Bing path for developer-grade web data. Exa is growing too.</p></li><li><p><strong>Economics/choice favor specialists.</strong> Microsoft&#8217;s 3-10x Bing API price hikes (now <strong>$25 per 1,000</strong>) and Google&#8217;s limited, capped JSON API (still <strong>$5 per 1,000</strong>, up to 10k/day) push developers to alternative providers with predictable pricing and richer outputs.</p></li><li><p><strong>Legal access matures.</strong> Publishers and search APIs converge on revenue-share and licensing. Perplexity&#8217;s <strong>$42.5M</strong> publisher pool is an early sign of viable content economics for AI search.</p></li></ol><p><strong>Disconfirming evidence to track:</strong> If platform leaders (OpenAI, Google, Microsoft) give away a full-featured, low-cost search API or effectively embed it into model runtimes, specialist API demand could compress. Also if publisher litigation materially walls off content without workable licenses, data access costs could swamp API margins.</p><h2>3. Market structure, size, and geography</h2><p><strong>Structure.</strong> Three layers matter to developers:</p><ul><li><p><strong>Index owners</strong> with developer APIs: Microsoft (Bing), Brave, You.com (increasingly an AI answer engine with enterprise tilt), plus regional engines (Baidu in China). Google&#8217;s Programmable Search remains limited and quota-capped. </p></li><li><p><strong>SERP/API specialists</strong> that fetch/parse results from many engines and verticals, exposing a clean JSON schema and compliance posture (e.g. <strong>SerpApi</strong> with legal shield, lower-cost peers like Serper.dev, DataForSEO). <strong>Exa API</strong> exposes structured, relevance-scored results that are tuned for AI agents, RAG systems, and retrieval pipelines. More semantic and programmatic than a traditional search API. Exa also markets itself as &#8220;search infrastructure for AI&#8221;, letting devs query the web in real time with filters for freshness, domain, and semantic similarity.</p></li><li><p><strong>Enterprise/site search</strong> APIs (Algolia, Elastic, Amazon Kendra) that index a customer&#8217;s own content and power product or knowledge-base search. Adjacent but often bought by the same teams and now blending with web grounding. Algolia alone powers <strong>1.5T+ queries/year</strong> across <strong>10k+</strong> customers. </p></li></ul><p><strong>Size and trajectory.</strong> There is no canonical &#8220;Search-API TAM&#8221;, but demand proxies are strong. Brave reports <strong>&gt;1.5B</strong> searches/month in recent updates. Perplexity&#8217;s MAU has reached <strong>~22M</strong> and processed <strong>~780M</strong> queries in <strong>May 2025</strong>. Algolia is already at trillion-scale enterprise queries. Each datapoint indicates rising programmatic search volume and shifting developer spend from DIY crawl/scrape stacks to APIs.</p><p><strong>Penetration and runway.</strong> Google still commands <strong>~90%</strong> of global search, but dipped <strong>below 90%</strong> in late <strong>2024</strong> and has hovered in the high-80s in <strong>2025</strong>. A small crack that corresponds with AI-native search usage and Bing&#8217;s modest desktop gains. For developers, the takeaway is not consumer share per se but <strong>willingness to try non-Google data sources</strong> when APIs are reliable and priced fairly. </p><p><strong>Geography.</strong> In <strong>China</strong>, <strong>Baidu</strong> leads with <strong>~56&#8211;60%</strong> share across platforms, with Bing surprisingly strong on desktop. Google is negligible. Practically, China-focused devs rely on <strong>Baidu</strong> (and 360/Haosou) data and localized APIs, while Western API startups rarely operate behind the Great Firewall. For global products that serve China, <strong>provider mix</strong> (Baidu + Bing/Brave) and compliance become material. </p><h2>4. Customers, jobs to be done, and switching costs</h2><p><strong>Who buys and why.</strong> Three clusters:</p><ol><li><p><strong>AI/agent builders</strong> who need <strong>live facts + citations</strong>. Instead of running their own crawler, they call search APIs within tool-use chains to ground model answers (news, pricing, docs). OpenAI validating &#8220;search-inside-chat&#8221; accelerates this pattern across the stack. Growing Exa usage is another datapoint.</p></li><li><p><strong>Product teams</strong> at e-commerce, SaaS, and content apps who need <strong>fast, typo-tolerant, tuned search</strong> for their own catalogs and docs (Algolia et al.). At scale, better search converts directly to revenue and support deflection. </p></li><li><p><strong>SEO/data analytics</strong> and <strong>research ops</strong> teams who need reliable, structured SERPs at volume (rank tracking, market analysis, due diligence). SerpApi&#8217;s customer mix is now <strong>~40% AI, ~40% SEO, ~20% other</strong>, highlighting the shift from pure SEO into AI infra. </p></li></ol><p><strong>Mission-criticality.</strong> If the search step fails, agent answers degrade or hallucinate. If site search fails, revenue drops. That creates a budget line for <strong>SLA-backed</strong> APIs and motivates redundancy (e.g. Brave primary, Bing or SERP API as fallback). The Bing price shock in <strong>2023</strong> nudged teams to multi-source or switch, a real-world proof of this redundancy mindset. </p><p><strong>Switching costs.</strong> Swapping an endpoint is easy. Replicating <strong>quality tuning, synonyms, ranking rules</strong>, or <strong>JSON schemas embedded in pipelines</strong> is not. Enterprise search configs (Algolia) and AI toolchains (prompt+parser contracts) generate meaningful friction. Legal/compliance features (e.g. SerpApi&#8217;s legal shield) further raise switching costs in regulated environments.</p><h2>5. Product and roadmap signals</h2><p><strong>Core modules developers expect:</strong></p><ul><li><p><strong>Query endpoints</strong> that return structured results (JSON) for web/news/images/local, with <strong>location &amp; language</strong> controls, snippet payloads, and schema-enriched data.</p></li><li><p><strong>Latency and uptime</strong> SLAs and &#8220;speed tiers&#8221; for interactive UX and agent loops.</p></li><li><p><strong>Compliance and indemnity</strong> (publisher respect, legal shield, SOC 2).</p></li><li><p><strong>AI grounding features</strong> (citations, multi-snippet context, MCP/tool adapters), and <strong>integrations</strong> (LangChain, cloud marketplaces).</p></li></ul><p><strong>Independent index momentum.</strong> Brave exposes a web index of <strong>30B+ pages</strong>, claims <strong>100M+ daily</strong> updates, and recently shipped <strong>AI Grounding</strong> to anchor LLM outputs in verifiable sources. This positions the API as a turnkey &#8220;search-to-source&#8221; layer for agents. Availability on <strong>AWS Marketplace</strong> shortens procurement and signals enterprise focus. Exa has their own index as well.</p><p><strong>Answer-engine APIs.</strong> Perplexity and You.com aim to <strong>synthesize answers</strong> with sources. <strong>Exa</strong> aims to make web search directly machine-consumable for LLMs. Their consumer metrics (Perplexity&#8217;s MAU/queries) indicate product-market fit. The open question is <strong>exposing that capability as a developer API</strong> at sustainable margins. The legal/publisher front is moving. Perplexity is pairing growth with a <strong>$42.5M</strong> publisher pool to defuse access risk. </p><p><strong>SERP/API specialists.</strong> SerpApi abstracts Google/Bing/vertical SERPs into consistent JSON and offers enterprise-friendly pricing at high volume (<strong>$2.75 per 1k</strong> reserved searches) plus legal safeguards. This is useful when you need Google-quality outputs with engineering and legal friction removed. </p><p><strong>Enterprise/site search keeps evolving.</strong> Algolia blends keyword + vector (&#8220;neural&#8221;) approaches and remains the easiest &#8220;drop-in&#8221; for app/internal search at massive scale (1.5T+ queries), making it a common complement to web grounding: <strong>your data</strong> via Algolia + <strong>the open web</strong> via a search API.</p><h2>6. Competitive dynamics and pricing</h2><p><strong>Platform APIs vs. independents.</strong></p><ul><li><p><strong>Microsoft Bing:</strong> Official, compliant, but expensive post-2023 (e.g. S1 web search <strong>$25/1k</strong>). Good reliability. Quality lags Google in some niches. </p></li><li><p><strong>Google Programmable Search:</strong> Cheap (<strong>$5/1k</strong>) and reliable for custom/site collections, but not a full web API and capped at 10k/day. Many teams therefore layer <strong>SERP APIs</strong> or <strong>independent indexes</strong> to get web-wide coverage. </p></li><li><p><strong>Exa/Brave/You.com:</strong> Independence is the differentiator (no dependency on Big Tech indices), plus developer-ready features (index transparency, grounding). Brave&#8217;s marketplace and AI-grounding moves specifically target agent stacks. </p></li><li><p><strong>SERP APIs:</strong> SerpApi (premium, legal shield), Serper.dev/DataForSEO (aggressive price points). This tier competes on breadth of engines, JSON quality, anti-bot resilience, and price. </p></li></ul><p><strong>AI search as an encroaching competitor.</strong> OpenAI&#8217;s <strong>SearchGPT</strong> is a strategic signal: if the experience ships as a <strong>developer API</strong> or becomes bundled into model runtimes, it could absorb demand. For now it is limited, but investors should assume <strong>bundling risk</strong> in the next 24 months. </p><p><strong>Consumer share vs. developer demand.</strong> Google still holds <strong>~89&#8211;90%</strong> of global search. Bing ~<strong>4%</strong>. Yandex, Yahoo, DDG trail. The gap doesn&#8217;t prevent developer migration if <strong>pricing, procurement, or legal</strong> are better elsewhere. The 2024&#8211;2025 dip below 90% is symbolically important: teams are now comfortable experimenting with non-Google sources. </p><h2>7. Go-to-market, adoption, and metrics to watch</h2><p><strong>PLG with enterprise overlays.</strong> Search APIs skew <strong>self-serve</strong>: devs test free tiers, wire in JSON, and grow usage. Enterprise deals add SLAs, DPAs, and volume commits. Distribution is improved by <strong>cloud marketplaces</strong> (easier procurement; draw-down on committed cloud spend) and <strong>framework integrations</strong> (LangChain/tools). Brave&#8217;s AWS listing is a concrete example of marketplace-led enterprise GTM. </p><p><strong>Adoption proxies.</strong></p><ul><li><p><strong>Exa</strong>: Still young but growing rapidly. Thousands of devs are using it. Recently raised $85M Series B led by Benchmark.</p></li><li><p><strong>Perplexity:</strong> ~<strong>22M MAU</strong>, <strong>~120M</strong> monthly visits (as of July 2025), <strong>~780M</strong> queries in May 2025. All indicate rising appetite for AI-answer search that could translate into API usage. </p></li><li><p><strong>Brave:</strong> claims <strong>&gt;1.5B</strong> searches/month recently and index scale/cadence (30B+ pages; 100M+ daily updates) consistent with commercial-grade coverage. </p></li><li><p><strong>Algolia:</strong> <strong>1.5T+</strong> queries/year across <strong>10k+</strong> customers remains the clearest signal that &#8220;search-as-an-API&#8221; is mainstream within product teams. </p></li><li><p><strong>SerpApi:</strong> enterprise pricing pages and research show scale economics (<strong>$2.75/1k</strong> overage), and customer mix <strong>40% AI</strong> underscores the category&#8217;s pivot from SEO to AI infra.</p></li></ul><p><strong>Reliability and compliance.</strong> Expect <strong>99.9%-style</strong> SLAs from serious vendors. Enterprise wins will hinge on <strong>SOC 2</strong>, data protection addenda, and <strong>publisher-aware crawling</strong>. Watch for visible status histories and <strong>legal shields</strong> or revenue-share programs. Both mitigate buyer risk and will become standard. </p><p><strong>Hiring and focus.</strong> Companies like <a href="https://parallel.ai/">Parallel</a> (founded by former Twitter CEO Parag Agrawal) emphasize agent-grade research APIs. Headcount remains lean and engineering-heavy. Public comms point to millions of &#8220;research tasks/day&#8221; and benchmark-first positioning, but the bigger signal is <strong>product velocity</strong> in agent tooling.</p><h2>8. Monetization and unit economics</h2><p><strong>Pricing models.</strong></p><ul><li><p><strong>Per-query</strong> (CPM-like) is standard for web search and SERP APIs: <strong>Bing</strong> (&#8776;<strong>$25/1k</strong> on popular tiers), <strong>Brave</strong> (public materials emphasize independence &amp; marketplace procurement, list prices vary by tier), <strong>SerpApi</strong> (enterprise reserved <strong>$2.75/1k</strong> and speed add-ons), <strong>Google Programmable Search</strong> (<strong>$5/1k</strong>, 10k/day).</p></li><li><p><strong>SaaS/usage</strong> for site/enterprise search (Algolia, Elastic) based on operations and records.</p></li><li><p><strong>Hybrid</strong> for answer engines (subs + ads + licensing/publisher share). Perplexity&#8217;s <strong>$42.5M</strong> publisher pool is an early, explicit content-cost line item meant to stabilize supply. </p></li></ul><p><strong>COGS and margins.</strong> Running a <strong>crawler + index</strong> has bandwidth/compute costs but can sustain software-like gross margins at scale. SERP APIs incur <strong>proxy/captcha</strong> costs but offset via engineering leverage and high utilization. AI answer engines face <strong>inference</strong> COGS until they lean on cheaper custom models. Hence Perplexity/You.com investments in their own models and summarization stacks. (Evidence: rapid model/version launches and product cadence across 2024&#8211;2025; vendors explicitly pitch &#8220;grounding&#8221; to reduce model-token burn). </p><p><strong>ARPU and expansion.</strong> Usage grows with app traffic and agent loops: as an e-commerce site, a support bot, or an agent platform scales, <strong>queries/customer</strong> scale too. That creates natural <strong>net-revenue expansion</strong> without more sales cycles. Enterprise contracts add overage revenue and encourage <strong>annual commits</strong> for lower unit rates (e.g. SerpApi&#8217;s reserved pricing). </p><p><strong>Seasonality.</strong> Consumer search APIs see event-driven spikes. Enterprise/site search peaks in retail Q4. But usage-based billing smooths revenue. Overages provide upside in peak months. Vendor comms on Brave Search Ads and query growth show seasonal surges. </p><h2>9. Moat, data advantage, and legal reality</h2><p><strong>Independent index &#8800; nice-to-have.</strong> Owning the index (Brave, You.com) is the defensibility wedge against platform policy changes and SERP scraping fragility. It also enables <strong>product differentiation</strong> like multi-snippet grounding, &#8220;goggles&#8221; (re-ranking), and fast freshness. For developers, this means fewer brittle dependencies and more consistent JSON across query types. </p><p><strong>Workflow lock-in.</strong> Embedded ranking rules, synonym maps, analytics, and pipelines (Algolia/Elastic) create real stickiness. On the web side, teams code to specific <strong>schemas</strong> and <strong>rate/latency</strong> expectations. Swapping vendors requires regression testing across critical UX. Legal coverage (SerpApi&#8217;s U.S. Legal Shield) and enterprise SLAs become part of the moat for high-risk users. </p><p><strong>Publisher alignment will define winners.</strong> Lawsuits and Cease-and-Desists against AI search providers (Dow Jones/News Corp., BBC, Britannica/Merriam-Webster) demonstrate that <strong>content access is not a free good</strong>. Startups that turn adversaries into suppliers via revenue-share or licenses will be able to scale volume without existential risk, even if near-term margins are thinner. </p><p><strong>Platform bundling risk.</strong> If OpenAI/Google ship low-cost, high-quality search endpoints <strong>inside</strong> the model runtime (or as a standard tool), third-party demand could compress. That said, developers value <strong>choice, cost control, transparency, and policy independence</strong>. All of which still argue for multi-sourcing web data (primary + fallback). </p><h2>10. What this means for infra startups </h2><p><strong>Who gets pulled in.</strong> Over the next 24 months, I expect <strong>~25&#8211;40%</strong> of infrastructure startups to be directly or indirectly affected by the rise of Search APIs. The exposure comes in three ways:</p><ol><li><p><strong>Agent and orchestration stacks</strong> (tool-use frameworks, evaluators, guardrails) will <strong>standardize on search tools</strong> for grounding. When SearchGPT-style UX becomes common, every agent platform needs a search provider and a policy for citations and often a <strong>backup</strong>. That&#8217;s a direct dependency. (Signal: OpenAI&#8217;s move with SearchGPT, Exa/Brave/others shipping MCP-style adapters.) </p></li><li><p><strong>Data infra and retrieval layers</strong> (vector DBs, RAG pipelines, ETL) will blend <strong>internal corpus</strong> with <strong>web augmentation</strong>. As teams move from static corpora to <strong>live</strong> answers with verifiable sources, they will route external results through their retrieval/ranking layer. Expect tighter <strong>connectors</strong> from Pinecone/Weaviate-like stacks into search APIs and more <strong>budget reallocation</strong> from &#8220;more tokens&#8221; to &#8220;better grounding&#8221;. </p></li><li><p><strong>Compliance, observability, and FinOps</strong> startups will see <strong>new budgets</strong> around content licensing, model+search cost controls, and provenance/attribution telemetry. If you must prove where an answer came from and pay the source, observability products and policy engines become critical.</p></li></ol><p><strong>Positive correlations.</strong></p><ul><li><p><strong>Inference cost declines</strong> strengthen search APIs because grounding becomes the obvious way to reduce hallucinations and <strong>trim token use</strong> (shorter prompts when you pass high-signal snippets). Brave&#8217;s &#8220;AI Grounding&#8221; is literally a productized version of this correlation. </p></li><li><p><strong>Marketplace distribution</strong> (AWS, Azure) lowers friction for enterprises to test and standardize on a search API. This historically accelerates infra adoption curves (database, logging, ML APIs). Brave&#8217;s AWS launch is a direct example. </p></li><li><p><strong>Publisher deals</strong> unlock premium sources (finance, health, news), which improves answer quality, driving <strong>higher conversion</strong> to paid tiers. Perplexity&#8217;s pool is the first at scale. Expect others to follow. </p></li></ul><p><strong>Risks and dependencies for infra startups.</strong></p><ul><li><p><strong>Legal and robots.txt compliance</strong>: startups embedding search must respect <strong>robots.txt</strong> and site policies, or risk collateral reputational/legal exposure if their provider is accused of scraping blocked sites. Recent BBC and News Corp actions show this is no longer theoretical. Vet your provider&#8217;s crawler compliance and indemnities. </p></li><li><p><strong>Provider concentration</strong>: relying on a single provider (e.g. just Bing) exposes you to <strong>pricing shocks</strong> (as in 2023) and availability changes. Multi-sourcing (Brave + SERP API + Bing/Google Programmable where allowed) adds resiliency. </p></li><li><p><strong>Geo constraints</strong>: if your users are in <strong>China</strong>, plan for <strong>Baidu/360</strong> integration and localized infrastructure. This may mean separate routing, filtering, and compliance processes from your global stack. </p></li></ul><p><strong>How much budget shifts here?</strong> For AI-agent startups, search can easily become <strong>10&#8211;30%</strong> of monthly variable COGS when agents do multi-hop research (because each answer can trigger tens of queries). For SaaS product teams, external web search spend is smaller. <strong>Internal search</strong> (Algolia/Elastic) remains the primary cost center, with <strong>web grounding</strong> added for specific features (e.g. a &#8220;Research&#8221; tab in a support bot).</p><p><strong>Who benefits in venture terms.</strong></p><ul><li><p><strong>Independent index APIs</strong> (Exa, Brave, You.com) with marketplace distribution, strong engineering cadence, and publisher alignment.</p></li><li><p><strong>Legal-first SERP APIs</strong> (SerpApi) where enterprises want Google-quality JSON without running a proxy farm or fighting captchas and where legal shield matters. </p></li><li><p><strong>Hybrid answer-engine APIs</strong> (Perplexity) if they can show <strong>measurable accuracy lift</strong> and <strong>lower blended COGS</strong> via licensing and in-house models, not just good UX.</p></li></ul><p><strong>Who could compress returns.</strong></p><ul><li><p><strong>OpenAI/Google bundling</strong>: if search becomes &#8220;free&#8221; inside a model runtime, specialists will compete on <strong>quality, compliance, and independence</strong> (e.g. sources that big models won&#8217;t touch without licenses). Developers still like choice. Being the <strong>fallback</strong> engine is a real, durable niche. </p></li></ul><h2>11. Competitive landscape: notable companies to watch</h2><p><strong>Exa (US) &#8212; independent index + agentic search + answer engine </strong></p><ul><li><p><strong>What&#8217;s special:</strong> Independent index, search features tailored for LLMs, and <strong>gaining rapid mindshare among devs.</strong></p></li></ul><p><strong>Brave (US) &#8212; independent index + AI grounding + AWS distribution.</strong></p><ul><li><p><strong>What&#8217;s special:</strong> Independent index (<strong>30B+</strong> pages, <strong>100M+</strong> daily updates), <strong>AI Grounding</strong> features tailored for LLMs, and <strong>AWS Marketplace</strong> listing. Signals enterprise intent and procurement ease. </p></li></ul><p><strong>SerpApi (US) &#8212; legal-aware SERP API at scale.</strong></p><ul><li><p><strong>What&#8217;s special:</strong> Wide engine coverage, enterprise <strong>legal shield</strong>, and reserve pricing down to <strong>$2.75/1k</strong> searches at scale; customer mix now <strong>~40% AI</strong>. Often the fastest path to Google-quality JSON for devs. </p></li></ul><p><strong>You.com (US) &#8212; AI research/answer engine with enterprise tilt.</strong></p><ul><li><p><strong>What&#8217;s special:</strong> Fresh <strong>$100M Series C</strong> at <strong>$1.5B</strong> valuation (Sep 2025), ongoing shift from &#8220;consumer search challenger&#8221; to <strong>AI research agent</strong> for regulated industries. Credible team pedigree (Richard Socher). </p></li></ul><p><strong>Perplexity (US) &#8212; answer engine with publisher economics.</strong></p><ul><li><p><strong>What&#8217;s special:</strong> High user/query growth (<strong>22M MAU</strong>, <strong>~780M</strong> queries in one month), bold GTM moves, and a <strong>$42.5M</strong> publisher pool amid lawsuits. The key watch-item is API exposure and sustainable COGS.</p></li></ul><p><strong>Parallel (US) &#8212; agent-grade deep research API (early).</strong></p><ul><li><p><strong>What&#8217;s special:</strong> Founder brand (Parag Agrawal), benchmark-driven positioning vs. browsing tools. Still early but tuned for agent workflows.</p></li></ul><p><strong>Baidu (China) &#8212; dominant local index.</strong></p><ul><li><p><strong>What&#8217;s special:</strong> Leads China search (<strong>~56&#8211;60%</strong> share). Essential for China-market apps. Developer-facing access exists within Baidu&#8217;s cloud/AI platforms. Global devs must consider geo separation and compliance. </p></li></ul><h2>12. Risks, catalysts, and what would change the call</h2><p><strong>Key risks in the next 24 months.</strong></p><ul><li><p><strong>Content and legal:</strong> Publisher suits (BBC, Dow Jones, Britannica/Merriam-Webster) escalate, forcing expensive licensing at scale or curtailing content coverage. Vendors without publisher strategy lose reliability, and their customers inherit risk. </p></li><li><p><strong>Platform moves:</strong> OpenAI or Google ships a cheap, first-party <strong>search tool</strong> in model runtimes, compressing third-party demand. Or browser defaults remain tightly controlled, limiting distribution for independent engines. </p></li><li><p><strong>Price shocks:</strong> Another <strong>Bing-style</strong> pricing change pushes up customer COGS, causing churn or multi-sourcing complexity. </p></li></ul><p><strong>Catalysts.</strong></p><ul><li><p><strong>Marketplace and cloud partnerships</strong> (AWS/Azure/Bedrock agent ecosystems) that pre-wire search tools for agents. Brave&#8217;s <strong>AWS</strong> launch is a template.</p></li><li><p><strong>Publisher alignment at scale</strong> (Perplexity-like funds replicated), reducing legal friction and unlocking premium data verticals.</p></li><li><p><strong>Visible accuracy/latency wins</strong> on benchmarked agent workloads, showing that independent indexes or SERP APIs deliver <strong>better answers per token</strong> than bundled tools. Brave&#8217;s AI-grounding launch is a signpost. </p></li></ul><p><strong>What would change the call.</strong></p><ul><li><p><strong>Bear case:</strong> If Search becomes &#8220;free&#8221; in LLMs and publishers successfully wall off valuable content without broad licensing, the independent API market could shrink to a niche.</p></li><li><p><strong>Bull case:</strong> If independent indexes become the <strong>de-facto</strong> grounding layer for agents and publisher economics settle, this sector compounds like payments or auth did a decade ago (quiet infra with huge downstream leverage).</p></li></ul><h3>Bottom line for investors</h3><ul><li><p><strong>Where to lean in:</strong> Independent index APIs with credible <strong>distribution</strong> (marketplaces, agent frameworks), strong <strong>developer experience</strong>, and visible <strong>publisher strategy</strong>. Legal-first SERP APIs that are the <strong>enterprise default</strong> for Google-quality JSON. Enterprise/site search where <strong>vector+keyword</strong> convergence is demonstrably improving outcomes. </p></li><li><p><strong>Portfolio construction:</strong> Expect <strong>multi-sourcing</strong> behavior. Your winners should play nicely as primary or fallback. Emphasize vendors with <strong>clear SLAs</strong>, <strong>observability hooks</strong>, and <strong>cost controls</strong> to survive price shocks and model bundling waves.</p></li><li><p><strong>Exposure for infra startups:</strong> Plan as if <strong>a 25-40%</strong> of infra startups will touch search APIs directly (agents, retrieval, developer tools) or indirectly (compliance, cost, analytics). Build connectors and procurement paths accordingly, and diligence provider legality as carefully as you diligence uptime.</p></li></ul><p>If these companies can convert developer trust + content access into durable distribution before platforms fully bundle search, the next two years favor specialists. If not, expect consolidation and a smaller, compliance-heavy niche. The signals above such as <strong>AWS listings, grounding features, publisher funds, MAU/queries growth, and pricing dispersion</strong> are the leading indicators to watch. </p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[RLenvironment.com - Tracking live signals from RL Repos]]></title><description><![CDATA[Built an agent + website to track 49 RL repos on Github and extract signals from them]]></description><link>https://www.infrastartups.com/p/rlenvironmentcom-tracking-live-signals</link><guid isPermaLink="false">https://www.infrastartups.com/p/rlenvironmentcom-tracking-live-signals</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Mon, 06 Oct 2025 20:54:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!P15x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I built an agent to analyze <strong>49 open source RL repos</strong>. And built <strong><a href="https://rlenvironment.com/">RLenvironment.com</a> </strong>to show the live signals. Check it out and let me know what you think. Here is the latest snapshot:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P15x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P15x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 424w, https://substackcdn.com/image/fetch/$s_!P15x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 848w, https://substackcdn.com/image/fetch/$s_!P15x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 1272w, https://substackcdn.com/image/fetch/$s_!P15x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P15x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png" width="1290" height="1174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1174,&quot;width&quot;:1290,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:200679,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/175468209?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P15x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 424w, https://substackcdn.com/image/fetch/$s_!P15x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 848w, https://substackcdn.com/image/fetch/$s_!P15x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 1272w, https://substackcdn.com/image/fetch/$s_!P15x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa85d8fa7-0599-4b4f-a61d-5db13dc1c214_1290x1174.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Portfolio-level signals</h2><ul><li><p><strong>Scale and freshness:</strong> 240,832 total stars across 49 repos. <strong>24 of them were active in the last 30 days</strong> (1392 commits), <strong>10 recent releases</strong> (&#8804; 30d). 2 repos archived.</p></li><li><p><strong>Power centers:</strong> Google/DeepMind, Farama, Nvidia Isaac, OpenDILab, Meta, HF anchor a big share of stars and recent commits.</p></li><li><p><strong>Where the work is:</strong> Mix is <strong>Library/Algos &gt; Environments/Simulators &gt; Platforms/Runtimes</strong>. Multi-agent + robotics + offline-RL are well represented (some people would debate that offline-RL is not real RL, but that&#8217;s for another post).</p></li></ul><h2>Leaderboard: Scale and mindshare</h2><p><strong>By stars (top 10):</strong> ray-project/ray, Unity ML-Agents, HF/trl, CARLA, Stable-Baselines3, OpenAI Spinning Up, Google Dopamine, DeepMind MuJoCo, Farama Gymnasium, Tianshou. These are the &#8220;safe defaults&#8221; for integrations and community reach.</p><h2>Momentum: Shipping velocity now</h2><p><strong>Hottest by activity score (commits_30d, PRs_merged_30d, push/release recency):</strong> strong showings from <strong>huggingface/trl</strong>, <strong>ray-project/ray</strong>, <strong>google-deepmind/mujoco</strong>, <strong>google-deepmind/open_spiel</strong>, <strong>isaac-sim/IsaacLab</strong>, <strong>pytorch/rl</strong>, <strong>PrimeIntellect-ai/verifiers</strong>. These are the best near-term partnership/signal taps.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zAdw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zAdw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 424w, https://substackcdn.com/image/fetch/$s_!zAdw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 848w, https://substackcdn.com/image/fetch/$s_!zAdw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 1272w, https://substackcdn.com/image/fetch/$s_!zAdw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zAdw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png" width="1390" height="976" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/edef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:976,&quot;width&quot;:1390,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172948,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/175468209?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zAdw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 424w, https://substackcdn.com/image/fetch/$s_!zAdw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 848w, https://substackcdn.com/image/fetch/$s_!zAdw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 1272w, https://substackcdn.com/image/fetch/$s_!zAdw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedef5c22-7d76-4263-9589-a117ce2695ed_1390x976.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Emerging movers (low stars, high momentum)</h2><p>&#8220;Under-the-radar but shipping&#8221; (stars &lt; 1k, momentum &#8805; 70): e.g. <strong>hud-evals/hud-python</strong> (rapid cadence in LLM-RL eval/envs), <strong>instadeepai/Mava</strong> (JAX MARL), plus a few niche env/runtime repos. These are prime candidates for early collabs, grants, or feature pilots.</p><h2>Areas heating up</h2><ul><li><p><strong>LLM-RL environments and verifiers:</strong> <strong>PrimeIntellect-ai/verifiers</strong>, <strong>hud-evals/hud-python</strong> show fast iteration &#8594; good proxy for RLHF/RLVR ecosystem traction.</p></li><li><p><strong>Physics and robotics:</strong> <strong>MuJoCo</strong>, <strong>IsaacLab</strong>, <strong>ManiSkill</strong> have active pipelines &#8594; strong for sim-to-real stories and embodied agents.</p></li><li><p><strong>Core libraries:</strong> <strong>TRL</strong>, <strong>PyTorch RL</strong>, <strong>Tianshou</strong>, <strong>SB3</strong> remain the practical workhorses for researchers/teams.</p></li></ul><h2>Risk and maintenance flags</h2><ul><li><p><strong>Issue backlog hotspots:</strong> very large open-issue queues in <strong>ray-project/ray</strong>, <strong>carla-simulator/carla</strong>, <strong>facebookresearch/habitat-lab</strong> &#8594; watch for maintainer bandwidth + triage pace.</p></li><li><p><strong>Staleness:</strong> a handful of well-known but <strong>stale &gt;90d</strong> repos (some with big star counts) &#8594; fine for legacy baselines, not great for new dependencies.</p></li></ul><h2>Topic coverage: Where the field is leaning</h2><p>Most frequent tags: <strong>reinforcement-learning, gym/gymnasium, robotics, multi-agent, MuJoCo, PPO/SAC/TD3, imitation/offline RL</strong>. Clear skew to embodied control + MARL + practical baselines.</p><div><hr></div><p>If you like this newsletter, consider sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #4: OPEN SOURCE INFRA]]></title><description><![CDATA[Built an agent to analyze 236 open source infra repos. Graphs, charts, and trends.]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-4-open-source-infra</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-4-open-source-infra</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Wed, 24 Sep 2025 17:55:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!X5Ev!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I built an agent to analyze <strong>236 open source infra repos</strong>. I wanted to extract useful signals that indicate where the world is heading. These projects collectively concentrate around a few durable themes: <strong>data infrastructure (38%), AI tooling (30%), databases</strong>+<strong>search (21%),</strong> and smaller but high-signal pockets in <strong>observability</strong> and <strong>streaming</strong>. </p><p>The set is large and mature. <strong>Median project age is around 9 years,</strong> but still very active. <strong>75%</strong> pushed code in the last 30 days and <strong>81%</strong> in the last 90 days. Popularity signals are strong: the <strong>median number of stars is 12.5k</strong>. As seen from the graph below, breaking past the 20k-star barrier is difficult. And beyond 40k stars is rarefied air.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X5Ev!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X5Ev!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 424w, https://substackcdn.com/image/fetch/$s_!X5Ev!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 848w, https://substackcdn.com/image/fetch/$s_!X5Ev!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 1272w, https://substackcdn.com/image/fetch/$s_!X5Ev!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X5Ev!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png" width="1140" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1140,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/174458668?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X5Ev!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 424w, https://substackcdn.com/image/fetch/$s_!X5Ev!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 848w, https://substackcdn.com/image/fetch/$s_!X5Ev!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 1272w, https://substackcdn.com/image/fetch/$s_!X5Ev!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11b2f087-566b-4595-a3de-415cd9a5ecb5_1140x500.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Forks and stars are highly correlated (<strong>0.81</strong>), which is consistent with communities that not only watch but also remix.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HjXg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HjXg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 424w, https://substackcdn.com/image/fetch/$s_!HjXg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 848w, https://substackcdn.com/image/fetch/$s_!HjXg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 1272w, https://substackcdn.com/image/fetch/$s_!HjXg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HjXg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png" width="500" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86658,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/174458668?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HjXg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 424w, https://substackcdn.com/image/fetch/$s_!HjXg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 848w, https://substackcdn.com/image/fetch/$s_!HjXg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 1272w, https://substackcdn.com/image/fetch/$s_!HjXg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b2bc28-1c4a-43d0-909c-b00a8ad1a410_500x500.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>From an investor/operator lens, three patterns stand out:</p><ol><li><p><strong>Foundation-led gravity wells:</strong> <strong>Apache</strong> is the dominant owner in the dataset (<strong>22%</strong> of repos) with <strong>85%</strong> 30-day activity. This suggests healthy stewarding and long-horizon roadmaps. Non-foundation projects skew higher on stars but slightly lower on recency of activity. Classic &#8220;breakout vs durability&#8221; trade-off.</p></li><li><p><strong>Permissive licensing is the default:</strong> <strong>Apache 2.0</strong> accounts for <strong>58%</strong> of identified licenses. <strong>MIT</strong> is a distant second. For commercial builds and cloud packaging, this materially reduces legal friction and widens the potential surface for enterprise adoption and revenue capture.</p></li><li><p><strong>Databases and data infra are &#8220;workhorse hot&#8221;:</strong> <strong>Database/search</strong> projects show <strong>88%</strong> 30-day activity and robust median stars (13.8k). <strong>Streaming</strong> is smaller in count but similarly active (83% in 30 days). <strong>Observability</strong> is tiny by count yet shows <strong>near-universal recent activity</strong>, signaling fast iteration and a race to product-market-fit in telemetry-heavy AI / infra stacks.</p></li></ol><h3>Language and owner landscape</h3><p><strong>Top languages by count:</strong> Python (65), <strong>Java (60)</strong>, <strong>C++ (34)</strong>, <strong>Go (18)</strong>, <strong>TypeScript (15)</strong>. The long tail includes Rust, Scala, C, Ruby, JS, and notebooks. Median stars by language (with sample-size caution) show <strong>C++</strong>, <strong>Rust</strong>, and <strong>TypeScript/Python</strong> clustering in the mid-teens to low-20k range, while <strong>Go</strong> sits a bit lower by median in this set. Though Go projects like Kubernetes and Ollama are outliers on the high end.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xqUL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xqUL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 424w, https://substackcdn.com/image/fetch/$s_!xqUL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 848w, https://substackcdn.com/image/fetch/$s_!xqUL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 1272w, https://substackcdn.com/image/fetch/$s_!xqUL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xqUL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png" width="1000" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130935,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/174458668?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xqUL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 424w, https://substackcdn.com/image/fetch/$s_!xqUL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 848w, https://substackcdn.com/image/fetch/$s_!xqUL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 1272w, https://substackcdn.com/image/fetch/$s_!xqUL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F749bf501-218b-4086-a158-b371d18ce8a5_1000x495.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Owner concentration:</strong> <strong>Apache</strong> (53 repos) is the gravitational center. Next are small clusters around <strong>facebookresearch</strong>, <strong>elastic</strong>, <strong>google</strong>, <strong>Netflix</strong>, <strong>h2oai</strong>, <strong>tidyverse</strong>, and a handful of fast-moving, company-backed AI infra owners.</p><ul><li><p><strong>Apache cohort:</strong> median stars <strong>5.9k</strong>, median open issues <strong>218</strong>, <strong>85%</strong> pushed in the last 30 days. High maintenance cadence and roadmap continuity.</p></li><li><p><strong>Non-Apache cohort:</strong> higher median stars (<strong>13.9k</strong>) and higher open issues (<strong>364</strong>), but <strong>lower 30-day activity (73%)</strong>. More &#8220;breakout&#8221; but slightly more sporadic recency.</p></li></ul><p>Interpretation: foundation projects optimize for <strong>stability and continuity</strong>, while non-foundation/company-backed projects skew toward <strong>velocity and viral adoption</strong> (and carry more product risk but also upside).</p><h3>Licensing signals and commercialization readiness</h3><p>Licenses heavily favor permissive terms:</p><ul><li><p><strong>Apache-2.0 &#8776; 58%</strong> of identified licenses</p></li><li><p><strong>MIT &#8776; 8&#8211;9%</strong>, <strong>BSD variants &#8776; 4&#8211;5%</strong></p></li><li><p>Copyleft licenses are a small minority (<strong>AGPL/GPL/MPL</strong> combined &#8776; low-teens count)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ICUu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ICUu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 424w, https://substackcdn.com/image/fetch/$s_!ICUu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 848w, https://substackcdn.com/image/fetch/$s_!ICUu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 1272w, https://substackcdn.com/image/fetch/$s_!ICUu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ICUu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png" width="1029" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1029,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135994,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/174458668?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ICUu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 424w, https://substackcdn.com/image/fetch/$s_!ICUu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 848w, https://substackcdn.com/image/fetch/$s_!ICUu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 1272w, https://substackcdn.com/image/fetch/$s_!ICUu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35612b5c-ed99-486b-906a-fc03e40b49d9_1029x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Why it matters: for <strong>infra investors and founders</strong>, permissive licenses simplify cloud packaging, commercial add-ons, and enterprise adoption. A deep Apache-2.0 bench implies <strong>fewer legal frictions</strong> for hosting and managed services. Copyleft projects can still commercialize, but with more nuanced business models (e.g. dual-license, hosted-only value capture).</p><h3>Popularity and engagement dynamics</h3><ul><li><p><strong>Median stars</strong> &#8776; <strong>12.5k</strong> (mean &#8776; 21&#8211;22k, long-tail heavy).</p></li><li><p><strong>Forks&#8211;stars correlation</strong> is strong (<strong>0.81</strong>) i.e. projects that attract attention also tend to accumulate derivative work/extensions, which is useful for platform bets.</p></li><li><p><strong>Open issues</strong> correlate moderately with stars/forks (<strong>0.34</strong>): bigger communities generate more surface area for maintenance and governance, which strengthens moat if maintainers keep pace.</p></li></ul><p><strong>Age profile:</strong> median age <strong>9 years</strong>. Cohorts are broad from 2008&#8211;2014 &#8220;classic&#8221; projects through 2019&#8211;2023 modern entrants. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BUtb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BUtb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 424w, https://substackcdn.com/image/fetch/$s_!BUtb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 848w, https://substackcdn.com/image/fetch/$s_!BUtb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 1272w, https://substackcdn.com/image/fetch/$s_!BUtb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BUtb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png" width="1269" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1269,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:157084,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/174458668?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BUtb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 424w, https://substackcdn.com/image/fetch/$s_!BUtb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 848w, https://substackcdn.com/image/fetch/$s_!BUtb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 1272w, https://substackcdn.com/image/fetch/$s_!BUtb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4162ba9d-26c3-4e65-a56b-80a309cde087_1269x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Despite age, <strong>recency is strong</strong>: <strong>75%</strong> pushed within the <strong>last 30 days</strong>, <strong>81%</strong> within <strong>90 days</strong>. This is notable: a materially active base across mature infra implies <strong>ongoing fit with today&#8217;s workloads</strong> (not just legacy shelfware).</p><h3>Thematic clusters </h3><p>Heuristic tags show the following splits and signals:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EmAE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EmAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 424w, https://substackcdn.com/image/fetch/$s_!EmAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 848w, https://substackcdn.com/image/fetch/$s_!EmAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 1272w, https://substackcdn.com/image/fetch/$s_!EmAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EmAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png" width="1013" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0846b652-b51d-44cd-8664-081003f23b74_1013x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1013,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104490,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/174458668?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EmAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 424w, https://substackcdn.com/image/fetch/$s_!EmAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 848w, https://substackcdn.com/image/fetch/$s_!EmAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 1272w, https://substackcdn.com/image/fetch/$s_!EmAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0846b652-b51d-44cd-8664-081003f23b74_1013x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Data infra (38% of repos)</strong>: ETL/ELT, lakes/warehouses, analytics engines.</p><ul><li><p><strong>Activity:</strong> <strong>80%</strong> pushed in 30 days.</p></li><li><p><strong>Median stars:</strong> <strong>12.7k</strong>.</p></li><li><p>Takeaway: Large, steady engine of infra demand. Many opportunities for connective tissue (metadata, cost governance, data contracts, lineage, privacy).</p></li></ul></li><li><p><strong>AI and LLM tooling (30%)</strong>: inference servers, evaluation, fine-tuning, agent frameworks.</p><ul><li><p><strong>Activity:</strong> <strong>73%</strong> in 30 days.</p></li><li><p><strong>Median stars:</strong> <strong>14.9k</strong>.</p></li><li><p>Takeaway: Strong attention, slightly more volatility. Integration layers around <strong>model routing, evals, caching, safety, observability</strong> look investable if coupled with usage-linked pricing.</p></li></ul></li><li><p><strong>Databases + search (21%)</strong>: transactional, analytical, vector, indexing.</p><ul><li><p><strong>Activity:</strong> <strong>88%</strong> in 30 days (highest of the major categories).</p></li><li><p><strong>Median stars:</strong> <strong>13.8k</strong>.</p></li><li><p>Takeaway: Sustained build velocity and adoption. Practical moats accrue via <strong>operational excellence</strong> (HA/backup/recovery), <strong>performance on real workloads</strong>, and <strong>cloud-native operability</strong> (autoscaling, storage tiering, predictable cost).</p></li></ul></li><li><p><strong>Streaming (5%)</strong>: Kafka/Pulsar-like patterns, queues, event backbones.</p><ul><li><p><strong>Activity:</strong> <strong>83%</strong> in 30 days.</p></li><li><p><strong>Median stars:</strong> <strong>11.5k</strong>.</p></li><li><p>Takeaway: Smaller number but sticky demand. New growth likely at the edges (exactly-once, stateful stream processing with low-latency joins, and <strong>data contracts bridging OLTP&#8594;OLAP</strong>).</p></li></ul></li><li><p><strong>Observability/telemetry (6%)</strong>: metrics, tracing, logging.</p><ul><li><p><strong>Activity:</strong> <strong>100%</strong> in 30 days (small sample).</p></li><li><p><strong>Median stars:</strong> <strong>20.4k</strong> (skewed by a few big names).</p></li><li><p>Takeaway: In the AI era, infra adds <strong>non-determinism</strong> and <strong>cost volatility</strong>. And this makes telemetry essential. Expect consolidation around <strong>OTel-native</strong> pipelines + <strong>LLM-aware</strong> SLOs, test harnesses, and <strong>cost-to-quality</strong> guardrails.</p></li></ul></li><li><p><strong>Security/auth (1&#8211;2%)</strong>: tiny count but fully active.</p><ul><li><p>Takeaway: Greenfield for <strong>policy-as-code</strong>, Secrets/Key mgmt for <strong>multi-tenant AI</strong>, evaluation/test data governance, and <strong>RAG pipeline hardening</strong>.</p></li></ul></li></ul><h3>What this implies for company building</h3><ol><li><p><strong>Pick durability.</strong> The highest sustained activity is in <strong>databases and data infra</strong>, with <strong>observability</strong> also showing intense iteration. For a newco, tighter loops on reliability, operability, and cost predictability will outrun feature-led me-toos.</p></li><li><p><strong>License choice is strategic.</strong> Given the dominance of <strong>Apache-2.0</strong>, deviating to copyleft will narrow the distribution surface. If your moat is cloud-operational (SLAs, SRE excellence, compliance), Apache/MIT keeps optionality high and makes enterprise POCs smoother.</p></li><li><p><strong>Own the boring edges.</strong> The correlation between stars and forks tells you where ecosystems are fertile. But investors should underwrite the <strong>edges that cause pain in production</strong>: schema evolution, data drift, stateful upgrades, backfills, cross-region consistency, on-disk format stability, efficient vector indexes under churn, and <strong>observability that ties cost &#8594; quality</strong> for AI pipelines.</p></li><li><p><strong>Design for platform effects.</strong> Projects that make it easier to extend (plugins, connectors, storage engines, UDFs, operators) accumulate forks and integrations. <strong>Compounding moats</strong> that are hard to unwind once embedded in workflows.</p></li></ol><h3>What this implies for investing</h3><ul><li><p><strong>Foundation-anchored assets</strong> (Apache et al) are excellent <strong>ecosystem barometers</strong> and <strong>acquisition surfaces</strong>: look for teams that commercialize operational excellence around these standards with <strong>predictable TCO</strong>.</p></li><li><p><strong>Company-backed AI infra</strong> with permissive licenses can scale fast but must prove <strong>defensibility beyond model access</strong> e.g. <strong>data adjacency</strong>, compliance, private fine-tunes, eval/safety/observability built in, and <strong>cost governance</strong>.</p></li><li><p><strong>Databases/search</strong> remain investable where teams demonstrate technical advantage on <strong>real customer workloads</strong> (tail latency, compaction, tiered storage, multi-AZ replication, crash-safe durability, workload isolation) and a clean path to <strong>managed-service margins</strong>.</p></li><li><p><strong>Telemetry/control planes</strong> for <strong>RAG/agentic systems</strong> are under-supplied. The high activity in observability hints at a forming standard: <strong>OTel first</strong>, <strong>LLM-aware</strong> signals (prompt/response lineage, caching hits, hallucination/eval scores), and <strong>unit-economics dashboards</strong> that tie GPU/egress cost to business outcomes.</p></li></ul><h3>Concrete opportunities to explore from the patterns</h3><ol><li><p><strong>AI Data Reliability and Cost Controls:</strong> Tools that watch <strong>embedding churn</strong>, <strong>index compaction</strong>, <strong>cache efficacy</strong>, and <strong>prompt/eval drift.</strong> And automatically tune for <strong>cost-to-quality</strong> trade-offs.</p></li><li><p><strong>Unified Schema and Lineage for Hybrid Workloads:</strong> Bridges between OLTP&#8594;stream&#8594;OLAP with <strong>contract enforcement</strong> and <strong>automated backfills</strong>.</p></li><li><p><strong>Database-as-a-Product Ops Kits:</strong> Opinionated ops for top open source databases (bootstrap, HA, backup/restore drills, live-migrations, chaos tests, perf baselines), delivered as <strong>operator + runbooks + SRE service</strong>.</p></li><li><p><strong>Security Hardening for RAG/Agents:</strong> Policy-as-code across <strong>retrievers, tools, model endpoints</strong>, with <strong>PII/PHI/PCI</strong> controls and reproducible evals.</p></li><li><p><strong>Plugin fabrics where forks flourish:</strong> If a repo shows strong forks/stars, there&#8217;s room for a <strong>market of connectors/operators</strong> and &#8220;glue&#8221; that becomes the default choice.</p></li></ol><h3>Risks and watch-outs</h3><ul><li><p><strong>Star-driven bias:</strong> Stars overweight top-of-funnel excitement. Insist on <strong>usage telemetry, self-hosted installs, and enterprise references</strong> before over-indexing.</p></li><li><p><strong>License flips / relicensing:</strong> While rare in Apache/MIT, some projects have pivoted to source-available or dual-license. Diligence the governance and contributor agreements.</p></li><li><p><strong>Maintainer bandwidth:</strong> Open issues scale with popularity. Ensure there&#8217;s a bus-factor plan e.g. multiple core maintainers, foundation backing, or a commercial steward.</p></li></ul><div><hr></div><p>If you like this newsletter, consider sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #3: INFERENCE CLOUD PLATFORMS]]></title><description><![CDATA[Companies to build, enable, and sell inference platforms for AI applications]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-3-inference-cloud</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-3-inference-cloud</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Wed, 17 Sep 2025 23:01:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AAiE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. What inference cloud platforms are and why they matter now</h2><p>&#8220;Inference&#8221; is the phase where trained AI models answer requests in the wild: they classify, summarize, code, chat, recommend, or generate images and video. Inference cloud platforms are the providers that run this at scale and expose it as APIs or managed endpoints. The group spans: </p><ul><li><p>Hyperscalers: AWS Bedrock, Azure OpenAI, Google Vertex AI</p></li><li><p>Model companies with hosted APIs: OpenAI, Anthropic, Cohere, Mistral, xAI</p></li><li><p>Specialized AI clouds: CoreWeave, Lambda, Together AI, Fireworks AI, Modal, Replicate, Anyscale</p></li><li><p>Open-source inference engines: NVIDIA TensorRT-LLM, vLLM, Triton, Ollama) that these platforms increasingly use under the hood</p></li></ul><p>Two things are happening at once. First, end-user demand is rising as enterprises shift from pilots to production. Companies are redesigning workflows and are seeing cost reductions in most functions where they actually deploy AI. Second, the supply side is scaling dramatically. Microsoft guided to a record ~$30 billion of capex in the current quarter (Jul 30, 2025). And Alphabet lifted 2025 capex plans to ~$85 billion, largely for AI data centers. </p><p>Taken together, that means inference capacity will drive value creation (not model training alone). Especially over the next two years as real usage composes into steady, billable traffic. AMD&#8217;s leadership has been explicit. Inference demand is set to outgrow training, with CEO Lisa Su calling out rapid acceleration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AAiE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AAiE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!AAiE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!AAiE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!AAiE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AAiE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png" width="768" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:823104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/173893981?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AAiE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!AAiE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!AAiE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!AAiE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aa2a46c-ff1b-41ad-ae42-5b1d7cb95503_768x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. Demand outlook: pricing is falling while usage rises</h2><p>Developers can now buy high-end model outputs for a fraction of last year&#8217;s cost. OpenAI&#8217;s GPT-4o launched on May 13, 2024 at roughly half the price of GPT-4-Turbo and with higher rate limits. Google followed with large price cuts to Gemini 1.5 Pro (announced Sep 24, 2024; effective Oct 1, 2024). And later rolled newer 2.x models into its lineup with low-cost &#8220;Flash&#8221; tiers.</p><p>Cohere and Mistral publish similarly aggressive pricing for their command-and-reasoning families. On the ultra-low-cost end, DeepSeek&#8217;s R1 reasoning API lists input at roughly $0.55 per million tokens and $2.19 for output.</p><p>Lower unit prices haven&#8217;t slowed demand. If anything, they invite more usage and new types of applications (voice agents, video generation, background automation). Inference requests are becoming embedded in daily processes rather than sporadic &#8220;pilot&#8221; bursts.</p><h2>3. Market structure: who&#8217;s selling what</h2><p><strong>Hyperscalers:</strong><br>AWS, Microsoft, and Google anchor the managed platform tier: wrapping multiple models, safety filters, observability, and enterprise controls under one bill and SLA. AWS Bedrock and Azure OpenAI have achieved FedRAMP High in their government clouds. And Google secured FedRAMP High for selected components like Vertex AI Vector Search. This matters for regulated demand where compliance can be the gating factor.</p><p><strong>Model API companies:</strong><br>OpenAI, Anthropic, Cohere, Mistral, and xAI expose models directly. And are often available through the hyperscalers too. Current, public pricing pages let builders compare per-million-token rates and pick &#8220;fast-cheap&#8221; or &#8220;smart-expensive&#8221; options. </p><p><strong>Specialized AI clouds and serverless GPU providers:</strong><br>CoreWeave, Together AI, Fireworks AI, Modal, Replicate, and others focus on cost-efficient throughput, burst capacity, and developer experience. CoreWeave&#8217;s S-1 (filed Mar 3, 2025) revealed $1.92 billion of 2024 revenue, but also heavy customer concentration (Microsoft at ~62% in 2024 per S-1 analysis and press). It financed expansion with a $7.5 billion debt facility led by Blackstone and Magnetar on May 17, 2024.</p><p>Together AI raised a $305 million Series B on Feb 20, 2025 and crossed $100 million annualized revenue around that time, per Bloomberg/Crunchbase reporting. Fireworks AI reported rapid ARR growth in 2025 and is reportedly exploring a raise at ~$4 billion valuation. Modal and Replicate illustrate the &#8220;serverless GPU&#8221; model that charges per-second or per-GPU-hour for inference runs, with public pricing examples for L-series GPUs.</p><p><strong>Open-source inference engines:</strong><br>Under the hood, many platforms are converging on a few high-performance engines. vLLM (57k+ GitHub stars as of Sep 7, 2025) and NVIDIA&#8217;s TensorRT-LLM (actively releasing throughout 2024&#8211;2025) are two of the most visible, while NVIDIA Triton remains common as a serving runtime. On the &#8220;local&#8221; side, Ollama&#8217;s explosive adoption (152k+ stars as of Sep 6&#8211;7, 2025) signals a strong DIY and edge-inference movement. This is often a precursor to enterprise demand for managed versions. </p><p><strong>China and the rest of world:</strong><br>In China, Baidu, Alibaba (Qwen), ByteDance (Doubao), and startups like DeepSeek are pushing aggressive capability-to-cost curves. Baidu announced model upgrades and price cuts on Apr 24&#8211;25, 2025. Alibaba publishes granular Qwen API pricing. ByteDance promotes Doubao access via Volcano Engine. DeepSeek lists low per-million-token pricing for its R1 reasoning model.</p><h2>4. Economics in plain terms: what drives costs and margins</h2><p>Inference costs scale with three levers: </p><ol><li><p>compute per request (model size, precision, and decoding strategy)</p></li><li><p>utilization (keeping GPUs busy with batching and scheduling)</p></li><li><p>data movement (egress and inter-AZ (availability-zone) / region traffic) </p></li></ol><p>Providers increasingly use FP8/FP4 kernels, paged attention, speculative decoding, and &#8220;in-flight batching&#8221; to boost throughput. Exactly what TensorRT-LLM and vLLM are optimized for.</p><p>At the cloud-network level, egress fees and inter-AZ traffic still matter for TCO (total cost of ownership). In 2024, Google removed <em>exit</em> fees for customers migrating off its cloud (Jan 11&#8211;12, 2024), and AWS followed in March 2024. But normal egress still applies for day-to-day operations. Typical AWS data transfer out runs roughly $0.09/GB for the first 10 TB/month in many regions, with inter-AZ charges around $0.01/GB.</p><p><strong>Put practically: bandwidth adds up if an application streams images or video from inference outputs or moves embeddings between services. </strong></p><p>That&#8217;s why many inference platforms co-locate vector search, caches, and storage with serving to cut cross-service data charges. It&#8217;s also why some startups prefer specialized AI clouds where pricing bundles compute, storage, and networking tightly.</p><h2>5. Reliability and compliance: what enterprises actually ask for</h2><p>Large buyers care about uptime SLAs, security attestations, and government cloud options. That said, outages do happen. OpenAI reported notable incidents in 2023&#8211;2024 (including a DDoS-related disruption on Nov 8, 2023 and a service impairment on Jun 4, 2024), which buyers often cite when asking about multi-vendor failover and local fallback. </p><p>For IT leaders, this translates into a simple rule: </p><p><em><strong>pick at least two model providers (direct or via a hyperscaler) and deploy a fallback path on an open-source engine (vLLM/TensorRT-LLM) where feasible.</strong></em> </p><p>This reduces outage risk and allows cost routing as prices change.</p><h2>6. Competitive dynamics: price wars meet platform bundling</h2><p>Price cuts are not just marketing. They pressure everyone (especially startups) to improve GPU utilization and lower serving costs. Google&#8217;s 64%/52% cuts on Gemini 1.5 Pro in late 2024 set a precedent. OpenAI&#8217;s GPT-4o (May 13, 2024) cut price and boosted rate limits. xAI&#8217;s Grok 4 and DeepSeek R1 added low-cost reasoning options in 2025. </p><p>Hyperscalers also bundle: the model call is one API, but buyers are really purchasing governance, observability, private networking, enterprise auth, and in some cases FedRAMP posture. That bundle is hard for startups to match, which is why specialized AI clouds differentiate on raw performance, GPU availability, or easy developer workflows.</p><p>CoreWeave is the bellwether for specialized AI clouds. Its S-1 showed explosive revenue growth to $1.92 billion in 2024. These numbers show how capital intensive inference clouds are and how much their fate can hinge on a few anchor customers. </p><h2>7. How this connects to infrastructure startups and how many are affected</h2><p><strong>Where the dependency lies: </strong>Even if a startup doesn&#8217;t sell an &#8220;inference API&#8221;, much of the AI infra startup stack is downstream of inference demand:</p><ul><li><p><strong>Model-serving and orchestration</strong> (BentoML/Ray Serve/Anyscale, vLLM, Triton): revenue aligns with request volumes and concurrency. As inference scales, these grow. Anyscale&#8217;s 2025 partnerships signal continued push to managed Ray-based serving.</p></li><li><p><strong>Data layer</strong> (vector databases, feature stores): more inference means more embeddings, more caching, and more retrieval.</p></li><li><p><strong>Observability/security</strong> (guardrails, evals, tracing): production inference requires red-teaming, safety checks, and run-time monitoring.</p></li><li><p><strong>Networking and acceleration</strong> (NICs, smart switches, CUDA/ROCm kernels): token throughput and tail-latency are network-sensitive.</p></li><li><p><strong>Edge/enterprise local</strong> (Ollama-style local serving, on-prem Triton/TensorRT-LLM): regulated workloads and cost control push some inference to customer hardware.</p></li></ul><p><strong>A rough share of affected infra startups:</strong><br>Using public venture reports that show AI absorbing an outsized share of VC dollars in 1H-2025 (EY: ~$49.2 billion into gen-AI in H1 2025, already above full-year 2024) and the visible skew toward application-layer deals, <strong>at least half (and plausibly 60&#8211;70%) of AI infrastructure startups</strong> have their fortunes tied to inference growth (direct revenue or adjacent data/observability spend). </p><p>This is an estimate, not a hard count. But it matches what&#8217;s seen in fund flows and market maps: most infra projects today pitch either &#8220;cheaper, faster inference&#8221; or &#8220;better pipelines and retrieval for inference.&#8221; </p><p><strong>Correlation and dependency:</strong><br>When hyperscalers raise capex and roll out enterprise controls, application teams are more likely to ship production features. And that directly lifts inference calls, which lifts demand for the whole downstream stack. Conversely, if a big buyer slows deployments or centralizes on one provider for cost reasons, adjacent infra (observability, vector DBs, orchestration) may see delayed projects.</p><h2>8. Risks over the next 24 months and what to watch</h2><p><strong>(a) Supply chain and power constraints</strong><br>GPU allocations are still tight and data-center interconnect queues are long in power-constrained regions. BigTech&#8217;s capex is surging (Microsoft ~$30 B this quarter, Alphabet ~$85 B for 2025), but a lot of that converts into capacity only when land, power, and cooling are ready. Watch for delays tied to grid interconnects and specialized high-density builds. </p><p><strong>(b) Price compression</strong><br>The fall in per-token pricing is likely to continue, with new &#8220;reasoning&#8221; models (DeepSeek R1, xAI Grok 4) pushing price-performance down further. Platforms must keep utilization high (via batching, caching, or model distillation) to avoid margin squeeze. </p><p><strong>(c) Outages and concentration</strong><br>Incidents at a single model provider can knock out large fractions of traffic for hours. The OpenAI incidents in Nov 2023 and Jun 2024 are a reminder. SRE teams will pressure for multi-provider routing and local fallbacks.</p><p><strong>(d) Compliance and data locality</strong><br>The good news is that FedRAMP High and similar authorizations are arriving for major services. The challenge is that sensitive workloads still need private networking, key management, and clear data-use policies. Delays in rolling out &#8220;enterprise safeguards&#8221; can stall big deals. </p><p><strong>(e) Macro and investor sentiment</strong><br>Inference clouds are capital intensive. Public market reception to specialized AI clouds has been mixed. If public comps wobble, late-stage private rounds could slow, impacting the partner ecosystem. </p><h1>9. How to think about winners</h1><p><strong>Platforms that control utilization</strong><br>Winners will squeeze more requests per GPU hour. The technology stack is clear: engines like TensorRT-LLM and vLLM, plus tricks like FP8/FP4 quantization and speculative decoding, drive throughput without hurting quality. Those gains are hard to reverse and compound every quarter.</p><p><strong>Platforms that own regulated channels</strong><br>FedRAMP High and sector certifications unlock budgets that smaller vendors can&#8217;t access quickly. AWS, Microsoft, and Google&#8217;s moves in 2024&#8211;2025 are strategic moats in US public sector and highly regulated industries. </p><p><strong>Platforms with balanced customer bases</strong><br>Revenue concentration is a risk in this subsector. The more diversified the top customers and geographies, the sturdier the cash flows through cycles.</p><p><strong>Global low-cost challengers</strong><br>The China ecosystem (Baidu, Alibaba, DeepSeek, ByteDance) is pushing cost down dramatically. While export controls and data residency limit cross-border usage, the pricing pressure they exert globally will influence buyer expectations.</p><h2>10. Practical guidance for investors and operators</h2><p><strong>For venture investors</strong><br>Ask any infra startup how they (a) keep GPUs hot (utilization) (b) cut token compute per request (quantization, distillation, caching) (c) minimize bandwidth charges (co-location, compression, RAG locality). You&#8217;re looking for companies able to maintain gross margins as price per token keeps sliding.</p><p>Validate multi-provider integrations. If a startup depends on one model vendor or one cloud region, treat that as concentration risk. Much like you would a single large customer.</p><p>Finally, watch the &#8220;serverless GPU&#8221; abstraction layer. Modal and Replicate show that per-second billing and instant scale can beat reserved instances for bursty workloads. Adoption there could shift where &#8220;platform&#8221; margins accrue (to the servers-on-demand layer). </p><p><strong>For corporate buyers and product teams</strong><br>Lock in at least two model paths (via your cloud of choice plus a direct API) and a local fallback using vLLM or TensorRT-LLM for mission-critical flows. Budget for bandwidth where outputs are large (images, video) and keep RAG stores co-located with serving to avoid inter-AZ/region fees. Remember that migrating off a cloud may waive <em>exit</em> fees now, but normal egress still applies to daily operations.</p><p><strong>For founders at the application layer</strong><br>Lean into cheaper &#8220;flash&#8221; tiers for non-critical tasks and reserve expensive reasoning models for high-value steps. Many teams are carving workload graphs so that 70&#8211;90% of tokens go to low-cost models and only the &#8220;hard&#8221; paths hit premium models. That keeps unit economics sane as your user base grows.</p><h2>11. What could change the call (24-month horizon)</h2><p><strong>Positive surprises</strong></p><ul><li><p>A step-change in throughput (e.g. widespread FP4 adoption or a new serving breakthrough) that halves cost per request again. Suddenly many more use cases become profitable. Keep an eye on TensorRT-LLM and vLLM releases.</p></li><li><p>Faster regulatory certifications (FedRAMP High/DoD IL-5 for more services) unlocking pent-up demand in government and healthcare. </p></li><li><p>Public-market validation of specialized AI clouds (smoother IPO outcomes and rising multiples) that lowers the sector&#8217;s cost of capital and speeds build-outs.</p></li></ul><p><strong>Negative surprises</strong></p><ul><li><p>Power or interconnect delays slow data-center rollouts, creating capacity gaps during peak demand. The capex is committed, but lead-times can slip.</p></li><li><p>Major, prolonged outages trigger widespread buyer mandates for on-prem inference, temporarily shifting spend away from managed cloud endpoints. </p></li><li><p>An extended price war that compresses gross margins faster than utilization improvements can compensate. Particularly painful for smaller providers without proprietary hardware access.</p></li></ul><h2>12. How many infrastructure startups will this affect and how?</h2><p><strong>Estimated share affected:</strong><br>Given H1-2025 venture flows (~$49.2 billion into generative AI in H1 alone) and the clear skew of infra pitches toward serving, orchestration, vector/RAG, and observability, a <strong>reasonable estimate is that 60&#8211;70% of AI infrastructure startups</strong> are directly tied to inference adoption curves either as primary revenue (serving) or adjacent spend (data/observability). That range reflects uncertainty: public sources break out &#8220;AI&#8221; as a whole, not &#8220;inference infra&#8221; specifically.</p><p><strong>Correlation pathways:</strong></p><ul><li><p><strong>Capex &#8594; capacity &#8594; price:</strong> Hyperscaler capex raises capacity. More capacity tends to push per-token prices down. Lower prices increase app usage. More usage drives infra demand.</p></li><li><p><strong>Compliance unlock &#8594; big-ticket buyers:</strong> FedRAMP High or equivalent certifications unlock multi-year contracts. Once signed, these generate steady inference flows that support data and observability partners.</p></li><li><p><strong>Model competition &#8594; routing:</strong> As DeepSeek/xAI/Mistral cut costs, app teams start routing workloads by cost/quality. That forces startups to integrate multiple providers and invest in evaluation and guardrails, benefiting orchestration and tooling companies.</p></li></ul><p><strong>Key dependencies:</strong></p><ul><li><p><strong>GPU supply and scheduling tech</strong> (TensorRT-LLM, vLLM, advanced schedulers) are foundational. Without them, margins erode.</p></li><li><p><strong>Network costs</strong> determine whether RAG-heavy apps scale profitably. &#8220;Exit&#8221; fee waivers don&#8217;t change daily egress economics. </p></li><li><p><strong>Customer concentration</strong> (CoreWeave-Microsoft) shows platform-level fragility that can cascade to smaller partners. </p></li></ul><h2>13. Regional notes: US, Europe, Asia</h2><p><strong>United States: </strong>The US remains the center of gravity for both demand and supply. FedRAMP authorizations and record hyperscaler capex point to continued expansion.</p><p><strong>Europe: </strong>Regulatory focus on switching costs pushed clouds to waive <em>exit</em> fees, which may encourage multi-cloud inference strategies (EU Data Act and broader scrutiny played a role in the 2024 fee changes). </p><p><strong>China: </strong>A separate but fast-moving market with intense price competition (Baidu, Alibaba Qwen, DeepSeek, ByteDance). Even if cross-border use is limited, the global effect shows up in buyer expectations about what a &#8220;fair&#8221; price per million tokens should be. </p><h2>14. What to monitor (practical checklist for the next 6&#8211;24 months)</h2><ul><li><p><strong>Capex guidance:</strong> Microsoft, Alphabet, Amazon quarterly updates. If spending flattens earlier than expected, expect tighter capacity growth and slower price cuts. </p></li><li><p><strong>Price changes:</strong> OpenAI, Google, Anthropic, Mistral, Cohere, xAI, DeepSeek pricing pages. Track cuts or new &#8220;flash/reasoning&#8221; tiers.</p></li><li><p><strong>Engine releases:</strong> vLLM and TensorRT-LLM release notes. Watch for features like better batching, quantization, and scheduler upgrades that change GPU economics. </p></li><li><p><strong>Compliance milestones:</strong> New FedRAMP/IL-4/5 authorizations across Bedrock, Vertex AI, Azure OpenAI. These correlate with large RFPs in public sector and highly regulated industries.</p></li><li><p><strong>Incident history:</strong> Model/API outages on vendor status pages and developer forums. See if customers adopt multi-provider routing as standard. </p></li><li><p><strong>Public comps:</strong> Watch CoreWeave and any follow-ons. Stock performance and disclosures can influence late-stage private rounds and M&amp;A appetite across inference tools. </p></li></ul><h2>15. Bottom line</h2><p>Inference cloud platforms are moving from novelty to utility. Prices are trending down and capacity is trending up. And compliance gates are opening. In the next two years, the most value will accrue to platforms that do three things well: (1) keep GPUs highly utilized with advanced serving stacks (vLLM, TensorRT-LLM, Triton-style backends), (2) meet enterprise governance and compliance needs at scale, and (3) diversify customers and geographies to reduce concentration risk.</p><p>For venture investors, that creates two attractive pockets: (a) <strong>&#8220;picks and shovels&#8221;</strong> that make inference cheaper (serving engines, schedulers, compression, agentic runtimes) and (b) <strong>&#8220;adjacent infrastructure&#8221;</strong> that becomes non-optional as inference scales (vector/RAG stores, eval/observability/guardrails, privacy/security layers). For founders and buyers, the operating playbook is: multi-model routing, local fallbacks, and ruthless attention to utilization and bandwidth.</p><p>The punchline: <strong>the growth of inference will pull a majority of AI infrastructure startups along with it</strong>. The exact share is uncertain. But current capex, pricing, and adoption signals make it hard to see a different center of gravity for AI infrastructure in the near term. Keep tracking capex guidance, price sheets, engine releases, and compliance wins. That&#8217;s where the next two years of winners will be decided.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #2: AI BROWSERS]]></title><description><![CDATA[Companies that build, enable, and sell AI-native browsers]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-2-ai-browsers</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-2-ai-browsers</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Thu, 04 Sep 2025 16:03:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!d0Je!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Snapshot and thesis</h2><p>Browser wars are happening again! This time with AI fueling the frenzy. AI browsers blend a familiar web browser with built-in AI that can summarize pages, answer questions, auto-navigate, draft content, and increasingly act for the user. </p><p>Over the next 24 months, this category will test whether AI-first workflows can carve out share in an entrenched browser market. And whether they can monetize beyond traditional search ads and create new attachments across the cloud and edge stack.</p><p><strong>Core thesis:</strong> The investable opportunity is attractive if (a) AI browsers can win distribution on mobile and desktop despite defaults (b) they can lower inference costs using in-browser compute and edge inference (c) they can convert intent (and time saved) into measurable revenue (ads, subscriptions, affiliate, or payments). </p><p>The upside expands because this category sits on top of and helps drive demand for a number of infra layers: GPUs, WebGPU / ONNX Runtime Web / TensorFlow.js, model APIs (OpenAI, Anthropic, Google), edge inference platforms (Cloudflare, Akamai), and search indexes (Brave Search, Bing, Perplexity&#8217;s own). </p><p><strong>Why now:</strong> Two enabling changes have landed: (1) web-native acceleration (WebGPU, maturing WASM toolchains, early WebNN), so models can run in the browser (2) policy/UX shocks that open distribution (EU DMA forcing Apple to allow alternative engines in the EU, Google&#8217;s AI Overviews changing search result pages and traffic patterns). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d0Je!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d0Je!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!d0Je!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!d0Je!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!d0Je!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d0Je!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png" width="768" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:883653,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/172282812?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d0Je!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!d0Je!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!d0Je!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!d0Je!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff288bf-cbd6-40df-9fa3-411310724637_768x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. What counts as an &#8220;AI browser&#8221; and who are the players</h2><p>In this report, &#8220;AI browsers&#8221; will include any web browser or browser-like app where AI is a native control surface (not just an extension). That spans:</p><ul><li><p><strong>Established browsers adding AI</strong>:<br><strong>Google Chrome:</strong> AI Overviews in search; &#8220;Help me write,&#8221; &#8220;Tab Organizer&#8221;.<br><strong>Microsoft Edge:</strong> Copilot built-in<br><strong>Opera:</strong> Aria assistant via Composer, on desktop, Android, and Opera Mini. Neon, an AI-native browser slated for public rollout. MiniPay crypto wallet integrated.<br><strong>Brave</strong>: Leo assistant, independent Brave Search, opt-in Brave Ads with 70% revenue share to users<br><strong>Mozilla Firefox</strong>: Exploring on-device and private AI assistants. Firefox remains a distribution gate even as AI integrations evolve. Context on iOS engine policy below.</p></li><li><p><strong>AI-native challengers</strong>:<br><strong>Perplexity</strong>: Comet AI browser. Talks with OEMs for preinstalls. Consumer MAU and revenue run-rate scaled markedly in 2025. <br><strong>BrowserBase</strong>: A web browser for AI agents. They recently teamed up with Cloudflare to build an identity layer for AI agents.<br><strong>Anthropic</strong>: They recently launched Claude for Chrome.<br><strong>The Browser Company (Arc)</strong>: AI-heavy &#8220;Browse for me&#8221; style features and agentic flows. <br><strong>SigmaOS</strong> (workflow browser with AI co-pilot). <strong>Felo/Fellou</strong> and <strong>Strawberry</strong> (smaller AI browser entrants). <br><strong>You.com</strong>: AI search + browser-style app. Enterprise agents and a web search API.<br><strong>DuckDuckGo</strong>: DuckAssist summaries and private AI chat embedded in its app. </p></li><li><p><strong>Regional ecosystems</strong>:<br><strong>Baidu</strong> is folding ERNIE models into search and companion apps. <strong>Yandex</strong> upgraded its Alice assistant with YandexGPT 3 across devices. These ecosystems point to AI browsers that are tightly integrated with local search, messaging, and mini-apps. </p></li></ul><h2>3. Market size and trajectory (next 24 months)</h2><p><strong>Installed base:</strong> Browsers are the biggest consumer runtime. Statcounter shows Chrome at ~68%, Safari ~16%, Edge ~5%, with others (Brave, Opera, Firefox, Samsung Internet) making up the balance (12-month window ending July 2025). That reach is applied to a global internet user base of ~5.5&#8211;5.65 billion in 2024&#8211;2025.</p><p><strong>Penetration starting point:</strong> AI answers already reach users through default surfaces (Google AI Overviews, Bing/Copilot milestones such as 100 million DAU in 2023), while dedicated AI browsers are in early innings with meaningful momentum: Perplexity reported ~30 million MAU and a ~$150 million ARR-like run-rate by mid-2025. Brave discloses ~93.8 million MAU. Opera reported accelerating ad/search growth with AI-powered monetization and an AI-native Neon rollout in 2025. </p><p><strong>Sizing the near-term &#8220;AI browser&#8221; revenue pool.</strong> This is not a classic TAM. It&#8217;s layers:</p><ul><li><p><strong>Ad/search monetization inside the browser</strong>: Sponsored answers, native ad formats, and affiliate/commerce. Google&#8217;s AI Overviews shift click-through patterns. Publishers report measurable volatility, which implies spend reallocations toward answer surfaces. </p></li><li><p><strong>Subscriptions</strong>: Perplexity&#8217;s premium, Brave Leo Premium, VPN bundles, and potential &#8220;agent&#8221; fees.</p></li><li><p><strong>Payments/commerce</strong> baked into browsers: Opera MiniPay crossing 8&#8211;9 million activated wallets and &gt;200&#8211;250 million transactions in 2025.</p></li></ul><p><strong>Back-of-envelope scenario (illustrative):</strong> If only 2% of the world&#8217;s internet users (~110 million) adopt an AI browser subscription at a blended $4/month in 24 months, that&#8217;s ~$5.3 billion annualized. If another 300 million users generate $1/month in incremental ad/affiliate yield via AI answers and shopping flows, that&#8217;s ~$3.6 billion annualized. These are assumptions and the purpose is to show order of magnitude. Actual outcomes hinge on distribution and inference cost curves.</p><p><strong>Growth drivers:</strong><br>a. <strong>Distribution policy changes</strong> (EU DMA, choice screens on iOS 17.4 in the EU, and potential antitrust remedies around default search contracts in the US). <br>b. <strong>Lower latency and cost</strong> thanks to WebGPU + ONNX Runtime Web/TensorFlow.js and edge inference (Cloudflare, Akamai). <br>c. <strong>New OEM channels</strong> for AI browsers on mobile (Perplexity&#8217;s Comet preinstall talks). </p><h2>4. Who uses these and why they care</h2><p>For consumers, the job to be done is &#8220;get answers, not links&#8221; plus &#8220;summarize this page, plan this trip, buy the thing, and do it fast&#8221;. AI browsers like Perplexity and Opera Aria collapse the search-read-copy-paste loop. And in Opera Mini, they do it even on low-end Android devices via lightweight UIs. </p><p>For prosumers and teams, the pitch is &#8220;do more with fewer tabs&#8221;. Arc and SigmaOS focus on workspace-style browsing with AI co-pilots to organize and draft. And You.com pushes enterprise agents and a web search API to wire AI into regulated workflows.</p><p>For privacy-sensitive users and publishers, Brave&#8217;s independent index and Leo assistant try to keep data local and links credited. DuckDuckGo keeps AI optional and private.</p><h2>5. Product and technology: how the stack fits together</h2><p><strong>Client-side acceleration.</strong> WebGPU (default in Chrome/Edge since 2023&#8211;2024) plus WASM allows meaningful on-device inference for vision, speech, and small-to-mid LLMs. Microsoft&#8217;s ONNX Runtime Web added a WebGPU execution provider in Feb 2024. TensorFlow.js continues to expand WebGPU support. Early WebNN bridges to native ML APIs. This reduces cloud spend, improves latency, and eases privacy concerns.</p><p><strong>Model options.</strong> Opera&#8217;s Aria routes to OpenAI and Google through its Composer layer (and can switch models), while Brave&#8217;s Leo supports leading closed and open models. Perplexity uses a blend (its own retrieval stack plus partner models) and its Comet browser aims to bring that into the navigation layer. </p><p><strong>Open-source tooling (browser-ready).</strong></p><ul><li><p><strong>WebLLM / MLC-LLM</strong> (LLMs compiled for WebGPU, in-browser quantization). </p></li><li><p><strong>Transformers.js</strong> (browser-side transformer inference with JS).</p></li><li><p><strong>llama.cpp</strong> (CPU/GPU-friendly inference, with ports to web via WASM/WebGPU). </p></li><li><p><strong>ONNX Runtime Web</strong> and <strong>TensorFlow.js</strong> (core runtime layers, increasingly WebGPU-accelerated).</p></li></ul><p><strong>Edge inference.</strong> Cloudflare&#8217;s Workers AI and AI Gateway, and Akamai&#8217;s Cloud Inference, push model serving closer to users to cut tail latency and cost. For AI browsers, that shortens round-trips for page-aware actions (summaries, &#8220;shopping compare&#8221;, &#8220;book this&#8221;) and creates an infra partnership surface (shared caches, embeddings, and guardrails at the edge). </p><p><strong>OS hardware tailwinds (indirect but relevant).</strong> Copilot+ PCs ship with NPUs. Apple Intelligence targets A17 Pro and M-series devices. While browsers mainly use the GPU via WebGPU, the hardware trend signals more capable local inference and rising user expectations for on-device AI. </p><h2>6. Distribution and competition</h2><p><strong>Defaults still matter.</strong> Safari and Chrome together exceed 80% market share worldwide. Moving users off defaults is hard. That&#8217;s why Perplexity is negotiating preinstalls and why EU choice screens (iOS 17.4) matter. For private companies, OEM deals and regional bundling can be the difference between niche and mainstream. </p><p><strong>Search stacks split the field.</strong></p><ul><li><p><strong>Google</strong>: AI Overviews change SERP layouts and upstream supply/demand. The company controls both Chrome and Search. </p></li><li><p><strong>Microsoft</strong>: Edge + Copilot + Bing give an integrated alternative. Bing crossed 100 million DAU when Chat launched in 2023, a psychological threshold that keeps the flywheel turning. </p></li><li><p><strong>Independents</strong>: <strong>Brave</strong> has its own index (plus Brave Search Ads), a differentiator vs meta-search. <strong>Perplexity</strong> is building index and retrieval infra and layering an agentic browser on top. <strong>BrowserBase</strong> is building web browsers to agents+applications.</p></li><li><p><strong>Regional</strong>: <strong>Baidu</strong> (ERNIE in search), <strong>Yandex</strong> (Alice/YandexGPT) bundle AI across services.</p></li></ul><p><strong>Notable financial traction and signals.</strong></p><ul><li><p><strong>Perplexity</strong>: ~30 million MAU and ~$150 million run-rate as of mid-2025 (press reporting).</p></li><li><p><strong>BrowserBase</strong>: 50 million+ browser sessions in 2025, serves 1,000+ companies, and has 20,000+ developers signed up.</p></li><li><p><strong>Brave</strong>: ~93.8 million MAU (company transparency page, 2025). </p></li><li><p><strong>Opera</strong>: Q2 2025 revenue +30% YoY; ad revenue +44% YoY; MiniPay &gt;9 million activated wallets and &gt;250 million transactions; AI-native Neon moving toward rollout. </p></li></ul><h2>7. Monetization, unit economics, and paths to profit</h2><p><strong>Ad and commerce yield.</strong> AI answers can capture high-intent queries (e.g. comparisons, local services, products) and monetize via native ads, affiliate, or merchant lead gen. Google&#8217;s AI Overviews already alter click flows. Some publishers report big swings, a signal that spend may reallocate toward answer units where the decision happens. For challengers, the question is whether AI answers can command CPC/CPA pricing comparable to today&#8217;s SERP ads at scale. </p><p><strong>Subscriptions.</strong> Perplexity Plus (and Comet tiers), Brave Leo Premium, and bundles (VPN, talk, search premium) provide diversified ARPU. Subscriptions smooth out the inherently cyclical ad budgets and create a budget for inference. </p><p><strong>Revenue sharing and tokens.</strong> Brave shares 70% of ad revenue from opt-in Brave Ads with users via BAT. It&#8217;s one of the few browsers with a transparent user revenue share. For investors, this is a lever for adoption and loyalty, but it shifts margin from company to user. </p><p><strong>Inference cost curve.</strong> Cloud-only answer generation is expensive. But two trends are bending the curve:<br>a. <strong>In-browser inference</strong> offloads smaller and latency-sensitive tasks (summaries, RAG, speech) using WebGPU and WASM. ONNX Runtime Web&#8217;s WebGPU provider launched in 02/2024. <br>b. <strong>Edge inference</strong> (Cloudflare, Akamai) trims tail latency and egress while enabling semantic caching. Fastly&#8217;s AI Accelerator (semantic caching) illustrates the caching/gateway layer that can sit in front of expensive LLM calls. </p><p><strong>Unit economics (directional).</strong> If an AI browser session involves 1&#8211;2 short model calls (RAG + summary) that can be handled locally or at the edge for pennies and premium tasks go to 4o/Claude/Gemini only when needed, then gross margins can look similar to ad-supported browsers with improved attach on premium. The mix of local/edge/cloud will be the dominant driver of gross margin over the next 24 months. </p><h2>8. Dependencies, hidden connections, and infra correlations</h2><p><strong>To GPUs (and GPU-like APIs).</strong> WebGPU exposes the device GPU to the web. Everything from WebLLM to ONNX Runtime Web depends on it. That ties AI browser performance to Chrome/Edge release cadence (and to Metal/Vulkan/DirectX under the hood), raising the value of teams that can squeeze performance from shader kernels and quantization. </p><p><strong>To model vendors.</strong> Opera Aria&#8217;s Composer taps OpenAI and Google. Edge integrates Copilot (OpenAI-family models). Perplexity blends its retrieval with partner models. Contract pricing, rate limits, and safety policies at OpenAI, Google, and Anthropic directly affect the UX and gross margin of many AI browsers.</p><p><strong>To search indexes.</strong> Independence matters. Brave&#8217;s own index reduces reliance on Bing. Perplexity is investing in its own web data and partnerships (Comet + OEM). Contract changes at Bing or Google could whipsaw smaller players. </p><p><strong>To edge networks.</strong> Cloudflare&#8217;s Workers AI/Gateway and Akamai&#8217;s Cloud Inference make agentic browsing feel instantaneous and cheaper. Expect deeper commercial tie-ups (shared semantic caches, RAG stores at the edge, abuse prevention) between AI browsers and these networks.</p><p><strong>To mobile OEMs and app stores.</strong> Perplexity&#8217;s preinstall talks hint at a classic &#8220;default wars&#8221; playbook. DMA-driven choice screens on iOS in the EU open a wedge for challengers who design great first-run flows and import tools. These are leverage points for venture-backed challengers. </p><p><strong>Correlations to watch.</strong></p><ul><li><p><strong>Ad market health &lt;&gt; AI answer RPMs.</strong> If SERP budgets migrate into AI answer units, browsers that own the answer surface will capture outsized upside. </p></li><li><p><strong>WebGPU maturity &lt;&gt; local inference share.</strong> As WebGPU and WebNN mature, more workload moves on-device, improving margins and privacy. </p></li><li><p><strong>Policy changes &lt;&gt; install base churn.</strong> DMA-style changes and U.S. remedies on default contracts could shift share faster than organic marketing can. </p></li></ul><h2>9. Risks and how to read them early</h2><p><strong>Distribution lock-in.</strong> Chrome and Safari dominate. Even great products struggle to overcome defaults and habit. Early warning: OEM deals failing to convert, short-lived spikes post-PR, and low stickiness after first-run. </p><p><strong>Traffic and publisher backlash.</strong> AI answers that hoover demand without sending clicks will risk regulatory and ecosystem pushback. There might be lawsuits, policy proposals, and an uptick in paywalled content blocking crawlers. Publishers have already reported volatility post-AI Overviews. </p><p><strong>Safety and compliance.</strong> Browsers delivering AI answers at scale must handle hallucinations, defamation, and local content controls. Edge caches can repeat bad answers faster. Human-in-the-loop and retrieval quality become key controls. </p><p><strong>Inference cost blowouts.</strong> If workloads stay cloud-heavy, COGS can swamp subscription ARPU. Watch ratio of local/edge/cloud calls. Follow releases from ONNX Runtime Web/TensorFlow.js and edge providers that measurably cut $$. </p><p><strong>Mobile platform friction.</strong> Apple&#8217;s EU engine carve-out helps, but outside the EU the WebKit requirement remains. Android OEM deals can be fragile. Track Apple/WebKit changes, iOS adoption of alternative engines (EU-only), and OEM preinstall terms. </p><p><strong>Regional complexity.</strong> In China and parts of the CIS, local giants (Baidu, Tencent, Yandex) integrate AI into super-apps and search, making it hard for foreign AI browsers to grow. Follow ERNIE, Hunyuan, DeepSeek integrations across search, messaging, and app stores. </p><h2>10. Company snapshots (what&#8217;s differentiated)</h2><p><strong>Google (Chrome + AI Overviews).</strong> Control over the browser and the ad engine with AI Overviews shifting &#8220;where decisions get made&#8221;. The risk is publisher backlash and regulatory scrutiny if traffic declines persist. </p><p><strong>Microsoft (Edge + Copilot).</strong> &#8220;Assistant in the browser&#8221; is clear. Crossing 100 million DAU on Bing in 2023 showed a step-change in engagement once chat arrived. The new Copilot+ PC push raises user expectations for local AI, which can complement WebGPU workloads in Edge.</p><p><strong>Opera (Aria, Neon, MiniPay).</strong> Clear AI narrative, strong emerging-market exposure via Opera Mini, and a fintech angle through MiniPay (8&#8211;9 million wallets, &gt;200&#8211;250 million transactions). Q2 2025 had 30% revenue growth, 44% ad growth, and Neon on the horizon. Execution on Neon and MiniPay monetization are the key tells. </p><p><strong>Brave (Leo, Search, BAT).</strong> A vertically integrated, privacy-centric stack (browser + index + ads) with ~93.8 million MAU and a distinctive user revenue-share model (70% to users for Brave Ads). Watch whether Leo and premium bundles lift ARPU without undercutting ad take. </p><p><strong>Perplexity (Comet, OEM).</strong> A consumer AI brand moving down-funnel into a browser. MAU and revenue grew fast in 2025. OEM preinstalls could be a distribution unlock. Execution risks are quality at scale, cost containment, and navigating platform politics.</p><p><strong>BrowserBase</strong>: In June 2025, it closed a $40 million Series B led by Notable Capital. Launched Director (a no-code web-automation product), signaling broader demand beyond developers. The company says it has supported 50 million+ browser sessions in 2025, serves 1,000+ companies, and has 20,000+ developers signed up. 100 million+ usage minutes billed per month, hundreds of paying customers, and &#8220;millions in revenue in its first year&#8221;, plus a 17% month-over-month increase in active subscribers after a pricing tweak. The Stagehand TypeScript repo shows ~16.7k GitHub stars (with companion repos like the MCP server at ~2.5k and open-operator at ~1.8k).</p><p><strong>You.com / DuckDuckGo / Arc / SigmaOS / Strawberry / Felo.</strong> These round out the spectrum from enterprise agents (You.com) to private AI answers (DuckAssist) to workflow-first browsers (Arc, SigmaOS) and niche AI browsers. Traction will hinge on a wedge (enterprise compliance, private AI, or a unique workflow) and a durable acquisition channel. </p><p><strong>Regional giants (Baidu, Yandex).</strong> Tight integration with search, mini-apps, and super-apps (Alice, ERNIE) produces &#8220;AI browsers&#8221; that are really gateways into national ecosystems. Good defensive moats locally. Tough export story. </p><h2>11. Ties to the infra layer and how value flows upstream</h2><ol><li><p><strong>GPU and driver stacks.</strong> Every time an AI browser runs a local summary with WebGPU, that&#8217;s incremental demand for GPU-capable devices and the driver/runtime work behind them (DirectX 12, Vulkan, Metal). When more work happens locally, latency shrinks and conversion improves &#8212; creating a measurable ROI story that justifies GPU-capable endpoints. </p></li><li><p><strong>Model APIs and gateways.</strong> AI browsers are high-variance demand generators for OpenAI, Anthropic, and Google APIs. Edge gateways (Fastly AI Accelerator, Cloudflare AI Gateway) smooth demand with semantic caching, hydrate RAG stores, and cap spend bursts &#8212; a surprisingly material piece of the puzzle for unit economics. </p></li><li><p><strong>Edge inference/CDN.</strong> Akamai&#8217;s Cloud Inference (3x throughput, up to 2.5x lower latency claims) is built exactly for the &#8220;answer right now&#8221; patterns AI browsers need. Expect joint solutions e.g. shared embeddings, abuse/fraud screens.</p></li><li><p><strong>Search indexes and crawling.</strong> Brave&#8217;s independent index and Perplexity&#8217;s retrieval investments reduce dependence on Bing/Google contracts. If remedies in the U.S. limit default search payments, the browser layer becomes a contested distribution node. This raises the strategic value of owning the index. </p></li><li><p><strong>Open-source toolchains.</strong> WebLLM/MLC-LLM, Transformers.js, llama.cpp, ONNX Runtime Web, and TensorFlow.js are now production-relevant. They compress costs and unlock offline/private modes. Infra vendors that optimize for these (developer tooling, observability, safety filters) will capture spend. </p></li></ol><p><strong>Hidden connection:</strong> as AI answers shift intent capture from SERP pages to &#8220;in-browser panels&#8221;, <strong>affiliate and performance marketing networks</strong> (and the edge CDNs that carry them) become part of the infra story: rate-limiters, link-resolvers, and fraud filters move closer to the browser. Fastly&#8217;s semantic caching is a preview. Expect Cloudflare, Akamai, and even payment processors to offer &#8220;AI answer commerce&#8221; kits.</p><h2>12. Regulation and platform dynamics</h2><ul><li><p><strong>EU DMA and iOS engines.</strong> Apple opened the door to non-WebKit engines in the EU (iOS 17.4), which can shift mobile distribution for AI browsers there. Outside the EU, WebKit remains required, tempering feature parity (e.g. WebGPU capabilities) for iOS users. </p></li><li><p><strong>U.S. search remedies.</strong> Proposed measures target default search contracts and possibly Chrome-search bundling. Outcomes could reshape browser distribution economics. Timing matters for venture pacing. </p></li><li><p><strong>Publisher relations.</strong> As AI answers expand, expect more licensing partnerships (Perplexity&#8217;s publisher deals) and more traffic-sharing proposals to defuse ecosystem friction. </p></li></ul><h2>13. What to watch in the next 24 months</h2><p><strong>Catalysts.</strong></p><ul><li><p><strong>Perplexity Comet GA + OEM preinstalls</strong> (distribution test).</p></li><li><p><strong>Opera Neon public rollout and MiniPay monetization progress</strong> (ad/search + fintech blend). </p></li><li><p><strong>Chrome/Edge WebGPU &amp; WebNN releases</strong> that materially improve on-device LLM speeds (watch ONNX Runtime Web and TF.js releases). </p></li><li><p><strong>Google&#8217;s AI Overviews dialing</strong> (frequency, layout, ad load) and publisher response. </p></li><li><p><strong>EU/U.S. remedies on defaults</strong> and any choice-screen expansions. </p></li></ul><p><strong>Early warning indicators.</strong></p><ul><li><p><strong>COGS/ARPU gaps</strong> widening (cloud-heavy inference, no local/edge offload).</p></li><li><p><strong>Short-session stickiness</strong> deteriorating (AI panels opened but not used &gt;2&#8211;3 times per day).</p></li><li><p><strong>Publisher/legal friction</strong> rising (blocked crawlers, suits, or higher licensing costs).</p></li></ul><p><strong>What would change the call (positive).</strong></p><ul><li><p>WebGPU/WebNN improvements that enable a standard, fast, small-model local stack across mainstream devices (Chrome, Edge, Safari). </p></li><li><p>Clear OEM distribution wins (preinstall + retention), validating a non-default route to tens of millions of users. </p></li><li><p>Proven ad RPMs or conversion data from AI answer units on par with classic SERP ads, with credible attribution.</p></li></ul><p><strong>What would change the call (negative).</strong></p><ul><li><p>Strong legal remedies that limit AI answer units or impose heavy licensing costs per snippet.</p></li><li><p>Performance ceilings on iOS WebKit that keep AI features lagging, capping mobile growth outside the EU. </p></li><li><p>A plateau in WebGPU adoption or instability that undermines local inference economics. </p></li></ul><h2>14. Where to place venture bets</h2><p><strong>Applications (consumer and prosumer).</strong> Founders with a distribution wedge (OEM, region, or workflow) and a clear cost plan (local + edge + cloud) seem to be in prime position. Perplexity (consumer brand + OEM motion), BrowserBase (web browsers for agents), Arc/SigmaOS (workflow depth), You.com (enterprise agents with a browser-like UX), and privacy-centric stacks (Brave) each represent one of these wedges. The de-risked angle is to fund attachments (research agents, shopping copilots, or vertical modules) that live inside multiple browsers. </p><p><strong>Infrastructure products.</strong> The biggest returns may accrue to the layers that let AI browsers run cheaply and instantly:</p><ul><li><p><strong>Web runtimes</strong> (ONNX Runtime Web, TF.js-compatible optimization services, model-to-WebGPU compilers like MLC). </p></li><li><p><strong>Edge inference</strong> with semantic caching, abuse controls, and per-publisher licensing logic (Cloudflare Workers AI/AIGateway, Akamai Cloud Inference, Fastly AI Accelerator). </p></li><li><p><strong>Search infra</strong> (crawler/index as a service for AI UIs, embeddings storage and freshness pipelines). Brave&#8217;s independent path and Perplexity&#8217;s moves show the strategic value of index control. </p></li></ul><p><strong>Network products.</strong> Invest where <strong>answers meet commerce</strong>: API gateways that price by semantic similarity, affiliate routers tuned for AI panels, and link-resolver services that run on CDNs. Early evidence: Fastly&#8217;s semantic caching and Akamai&#8217;s claims on cost/latency improvements.</p><p><strong>Open-source leverage.</strong> Back maintainers and vendors who package WebLLM/MLC-LLM, llama.cpp, or Transformers.js for enterprise browser deployments with management, policy, and observability (the &#8220;Vercel for in-browser AI&#8221; slot). </p><h2>Closing thoughts</h2><p>AI browsers are credible venture targets if you believe (1) answer-first experiences will capture high-intent moments inside the browser (2) local/edge inference will make those experiences fast and cheap (3) distribution wedges (DMA choice screens, OEM deals, differentiated workflows) can overcome default gravity. </p><p>The opportunity is in the apps as well as the &#8220;hidden pipes&#8221; that let those apps feel instant, safe, and affordable at scale. Over the next 24 months, watch three signals: <strong>distribution wins, cost curves, and ad/commerce RPMs</strong>. If two of the three break right, this sector can compound. And drag a lot of infra value up with it.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Startup Tracker #5 - Signals, Links, and What They Mean for the Stack]]></title><description><![CDATA[Compute, model serving, orchestration, identity, agents]]></description><link>https://www.infrastartups.com/p/startup-tracker-5-signals-links-and</link><guid isPermaLink="false">https://www.infrastartups.com/p/startup-tracker-5-signals-links-and</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Mon, 01 Sep 2025 22:09:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tSCL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>1. Snapshot of the week</h2><p>Recent data points to steady execution rather than headline-chasing. Integrations and partnerships dominate with AWS showing up far more than other clouds. Security, compliance, and data-center realities (power and cooling) cut across many items. Product releases skew toward &#8220;make this production-grade&#8221; over novelty: faster serving, clearer SLAs, safer defaults, and easier setup. The shape of demand is practical. Customers need predictable latency, policy controls, and measurable ROI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tSCL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tSCL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!tSCL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!tSCL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!tSCL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tSCL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png" width="768" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:694043,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/172521276?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tSCL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!tSCL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!tSCL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!tSCL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3559ae02-2f83-4c3d-9f94-38a561d14b30_768x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>2. Compute supply meets power and cooling</h2><p>Performance gains are real, but the bottleneck has shifted to watts and thermals. Groq emphasized low-latency, deterministic throughput. This is vital for voice and agent loops. </p><p>On the training and inference side, we have companies like Cerebras, SambaNova, and Lambda Labs. Their updates highlight the arms race for scale, but multiple notes tie progress back to data-center constraints: immersion cooling, rack density, and power planning. </p><p>The dependency is stark. Even the best model stack is gated by energy availability and thermal envelopes. <strong>Expect more vendors to publish &#8220;performance per dollar per watt&#8221; not just tokens per second.</strong></p><p><strong>Implication:</strong> Buyers should demand SLOs that include cache hit assumptions and queue visibility. Builders should make watt-aware autoscaling and capacity forecasts first-class features.</p><h2>3. Model serving and runtimes: production over novelty</h2><p>Together AI, Fireworks AI, Anyscale, Fal AI, Modular, Baseten, Replicate, and Banana.dev all push toward &#8220;<strong>one API, many reliable backends</strong>&#8221;. The common thread is multi-model routing, fast cold-start, cost caps, and per-route safety policies. Plus knobs for batch size, caching, and rate limits. The risk isn&#8217;t vendor lock-in so much as operational complexity. Platforms that hide the multi-provider mess while exposing policy-level controls are winning deals.</p><p><strong>Example:</strong> Fireworks and Together lean into scalable serving. Anyscale and Modal stress cluster-grade reliability. Fal AI simplifies deploying custom endpoints. For app teams, the new baseline is &#8220;swap models on Tuesday without breaking Friday deploys&#8221;.</p><h2>4. Data plumbing and activation: tight loops beat new data stores</h2><p>Hightouch earned recognition for activation and journey orchestration, signaling that reverse-ETL has matured into measurable value. LakeFS pushes versioned data and reproducibility. Featureform and Tecton pitch feature stores that bridge data teams and ML. Chroma DB and Activeloop show up in RAG workflows tied to documentation search and support deflection. Airbyte continues to be the connective tissue for sources.</p><p><strong>Pattern:</strong> The market rewards closed loops: source systems &#8594; cleaned entities &#8594; features/embeddings &#8594; outcomes. Tools that translate RAG plumbing into &#8220;fewer support tickets&#8221; or &#8220;faster onboarding&#8221; outpace generic retrieval benchmarks. Risk lives in silent failures like stale corpora, drifting chunking, and unmonitored caches.</p><h2>5. Agent reliability becomes the moat</h2><p><strong>Temporal is the quiet backbone of long-running, multi-step work</strong>. And this is exactly what agent systems need for retries, tool calls, human-in-the-loop, and checkpointing. Dagster, Prefect, and Dagger updates point the same way: idempotency, lineage, and policy as defaults. Coding agents (e.g. Cline) depend on these guarantees to avoid duplicate actions or deadlocks.</p><p><strong>Buyer checklist:</strong> Can the system recover from partial failure without babysitting? Does it store the <em>why</em> (prompts, tool calls, responses) as well as the <em>what</em> (status codes)? Can policy (PII hints, cost ceilings, VIP users) stop or reroute flows at runtime?</p><h2>6. Observability, evals, and safety: from &#8220;nice&#8221; to &#8220;blocking&#8221;</h2><p>Evidently AI shipped guidance and tooling that moves evals from notebooks into CI/CD. Superwise and Fiddler emphasize production monitoring and explainability. Arize, Comet, and Honeycomb show up where teams want drift alerts, prompt regression tests, and business metrics tied to model changes. PromptFoo remains a common choice for prompt testing. The connective tissue is measurement: changes to models, prompts, or retrieval must link to acceptance rates, NPS, and cost per interaction.</p><p><strong>Tactical move:</strong> Adopt opinionated defaults (starter test suites, coverage metrics, &#8220;fail the build&#8221; safety checks) so product and compliance can sign off without running bespoke experiments every time something changes.</p><h2>7. Security, identity, and governance now baked into design</h2><p>Security pops up in nearly every layer. Teleport for secure access and context, Aserto and Oso for authorization, Permit.io and Stytch for identity flows, Credo AI for governance. The dependency risk is upstream IAM and secrets. Many teams lean on cloud KMS and provider SDKs. If there&#8217;s a change in scopes / tokenization / quotas, then downstream systems can break in surprising ways. <strong>Treat every tool a model calls as an untrusted boundary</strong>. Standardize redaction and approval. Assume prompts and tool calls are records subject to retention.</p><p><strong>Good sign:</strong> Vendors are converging on safer context injection patterns and RBAC that travel with requests, not just services.</p><h2>8. GTM reality: integrations move deals, hyperscalers set gravity</h2><p>Integrations outperformed net-new features as deal accelerants. Hightouch&#8217;s edge is its native wiring into CRMs, ad platforms, and warehouses. As mentioned earlier, AWS appeared far more this week than Azure or GCP. It reflects customer center-of-mass and marketplace pull. Vercel, Netlify, Supabase, Render, Railway, Zeet, and Fly.io each show momentum by meeting developers where they already ship.</p><p><strong>Risk:</strong> Concentration. If a major partner tweaks pricing or marketplace terms, CAC mechanics can fail overnight. Keep a viable &#8220;no-hyperscaler&#8221; path (self-host, on-prem-friendly, or sovereign options) especially for EU and regulated buyers.</p><h3>Closing view</h3><p>The stories link cleanly across the stack: </p><ul><li><p><strong>Compute</strong> optimizes for predictability and energy</p></li><li><p><strong>Runtimes</strong> collapse complexity behind policy</p></li><li><p><strong>Data</strong> tools turn RAG into repeatable business value</p></li><li><p><strong>Orchestration</strong> makes <strong>agents</strong> reliable</p></li><li><p><strong>Evals</strong> connect changes to outcomes</p></li><li><p><strong>Security</strong> shifts left into design</p></li><li><p><strong>Integrations</strong> with the customer&#8217;s existing platforms move the pipeline. </p></li></ul><p>For founders, the opportunity is to collapse handoffs. <strong>Ship opinionated paths from data to decision with reliability, policy, and measurement built-in</strong>. For buyers, favor vendors that publish evals, integrate natively, and can explain not just &#8220;how fast&#8221; but &#8220;how predictably, at what cost, under what controls&#8221;.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Sector Deep Dive #1: REINFORCEMENT LEARNING]]></title><description><![CDATA[Companies that build and sell Reinforcement Learning products]]></description><link>https://www.infrastartups.com/p/sector-deep-dive-1-reinforcement</link><guid isPermaLink="false">https://www.infrastartups.com/p/sector-deep-dive-1-reinforcement</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Thu, 28 Aug 2025 16:03:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!u6fi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>1. The big picture you might not expect</h3><p>Reinforcement Learning (RL) is often treated like a moonshot: amazing demos but not very dependable in production. But the past few years tell a different story. There have been a handful of very practical deployments and a maturing toolchain around simulators, data, and post-training. And they&#8217;re quietly turning RL into a repeatable product category for operations, robotics, and model alignment. </p><p>Here are the unexpected parts that matter over the next 12&#8211;24 months:</p><ul><li><p><strong>RL has already paid real, measurable bills in production.</strong> DeepMind&#8217;s control system for Google data centers cut cooling energy by <strong>up to 40%</strong> when first deployed on human-in-the-loop settings in <strong>July 2016</strong>. And <strong>from August 2018,</strong> Google ran a fully autonomous controller that delivered ongoing energy savings across multiple sites. Both deployments were publicly documented by the teams involved, which is rare for infra case studies.</p></li><li><p><strong>Instruction-following AI exists because RL works at scale.</strong> OpenAI&#8217;s InstructGPT paper (posted <strong>March 4, 2022</strong>) showed human evaluators preferred outputs from a <strong>1.3B</strong> parameter RLHF-tuned model over the original <strong>175B</strong> GPT-3. That single result is why almost every modern model stack includes RLHF (Reinforcement Learning with Human Feedback) or a cousin such as DPO (Direct Preference Optimization) / GRPO (Group Relative Policy Optimization) in production alignment.</p></li><li><p><strong>Simulation is the hidden kingmaker.</strong> Microsoft&#8217;s Project Bonsai and <strong>Siemens</strong> demonstrated <strong>&gt;30x</strong> faster CNC auto-calibration in <strong>May 2018</strong>, with a domain expert (not an ML specialist) building the agent. Today Bonsai plugs into <strong>MathWorks Simulink</strong> and <strong>AnyLogic</strong>, letting teams train safely in simulation and ship to factory lines with fewer surprises. </p></li><li><p><strong>Grocery and 3PL logistics are RL&#8217;s commercial wedge.</strong> <strong>Ocado Group</strong> bought <strong>Kindred Systems</strong> and <strong>Haddington Dynamics</strong> for <strong>$262M</strong> and <strong>$25M</strong> in <strong>November 2020</strong>, explicitly citing deep RL in Kindred&#8217;s picking approach. <strong>Covariant</strong> is live with 3PL <strong>Radial</strong>, and <strong>Symbotic</strong> expanded its Walmart partnership on <strong>January 16, 2025</strong> by acquiring Walmart&#8217;s robotics unit for <strong>$200M</strong> and securing a <strong>$520M</strong> development program. This is evidence that large retailers will keep writing large checks for autonomy that improves unit economics.</p></li><li><p><strong>Education-to-enterprise funnels are changing shape.</strong> <strong>AWS</strong> is retiring the centrally hosted DeepRacer League after <strong>2024</strong>. The service remains in console through <strong>December 2025</strong> and transitions to an <strong>AWS Solution</strong> you can run in your own account. Expect &#8220;inside-the-enterprise&#8221; leagues connected to internal simulators and proprietary data. </p></li><li><p><strong>The product stack is global.</strong> The most credible RL product proof points span the <strong>US</strong> (Microsoft, AWS, Covariant, Symbotic), <strong>UK</strong> (DeepMind, Ocado), <strong>Germany</strong> (BioNTech&#8217;s <strong>January 2023</strong> acquisition of <strong>InstaDeep</strong>), and <strong>China</strong> (Baidu&#8217;s RL work in robotics and autonomy). This matters because buyers often prefer local support and local compliance expertise for factory deployments. </p></li><li><p><strong>Capital markets haven&#8217;t given up on RL.</strong> <strong>JPMorgan</strong>&#8217;s LOXM project (publicly discussed <strong>2017&#8211;2018</strong>) used policy learning for execution with guardrails and supervision. Expect more &#8220;bandits + rules + audits&#8221; than end-to-end black-box RL in finance. </p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u6fi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u6fi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!u6fi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!u6fi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!u6fi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u6fi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png" width="768" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:919284,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/171920628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u6fi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!u6fi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!u6fi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!u6fi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F058d3991-403a-4a0f-8a1a-eac54e7c9735_768x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>2. Where RL is actually delivering value (and why it&#8217;s defensible)</h3><p><strong>Industrial control / energy efficiency.</strong> Datacenter cooling is a canonical example because the objectives are simple (keep PUE low, avoid thermal excursions) but the dynamics are messy. DeepMind showed <strong>up to 40%</strong> cooling energy reduction (2016) followed by an autonomous controller saving roughly <strong>30%</strong> across multiple Google sites (2018). This shows RL can run <strong>24/7</strong> provided you bound the action space and keep human oversight for safety. The buyer value here is repeatable OPEX savings, not marginal accuracy points.</p><p><strong>Precision calibration and tuning.</strong> Siemens and Microsoft&#8217;s Project Bonsai demonstrated a <strong>&gt;30x</strong> speed-up for CNC calibration in <strong>2018</strong>. One axis calibrated in <strong>~13 seconds</strong> while matching expert precision, and the system was built by a subject-matter expert using &#8220;machine teaching&#8221; rather than a research team writing algorithms. Longevity matters: Bonsai is now integrated with <strong>Simulink</strong> and <strong>AnyLogic</strong>, making sim-to-plant workflows more accessible to control engineers.</p><p><strong>Warehouses and last-mile retail logistics.</strong> Logistics-grade picking demands adaptable perception and control. <strong>Kindred</strong> (now part of <strong>Ocado</strong>) leaned on deep RL for dexterous piece-picking. <strong>Ocado</strong> keeps emphasizing RL across its &#8220;on-grid pick&#8221; automation. <strong>Covariant</strong> landed deployments with <strong>Radial</strong> (announced <strong>February 10, 2023</strong>). <strong>Symbotic</strong> posted <strong>$1.822B</strong> FY-2024 revenue and deepened Walmart ties in <strong>January 2025</strong>, adding hundreds of accelerated pickup &amp; delivery centers (APDs) to its roadmap. This is an ecosystem signal that warehouses will standardize on platforms where RL can be an embedded component.</p><p><strong>Biotech and complex optimization.</strong> <strong>BioNTech</strong> agreed to acquire <strong>InstaDeep</strong> in <strong>January 2023</strong> (deal up to <strong>~&#8364;562M</strong>). While much of the public narrative focused on discovery, the near-term operational wins are often in scheduling, experiment design, and supply-chain optimization. These are classic RL-friendly problems with constrained action spaces and strong simulators.</p><p><strong>Quant/execution.</strong> <strong>JPMorgan</strong>&#8217;s LOXM (first reported <strong>July 2017</strong>) used RL concepts for execution improvement. The design takeaway for startups is not &#8220;ship end-to-end RL&#8221;, but <strong>wrap RL with supervision, audit logs, and rule-based safety</strong>. That&#8217;s how you pass risk committees.</p><h3>3. Who buys, why they sign, and what convinces them</h3><p><strong>Plant managers and control engineers</strong> buy when you can prove <strong>bounded exploration</strong> (no runaway actuators) and <strong>short time-to-value</strong>. The <strong>Siemens + Bonsai</strong> result lands because it collapsed calibration time to seconds on some axes without sacrificing precision. And did so with a domain expert building the agent, not a research lab parachuting in. That makes RL feel like a tool, not a science project.</p><p><strong>Logistics and e-commerce operations leaders</strong> buy steadier throughput and fewer mis-picks that integrate cleanly with WMS/ERP. Kindred&#8217;s history with <strong>Gap</strong> and <strong>American Eagle</strong> pre-acquisition showed real merchant tolerance for RL-powered picking. <strong>Covariant</strong>&#8217;s <strong>Radial</strong> rollout demonstrates that 3PLs (who standardize across many sites) are willing to pick a platform and expand. <strong>Symbotic</strong>&#8217;s Walmart expansion underscores that once a retailer standardizes, follow-on scope (like APDs) can be large and fast. </p><p><strong>CIO/CTO buyers in regulated industries</strong> will ask for <strong>verification</strong> (what happens under edge cases?), <strong>observability</strong> (what did the policy see and do?), and <strong>roll-back</strong> (how do we safely disable and revert?). Vendors need to bundle formal verification or &#8220;safe RL&#8221; claims with simulator-backed testing. Something like <strong>NVIDIA Isaac Sim</strong> for robotics or <strong>AnyLogic</strong> for discrete-event systems. These vendors get a smoother reception when pilots transition to production.</p><p><strong>Data science and LLM platform teams</strong> buy RLHF-style post-training to make models useful to end users. The <strong>InstructGPT</strong> result (1.3B beating 175B on human preference) remains a watershed that budget owners still cite when defending RLHF spend. </p><h3>4. The product stack behind successful RL deployments</h3><p><strong>Simulators and digital twins.</strong> You don&#8217;t let a learning controller &#8220;trial-and-error&#8221; on a live kiln or warehouse unless it&#8217;s practiced extensively in a high-fidelity simulator. That&#8217;s why connectors and toolchains matter:</p><ul><li><p><strong>MathWorks Simulink + Microsoft Project Bonsai</strong> (announced <strong>May 19, 2020</strong>) allows control engineers to reuse existing Simulink models as training environments.</p></li><li><p><strong>AnyLogic + Project Bonsai</strong> (announced <strong>July 14, 2020</strong>) supplies an official connector and wrapper for quick simulator hookups, which is good for factories and logistics networks modeled in discrete-event or agent-based styles.</p></li><li><p><strong>NVIDIA Isaac Sim</strong> provides physics-accurate robot simulation to train and test RL policies before touching the real arm. </p></li></ul><p><strong>Foundational RL libraries:</strong></p><ul><li><p><strong>Ray RLlib</strong> (Anyscale) remains a widely used distributed RL library that &#8220;just works&#8221; at cluster scale. </p></li><li><p><strong>Unity ML-Agents</strong> bridges game-quality 3D simulation with RL for robotics and control. </p></li><li><p><strong>Gymnasium</strong> (community successor to <strong>OpenAI Gym</strong>) standardized the environment API used across the ecosystem. </p></li><li><p><strong>TF-Agents</strong> (Google&#8217;s TensorFlow team) is still useful where TensorFlow is entrenched. </p></li><li><p><strong>Intel Coach</strong> is an older but illustrative example of chip-vendor RL tooling from Intel&#8217;s AI lab. </p></li></ul><p><strong>Cloud RL services:</strong></p><ul><li><p><strong>Microsoft Project Bonsai</strong> (Bonsai acquired <strong>June 2018</strong>, public preview <strong>May 2020</strong>) focuses on &#8220;machine teaching&#8221; for subject-matter experts and integrates with leading simulators. </p></li><li><p><strong>AWS SageMaker RL</strong> (announced <strong>November 28, 2018</strong>) offers managed RL containers and RLEstimator. It integrates toolkits like RLlib and Intel Coach and supports commercial/custom environments.</p></li></ul><h3>5. Libraries and frameworks: how to choose (fast)</h3><p>Teams keep asking the same question: &#8220;Which RL/RLHF stack do we pick, and when?&#8221; Here&#8217;s a practical guide grounded in currently maintained projects and their stated capabilities:</p><ul><li><p><strong>Need something quick that fits Hugging Face? &#8594; Use TRL.</strong> <br>Hugging Face&#8217;s <strong>TRL</strong> library has ready-made trainers (PPO, DPO, GRPO) and copy-paste examples that work with the HF model ecosystem. It&#8217;s the fastest way to get an RLHF loop running without standing up lots of infra. </p></li><li><p><strong>Training very large models, from one GPU up to massive clusters? &#8594; Use NVIDIA NeMo-RL or ByteDance&#8217;s verl.</strong><br><strong>NeMo-RL</strong> targets production-scale RLHF for LLMs (100B-class model claims in docs/marketing) and integrates with NeMo&#8217;s distributed training stack. <strong>verl</strong> (from ByteDance/Volcengine) is an open-source RLHF system designed for speed and scale. If you&#8217;re already in NVIDIA land, NeMo-RL is the natural fit. If you want a lean OSS stack that scales, <strong>verl</strong> is a strong option. </p></li><li><p><strong>Already run a Megatron/vLLM/Ray-style cluster and want a full RLHF setup? &#8594; Use Alibaba ROLL or Zhipu/THUDM SLiME.</strong><br><strong>ROLL</strong> (Alibaba) focuses on high-throughput RLHF for big GPU fleets. <strong>SLiME</strong> (THUDM/Zhipu AI ecosystem) explicitly connects <strong>Megatron</strong> training with <strong>SGLang</strong> serving for scaled RLHF. Both target production post-training on large clusters. (If you&#8217;re deeply on Ray/vLLM, <strong>OpenRLHF</strong> is also built exactly for that combo.) </p></li><li><p><strong>Training across many separate or partly untrusted machines? &#8594; Use prime-rl.</strong><br><strong>prime-rl</strong> is a fully asynchronous, distributed RL/RLHF system designed for flaky or heterogeneous clusters. Its authors used it to train the <strong>INTELLECT-2</strong> model. If your infra looks like a federation of nodes rather than a tidy HPC cluster, this is built for you. </p></li><li><p><strong>Want RL for tool-using chatbots/agents right now? &#8594; Try SkyRL or OpenPipe/ART.</strong><br><strong>SkyRL</strong> (Sky Computing/UC Berkeley contributors) includes an &#8220;agent gym&#8221; for long-horizon tool use and evaluation. <strong>ART</strong> (from OpenPipe) focuses on reliable <strong>GRPO</strong> training for agents, with practical recipes rather than a heavy platform. These are aimed squarely at agentic tasks, not only static benchmarks. </p></li><li><p><strong>Mostly doing instruction-tuning (not full RL)? &#8594; Use AI2&#8217;s Open-Instruct.</strong><br><strong>Open-Instruct</strong> from the Allen Institute (AI2) is a clean, simple codebase for instruction/post-training pipelines. It&#8217;s great when you don&#8217;t need RL loops. </p></li><li><p><strong>Just want the core DPO/PPO bits in plain PyTorch? &#8594; Use torchtune.</strong><br><strong>torchtune</strong> (Meta) ships PyTorch-native recipes and losses for PPO, DPO, and GRPO without the extra layers of large frameworks. This is useful for teams that prefer minimal abstractions. </p></li><li><p><strong>Need GRPO plus built-in environments and eval tools? &#8594; Use willccbb/verifiers.</strong><br><strong>verifiers</strong> is a modular GRPO/DPO training/eval toolkit that works with Hugging Face&#8217;s Trainer and can plug into other stacks like <strong>prime-rl</strong>. Good for standing up an end-to-end loop with credible evaluation. </p></li><li><p><strong>Reproducing frontier &#8220;reasoning&#8221; agents (e.g. R1/O-series-style research)? &#8594; Use agentica-project/rLLM.</strong><br><strong>rLLM</strong> is an academic, all-in-one framework to train LLM agents with RL, maintained by the <strong>Agentica Project</strong> (with <strong>UC Berkeley/Sky Computing</strong> involvement). Choose this when you need research faithfulness and multi-env support more than a polished enterprise UX. </p></li></ul><p>Two closing notes on the framework landscape: (a) you can mix and match e.g. simulate in <strong>Isaac Sim</strong> or <strong>AnyLogic</strong>, train with <strong>NeMo-RL</strong> or <strong>ROLL</strong>, align with <strong>TRL</strong> or <strong>torchtune</strong>, and serve with <strong>vLLM/SGLang</strong> (b) expect consolidation: winners will be those that meet infra teams where they are (Kubernetes, Slurm, on-prem GPU pools) and play nice with existing observability/logging.</p><h3>6. The companies and ecosystems you&#8217;ll keep hearing about</h3><p><strong>Cloud platforms and labs:</strong></p><ul><li><p><strong>Microsoft</strong> (<strong>Project Bonsai</strong>): Deep integrations with <strong>Simulink</strong> and <strong>AnyLogic</strong> keep it attractive for industrial control. </p></li><li><p><strong>AWS:</strong> <strong>SageMaker RL</strong> (since <strong>2018</strong>) and the <strong>DeepRacer</strong> education funnel (league ending <strong>2025</strong>, service available through <strong>December 2025</strong>, new <strong>AWS Solution</strong> form).</p></li><li><p><strong>Google DeepMind</strong>: The data-center cooling results (<strong>2016/2018</strong>) remain the go-to reference for &#8220;RL in critical infrastructure&#8221;. </p></li><li><p><strong>OpenAI</strong>: InstructGPT (<strong>March 4, 2022</strong>) codified RLHF as the default alignment step in modern LLM pipelines. </p></li><li><p><strong>Baidu</strong>: Active RL/autonomy research (e.g. RL for robotics and traffic signal control, Apollo RL platform papers), signaling ongoing investment on the China side of the market. </p></li><li><p><strong>IBM / Intel / Salesforce</strong>: Ecosystem contributors (Intel&#8217;s <strong>Coach</strong> RL library, Salesforce research has released performance-minded RL tools historically).</p></li></ul><p><strong>Simulation products / RL Environment / RL-as-a-Service:</strong></p><ul><li><p><strong>MathWorks (Simulink)</strong> and <strong>AnyLogic</strong>: Official connectors with <strong>Project Bonsai</strong> with enterprise-friendly entry points. </p></li><li><p><strong>NVIDIA Isaac Sim</strong>: Physics-accurate sim for robot policy training and validation. </p></li><li><p><strong>Unity ML-Agents</strong>, <strong>Ray RLlib (Anyscale)</strong>, <strong>Gymnasium/Gym</strong>, <strong>TF-Agents</strong>: The open-source backbone for many RL stacks.</p></li><li><p>Companies like <strong>Applied Compute, Veris AI, Kaizen, Mechanize, and Osmosis</strong> are providing RL infra and services to let customers infuse RL into their products.</p></li></ul><p><strong>Robotics and logistics:</strong></p><ul><li><p><strong>Ocado Group</strong>: As mentioned earlier, they acquired <strong>Kindred Systems</strong> (<strong>$262M</strong>) and <strong>Haddington Dynamics</strong> (<strong>$25M</strong>) in <strong>Nov 2020</strong>. Repeatedly calls out deep RL for picking.</p></li><li><p><strong>Kindred Systems</strong>: Piece-picking. Past customers include <strong>Gap</strong> and <strong>American Eagle</strong>. </p></li><li><p><strong>Haddington Dynamics</strong>: Low-cost dexterous arms. Acquired for <strong>$25M</strong>. </p></li><li><p><strong>Covariant</strong>: Deployed with <strong>Radial</strong> in <strong>2023</strong>. Strong 3PL fit. </p></li><li><p><strong>Micropsi Industries</strong>: &#8220;MIRAI&#8221; product adapts to variability in tasks like cable assembly. RL-style learning under the hood. </p></li><li><p><strong>OSARO</strong>: Picking and depalletizing software with learning-based control. </p></li><li><p><strong>Vicarious</strong>: Acquired by Alphabet&#8217;s <strong>Intrinsic</strong> in <strong>2022</strong>, pointing to consolidation of manipulation/learning talent.</p></li><li><p><strong>Symbotic</strong>: Public warehouse-automation bellwether. <strong>$1.8B</strong> FY-2024 revenue and a <strong>Jan 16, 2025</strong> deal to acquire Walmart&#8217;s robotics unit for <strong>$200M</strong>, paired with a <strong>$520M</strong> development program covering <strong>400</strong> APDs over time. </p></li></ul><p><strong>Healthcare and biotech:</strong></p><ul><li><p><strong>BioNTech</strong>: Acquired <strong>InstaDeep</strong> (<strong>Jan 10, 2023</strong>) to bring advanced AI (including RL) in-house for discovery and operations.</p></li></ul><p><strong>Finance:</strong></p><ul><li><p><strong>JPMorgan</strong>: LOXM execution agent (reported <strong>2017</strong>), an early example of RL-style policy learning with controls in a high-stakes domain. <a href="https://www.ft.com/content/16b8ffb6-7161-11e7-aca6-c6bd07df1a3c?utm_source=chatgpt.com">Financial Times</a></p></li></ul><p><strong>Smaller companies</strong>:</p><ul><li><p><strong>BeChained</strong> (industrial energy optimization), <strong>Predictiva</strong> (trading), <strong>Telemus AI</strong> (RL training/eval tools), <strong>PLAIF</strong> (ROS-to-KEBA control demos), <strong>Surge AI</strong> (data labeling used in RLHF pipelines). Each points at niche opportunities in energy, finance, robotics control, and data operations.</p></li><li><p>Ecosystem names appearing as adopters/partners include <strong>Gap</strong>, <strong>American Eagle</strong>, <strong>Walmart</strong>, and <strong>Radial</strong>.</p></li></ul><h3>7. Go-to-market patterns, risks, and moats you can actually underwrite</h3><p><strong>The sim-first deployment loop is a moat.</strong> If your RL product depends on high-fidelity simulators and digital twins, integration depth with <strong>Simulink</strong>, <strong>AnyLogic</strong>, or <strong>Isaac Sim</strong> becomes a practical switching cost. Once a control policy is validated against a company&#8217;s &#8220;digital plant&#8221;, ripping it out is painful. And this is especially true if you&#8217;ve also instrumented observability, roll-back, and safety checkers around the policy.</p><p><strong>Education channels are moving in-house.</strong> <strong>AWS DeepRacer</strong> seeded hundreds of thousands of learners, but the league&#8217;s retirement after <strong>2024</strong> and the shift to an <strong>AWS Solution</strong> in <strong>2025</strong> signals a new model: companies will run their own &#8220;leagues&#8221;, tie them to internal simulators and datasets, and keep IP in-house. Vendors who support that motion (private clouds, custom tracks, enterprise SSO) will win training budgets and later production work.</p><p><strong>Data and alignment work is sticky.</strong> Because most LLM stacks now include RLHF (thanks to <strong>InstructGPT&#8217;s</strong> result), any vendor who supplies reliable feedback data (e.g. labeling platforms such as <strong>Surge AI</strong>) or dependable reward/eval tooling (e.g. <strong>verifiers</strong>) can become embedded in model-lifecycle operations. This is an emerging moat that doesn&#8217;t look like &#8220;classic SaaS&#8221;, but behaves like it in practice.</p><p><strong>Consolidation is a feature, not a bug.</strong> <strong>Ocado&#8217;s</strong> purchases of <strong>Kindred</strong> and <strong>Haddington</strong> and <strong>Intrinsic&#8217;s</strong> acquisition of <strong>Vicarious</strong> show that large buyers prefer packaged stacks with talent attached. For startups that only provide specific tools, that means the exit path is often &#8220;get three logos, prove reliability, and get acquired&#8221;. If you want to swing for a long-run IPO, you need to grow out of pure-play offering and build a full stack offering.</p><p><strong>Geography matters.</strong> Local compliance and support remains a gating factor for factory and logistics deployments. The center of gravity is multinational (<strong>US</strong>, <strong>UK</strong>, <strong>Germany</strong>, <strong>China</strong>), so startups that find the right regional system integration partners (or ride with Microsoft/AWS channel programs) will scale faster than pure-direct sellers. </p><h3>8. How to connect the dots to infra startups (dependencies, correlations, and &#8220;gotchas&#8221;)</h3><p><strong>GPU supply and cluster managers &#8594; which RLHF framework wins.</strong> If your customer already runs <strong>Megatron</strong> for pretraining and <strong>SGLang/vLLM</strong> for serving, <strong>SLiME</strong> or <strong>ROLL</strong> will feel native. If they live in <strong>NeMo</strong> land, <strong>NeMo-RL</strong> wins by default. If they want total flexibility or untrusted nodes, <strong>prime-rl</strong> unlocks federated training. The point is that infra choices decide the RLHF tool <strong>before</strong> a modeler opens a notebook. </p><p><strong>Simulator availability &#8596; sales velocity.</strong> The fastest deployments happen where the buyer already maintains trusted simulators (Simulink for control, AnyLogic for operations, Isaac Sim for robots). If a prospect cannot simulate, your sales cycle includes a modeling project. Time-to-value stretches out and your gross margin takes a hit.</p><p><strong>Data labeling and evaluation &#8594; sticky, recurring services.</strong> RLHF needs high-quality preference data and robust evaluations. That creates a repeat services layer (often billed on volume or seats) that compounds over time and raises switching costs. This is subtle, but powerful. The <strong>verifiers</strong> framework codifies evals. Data vendors like <strong>Surge AI</strong> are common in RLHF case studies.</p><p><strong>Retail automation deals ripple through the stack.</strong> The <strong>Symbotic&#8211;Walmart</strong> expansion isn&#8217;t just a warehouse story. It pulls in upstream component vendors (arms, vision systems), software (WMS integration), and sometimes nearby last-mile tech (APDs). Startups supplying perception, grasp planning, or scheduling can ride these waves even if they aren&#8217;t the &#8220;prime&#8221; vendor.</p><p><strong>Safety and audit features are not optional.</strong> Particularly in finance and heavy industry, buyers will demand logs, simulators for &#8220;what if&#8221; replays, and override circuits. LOXM&#8217;s early disclosures and the widespread use of guardrails in enterprise LLM deployments show that RL succeeds commercially when paired with simple, explainable controls.</p><h3>9. Risks, surprises, and what to watch in the next 24 months</h3><p><strong>Sim-to-real gaps can bite.</strong> Even with good models, differences between simulation and reality can cause regressions. The mitigation is boring but effective: domain randomization, staged rollouts, and layered safety constraints. Vendors with proven simulator connectors (Simulink/AnyLogic/Isaac) and robust A/B failovers have an edge.</p><p><strong>Vendor stability and consolidation risk.</strong> If your RL vendor gets acquired (e.g. <strong>Vicarious</strong> &#8594; <strong>Intrinsic</strong> in <strong>2022</strong>) or pivots, your roadmap may change overnight. Large buyers like <strong>Ocado</strong> handle this by buying the capability outright. If you&#8217;re an investor, favor startups that integrate with the buyer&#8217;s existing simulators and control stack. This reduces &#8220;platform hostage&#8221; risk at renewal.</p><p><strong>Education channels are moving away from centrally hosted showcases.</strong> With <strong>DeepRacer</strong>&#8217;s league ending after <strong>2024</strong>, teams will need new ways to upskill engineers. That could slow top-of-funnel unless vendors provide simple, self-hosted training kits and enterprise competitions. The flip side: internal leagues may produce <strong>more</strong> deployable pilots because they&#8217;re built on company models and data from day one. </p><p><strong>Regulatory and safety scrutiny.</strong> As RL touches physical systems and financial execution, expect more audit requirements. Startups that package policy introspection and &#8220;explainable controls&#8221; will find compliance less of a throttle.</p><p><strong>Catalysts to watch near term:</strong></p><ul><li><p><strong>Walmart&#8211;Symbotic</strong> APD deployments moving from design to rollout. Watch for first-site go-lives and backlog updates. </p></li><li><p>Deeper <strong>Simulink</strong>/<strong>AnyLogic</strong> integrations (connectors, templates) that shorten time-to-pilot for industrial buyers.</p></li><li><p><strong>NeMo-RL</strong>/<strong>ROLL</strong>/<strong>SLiME</strong> performance wins on big clusters and better agent stacks (e.g. <strong>SkyRL</strong>, <strong>ART</strong>) proving stable tool use over long horizons. </p></li><li><p>Internal &#8220;leagues&#8221; at F500s replacing <strong>DeepRacer</strong> as a talent funnel. </p></li></ul><p><strong>Names you&#8217;ll keep encountering (complete coverage of earlier mentions):</strong></p><ul><li><p>Platforms/labs: <strong>Microsoft (Project Bonsai)</strong>, <strong>AWS (SageMaker RL / DeepRacer)</strong>, <strong>Google DeepMind</strong>, <strong>OpenAI</strong>, <strong>Baidu</strong>, <strong>IBM</strong>, <strong>Intel</strong>, <strong>Salesforce</strong>.</p></li><li><p>RL infra products: <strong>MathWorks (Simulink)</strong>, <strong>AnyLogic</strong>, <strong>NVIDIA Isaac Sim</strong>, <strong>Unity ML-Agents</strong>, <strong>Ray RLlib (Anyscale)</strong>, <strong>Gymnasium/Gym</strong>, <strong>TF-Agents</strong>, <strong>Intel Coach</strong>, <strong>Applied Compute, Veris AI, Kaizen, Mechanize, Osmosis</strong>.</p></li><li><p>Robotics/logistics: <strong>Ocado Group</strong>, <strong>Kindred Systems</strong>, <strong>Haddington Dynamics</strong>, <strong>Covariant</strong>, <strong>Radial</strong>, <strong>Micropsi Industries</strong>, <strong>OSARO</strong>, <strong>Vicarious (Intrinsic)</strong>, <strong>Symbotic</strong>, plus adopters <strong>Gap</strong>, <strong>American Eagle</strong>, <strong>Walmart</strong>.</p></li><li><p>Healthcare/biotech: <strong>BioNTech</strong>, <strong>InstaDeep</strong>.</p></li><li><p>Finance: <strong>JPMorgan</strong>.</p></li><li><p>Startups/tools: <strong>BeChained</strong>, <strong>Predictiva</strong>, <strong>Telemus AI</strong>, <strong>PLAIF</strong>, <strong>Surge AI</strong>.</p></li><li><p>New RL/RLHF stacks: <strong>TRL</strong>, <strong>NVIDIA NeMo-RL</strong>, <strong>ByteDance/volcengine verl</strong>, <strong>Alibaba ROLL</strong>, <strong>Zhipu/THUDM SLiME</strong>, <strong>prime-rl</strong>, <strong>SkyRL</strong>, <strong>OpenPipe/ART</strong>, <strong>AI2 Open-Instruct</strong>, <strong>torchtune</strong>, <strong>willccbb/verifiers</strong>, <strong>agentica-project/rLLM</strong>.</p></li></ul><p>If you&#8217;re tracking this sector for venture, the investable themes over the next two years are:</p><ol><li><p><strong>Sim-tied RL for operations</strong> where you can measure OPEX savings quickly (HVAC, calibration, scheduling).</p></li><li><p><strong>Robot manipulation/picking stacks</strong> that demonstrate site-to-site generalization and clean WMS/ERP hooks.</p></li><li><p><strong>RLHF infrastructure</strong> (data, eval, training frameworks) that meets infra teams where they are (Kubernetes/Slurm, Megatron, NeMo, vLLM/SGLang) and ships with the guardrails enterprises demand.</p></li></ol><h3>10. Why these pieces fit together</h3><p>Think of RL as <strong>&#8220;learned control&#8221;</strong> rather than &#8220;AI magic&#8221;. In a factory or warehouse, you already have sensors and actuators. </p><p>The missing piece is a <strong>policy</strong> that maximizes a <strong>goal</strong> while avoiding <strong>bad states</strong>.</p><ul><li><p>Policy is a set of choices about which action to take in each situation</p></li><li><p>Goal is something like energy savings, picks per hour, or calibration accuracy.</p></li><li><p>Bad state is something like overheating, collisions, or mis-picks.</p></li></ul><p>What makes RL <strong>commercially usable now</strong> are three things:</p><ol><li><p><strong>Simulation first, deployment later.</strong> When you can train policies in a digital twin (Simulink, AnyLogic, Isaac Sim), you sidestep most of the risk. That&#8217;s why the Siemens + Bonsai story resonates: a domain expert could encode the task and use a platform to do the heavy lifting.</p></li><li><p><strong>Tooling that meets infra where it lives.</strong> In LLM land, RLHF stacks like <strong>TRL</strong>, <strong>NeMo-RL</strong>, <strong>verl</strong>, <strong>ROLL</strong>, <strong>SLiME</strong>, and <strong>prime-rl</strong> now align with the way infra teams actually run workloads (on Kubernetes, Slurm, or tightly packed DGX pods). Many of these stacks come with sane defaults and recipes so teams can spend more time on <strong>what</strong> to optimize and less on <strong>how</strong> to wire up the training loop. </p></li><li><p><strong>Proof that buyers will pay when outcomes are clear.</strong> Energy bills and pick-rates are easy to measure. A pilot that shows <strong>30&#8211;40%</strong> energy savings or steady throughput uplift writes its own business case. That&#8217;s why <strong>DeepMind / Google</strong>, <strong>Ocado / Kindred</strong>, <strong>Covariant / Radial</strong>, and <strong>Symbotic / Walmart</strong> matter.</p></li></ol><h3>What this means for venture</h3><ul><li><p><strong>Don&#8217;t fund research projects. Fund &#8220;boring excellence&#8221;.</strong> The winners aren&#8217;t the flashiest algorithms. They&#8217;re the teams who make deployments predictable and safe, with rock-solid simulator hooks and guardrails.</p></li><li><p><strong>Back the &#8220;glue&#8221; layers.</strong> Evaluation suites (e.g. <strong>verifiers</strong>), high-quality preference data vendors, and integration connectors are under-invested and deeply sticky.</p></li><li><p><strong>Assume consolidation.</strong> If a startup gets three industrial logos and shows solid uptime, it&#8217;s a candidate for acquisition (as <strong>Vicarious</strong> and <strong>Kindred/Haddington</strong> show). Invest with that outcome in mind. </p></li><li><p><strong>Expect internal leagues to replace showcase programs.</strong> As <strong>DeepRacer</strong> transitions, enterprises will &#8220;own&#8221; their RL training funnels. That&#8217;s a place for startups to sell hosted competitions, simulator content, and analytics dashboards behind the firewall. </p></li></ul><h3>Closing thought</h3><p>Reinforcement learning stopped being a lab toy the moment it started saving money in data centers and picking real items in warehouses. The next two years won&#8217;t be about one grand breakthrough. They&#8217;ll be about <strong>repeatable, simulator-backed deployments</strong> across factories and logistics, and <strong>RLHF stacks</strong> that make large models actually helpful. If you invest in the pieces that make those two motions <strong>boring and reliable</strong>, you&#8217;re investing where the value will quietly compound.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Startup Tracker #4 - What moved, why it matters]]></title><description><![CDATA[Multimodal features, agent workflow tooling, and model quality evaluation utilities]]></description><link>https://www.infrastartups.com/p/startup-tracker-4-what-moved-why</link><guid isPermaLink="false">https://www.infrastartups.com/p/startup-tracker-4-what-moved-why</guid><dc:creator><![CDATA[Prateek Joshi]]></dc:creator><pubDate>Mon, 25 Aug 2025 19:23:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!X7Uw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>1. Snapshot of the week</h3><p>The center of gravity was product shipping. About a third of updates were new releases or major version bumps. The heaviest clustering was around multimodal features, agent workflow tooling, and model-quality evaluation utilities. Partnerships and small capital moves also featured, but the bigger story is that infra vendors are packing more end-to-end capability into their stacks: retrieval, agents, evals, and deployment are increasingly bundled rather than bought separately.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X7Uw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X7Uw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!X7Uw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!X7Uw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!X7Uw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X7Uw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png" width="768" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:694382,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.infrastartups.com/i/171913532?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X7Uw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 424w, https://substackcdn.com/image/fetch/$s_!X7Uw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 848w, https://substackcdn.com/image/fetch/$s_!X7Uw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 1272w, https://substackcdn.com/image/fetch/$s_!X7Uw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9a87ef2-fe74-4e69-9788-509ee964fb20_768x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>2. The shift to &#8220;agentic&#8221; stacks</h3><p>Multiple companies advanced agent and workflow automation. <strong>Together AI</strong>&#8217;s updates emphasized multi-step agents that compose tools, retrieval, and model calls to handle complex tasks end-to-end. <strong>Buildkite</strong> highlighted AI agents inside CI, triaging failures and suggesting fixes rather than just failing builds. </p><p><strong>The pattern is consistent: more systems are moving from &#8220;assistive interface&#8221; to &#8220;closed-loop executor&#8221;</strong>. This increases demand on orchestration, sandboxing, and audit trails. The connection is that as agents act, observability and controls move from &#8220;nice to have&#8221; to &#8220;ship-blocker&#8221;.</p><h3>3. Cost and latency: practical wins over theoretical speed</h3><p>Several releases focused on reducing inference bills and making response times predictable. <strong>Groq</strong>&#8217;s push on prompt caching is emblematic: cache the static prefix, pay only for the new tokens, and you cut cost and tail latency for chat UIs and code assistants. </p><p>That theme shows up elsewhere too &#8212; runtime-level optimizations, smarter batching, and memory-aware serving. The dependency to watch is hardware supply. Even clever serving tricks still rely on GPU availability and scheduling, which continue to shape roadmaps and pricing.</p><h3>4. RAG is growing up (quietly)</h3><p>Retrieval isn&#8217;t grabbing headlines anymore, but it&#8217;s getting sturdier. Several updates blended vector retrieval with higher-quality indexing and guardrails. Teams that once shipped &#8220;RAG v0&#8221; are now focused on document chunking strategies, embedding refresh cadence, and permissions-aware search. <strong>Together AI, Seldon</strong>, and others referenced improvements in retrieval and embeddings alongside workflow features. </p><p>The correlation this week: when an agent feature shipped, a retrieval or embedding upgrade often shipped with it. This is evidence that practical agents still hinge on grounded context, not just bigger prompts.</p><h3>5. The safety, evals, and governance layer is consolidating</h3><p>Model-quality and red-team tooling kept pace with the agent push. <strong>Evidently AI</strong> refreshed guidance on classification metrics and LLM evaluation. <strong>PromptFoo</strong> rolled out moderation tooling and highlighted a recent funding round focused on safety features. </p><p>The connection is direct: as more apps perform actions (not merely answer questions), teams need reproducible evals, jailbreak resistance, and change-management for prompts and policies. Risk is migrating from &#8220;bad answer&#8221; to &#8220;bad action&#8221;, so evals are moving from offline dashboards into pre-deployment gates and run-time guardrails.</p><h3>6. Data platforms are asserting their role in AI</h3><p>Warehouse-native and lake-native players continued to lean into AI data workflows. <strong>Hightouch</strong> emphasized identity and activation primitives that sit on the warehouse rather than siphoning data into another tool. <strong>LakeFS</strong> underscored versioning and branch-and-merge patterns for data, treating training and evaluation sets more like code. <strong>MotherDuck</strong> kept pushing easy analytics on top of DuckDB for teams that want small, fast pipelines without heavy infra. </p><p><strong>The dependency thread</strong>: successful AI launches increasingly depend on three mundane but critical data capabilities &#8212;lineage, time-travel/versioning, and permissioning mapped to business entities.</p><h3>7. Multimodal moves go from demos to workflows</h3><p>Several launches centered on image/video generation and editing. Plus speech/vision add-ons that plug into existing apps. <strong>Fal AI</strong> expanded image-editing and multimodal inference options. We also saw more &#8220;instant model libraries&#8221; for creative tasks that can be wired into production without heavy ops. </p><p><strong>The correlation to watch</strong>: multimodal features often arrived packaged with either a runtime optimization (to keep costs in check) or an agent/workflow wrapper to make them usable in real processes (not just in a demo).</p><h3>8. Partnerships and certifications: selling to the real world</h3><p>A noticeable share of updates were integrations and certifications: net-new connectors into developer platforms, plus security and biometric credentials. <strong>Paravision</strong>&#8217;s recent recognition on the security/compliance front fits a broader pattern: buyers are asking for proof. </p><p><strong>PromptFoo</strong>&#8217;s moderation focus and new funding reinforced the &#8220;compliance story as a growth vector.&#8221; Partnerships also signal distribution strategy: <strong>Z.ai</strong> highlighted collaborations and cost positioning in a crowded market. <strong>Netlify</strong> updated its CLI and runtime packages that many AI front-ends rely on. </p><p><strong>The dependency chain here is commercial</strong>: integrations unlock budgets and certifications unlock regulated accounts.</p><h3>9. Capital flows: smaller checks, nearer to product</h3><p>There were funding notes, but fewer megadeals. Announcements skewed toward teams that can show immediate product or workflow impact. <strong>PromptFoo</strong>&#8217;s raise for safety tooling is a good example: the money is following concrete, near-term pain (moderation, jailbreak defense, evals), not speculative long-horizon bets. </p><p><strong>Temporal</strong>&#8217;s inclusion in investor shortlists underscores that orchestration remains an investable wedge, especially when it controls meaningful production traffic. </p><p><strong>The takeaway</strong>: capital is favoring infra that shortens time-to-value inside existing stacks &#8212; security, evals, orchestration, and cost controls.</p><h3>10. How this week maps to the infra stack</h3><ul><li><p><strong>Silicon and runtime</strong>: Demand signal favors cost/latency features (prompt caching, batching, quantization). Dependence on GPU supply remains the risk amplifier. Vendors that abstract hardware variability win trust when shortages or price spikes hit.</p></li><li><p><strong>Inference platforms</strong>: The winners are bundling retrieval, evals, and agent orchestration so developers don&#8217;t stitch multiple tools. <strong>Together AI</strong> exemplifies the &#8220;full loop&#8221; motion. <strong>Groq</strong> leans into a performance/cost identity.</p></li><li><p><strong>Data layer</strong>: Warehouse/lake alignment is paying off. <strong>Hightouch</strong> and <strong>LakeFS</strong> show how identity resolution, lineage, and versioning become first-class for AI work. This reduces &#8220;shadow data stores&#8221; and keeps governance attached to the source of truth.</p></li><li><p><strong>RAG and search</strong>: Better embeddings and policy-aware retrieval are quietly raising answer quality. The dependency is permissioning: if RAG can&#8217;t respect row and column level access, it stalls in enterprise pilots.</p></li><li><p><strong>Agents and orchestration</strong>: <strong>Buildkite</strong>&#8217;s agentic CI and <strong>Together</strong>&#8217;s multi-step flows put pressure on reliability, sandboxing, and auditability. Systems that can explain <em>why</em> an action occurred (not just that it did) will pass procurement faster.</p></li><li><p><strong>Safety / evals</strong>: <strong>PromptFoo</strong> and <strong>Evidently</strong> signal a shift from &#8220;after-the-fact&#8221; dashboards to <em>gates</em> in the path to production. Expect eval suites to look more like unit tests: cheap, frequent, and blocking when they fail.</p></li><li><p><strong>Security and compliance</strong>: Certifications and moderation are becoming revenue features. <strong>Paravision</strong>&#8217;s momentum illustrates that regulated buyers care as much about proofs and logs as they do about model specs.</p></li></ul><h3>11. Correlations, risks, and dependencies to watch</h3><ul><li><p><strong>Correlation</strong>: New agent features often shipped alongside retrieval upgrades and eval tooling. That triad (agents + RAG + evals) showed up together repeatedly. It&#8217;s a sign that &#8220;usable agents&#8221; require context and quality checks by default.</p></li><li><p><strong>Correlation</strong>: Multimodal releases frequently paired with runtime optimizations. When cost per call is visible to end users (e.g. creative tools), performance engineering becomes a product feature, not just an infra concern.</p></li><li><p><strong>Risk</strong>: <strong>Hardware supply and pricing.</strong> Even with caching and quantization, workloads depend on GPU availability. Sudden scarcity or price changes ripple through every layer above.</p></li><li><p><strong>Risk</strong>: <strong>Eval/guardrail drift.</strong> As prompts and models evolve, evals can silently go stale. Teams that don&#8217;t treat evals as code (versioned, reviewed, and diffed) will ship regressions.</p></li><li><p><strong>Risk</strong>: <strong>Data governance debt.</strong> Without lineage and permissions tied to the warehouse/lake, RAG and agents will leak or get blocked by IT. The fix is slow, and companies that short-cut it will pay later.</p></li><li><p><strong>Dependency</strong>: <strong>Distribution through integrations.</strong> Many launches are really <em>routes to market</em> &#8212; CLI updates, connectors, SDKs. These are fragile: when a key platform changes APIs, roadmaps slip.</p></li></ul><h3>12. What this means for the next quarter</h3><ol><li><p><strong>Bundle the loop.</strong> The market is rewarding platforms that ship retrieval, agents, evals, and deployment as a coherent loop. Fragmented toolchains will face longer sales and higher churn.</p></li><li><p><strong>Ship cost controls as features.</strong> Caching, batching, and policy-based routing should be visible in the product, not buried in docs. Buyers now ask for &#8220;how do you keep my bill predictable?&#8221; in the first call.</p></li><li><p><strong>Make governance boring.</strong> Identity-aligned data access, lineage, and versioning should be one-click, not a consulting project. This is where warehouse-native players like <strong>Hightouch</strong> and data-versioning tools such as <strong>LakeFS</strong> are pulling ahead.</p></li><li><p><strong>Treat evals like tests.</strong> Bake <strong>PromptFoo/Evidently</strong>-style checks into CI and pre-prod gates. If agents act, you need &#8220;red lines&#8221; that block deploys on safety or quality regressions.</p></li><li><p><strong>Certify early.</strong> Security credentials and vertical certifications are functioning as growth levers. <strong>Paravision</strong>&#8217;s traction is a reminder that compliance unlocks budgets that features alone can&#8217;t.</p></li></ol><h3>Bottom line</h3><p>This week&#8217;s activity shows infra moving from &#8220;pieces you assemble&#8221; to &#8220;loops you run&#8221;. The strongest updates connect agents with grounded retrieval, observable execution, and predictable cost. Where those connections are tight, adoption accelerates. Where they&#8217;re loose (governance, eval drift, and hardware dependence), risk compounds.</p><div><hr></div><p>If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.infrastartups.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.infrastartups.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>