Sector Deep Dive #6: AGENT RUNTIME

Companies that build environments that let AI agents actually do work

Nov 10, 2025

1. What this space is and why it suddenly matters

Agent Runtime is an environment that let AI agents actually do work. It’s the control room that plans steps, calls the right tools, remembers context, and keeps logs so humans can see what happened. Agent Sandbox is a related concept that refers to a safe box the agent acts inside to run code, browse, or touch APIs without breaking things. Put together, they’re the missing layer between raw models and real enterprise workflows.

Three forces make this takeoff feel real:

Models leveled up. The newest LLMs plan multi-step tasks, call tools, and follow structured instructions.
Enterprises want cognitive automation. After a decade of RPA and scripts, companies now want automation that can read, reason, and decide.
Enablers arrived. Secure micro-VMs / containers, long-context memory, and open protocols (Anthropic’s MCP, Google’s Agent-to-Agent/A2A) give teams a safer and more interoperable way to wire agents to data and systems.

Think of this as the shift from “power tools” (classic SaaS) to “coworkers” (agents) that execute tasks end-to-end with guardrails. That’s why the sector’s drawing capital and attention: it upgrades software from “assist” to “act”.

2. Market trajectory

The agents market is growing from roughly $5B in 2024 to about $47B by 2030 if current forecasts hold. Funding has kept pace: about $8B+ went into agent startups in the year through late-2024, and seed funding alone in 1H-2025 was on the order of $700M. Analysts expect a third of enterprise software to include agentic capabilities by 2028 (up from almost zero in 2024).

What that means in practical terms:

This isn’t a “single killer app”. It’s a horizontal capability (like cloud or mobile) that seeps into IT, ops, support, finance, and engineering.
Adoption is gradual and risk-managed. Most teams start with human-in-the-loop, then graduate to autonomy for bounded tasks once accuracy and auditability are proven.
It’s not winner-take-all yet. Standards like MCP and A2A reduce lock-in and keep the door open for neutral platforms and open-source tools, not just cloud megasuites.

3. The product stack: six bricks you actually need

When you peel back the marketing, mature agent platforms all converge on the same 6 components:

Secure execution
Agents need isolated places to run code, browse, and call services. That usually means Linux containers or micro-VMs with strict network and filesystem policies. Startups to know:
1. E2B: open-source, Firecracker-style micro-VM isolation; fast cold-starts.
2. Novita AI (Agent Sandbox): per-second billed serverless workers for bursty agent compute.
3. Browserbase: managed, clean browsers for reliable web automation.
Orchestration (the agent loop)
Plan → act (use a tool) → observe → re-plan. You need a consistent way to define this loop, branch on errors, and compose sub-agents. LangChain / LangGraph, CrewAI, CUA (Computer-Use Agent), Dust are common choices depending on how code-centric or visual you want to be.
Connectors and permissions
Agents get work done by touching APIs, SaaS apps, and internal services with clear scopes and approval rules. Composio has become the “agent connector fabric” many teams reach for. Cloud platforms ship their own registries too.

Memory and state
Short-term scratchpads and long-term project memory, usually backed by vector DBs or filesystems. Plus auto-summaries so context doesn’t blow up costs.
Observability, evaluation, and guardrails
You need step-level traces, cost and latency metrics, red-team tests, and “circuit breakers”. AgentOps (production runs, replay, costs), Langfuse (open-source traces/evals) are becoming table-stakes.

Human interface
Most business agents surface in chat UIs, IT portals, or IDEs. Good products make it easy to toggle autonomy, insert approvals, and explain what just happened.

4. Who’s competing and how to think about them

Cloud incumbents are shipping full stacks:

OpenAI/Microsoft (AgentKit + Copilots) lean into deep model integration and a vast distribution surface.
AWS (Bedrock AgentCore) emphasizes isolation, identity, observability, and marketplace distribution.
Google pushes open A2A to make multi-vendor agent workflows normal inside Workspace and GCP.
Anthropic focuses on model safety and MCP so tools and models interoperate cleanly.

Independent startups fill critical gaps and keep the space dynamic:

Secure execution: E2B, Novita AI (Agent Sandbox), Browserbase
Agent OS / orchestration: LangChain/LangGraph, CrewAI, CUA, Fixie, Dust
Connectors: Composio
Observability/evals: AgentOps, Langfuse
Dev-env as runtime: Daytona lets agents spin up real developer workspaces with full toolchains.
Marketplaces and hubs: Gumloop and MuleRun explore app store for agents.
Capability showcases: Prime Intellect helped popularize computer-use agents that click and type like a human.

Expect consolidation: some of these become features inside cloud platforms. Others win as neutral layers precisely because big customers want multi-model, multi-cloud flexibility.

5. What buyers actually use this for

Developers and startups use sandboxes/runtimes to ship agentic apps faster. Prototyping with OSS, then hardening with better isolation, connectors, and monitoring.

Large enterprises pick a few high-ROI use cases and expand from there. Typical first wins:

IT automation: ordering equipment, provisioning access, resetting accounts, closing tickets.
Customer support: reading tickets, checking entitlements, proposing actions, and (once trusted) executing refunds or returns.
Operations and finance: reconciling invoices, chasing documents, scheduling freight.
Engineering productivity: write → run → test → fix loops inside an isolated code sandbox (pair this with Daytona or E2B).

Adoption pattern is consistent: start with copilot (human approves), track success and cost, then move select workflows to autopilot with timeouts and escalation rules. The runtime matters because it encodes that discipline, not just “let the LLM run”.

6. The economic logic: why this can be cheaper (and when it isn’t)

A single agent task usually triggers many model calls plus tool invocations. Early Auto-GPT experiments were expensive and brittle. Three things flipped that story:

Smarter planning and caching cut token waste.
Isolated code execution moves heavy mathematics or parsing to cheap CPU time instead of expensive tokens.
Model mix-and-match runs 3.5-class models for easy steps and saves 4/5-class models for hard ones.

When you price it the way buyers do, the question is: cost per completed task vs a human baseline. If an agent can process a support email for $0.10–$0.30 all-in where a human minute costs a few dollars, the cost model works immediately.

Where it doesn’t work yet: ambiguous tasks with high back-and-forth, long tool chains, or high error penalties. That’s why most teams still insert approvals, limits, and budgets. This is as much an economic guardrail as a safety one.

The trend line is favorable: better models, cheaper inference, and tighter runtimes steadily push cost-per-task down and success rates up. That’s the flywheel to watch.

7. Impact on the broader infra startup landscape

Short answer: this wave will touch most of infra. Over the next 24 months, expect 60–70% of infra startups to be directly or indirectly affected. Either as beneficiaries, suppliers, or competitors. Here’s how it maps:

Direct beneficiaries (20–25%)
Startups whose core product is agent runtime capability: secure sandboxes (E2B, Novita), orchestration (LangChain, CrewAI, CUA, Dust), observability/evals (AgentOps, Langfuse), connectors (Composio), and marketplaces (Gumloop, MuleRun). Their traction rises with each successful enterprise deployment.
Adjacent pull-through (20–25%)
Data infra (vector DBs, feature stores), identity and policy (fine-grained scopes for agents), secrets/key management, audit logging, and cost monitors. Agents create persistent demand for retrieval, permissioning, and explainability. Great for neutral infra vendors. If you’re building vector search, lineage, or IAM, agents are a net tailwind.
Devtool and platform reshaping (15–20%)
Dev environments and CI/CD adapt so agents can participate as “non-human contributors”. Daytona is a clear bridge. Agents spin up real workspaces with compilers, DBs, and test harnesses. Expect git hosts, test frameworks, and build systems to expose agent-friendly APIs and policies. Winners will make “agent + human” pair programming and reviews safe and auditable.
Integration/iPaaS and RPA convergence (10–15%)
Workflows move from rigid scripts to agent-driven flows. RPA and iPaaS vendors will add LLM brains. New neutral runtimes will nibble at classic automation budgets. If you’re building modern integration layers, aligning with MCP/A2A and shipping strong observability can put you on the right side of this shift.
Compute and GPU infra (5–10%)
Agent adoption raises steady inference workloads and bursty sandbox compute. That benefits GPU scheduling, serverless containers, model gateways, and browser automation at scale (hello Browserbase). Efficiency startups (quantization, caching, routing) also see a lift.
Potentially crowded or pressured (10–15%)
Products that are “just an LLM wrapper” around a single workflow will feel pressure as AgentKit/AgentCore and marketplaces ship that workflow as a prefab. The defense is depth: data access, accuracy guarantees, distribution, or owning a compliance-sensitive niche.

Correlation and dependencies.
Think of a dependency chain: models → runtimes → connectors → policy/identity → observability → data. Improvements at any layer (cheaper inference, better planning, richer connectors) ripple to the others. Infra startups that “lock” into one model vendor will carry vendor risk. Those that speak MCP/A2A and multiple models reduce it. Conversely, security incidents or prompt-injection failures at the app layer will generate demand for policy, isolation, and monitoring deeper in the stack. Another pull-through for infra.

8. Key risks and the practical mitigations that matter

Reliability and safety. Agents still make bad calls. Mature teams use retrieval grounding, step limits, timeouts, and human approvals on high-impact actions. Observability and evals move from “nice-to-have” to mandatory.
Security and data privacy. Agents handle credentials and sensitive data. Sandboxes must strictly confine code and network. IAM scopes, secrets rotation, tamper-proof audit logs, and signed tool calls should be part of the design, not a later add-on.
Prompt injection and supply-chain risk. Agents read untrusted content and may be tricked. Defensive patterns (content sanitization, tool call whitelists, trusted data paths) and “kill-switch” policies reduce blast radius.
Regulation and governance. Expect requests for audit trails, decision explanations, and model/agent change control. Vendors with strong explainability and logging will win security and compliance reviews.
Cloud squeeze. Big providers will absorb generic runtime features. Neutral players must compete on openness (multi-model/multi-cloud), UX, cost, or depth in a vertical. Aligning with standards and meeting enterprises in their VPCs are proven ways to keep a seat at the table.
Unit economics drift. A long, meandering agent can burn tokens and money. Teams that enforce budgets, cache aggressively, route models by difficulty, and offload compute to sandboxes will keep cost-per-task in the green.

9. What to watch next

Capability jumps. If the next model wave materially improves tool-use and long-horizon planning, watch success rates rise and human approvals shrink. That opens more workflows to autonomy.

Reference deployments. One marquee case study in banking, logistics, or healthcare (measured in millions saved or hours cut) will unlock follow-on budgets elsewhere.

Standard adoption. Broad support for MCP and A2A would normalize multi-vendor agent meshes inside large companies. That’s a tailwind for neutral infra (connectors, policy, observability) and a constraint on lock-in strategies.

Cost curves. Cheaper inference and faster cold-starts lower the “minimum viable agent”. Keep an eye on platform announcements about long-running sessions, serverless micro-VMs, and per-second billing. These directly change which tasks pencil out.

Distribution channels. Agent marketplaces (e.g. Gumloop, MuleRun) and cloud app stores will matter more as companies move beyond pilots. Templated agents with real connectors and auditable logs will travel fastest through those channels.

Consolidation. Expect acqui-hires and product fold-ins. If you’re building infra, assume your best exit path might be a cloud or enterprise platform that wants your isolation, connectors, or observability baked in.

10. Investment stance and practical takeaways

It’s an infra story as much as a model story. Sandboxes, runtime control planes, connectors, identity, and observability will decide whether agents stay demos or become dependable “digital workers”. That creates room for neutral infra winners, not just model vendors.
Barbell strategy. One bet aligned with a major platform (for distribution and trust) and one bet that’s open, multi-model, and multi-cloud captures both worlds. In parallel, there will be category enablers: isolation (E2B, Novita), connectors (Composio), eval/ops (AgentOps, Langfuse), dev-env runtimes (Daytona).
Bias to measurable workflows. IT ops, support ops, finance back-office, and code-adjacent tasks produce clean before/after metrics (success rate, handle time, cost-per-task). Those are the proving grounds that compound into wider adoption.
Design for approvals, not just autonomy. The businesses that grow fastest will support a spectrum—suggest → approve → auto-execute—with rock-solid audit trails and budget controls.
Plan for standards. Treat MCP/A2A as inevitabilities and build in that direction. You’ll be easier to buy and harder to rip out.

11. Bottom line for infra founders and investors

Agent sandboxes and runtimes are graduating from experiments to infrastructure. The core idea of software that can read, decide, and act with constraints is now implementable with acceptable risk in many day-to-day workflows. The stack is clarifying, the standards are emerging, and the economics are trending in the right direction.

The effect on the broader infra universe will be wide. Roughly two-thirds of infra startups will feel it. Some directly as agent-native platforms, some as upstream suppliers (data, identity, observability), and some via pressure as clouds bundle the basics. The safest places to build and back are the boring necessities of a production agent world: isolation that never breaks, connectors that always work, policies that auditors love, and telemetry that catches issues before the CFO does.

The next 24 months are a prove-out. Watch the success rates, costs per task, and the first wave of big reference customers. If those turn the corner, this sector looks less like a trend and more like a new layer of enterprise software. Quiet, reliable, and everywhere.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:

Infra Startups

Discussion about this post