Startup Tracker #3 - What this week's infra data reveals

Agent-ready retrieval, serverless inference, and production guardrails

Prateek Joshi

Aug 17, 2025

Let’s dig into the infra startup moves this week.

1. What the data is telling us

A quick read of the data shows 4 strong signals:

Search and retrieval is the busiest lane. In 34% of the startups, the summary touches on search/RAG: agent-ready web/search APIs (Tavily, Exa), enterprise retrieval layers (Shaped AI), video understanding (Twelve Labs), and knowledge graph–style enrichment.
Partnerships and integrations are everywhere. 31% of the startups highlight partner motions or platform integrations. Snowflake/Databricks connectors, Nvidia or Groq support, or listings in cloud marketplaces. This “integration-first” GTM is now standard for infra.
Cost/FinOps, observability, and governance are no longer nice-to-haves. 29% of the startups reference cost/efficiency. 27% discuss monitoring/eval. 19% touch security/compliance. As more teams deploy agents and LLM features, they need to watch spend, uptime, and risk in real time.
Open source remains the wedge. About 36% of companies mention open source. You see it in compute/data engines (ClickHouse, DuckDB via MotherDuck), MLOps (Seldon, ZenML), and workflow/runtime stacks (Anyscale/Ray, Lightning AI’s ecosystem). OSS is how teams earn developer trust and build bottoms-up distribution.

Other readouts worth noting from the file:

28% mention “agents”. Agent orchestration and agent-safe infrastructure are becoming distinct buying categories.
32% mention funding. Just 5% explicitly mention hiring. An efficiency mindset consistent with the post-2023 AI infra cycle.
Companies typically straddle two infra layers (median = 2), reinforcing that “full stacks” are still more aspiration than reality.

2. The integration-first go-to-market

Startups are winning attention by meeting developers where they already are (inside clouds, data clouds, and model APIs) rather than asking them to adopt a greenfield platform.

Data and model ecosystems. Hightouch (data activation) shows how data movers plug AI in at the last mile. PlanetScale (serverless MySQL) and ClickHouse (OLAP) emphasize connectors and compatibility. MotherDuck pairs DuckDB with easy sharing. On the model side, Anthropic/OpenAI mentions appear alongside vendors like Baseten and Modal, reflecting a pattern: “bring your own API key and we’ll handle infra”.
Runtime and serving. Anyscale (Ray), Modal (serverless jobs/GPUs), and Baseten (model serving) all lean into “inference plumbing that slots into your stack”. These products tend to ship first-class bindings for LangChain/LlamaIndex and publish Terraform/Helm assets as table stakes.
Hardware gravity. Mentions of Nvidia and Groq are frequent. Lambda Labs’ presence underscores how GPU supply still drives architecture choices and vendor selection. Even when a company isn’t a “GPU company”, they market compatibility and performance on a specific chip or accelerator.

Why it matters: Integrations reduce friction in evaluation and procurement, shorten time-to-value, and unlock co-marketing. The catch is dependence: when your roadmap is gated by upstream APIs / cloud quotas / marketplace rules, your velocity and gross margins are partially out of your control.

Example connections:

Baseten and Modal both make it trivial to stand up stateful model endpoints and background workers with minimal devops. They win because they drop directly into Python projects and let teams switch between OpenAI/Anthropic and open source models without a platform migration.
Hightouch meets analytics teams inside Snowflake/Databricks, then layers AI enrichment. This is “AI where your warehouse lives” and not “ship data to our AI platform”.

3. Agentic workflows are hardening into an infra layer

Nearly 33% of the startups reference agents. The pattern is consistent: teams first experiment with a copilot, then hit reliability/latency/cost walls. And then go shopping for infra that makes multi-step, tool-using agents predictable.

The emerging stack looks like this:

Orchestration/frameworks: LangChain, LlamaIndex, and purpose-built orchestrators (you’ll see agent wording around Anyscale/Ray, Baseten workflows, and Modal flows).
Search/RAG backplanes: Tavily and Exa for web search. Kumo and Shaped AI for domain-specific retrieval. Twelve Labs for video search.
Guardrails and eval: PromptFoo for prompt/eval regression. Fiddler AI and Arize for production monitoring of quality/drift/safety.
Policy and identity: Aserto (authorization), plus increasing mentions of SOC2/GDPR/HIPAA alignment for enterprise use.

Example connections:

A product pipeline built on Modal (tasks), Tavily (search), PromptFoo (eval), and Arize (live monitoring) gives a team a realistic “agent SLO”. That’s a de-risked agent loop without building heavy platform code.

Risks/dependencies: Orchestrators depend on reliable search APIs and model latency guarantees. If a model vendor changes API behavior or rate limits, your agent reliability degrades. Startups that cache aggressively or can swap models at runtime without losing behavior will have an edge.

4. Search and retrieval is becoming core infra

This week’s busiest theme is retrieval, not just “add a vector DB”. It’s domain-aware search, freshness, entity linking, and multimodal cues.

Web and real-time search APIs: Tavily and Exa AI are leaning into agent-ready web search with rate limits, citations, and source controls. This reduces prompt glue work and makes agent actions explainable.
Specialized retrieval: Twelve Labs does video understanding/search. We also see companies positioning around enterprise semantic search with connectors to internal repos, Slack, Confluence, and data lakes.
Vector isn’t the whole story: Traditional stores (Postgres/ClickHouse) recur in the file alongside vector capabilities. Teams prefer hybrid retrieval (keyword + semantic) and reranking over “vector-only” systems.

Why it matters: Most useful agents are “retrieve-reason-act” loops. If retrieval is flaky, your agent is flaky. Retrieval vendors that publish quality/latency SLOs and offer clear cost controls will capture budget that used to belong to internal search teams.

Example connections:

Pair Exa with ClickHouse (fast storage/analytics) or MotherDuck (analytics + sharing) to build an internal news/search console that’s enterprise-grade without a big search team.
Kumo can sit above existing data to expose predictions to agents without forcing a new warehouse migration.

5. Open source as the default wedge

Roughly 33% of the companies frame an open source angle: an MIT/Apache reference build, a community operator, or an SDK. This is strongest in:

Serving/runtime: Anyscale (Ray), Lightning AI (PyTorch Lightning and friends), Seldon (Seldon Core) put production knobs around open tooling.
Data engines: ClickHouse and DuckDB (via MotherDuck) are classic examples of open source engines with commercial clouds. LakeFS (data versioning) and Supabase (Postgres BaaS) follow the same pattern.
MLOps pipelines: ZenML open-sources orchestration recipes for training/eval. Seldon publishes model deployment primitives that are enterprise-hardened in the paid edition.

Why it matters: Open source is still the fastest route to developer love and bottom-up adoption. The model that’s working is “batteries-included open source software for single-team use, plus a hosted product with policy/SAML/observability baked in”.

Risks: License drift and cloud competition. If an adjacent cloud vendor can host your open source with a thinner margin structure or better discounting, your paid tier needs real enterprise-only features (governance, isolation, SLAs) to defend itself.

6. Compute economics are the product

Mentions of hardware vendors are common. The practical pattern: AI infra is priced by tokens and milliseconds, so runtime and hardware efficiency is product differentiation.

Alternative accelerators: Groq shows up often. Its LPU-based inference emphasizes deterministic latency and high tokens/sec for chat/coding workloads. Startups that add first-class Groq support are signaling “we care about speed and cost”.
GPU cloud pragmatism: Lambda Labs is top of mind for teams that need capacity without resorting to hyperscaler lock-in. This pressure also explains the popularity of serverless runtimes (Modal, Baseten) that can bin-pack GPU work and eliminate idle time.
Data throughput/storage: Weka appears in storage-intensive contexts. Infra that feeds GPUs fast (and cheaply) becomes a competitive moat for training and high-throughput inference.

Why it matters: Inference platforms win or lose on cost curves and tail latency, not just features. Expect more vendors to publish cost/latency dashboards and to auto-route workloads across Nvidia, Groq, and CPU/AVX backends to keep SLAs and margins.

Risk: Supply concentration. If your SLOs are tuned to a single silicon vendor, procurement shocks become product incidents.

7. Monitoring, evaluation, and governance get first-class budgets

As teams move from demos to production, they buy controls:

Observability and eval: Fiddler AI and Arize are the canonical examples in this week’s data. Model quality tracking, drift detection, feature attribution, and experiment comparison. Tools like PromptFoo push evals earlier in the lifecycle with test suites you can run in CI.
Security and policy: Aserto (authorization) and privacy-forward data players (Tonic AI, Mostly AI, Parallel Domain for simulated or synthetic data) help enterprises ship AI without moving PII into unmanaged systems.
Ops for k8s/on-prem: Rafay and Spectro Cloud show how regulated sectors keep agents in private clusters or at the edge. When CIOs say “air-gapped AI”, this is the pattern they buy.

Why it matters: These purchases reduce organizational risk: vendor lock-in risk (by enabling model swapping), regulatory risk (by making flows explainable), and finance risk (by tying spend to outcomes).

Risk: Tool sprawl. If eval, monitoring, and policy each live in separate tools, buyers push back and consolidate. Vendors that integrate well into Buildkite/Harness and observability backbones will fare better.

8. Layer-by-layer implications: who benefits, who’s exposed

Compute and hardware
- Winners: serverless compute that can auto-select the cheapest/fastest backend for each model class (Modal, Baseten), GPU clouds that publish predictable queues (Lambda Labs), and alt-silicon with credible software stacks (Groq).
- Exposed: single-vendor-only runtimes and any platform that can’t show unit-economics improvements quarter over quarter.
Model providers
- Winners: vendors with strong policy controls and tooling hooks (Anthropic’s Claude Code-style tools, OpenAI function-calling) that make agent loops simpler.
- Exposed: API-only vendors without enterprise-grade governance or regional hosting options.
Inference platforms
- Winners: platforms that treat cost controls, caching, and canarying as first-class features and that publish “speed under load” as a product.
- Exposed: “model hosting” without workflows, evals, or rollbacks.
Data infrastructure
- Winners: hybrid retrieval patterns that combine Postgres/ClickHouse features, embeddings, and rerankers. Data-versioning (LakeFS) that gives auditors a clean chain of custody. Friendly SQL-forward products (Supabase, Turso) that let agents read/write safely.
- Exposed: vector-only systems that don’t support hybrid search or enterprise security.
Orchestration and agents
- Winners: opinionated, production-grade workflows that treat tools as versioned APIs, with timeouts, retries, and SLOs.
- Exposed: notebooks as production.
Observability and evaluation
- Winners: tools that bridge pre-prod eval suites (PromptFoo style) and prod monitoring (Fiddler/Arize) so teams share metrics, not screenshots.
- Exposed: black-box dashboards without hooks into CI/CD or ticketing.
Security and governance
- Winners: policy engines that sit in the hot path without adding material latency (Aserto), and synthetic data that demonstrably reduces compliance scope (Tonic/Mostly).
- Exposed: vendors promising “secure by design” without proofs, logs, or SOC2-ready controls.
Edge/on-prem
- Winners: Kubernetes management (Rafay, Spectro Cloud) with curated model catalogs for air-gapped installs.
- Exposed: cloud-only offerings in healthcare, defense, and critical infrastructure.

9. Correlations and what they imply

A few notable co-occurrences jump out in the dataset:

Cost + integrations travel together. Companies that talk about FinOps also talk about partnerships. This makes sense: the fastest path to lower costs is often swapping models/runtimes based on price/perf, which requires deep integrations.
Agents + observability are tightly linked. Teams that deploy agents quickly discover they need eval/monitoring to keep incident tickets down. This supports the “agent SLO” thesis: a budget holder will pay to guarantee reliability, not just to add reasoning.
Search + observability co-occur. Retrieval quality is fragile under domain drift. Buyers are starting to ask for quality dashboards, not just precision/recall claims in a PDF.

Practical takeaway for founders: If you sell an agentic or retrieval-heavy product, ship integrations, SLOs, and cost controls in the first 90 days. If you sell an open source wedge, publish a clean enterprise demarcation (policy, SSO, RBAC, audit) and resist feature leakage into the free tier.

10. Funding vs hiring: what it says about the next 6 months

Mentions of funding are materially more common than hiring. Within funding mentions, the rate is highest for model providers and compute (roughly half of companies touching those layers also talk about new capital), and lowest in core data infrastructure. That’s consistent with market behavior: tokens and milliseconds get funded quickly, but durable data platforms raise on slower cycles and enterprise proofs.

Implications:

Expect more serverless inference and search APIs to raise in the near term. They have obvious growth levers via integrations.
Data infrastructure founders should assume more diligence on procurement, security, and total cost. Lean into hybrid retrieval and SQL-forward UX to expand the buyer set.

11. What could derail these trends

Vendor concentration. Nvidia, a few hyperscalers, and two model API leaders hold a lot of power. Price or policy changes can damage downstream gross margins. Mitigate with true multi-backend routing and pre-negotiated capacity.
Benchmarks and evals drift. Retrieval benchmarks are easy to game. Eval suites may not reflect messy production. Bake in per-customer testsets and quality telemetry, and close the loop to prompt/model changes.
Regulatory pull-forward. As agentic systems move into healthcare/financial workflows, privacy and autonomy rules tighten. Vendors that can’t provide auditability, data residency, or deterministic behaviors will get boxed out.
Tool sprawl fatigue. Buyers will resist stitching together five tools for one use case. Lean into opinionated defaults and publish a canonical “reference architecture” with two or three vendors, not eight.

12. How this affects each layer of infra in practice

For compute vendors (Lambda Labs, Groq): publish transparent queues/prices and ecosystem guides (“How to run vLLM/TensorRT-LLM here”). Partner tightly with serving platforms to become the default backend they auto-select.
For inference/serving platforms (Baseten, Modal, Anyscale, Lightning AI): treat routing + caching + eval + rollbacks as a single story. Make it trivial to go from a PromptFoo test passing in CI to a guarded rollout behind feature flags.
For data platforms (MotherDuck, ClickHouse, PlanetScale, Supabase, Turso, ParadeDB): own hybrid retrieval patterns and permissioning. Agents increasingly need safe writes (not just reads). Make row-level security and reversible migrations “agent-proof”.
For search/RAG APIs (Tavily, Exa, Twelve Labs): ship SLAs, source traceability, and per-customer cost caps. If an agent loops forever, your API bill shouldn’t spiral.
For observability/eval (Fiddler AI, Arize, PromptFoo): integrate with Buildkite/Harness so evals gate deploys. Offer budgets and drift alerts in plain English, not just charts.
For governance and synthetic data (Aserto, Tonic, Mostly, Parallel Domain): market the compliance delta: what audits get easier if the customer adopts you. For agents in enterprises, “pass the audit” is the buying trigger.
For edge/on-prem (Rafay, Spectro Cloud): package a curated model catalog with signed images, air-gapped updates, and policy-by-default. The value is “your agents run inside your cluster”, not “k8s but with AI”.

Bringing it all together

This week’s data points to an infra market that is consolidating around agent-ready retrieval, serverless inference with real cost controls, and production-grade guardrails.

Founders who win will make integration friction effectively zero, turn latency and cost into product features, and expose observable, testable quality across the entire agent loop. Investors should expect capital to chase those patterns first. And core data platforms win by enabling hybrid retrieval, safe writes, and clean governance rather than by selling “a vector DB” alone.

Companies like Tavily, Exa, Twelve Labs (retrieval), Baseten, Modal, Anyscale, Lightning AI (serving/runtime), MotherDuck, ClickHouse, PlanetScale, Supabase, Turso (data), Fiddler AI, Arize, PromptFoo (eval/observability), Tonic, Mostly, Parallel Domain, Aserto (privacy/policy) illustrate how these pieces combine into reliable, economical agent systems. The common thread is practical: ship fast by integrating deeply, then earn enterprise trust with controls.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend:

Infra Startups

Discussion about this post