Sector Deep Dive #5: SEARCH API PRODUCTS
Companies that build and sell search API products to developers
1. Snapshot
The core bet is that developers will increasingly buy real-time web search as a managed API instead of building it. Why? Because modern apps and AI agents need fresh, machine-readable information and citations on demand.
Prices, performance, and legal access to content are shifting quickly. A handful of independent search indexes (Brave, You.com) and SERP/API specialists (SerpApi, DataForSEO, Serper.dev) are emerging as the infrastructure layer that feeds LLMs, agents, and enterprise apps with web context.
There’s another layer in between. Exa is effectively the developer-first infrastructure layer that sits between the independent-index crowd (Brave, You.com) and the SERP scrapers (SerpApi, DataForSEO). It builds its own continuously refreshed web index (not just scraping Google or Bing).
Microsoft’s price hikes on the official Bing Web Search API in 2023 (e.g. S1 tier from $7 to $25 per 1,000 queries starting May 2023) drove many builders to look for alternatives. And Google still limits its own JSON API to $5 per 1,000 queries with constrained use, creating an opening for developer-first vendors.
Near-term catalysts include: (a) independent indexes scaling distribution via cloud marketplaces (e.g. Brave Search API on AWS Marketplace) and launching AI-grounding features (b) OpenAI’s SearchGPT prototype validating developers’ demand for search-plus-answers (c) high-profile AI answer engines (Perplexity, You.com) opening or expanding APIs, pushing volume into search infra instead of consumer portals.
The biggest risks are: (a) content access and litigation (publishers vs. AI search providers), which may raise COGS and restrict data (b) platform dependence (Bing or browser defaults) (c) consolidation if hyperscalers bundle “good-enough” search into agent platforms. These are active issues today (e.g. lawsuits and cease-and-desists targeting Perplexity, publisher revenue-share programs emerging in response).
This is an investable infra subsector with asymmetric upside over the next 24 months, especially in independent index APIs and legal-first SERP APIs with enterprise posture. The winners will pair developer ergonomics (clean JSON, fast SLAs), distribution (marketplaces, model/tooling integrations), and credible content access (publisher deals, compliance).
2. Thesis framing: what must be true
Investment question in one line: Can independent search APIs become the default way developers and AI agents ground responses in fresh, verifiable web data (at attractive unit economics) before big platforms make the category a bundled feature?
Thesis pillars (what must be true):
Real-time grounding becomes mandatory for LLMs and agents. OpenAI’s SearchGPT signals that “search + answers + sources” is moving into core AI UX. Third-party APIs that are fast, citable, and cheap will see rising demand from AI builders.
Independent indexes achieve escape velocity. Brave’s index (>30B pages, 100M+ daily updates) and growing distribution (AWS Marketplace, AI-grounding features) show a credible non-Google/Bing path for developer-grade web data. Exa is growing too.
Economics/choice favor specialists. Microsoft’s 3-10x Bing API price hikes (now $25 per 1,000) and Google’s limited, capped JSON API (still $5 per 1,000, up to 10k/day) push developers to alternative providers with predictable pricing and richer outputs.
Legal access matures. Publishers and search APIs converge on revenue-share and licensing. Perplexity’s $42.5M publisher pool is an early sign of viable content economics for AI search.
Disconfirming evidence to track: If platform leaders (OpenAI, Google, Microsoft) give away a full-featured, low-cost search API or effectively embed it into model runtimes, specialist API demand could compress. Also if publisher litigation materially walls off content without workable licenses, data access costs could swamp API margins.
3. Market structure, size, and geography
Structure. Three layers matter to developers:
Index owners with developer APIs: Microsoft (Bing), Brave, You.com (increasingly an AI answer engine with enterprise tilt), plus regional engines (Baidu in China). Google’s Programmable Search remains limited and quota-capped.
SERP/API specialists that fetch/parse results from many engines and verticals, exposing a clean JSON schema and compliance posture (e.g. SerpApi with legal shield, lower-cost peers like Serper.dev, DataForSEO). Exa API exposes structured, relevance-scored results that are tuned for AI agents, RAG systems, and retrieval pipelines. More semantic and programmatic than a traditional search API. Exa also markets itself as “search infrastructure for AI”, letting devs query the web in real time with filters for freshness, domain, and semantic similarity.
Enterprise/site search APIs (Algolia, Elastic, Amazon Kendra) that index a customer’s own content and power product or knowledge-base search. Adjacent but often bought by the same teams and now blending with web grounding. Algolia alone powers 1.5T+ queries/year across 10k+ customers.
Size and trajectory. There is no canonical “Search-API TAM”, but demand proxies are strong. Brave reports >1.5B searches/month in recent updates. Perplexity’s MAU has reached ~22M and processed ~780M queries in May 2025. Algolia is already at trillion-scale enterprise queries. Each datapoint indicates rising programmatic search volume and shifting developer spend from DIY crawl/scrape stacks to APIs.
Penetration and runway. Google still commands ~90% of global search, but dipped below 90% in late 2024 and has hovered in the high-80s in 2025. A small crack that corresponds with AI-native search usage and Bing’s modest desktop gains. For developers, the takeaway is not consumer share per se but willingness to try non-Google data sources when APIs are reliable and priced fairly.
Geography. In China, Baidu leads with ~56–60% share across platforms, with Bing surprisingly strong on desktop. Google is negligible. Practically, China-focused devs rely on Baidu (and 360/Haosou) data and localized APIs, while Western API startups rarely operate behind the Great Firewall. For global products that serve China, provider mix (Baidu + Bing/Brave) and compliance become material.
4. Customers, jobs to be done, and switching costs
Who buys and why. Three clusters:
AI/agent builders who need live facts + citations. Instead of running their own crawler, they call search APIs within tool-use chains to ground model answers (news, pricing, docs). OpenAI validating “search-inside-chat” accelerates this pattern across the stack. Growing Exa usage is another datapoint.
Product teams at e-commerce, SaaS, and content apps who need fast, typo-tolerant, tuned search for their own catalogs and docs (Algolia et al.). At scale, better search converts directly to revenue and support deflection.
SEO/data analytics and research ops teams who need reliable, structured SERPs at volume (rank tracking, market analysis, due diligence). SerpApi’s customer mix is now ~40% AI, ~40% SEO, ~20% other, highlighting the shift from pure SEO into AI infra.
Mission-criticality. If the search step fails, agent answers degrade or hallucinate. If site search fails, revenue drops. That creates a budget line for SLA-backed APIs and motivates redundancy (e.g. Brave primary, Bing or SERP API as fallback). The Bing price shock in 2023 nudged teams to multi-source or switch, a real-world proof of this redundancy mindset.
Switching costs. Swapping an endpoint is easy. Replicating quality tuning, synonyms, ranking rules, or JSON schemas embedded in pipelines is not. Enterprise search configs (Algolia) and AI toolchains (prompt+parser contracts) generate meaningful friction. Legal/compliance features (e.g. SerpApi’s legal shield) further raise switching costs in regulated environments.
5. Product and roadmap signals
Core modules developers expect:
Query endpoints that return structured results (JSON) for web/news/images/local, with location & language controls, snippet payloads, and schema-enriched data.
Latency and uptime SLAs and “speed tiers” for interactive UX and agent loops.
Compliance and indemnity (publisher respect, legal shield, SOC 2).
AI grounding features (citations, multi-snippet context, MCP/tool adapters), and integrations (LangChain, cloud marketplaces).
Independent index momentum. Brave exposes a web index of 30B+ pages, claims 100M+ daily updates, and recently shipped AI Grounding to anchor LLM outputs in verifiable sources. This positions the API as a turnkey “search-to-source” layer for agents. Availability on AWS Marketplace shortens procurement and signals enterprise focus. Exa has their own index as well.
Answer-engine APIs. Perplexity and You.com aim to synthesize answers with sources. Exa aims to make web search directly machine-consumable for LLMs. Their consumer metrics (Perplexity’s MAU/queries) indicate product-market fit. The open question is exposing that capability as a developer API at sustainable margins. The legal/publisher front is moving. Perplexity is pairing growth with a $42.5M publisher pool to defuse access risk.
SERP/API specialists. SerpApi abstracts Google/Bing/vertical SERPs into consistent JSON and offers enterprise-friendly pricing at high volume ($2.75 per 1k reserved searches) plus legal safeguards. This is useful when you need Google-quality outputs with engineering and legal friction removed.
Enterprise/site search keeps evolving. Algolia blends keyword + vector (“neural”) approaches and remains the easiest “drop-in” for app/internal search at massive scale (1.5T+ queries), making it a common complement to web grounding: your data via Algolia + the open web via a search API.
6. Competitive dynamics and pricing
Platform APIs vs. independents.
Microsoft Bing: Official, compliant, but expensive post-2023 (e.g. S1 web search $25/1k). Good reliability. Quality lags Google in some niches.
Google Programmable Search: Cheap ($5/1k) and reliable for custom/site collections, but not a full web API and capped at 10k/day. Many teams therefore layer SERP APIs or independent indexes to get web-wide coverage.
Exa/Brave/You.com: Independence is the differentiator (no dependency on Big Tech indices), plus developer-ready features (index transparency, grounding). Brave’s marketplace and AI-grounding moves specifically target agent stacks.
SERP APIs: SerpApi (premium, legal shield), Serper.dev/DataForSEO (aggressive price points). This tier competes on breadth of engines, JSON quality, anti-bot resilience, and price.
AI search as an encroaching competitor. OpenAI’s SearchGPT is a strategic signal: if the experience ships as a developer API or becomes bundled into model runtimes, it could absorb demand. For now it is limited, but investors should assume bundling risk in the next 24 months.
Consumer share vs. developer demand. Google still holds ~89–90% of global search. Bing ~4%. Yandex, Yahoo, DDG trail. The gap doesn’t prevent developer migration if pricing, procurement, or legal are better elsewhere. The 2024–2025 dip below 90% is symbolically important: teams are now comfortable experimenting with non-Google sources.
7. Go-to-market, adoption, and metrics to watch
PLG with enterprise overlays. Search APIs skew self-serve: devs test free tiers, wire in JSON, and grow usage. Enterprise deals add SLAs, DPAs, and volume commits. Distribution is improved by cloud marketplaces (easier procurement; draw-down on committed cloud spend) and framework integrations (LangChain/tools). Brave’s AWS listing is a concrete example of marketplace-led enterprise GTM.
Adoption proxies.
Exa: Still young but growing rapidly. Thousands of devs are using it. Recently raised $85M Series B led by Benchmark.
Perplexity: ~22M MAU, ~120M monthly visits (as of July 2025), ~780M queries in May 2025. All indicate rising appetite for AI-answer search that could translate into API usage.
Brave: claims >1.5B searches/month recently and index scale/cadence (30B+ pages; 100M+ daily updates) consistent with commercial-grade coverage.
Algolia: 1.5T+ queries/year across 10k+ customers remains the clearest signal that “search-as-an-API” is mainstream within product teams.
SerpApi: enterprise pricing pages and research show scale economics ($2.75/1k overage), and customer mix 40% AI underscores the category’s pivot from SEO to AI infra.
Reliability and compliance. Expect 99.9%-style SLAs from serious vendors. Enterprise wins will hinge on SOC 2, data protection addenda, and publisher-aware crawling. Watch for visible status histories and legal shields or revenue-share programs. Both mitigate buyer risk and will become standard.
Hiring and focus. Companies like Parallel (founded by former Twitter CEO Parag Agrawal) emphasize agent-grade research APIs. Headcount remains lean and engineering-heavy. Public comms point to millions of “research tasks/day” and benchmark-first positioning, but the bigger signal is product velocity in agent tooling.
8. Monetization and unit economics
Pricing models.
Per-query (CPM-like) is standard for web search and SERP APIs: Bing (≈$25/1k on popular tiers), Brave (public materials emphasize independence & marketplace procurement, list prices vary by tier), SerpApi (enterprise reserved $2.75/1k and speed add-ons), Google Programmable Search ($5/1k, 10k/day).
SaaS/usage for site/enterprise search (Algolia, Elastic) based on operations and records.
Hybrid for answer engines (subs + ads + licensing/publisher share). Perplexity’s $42.5M publisher pool is an early, explicit content-cost line item meant to stabilize supply.
COGS and margins. Running a crawler + index has bandwidth/compute costs but can sustain software-like gross margins at scale. SERP APIs incur proxy/captcha costs but offset via engineering leverage and high utilization. AI answer engines face inference COGS until they lean on cheaper custom models. Hence Perplexity/You.com investments in their own models and summarization stacks. (Evidence: rapid model/version launches and product cadence across 2024–2025; vendors explicitly pitch “grounding” to reduce model-token burn).
ARPU and expansion. Usage grows with app traffic and agent loops: as an e-commerce site, a support bot, or an agent platform scales, queries/customer scale too. That creates natural net-revenue expansion without more sales cycles. Enterprise contracts add overage revenue and encourage annual commits for lower unit rates (e.g. SerpApi’s reserved pricing).
Seasonality. Consumer search APIs see event-driven spikes. Enterprise/site search peaks in retail Q4. But usage-based billing smooths revenue. Overages provide upside in peak months. Vendor comms on Brave Search Ads and query growth show seasonal surges.
9. Moat, data advantage, and legal reality
Independent index ≠ nice-to-have. Owning the index (Brave, You.com) is the defensibility wedge against platform policy changes and SERP scraping fragility. It also enables product differentiation like multi-snippet grounding, “goggles” (re-ranking), and fast freshness. For developers, this means fewer brittle dependencies and more consistent JSON across query types.
Workflow lock-in. Embedded ranking rules, synonym maps, analytics, and pipelines (Algolia/Elastic) create real stickiness. On the web side, teams code to specific schemas and rate/latency expectations. Swapping vendors requires regression testing across critical UX. Legal coverage (SerpApi’s U.S. Legal Shield) and enterprise SLAs become part of the moat for high-risk users.
Publisher alignment will define winners. Lawsuits and Cease-and-Desists against AI search providers (Dow Jones/News Corp., BBC, Britannica/Merriam-Webster) demonstrate that content access is not a free good. Startups that turn adversaries into suppliers via revenue-share or licenses will be able to scale volume without existential risk, even if near-term margins are thinner.
Platform bundling risk. If OpenAI/Google ship low-cost, high-quality search endpoints inside the model runtime (or as a standard tool), third-party demand could compress. That said, developers value choice, cost control, transparency, and policy independence. All of which still argue for multi-sourcing web data (primary + fallback).
10. What this means for infra startups
Who gets pulled in. Over the next 24 months, I expect ~25–40% of infrastructure startups to be directly or indirectly affected by the rise of Search APIs. The exposure comes in three ways:
Agent and orchestration stacks (tool-use frameworks, evaluators, guardrails) will standardize on search tools for grounding. When SearchGPT-style UX becomes common, every agent platform needs a search provider and a policy for citations and often a backup. That’s a direct dependency. (Signal: OpenAI’s move with SearchGPT, Exa/Brave/others shipping MCP-style adapters.)
Data infra and retrieval layers (vector DBs, RAG pipelines, ETL) will blend internal corpus with web augmentation. As teams move from static corpora to live answers with verifiable sources, they will route external results through their retrieval/ranking layer. Expect tighter connectors from Pinecone/Weaviate-like stacks into search APIs and more budget reallocation from “more tokens” to “better grounding”.
Compliance, observability, and FinOps startups will see new budgets around content licensing, model+search cost controls, and provenance/attribution telemetry. If you must prove where an answer came from and pay the source, observability products and policy engines become critical.
Positive correlations.
Inference cost declines strengthen search APIs because grounding becomes the obvious way to reduce hallucinations and trim token use (shorter prompts when you pass high-signal snippets). Brave’s “AI Grounding” is literally a productized version of this correlation.
Marketplace distribution (AWS, Azure) lowers friction for enterprises to test and standardize on a search API. This historically accelerates infra adoption curves (database, logging, ML APIs). Brave’s AWS launch is a direct example.
Publisher deals unlock premium sources (finance, health, news), which improves answer quality, driving higher conversion to paid tiers. Perplexity’s pool is the first at scale. Expect others to follow.
Risks and dependencies for infra startups.
Legal and robots.txt compliance: startups embedding search must respect robots.txt and site policies, or risk collateral reputational/legal exposure if their provider is accused of scraping blocked sites. Recent BBC and News Corp actions show this is no longer theoretical. Vet your provider’s crawler compliance and indemnities.
Provider concentration: relying on a single provider (e.g. just Bing) exposes you to pricing shocks (as in 2023) and availability changes. Multi-sourcing (Brave + SERP API + Bing/Google Programmable where allowed) adds resiliency.
Geo constraints: if your users are in China, plan for Baidu/360 integration and localized infrastructure. This may mean separate routing, filtering, and compliance processes from your global stack.
How much budget shifts here? For AI-agent startups, search can easily become 10–30% of monthly variable COGS when agents do multi-hop research (because each answer can trigger tens of queries). For SaaS product teams, external web search spend is smaller. Internal search (Algolia/Elastic) remains the primary cost center, with web grounding added for specific features (e.g. a “Research” tab in a support bot).
Who benefits in venture terms.
Independent index APIs (Exa, Brave, You.com) with marketplace distribution, strong engineering cadence, and publisher alignment.
Legal-first SERP APIs (SerpApi) where enterprises want Google-quality JSON without running a proxy farm or fighting captchas and where legal shield matters.
Hybrid answer-engine APIs (Perplexity) if they can show measurable accuracy lift and lower blended COGS via licensing and in-house models, not just good UX.
Who could compress returns.
OpenAI/Google bundling: if search becomes “free” inside a model runtime, specialists will compete on quality, compliance, and independence (e.g. sources that big models won’t touch without licenses). Developers still like choice. Being the fallback engine is a real, durable niche.
11. Competitive landscape: notable companies to watch
Exa (US) — independent index + agentic search + answer engine
What’s special: Independent index, search features tailored for LLMs, and gaining rapid mindshare among devs.
Brave (US) — independent index + AI grounding + AWS distribution.
What’s special: Independent index (30B+ pages, 100M+ daily updates), AI Grounding features tailored for LLMs, and AWS Marketplace listing. Signals enterprise intent and procurement ease.
SerpApi (US) — legal-aware SERP API at scale.
What’s special: Wide engine coverage, enterprise legal shield, and reserve pricing down to $2.75/1k searches at scale; customer mix now ~40% AI. Often the fastest path to Google-quality JSON for devs.
You.com (US) — AI research/answer engine with enterprise tilt.
What’s special: Fresh $100M Series C at $1.5B valuation (Sep 2025), ongoing shift from “consumer search challenger” to AI research agent for regulated industries. Credible team pedigree (Richard Socher).
Perplexity (US) — answer engine with publisher economics.
What’s special: High user/query growth (22M MAU, ~780M queries in one month), bold GTM moves, and a $42.5M publisher pool amid lawsuits. The key watch-item is API exposure and sustainable COGS.
Parallel (US) — agent-grade deep research API (early).
What’s special: Founder brand (Parag Agrawal), benchmark-driven positioning vs. browsing tools. Still early but tuned for agent workflows.
Baidu (China) — dominant local index.
What’s special: Leads China search (~56–60% share). Essential for China-market apps. Developer-facing access exists within Baidu’s cloud/AI platforms. Global devs must consider geo separation and compliance.
12. Risks, catalysts, and what would change the call
Key risks in the next 24 months.
Content and legal: Publisher suits (BBC, Dow Jones, Britannica/Merriam-Webster) escalate, forcing expensive licensing at scale or curtailing content coverage. Vendors without publisher strategy lose reliability, and their customers inherit risk.
Platform moves: OpenAI or Google ships a cheap, first-party search tool in model runtimes, compressing third-party demand. Or browser defaults remain tightly controlled, limiting distribution for independent engines.
Price shocks: Another Bing-style pricing change pushes up customer COGS, causing churn or multi-sourcing complexity.
Catalysts.
Marketplace and cloud partnerships (AWS/Azure/Bedrock agent ecosystems) that pre-wire search tools for agents. Brave’s AWS launch is a template.
Publisher alignment at scale (Perplexity-like funds replicated), reducing legal friction and unlocking premium data verticals.
Visible accuracy/latency wins on benchmarked agent workloads, showing that independent indexes or SERP APIs deliver better answers per token than bundled tools. Brave’s AI-grounding launch is a signpost.
What would change the call.
Bear case: If Search becomes “free” in LLMs and publishers successfully wall off valuable content without broad licensing, the independent API market could shrink to a niche.
Bull case: If independent indexes become the de-facto grounding layer for agents and publisher economics settle, this sector compounds like payments or auth did a decade ago (quiet infra with huge downstream leverage).
Bottom line for investors
Where to lean in: Independent index APIs with credible distribution (marketplaces, agent frameworks), strong developer experience, and visible publisher strategy. Legal-first SERP APIs that are the enterprise default for Google-quality JSON. Enterprise/site search where vector+keyword convergence is demonstrably improving outcomes.
Portfolio construction: Expect multi-sourcing behavior. Your winners should play nicely as primary or fallback. Emphasize vendors with clear SLAs, observability hooks, and cost controls to survive price shocks and model bundling waves.
Exposure for infra startups: Plan as if a 25-40% of infra startups will touch search APIs directly (agents, retrieval, developer tools) or indirectly (compliance, cost, analytics). Build connectors and procurement paths accordingly, and diligence provider legality as carefully as you diligence uptime.
If these companies can convert developer trust + content access into durable distribution before platforms fully bundle search, the next two years favor specialists. If not, expect consolidation and a smaller, compliance-heavy niche. The signals above such as AWS listings, grounding features, publisher funds, MAU/queries growth, and pricing dispersion are the leading indicators to watch.
If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 infra-curious friend: