September 1, 2025 13 min read

API Transport in 2025: REST, GraphQL, gRPC, and the Streaming Reality

The web used to be a polite letter exchange. Then AI started talking back mid-sentence, asking for tools, streaming tokens. Choosing API transport is now about choreography: who leads, who follows, and how to keep time when the band is half human, half transformer.

api-designrestgraphqlgrpctrpcssewebsocketstreamingai-integrationkafkanatswebtransportwebrtcasyncapi

The web used to be a polite letter exchange: you write a request, you get a reply. Then the machines started talking back mid-sentence, asking for tools, streaming thoughts, singing voice packets over UDP. Choosing an API style now is less about ideology and more about choreography: who leads, who follows, who improvises, and how to keep time when the band is half human, half transformer.

TL;DR

REST stays the public, cacheable, CDN-friendly backbone. Pair it with OpenAPI and (for AI) structured outputs to turn models into predictable machines.
GraphQL shines when product UIs need flexible slices of data; use APQ and (where needed) subscriptions over SSE/WebSocket.
gRPC rules service-to-service calls: strong contracts, streaming, deadlines, and clear error semantics over HTTP/2; use Connect/gRPC-Web at the browser edge.
tRPC is the pragmatic choice for a TypeScript-everywhere team shipping internal/front-end-back-end apps fast; export OpenAPI if outsiders must consume it.
SSE is the simplest path for token streaming from LLMs; widely used by OpenAI/Anthropic stacks. WebSocket is for genuine two-way, low-latency interaction (collab editing, multiplayer, model "realtime" agents).

At small scale, bias toward REST + SSE or tRPC + SSE. At medium scale, GraphQL for product surfaces, gRPC for internals. At large scale, build a mixed mesh: REST at the perimeter, GraphQL for aggregation, gRPC for east-west, SSE/WS for realtime, with typed contracts and gateways as policy points.

The new constraint: AI in the request path

Three behaviors dominate 2025 architectures:

Streaming responses. Users expect words (and audio) to arrive mid-thought. Most major LLM APIs stream via Server-Sent Events (SSE); voice agents often use WebRTC/WebSocket for full-duplex media. Your transport must handle partial results, backpressure, and cancellation.

Tool use / function calling. Models emit typed calls; you execute them and feed results back. JSON Schema–backed structured outputs and tool schemas make this robust.

Strict SLAs between services. LLM pipelines fan out: embeddings here, retrieval there, re-ranking elsewhere. Deadlines/timeouts and typed error codes keep chains honest—this is where gRPC's semantics help.

Hold those three in your head; they're the metronome for the rest of the score.

Project sizing (be precise, not grandiose)

Small: 1–3 devs, ≤ 9 months, ≤ ~10k LOC, one UI + a couple services, no public API.
Medium: 3–12 devs, 9–36 months, 10k–75k LOC, multiple clients, a few internal services, maybe partner integrations.
Large: 12–50+ devs, multi-year, 75k–250k+ LOC, polyglot, public APIs/SDKs, regulated or revenue-critical.

Most "large" efforts are actually medium; complexity inflation is real. Choose the simplest style that still scales to your horizon.

REST: the durable perimeter

When it sings: public APIs, cacheable reads, broad client compatibility, governance, and long lifetimes. What it gives you:

HTTP semantics & caching: verbs, status codes, content negotiation; CDN/edge wins on GET with RFC 9110/9111.
Observability & policy: mature gateways, rate limits, OAuth flows, retries.

AI-era twist:

Structured outputs: expose "function-calling" style endpoints with JSON Schema contracts so your model clients hit deterministic shapes. (OpenAI/Azure ship first-class structured outputs.)
Streaming tokens: keep the request classic REST, but return text/event-stream and push deltas via SSE; this is the de facto pattern across providers.

Use for:

Small: CRUD + a /chat/stream SSE endpoint.
Medium: partner APIs; public webhooks; cache-heavy resources.
Large: the contract you want third parties to live with a decade from now.

REST patterns you'll need:

JSON Patch (RFC 6902) & JSON Merge Patch (RFC 7386): Cut payloads and conflicts for model/config updates; essential when AI agents tweak resources frequently.
Hypermedia/Formats (JSON:API, HAL, OData): Standardize shapes, pagination, and linking so agents and tools don't relearn every API's quirks. Especially helpful for AI "function calling" against many services.

GraphQL: product-shaped data, one round-trip

When it sings: complex UIs needing slices across domains; fewer round trips; schema-governed evolution. What it gives you:

A typed schema the client queries flexibly; not tied to storage.
Subscriptions & streaming fields via WebSocket or increasingly GraphQL over SSE.

AI-era twist:

APQ (Automatic Persisted Queries) to enable GET + cache at the edge; models can be taught to pick from a safelisted catalog of operations (less jailbreak, more guardrails).
Transport choice: for incremental results (@defer, subscriptions), SSE is simple and HTTP-native; keep WebSocket where you truly need duplex interaction.

Use for:

Small: only if you have multiple clients with divergent views; otherwise REST is faster to ship.
Medium: UI aggregation layer; stitch microservices; subscriptions for live widgets.
Large: supergraphs/federation with APQ, caching, and policy at the router.

gRPC: the spine of your service mesh

When it sings: inter-service calls with strict contracts, streaming, and tight SLAs. What it gives you:

Protocol Buffers schemas and generated clients; HTTP/2 transport with multiplexing; unary + client/server/bidi streaming.
Deadlines, cancellation, status codes baked in—crucial when an LLM pipeline chains several hops.

Browser reality:

Native gRPC over HTTP/2 isn't directly browser-friendly; use gRPC-Web or a Connect server to bridge (and you can still stream).

AI-era twist:

Perfect for tool backends (retrievers, vector searchers, rerankers) where you need deadlines and typed errors to keep the orchestra together under latency budgets.

Use for:

Small: usually overkill unless you're already in Proto land.
Medium: internal services and data planes.
Large: the default east-west protocol; expose REST/GraphQL at the edge.

tRPC: TypeScript end-to-end velocity

When it sings: one team, TypeScript on both sides, you want types without extra machinery. What it gives you:

Inferred types from server to client; minimal boilerplate; HTTP under the hood.

Caveats:

Not polyglot; tight coupling to TS; not ideal as a public API unless you export OpenAPI (community plugins exist).

AI-era twist:

End-to-end types are catnip for copilots: change the server shape, the compiler (and the model) guides the refactor. For external model consumers, publish OpenAPI so non-TS stacks can interop.

Use for:

Small: fastest path for TS teams (e.g., Next.js apps).
Medium: internal dashboards/devtools; graduate to REST/GraphQL when partners arrive.
Large: rarely; mirror contracts into OpenAPI/GraphQL if you keep it.

SSE: the underrated workhorse of AI UX

What it is:

A single HTTP connection streaming text/event-stream to the browser's EventSource. One-way server → client; auto-reconnect; proxy-friendly.

Why AI loves it:

Token streaming is natural on SSE, and many providers document it explicitly (including on managed clouds).

Use for:

Small/Medium/Large: whenever you need incremental outputs without two-way chatter (chat, logs, progress events). Reserve WS for true duplex.

Variant: HTTP streaming with NDJSON (JSON Lines):

Cheap, debuggable server → client streams (and a nice fit for LLM token-by-token output). Pair with SSE when you don't need full duplex.

WebSocket: when you truly need two-way

What it is:

A full-duplex pipe defined by RFC 6455, with browser APIs on MDN; great for interactive apps and custom protocols.

AI-era twist:

For voice/vision agents or collaborative tools, you'll end up with WebSocket or WebRTC (OpenAI/Azure support both paths in realtime).

Use for:

Small: chats, cursors, whiteboards that need client→server events.
Medium/Large: stateful collab or realtime control channels; otherwise prefer SSE.

Modern additions: the streaming and eventing layer

WebTransport (HTTP/3/QUIC)

What it is: Bidirectional streams and unreliable datagrams to a server, no head-of-line blocking. AI-era use: Ideal for low-latency token and telemetry streaming when WebSockets hit proxy or multiplexing snags. Useful for LLM token streams, live model control, and multi-stream UI updates. Use for: Medium/Large when you need true low-latency multi-stream transport and can control both client and server.

WebRTC Data Channels

What it is: Peer-to-peer, reliable or partial-reliability data; perfect for voice/vision agents, multiplayer, or local-first AI tools. AI-era use: WebRTC is how new realtime AI APIs deliver voice-in/voice-out. Also great for distributed AI inference (peer-to-peer model routing). Use for: Small/Medium/Large voice agents, collaborative editing, multiplayer, or any scenario where peer-to-peer reduces server costs.

Web Push + Notifications API

What it is: For re-engagement and out-of-band nudges (e.g., "your long-running batch LLM job finished"). Works even when the app isn't open. Use for: Medium/Large background job notifications, model training completion alerts.

Messaging for services, agents, and data planes

MQTT (incl. MQTT over WebSockets)

What it is: Ultra-light pub/sub; excellent for IoT devices that feed edge AI or subscribe to model outputs. Browser-friendly via WebSockets. Use for: Small/Medium IoT + AI edge deployments; Large when you need millions of lightweight connected devices.

NATS (+ JetStream)

What it is: Simple, very fast messaging with optional durable streams and exactly-once patterns; nice backbone for microservices plus AI agents that need request/reply and pub/sub. Use for: Medium internal event bus; Large as service mesh backbone when you need simplicity over Kafka's complexity.

Apache Kafka (Event Streaming)

What it is: Durable, ordered event logs for high-throughput pipelines; great for model telemetry, feature stores, and emitting/consuming AI events at scale. Use for: Large event-driven architectures, ML feature stores, audit logs, CDC pipelines feeding AI models.

AMQP 1.0

What it is: Open, interoperable enterprise messaging; robust routing, transactions, and reliability. Useful where AI events must traverse heterogeneous systems (banks, gov, airlines). Use for: Large regulated industries needing guaranteed delivery and transactional semantics.

CloudEvents (CNCF)

What it is: A common envelope for events across transports (HTTP, Kafka, NATS). Helps standardize AI event payloads across your estate. Use for: Medium/Large when you're running multi-transport event architectures and need interop.

AsyncAPI

What it is: The OpenAPI of evented systems; document and govern your streaming/messaging APIs (Kafka, MQTT, NATS, AMQP). Use for: Medium/Large governance, client SDK generation, API catalogs for event-driven systems.

RPC alternatives & bridges

Connect RPC (by Buf)

What it is: A pragmatic, HTTP/1.1-friendly sibling of gRPC with first-class JSON + Protobuf, simple CORS, and great DX—handy for web + mobile talking to the same AI backends. Use for: Small/Medium when you want gRPC semantics without the browser/proxy pain; Large as a simpler alternative to gRPC-Web.

gRPC-JSON Transcoding (Envoy)

What it is: Expose REST/JSON over your gRPC backends so AI toolchains and low-code clients can call them easily. Use for: Medium/Large when you want to maintain gRPC internally but expose REST externally.

Cap'n Proto / Thrift

What it is: Niche but potent: zero-copy, capability-based RPC (Cap'n Proto) or mature IDL-first RPC (Thrift). Use for: Large super-low latency between AI microservices where nanoseconds matter.

AI-specific protocols worth knowing

Model Context Protocol (MCP)

What it is: An open, JSON-RPC–based protocol to connect models with tools, data, and IDEs. If your article covers "AI calling your APIs," MCP is the vendor-neutral story. Use for: Small/Medium/Large when building AI agent systems that need standardized tool/data connectors.

Structured Outputs / JSON Schema-constrained responses

What it is: Treat LLMs like well-typed APIs; crucial for tool use and safe automation. Use for: All sizes when exposing APIs to AI agents—enforce schemas to prevent hallucinated fields.

Realtime AI over WebRTC/WebSocket

What it is: Many LLM vendors now offer WebRTC/WebSocket realtime APIs for voice/vision agents. Use for: Medium/Large voice assistants, live transcription, vision analysis with streaming results.

Bridge patterns worth knowing (2025)

Connect RPC: single codebase that speaks gRPC, gRPC-Web, and its own HTTP-friendly protocol (JSON or Protobuf; streaming too). It's a practical way to expose the same service to browsers and backends without Envoy gymnastics.
GraphQL over SSE: many servers now implement SSE for subscriptions (simpler ops, HTTP-native). Keep WS where you need duplex.
APQ for GraphQL: enable GET + hashes for edge caching and safelisting.

Decision guide by project size

Small (1–3 devs, ≤9 months)

Default:

REST + SSE (OpenAPI + structured outputs; one streaming endpoint).
TS full-stack? tRPC + SSE for speed; export OpenAPI if someone outside your repo must call you.

Avoid:

GraphQL unless you truly have multiple divergent clients.
gRPC unless you already own Proto.

Medium (3–12 devs, 9–36 months)

Front-ends:

GraphQL for UI flexibility; turn on APQ; use SSE for subscriptions/streaming.

Internals:

gRPC between services; set deadlines; map errors to status codes.
NATS or Kafka for async event bus.

Realtime:

SSE for token streams.
WebSocket only if clients must send live events.

Governance:

AsyncAPI if you have event-driven services.
CloudEvents for event envelope standardization.

Large (12–50+, multi-year)

Perimeter:

REST (versioned, cacheable) with structured outputs for AI clients; publish SDKs.

Aggregation:

GraphQL supergraph with APQ; subscriptions over SSE where possible, WS where necessary.

Mesh:

gRPC east-west (streaming where it helps); bridge to browsers with Connect/gRPC-Web.
Kafka as event backbone for ML pipelines, feature stores, audit logs.

Realtime agents:

WebRTC/WS control channels; SSE for text stream fallbacks.

Governance:

API Extractor for TypeScript public surface.
AsyncAPI for event catalogs.
CloudEvents for cross-transport event interop.

Common traps (and how AI changes the math)

Using WebSocket "just because it's realtime." If the client rarely talks back, pick SSE; it's simpler, more CDN/proxy friendly, and aligns with LLM token flows.

GraphQL without a cache story. Turn on APQ and consider response caching where semantics allow; otherwise you'll push everything to client caches and wonder why the edge sits idle.

Public tRPC. Great inside a TS monorepo; for partners, generate OpenAPI (or provide REST/GraphQL) so non-TS clients and AI agents can integrate sanely.

Ignoring deadlines in LLM pipelines. gRPC (or Connect) makes timeouts/cancellation first-class; propagate them or watch tail latency explode.

Mixing too many transports without contracts. Pick 2-3 primary styles and bridge carefully. Use OpenAPI, AsyncAPI, and Protobuf to maintain sanity.

Quick chooser (AI-centric)

"I need a public API partners can cache at the edge, and my AI clients need JSON schemas." → REST + OpenAPI + Structured Outputs; SSE for streaming.

"My product UI pulls bespoke shapes across several backends." → GraphQL (+ APQ, SSE subscriptions).

"This is an internal, latency-sensitive data plane with fan-out and retries." → gRPC (deadlines, status codes, streaming).

"We're a TS shop shipping a web app fast." → tRPC (export OpenAPI if anyone else will call it).

"It's a chat/console/answer stream." → SSE.

"It's collaborative and bidirectional or voice." → WebSocket/WebRTC.

"We need event-driven microservices at scale." → Kafka (large), NATS (medium), with AsyncAPI docs and CloudEvents envelopes.

"AI agents need to call our internal tools." → REST with JSON Schema + MCP, or gRPC with structured tool definitions.

Conclusion: The orchestra keeps time

In 2025, choosing API transport isn't a single decision—it's orchestrating a stack where different instruments play different parts.

REST anchors your public surface. GraphQL aggregates for product UIs. gRPC connects services under SLA. SSE streams tokens from models. WebSocket handles true duplex. Kafka/NATS broker events. CloudEvents standardize envelopes. AsyncAPI documents the flows.

The teams that ship fastest aren't the ones picking "the best" transport. They're the ones who:

Size their project honestly (Small/Medium/Large)
Pick 2-3 primary styles and bridge them cleanly
Treat streaming and tool-calling as first-class concerns
Use schemas (OpenAPI, Protobuf, JSON Schema, AsyncAPI) to keep humans and AI honest

AI didn't make REST obsolete. It made hybrid architectures mandatory. The machines are in the loop now, and they need contracts they can trust.

Ship the choreography. Keep time. Let the band play.

---

About the Author: Odd-Arild Meling has built API layers for telecom switches, payment gateways, multiplayer games, and LLM-powered agents. He's debugged gRPC deadlines at 3 AM, celebrated SSE reconnection logic that saved Black Friday, and watched WebSocket connections multiply from dozens to millions. Currently architecting edge-first systems at Gothar where REST, GraphQL, gRPC, and SSE all ship to production—because reality doesn't pick favorites.