Livestreaming Trilemma: HLS, WEBRTC, MOQ

Live video has always been a trilemma: pick two of scale, latency, and cost. Does the new IETF protocol called Media over QUIC finally let you have all three?

A new livestreaming protocol? Really?

There’s a new multimedia protocol for livestreaming under development. It’s called Media over QUIC — MoQ for short — and it’s quietly attracting the kind of attention that makes engineers at Meta, Cisco, Google, and Akamai show up to IETF working group meetings. Streaming startups are racing to ship the first production deployments. People are genuinely excited.

Your reasonable first reaction is “…another one?” We already have HLS, RTMP, WebRTC, SRT, and a dozen lesser variants. Every one of them is battle-tested, in production at major streaming platforms, carrying real video to real users right this second. So why are people suddenly happy about yet another livestreaming protocol that — at first glance — looks like it’s solving a problem already solved?

Because it isn’t solving a problem already solved. MoQ isn’t a slightly faster HLS or a slightly more scalable WebRTC. It’s a protocol that learns from both and tries to give you the best of each at once — and as we’ll see, that turns out to matter quite a lot. In this article I’ll walk you through why, and at the end I’ll show you how to try it yourself in about ten minutes.

A short history of “can I watch this online?”

The story starts in the late 2000s. People wanted to watch live events on the internet — football games, news broadcasts, their favourite streamer going live — and someone had to invent the plumbing. The first answer was RTMP, Adobe’s Flash-era protocol — it worked, but it leaned on a browser plugin that was already on its way out and didn’t scale gracefully past a certain point. So the industry went looking for something else. The idea that eventually won was almost embarrassingly simple: instead of inventing some exotic real-time protocol, just chop the live stream into tiny files and serve them over normal HTTP, the same way every web page on Earth is delivered.

That idea became HLS (from Apple), with DASH as the open-standard cousin. It worked beautifully. CDNs already knew how to deliver files to millions of users at once, and HLS got to ride that infrastructure for free. Suddenly anyone with a laptop could watch a live football game from anywhere in the world. Everyone was happy.

For a while. Then people started noticing the cracks.

The first crack was latency. The stream was late — ten to thirty seconds behind the camera, depending on the player’s buffer. During the World Cup final, your neighbour upstairs would scream “GOAAAAL!” through the ceiling, and your TV would catch up twenty seconds later. The moment had been spoiled. The second crack was related: two viewers on different networks would drift several seconds apart from each other, so reacting in real time with a friend was a coin-flip.

So people optimised. Low-Latency HLS and CMAF chunked transfer squeezed the same machinery harder and got latency down to a theoretical two seconds — in practice expect around five. Synchronisation got tighter but never reliable. Everyone was happy again. For a while.

Then the use cases evolved. People started wanting things HLS had never been designed for: online betting, where a five-second lag means the line has already moved; live shopping, where the bidder ahead of you sees the item first; watch-parties, where everyone reacts in sync; concerts and live sports, where the audience needs to feel like a single room. All of these need real sub-second latency and real synchronisation — well under what LL-HLS could ever deliver.

Luckily, a standard that fit the bill already existed: WebRTC. It hadn’t been designed for one-to-many livestreaming — it was built for video calls — but it had the latency (often under 200 milliseconds) and the synchronisation everyone wanted. So people pressed it into service. And it worked.

Sort of. Because WebRTC’s low latency came at a steep price: it lost the thing that made HLS scale so well. To understand what that price actually is, we need to look at both a little more carefully.

Two protocols, two economies

The reason HLS and WebRTC sit on opposite ends of the latency–scale axis isn’t accidental. Each one’s strengths come from what it was designed for, and so do its limits.

HLS scales cheaply because it delivers video the same way CDNs deliver everything else on the web — as static files. This is the architecture under Twitch, YouTube Live, and most live news broadcasts. Once a chunk is written, the CDN caches it at the edge and serves it to every viewer who asks for it, just like a logo or a JavaScript bundle so the marginal cost of one more viewer is essentially zero. But the same simplicity and scalability come at a cost: latency. Production HLS deployments typically run at 10–30 seconds end-to-end, with LL-HLS bringing that down to roughly 3–5 seconds.

HLS delivery — one origin server fanned out through CDN servers to viewers over HTTP

WebRTC goes the other way. It was designed for video calls and conferences where the only thing that matters is latency under ~200 ms, so a conversation feels like a conversation.

To get there, WebRTC skips the central server entirely. Instead of every client pulling frames from an origin, two peers open a direct, bidirectional connection and push media to each other in real time. The cost of that design is complexity. Opening a direct connection between two random machines on the internet is genuinely hard — there are NATs, firewalls, varying network paths, and codec negotiation in the way.

And then there’s the livestreaming problem. A streamer with ten thousand viewers obviously can’t open ten thousand peer-to-peer connections. So the industry reached for SFUs — Selective Forwarding Units, servers originally built for video conferencing so each participant uploads their media just once instead of to every other peer separately. The SFU pretends to be a peer: the streamer connects to it as if it were a single viewer, and the SFU forwards the media to every actual viewer, also pretending to be a peer on their end.

The trouble is what an SFU has to do per viewer. Unlike a CDN edge, which serves a cached file blindly to whoever asks, an SFU holds an encrypted, stateful peer connection with every viewer, parses each RTP packet to decide what’s safe to forward, and reacts to each viewer’s bandwidth in real time. That’s CPU- and memory-heavy, and it scales by adding more SFUs — usually cascaded into custom, vendor-specific networks rather than the commodity CDN backbone HLS rides on for free. So adding viewers means provisioning more bespoke servers, not just paying a few more cents of CDN egress.

As a result, WebRTC delivers streams at the lowest possible latency — but at the cost of scalability and price.

So pick one

Until now, that’s been the choice. HLS for cost and scale. WebRTC for synchronisation and interaction. Anything that needs both ends up as a compromise.

How MoQ dissolves the tradeoff

At its core, MoQ is a way to move live media using a publish/subscribe model: a publisher exposes a stream, a subscriber asks for what it needs, and a relay in the middle makes the exchange scale.

It isn’t a codec, a player, or a magic replacement for HLS and WebRTC at what they already do well. It’s the plumbing for everything those two can’t do together — a way to build live media systems for products that today have to choose between sub-second responsiveness and large audiences.

It’s being standardised through the same process that gave us HTTP and TLS, by a working group that includes Meta, Google, Cisco, and Akamai. Still a draft — but a real one, with running implementations you can use today.

How MoQ works

MoQ is built on QUIC — the modern transport protocol underneath HTTP/3. Publishers push a stream under a named path (something like my-room/alice); subscribers ask the network for the paths they want. They don’t even need to know the exact paths in advance — subscribing to a namespace prefix (e.g. my-room) tells the relay to announce any publishers that appear under it. The two never talk to each other directly.

Instead, both sides connect to a relay. Unlike WebRTC’s SFU workaround, the relay is a first-class part of the protocol, not bolted on as an afterthought. A MoQ relay receives a stream once and fans it out to every subscriber, and multiple relays can chain together.

This gives you both halves of the old tradeoff at once. Like a CDN, one source serves many viewers — so scale is cheap. Unlike a CDN, where viewers pull files from the edge after the publisher has written them, the relay pushes data to subscribers the moment the publisher sends it — allowing latency to stay under a second.

MoQ delivery — relays fan a QUIC stream out to viewers, and relays can chain together

The relay’s other superpower is that it doesn’t need to understand what’s flowing through it. MoQ is intentionally layered — the transport just routes named tracks of bytes, and a thin media layer on top describes what those bytes mean. To the relay, an HD camera feed, a screen share, a chat channel, and a game-controller state stream all look the same: named tracks of bytes. The same MoQ infrastructure can carry a concert, a multiplayer game session, and a live shopping stream without code changes — because all the application logic lives at the publisher and subscriber, not in the network.

What MoQ gives you

Taken together, here’s what that design opens up:

Sub-second latency at CDN scale. The headline benefit — and the one neither HLS nor WebRTC can deliver alone today.
Custom tracks alongside media. MoQ doesn’t only carry video and audio. Controller inputs from a cloud-gaming player, game state, ad-insertion cues, live captions, drone or robot telemetry — any named stream of bytes can ride the same connection as the media, with the same delivery guarantees. What today often spans three or four stacks (HLS for video, WebRTC for voice, REST for state, WebSockets for events) can fold into a single MoQ connection.
Per-viewer latency choice. Different viewers of the same stream can pick different tradeoffs — minimum latency vs. smoothest playback — without the publisher having to branch or re-encode.
No vendor lock-in. Any compliant MoQ client can connect to any compliant MoQ relay — unlike WebRTC SFUs, which are vendor-specific.
Resilient delivery on bad networks. QUIC handles packet loss without head-of-line blocking — a dropped audio packet doesn’t pause your video frames. Recovery is faster than TCP, which matters most on mobile and flaky WiFi.
Namespace discovery. Subscribe to a path prefix (e.g. my-room) and the relay tells you which publishers are live underneath. Builds multi-participant rooms without knowing every user’s stream path in advance.
Browser-native delivery. WebTransport (the QUIC API in modern browsers) is the primary transport; WebSocket falls back where WebTransport isn’t available. No plugins, no proprietary SDK.
Live and on-demand in one stack. The same primitives serve both, so you don’t need a separate pipeline for replays.

The honest comparison

Capability	HLS	WebRTC	MoQ
Minimum latency	> 2 s	< 1 s	< 1 s
Built-in scalability	✅	❌	✅
Works in the browser	✅	✅	✅
Native scale-out mechanism	✅	❌	✅
Transport	HTTP	UDP	QUIC
Designed for one-to-many	✅	❌	✅
Designed for interactive	❌	✅	✅

What this actually unlocks

Easing the scale-vs-latency tradeoff opens up room for product categories that today exist mostly in compromised form — built on HLS and apologising for the lag, or built on WebRTC and capping the audience.

The headline win is the obvious one: sub-second latency at scale — combining what HLS does well with what WebRTC does well. But that isn’t the most interesting part of MoQ.

The more interesting part is that MoQ is generic. The same protocol, the same relay, and largely the same client code can power applications that today need entirely different stacks. Within a single stream, different viewers can pick their own latency-quality tradeoff — fresher but choppier, or smoother but slightly delayed — without the publisher having to branch or re-encode. And across applications, the same MoQ infrastructure can carry a chunked, buffered VOD experience for one product and a sub-second live shopping or online gaming session for another. Three or four protocols’ worth of plumbing consolidated into one stack.

That genericness extends to the media itself. Most live protocols pin you to a short list of codecs and profiles, with rules about what frame types and encoding tricks are allowed. MoQ just routes named bytes, so you pick the codec, the GOP structure, and whatever encoding choices fit your use case.

None of this is meant to replace HLS or WebRTC at what they already do well. HLS is still the right answer when you need the widest reach at the lowest cost per viewer. WebRTC is still the right answer for tight one-on-one or small-room calls. MoQ shines in the gap between them — wherever you need both interactivity and scale, or where a product would otherwise glue several protocols together to do one job.

Try MoQ today, in about ten minutes

MoQ is still a draft standard, but it is not theoretical — running implementations exist, and you can use one right now.

Disclosure: I work on Fishjam — a hosted streaming infrastructure that runs production MoQ relays. The fastest way to feel what MoQ is actually like is to point a publisher and a subscriber at it.

If you’d rather see MoQ working before touching any code, sign in to Fishjam and open the built-in demo — publish from your browser and watch the result, no setup required.

When you’re ready to wire it into your own app, Fishjam ships a Sandbox API that hands out publisher and subscriber tokens — no backend needed. From a blank Node project:

npm install @moq/lite @moq/publish

import * as Moq from "@moq/lite";
import * as Publish from "@moq/publish";

// Replace with your Fishjam ID and a sandbox publisher token.
const relay = new URL("https://relay.fishjam.io/<your-id>?jwt=<token>");
const connection = await Moq.Connection.connect(relay);

const camera = new Publish.Source.Camera({ enabled: true });
new Publish.Broadcast({
  connection,
  name: Moq.Path.from("<your-stream>"),
  enabled: true,
  video: { source: camera.source, hd: { enabled: true } },
});

A camera feed is now live on the relay. The full walkthrough for the MoQ example lives at fishjam.swmansion.com/docs/tutorials/moq, and the general docs are at fishjam.swmansion.com/docs. You’ll need a Fishjam account for your Fishjam ID and the management token — sign up and try it out for free.

The neighbour upstairs no longer has to spoil the goal.

Stepping through the MoQ door