← Back to Features

Inline control channel

DLE/STX framing multiplexes raw terminal bytes and structured control frames onto one WebSocket.

The most important low-level component in uterm is the control-channel framing in control_channel.py. Unlike systems that use a second WebSocket (or HTTP side-channel) for metadata, uterm multiplexes everything into one stream.

Browser, TermHub, and Worker exchanging raw bytes and inline DLE/STX control frames on the same WebSocket.

Wire format

  • DLE (0x10) is the escape character.
  • Data: raw terminal bytes have their DLE bytes doubled (0x10 0x10) on the wire.
  • Control frames: DLE STX [8-hex length] : [JSON].

That’s it. The same parser runs in Python (server) and TypeScript (browser).

Why a single stream

Multi-channel proxies eventually race: a resize control frame and the bytes after a resize arrive out of order, the snapshot drifts from the visible buffer, and presence updates flicker. Inlining the control frames makes ordering trivial — the parser sees them in the exact position the producer emitted them.

What rides on it

  • Resize and heartbeat
  • Hijack state, lease ownership, role announcements
  • Presence (join/leave, adjective-animal identity, HSL color)
  • Annotations placed by humans or AI agents
  • Chat lines in the DeckMux channel
  • Screen snapshots (when a participant joins mid-session)

Every one of those is just a JSON payload between DLE STX and the next data byte.