watwa.re

emweb

A while ago I migrated some group chats from Twitter to Signal, for a myriad of reasons. One of the things lost in that particular fire was the ability to view tweets embedded directly in the chat feed. This tickled an old frustration I have with the web:

The interwoven-ness of the web has a stark boundary which is the HTML document itself. Only a tiny number of too-big-to-ignore properties have the privilege of being directly auto-embedded into all sorts of sites, namely BigSocial and uToob.

What if...

Imagine posting a link anywhere would embed whatever resource is hiding behind that link on the original site. Let's call that an emweb.
What could an embed look like in a real app? A prime use case for me is sending polls on various platforms. This is a simple webapp for polling:

https://powl.vercel.app/id/39k80qwag97grwvk4se5h2799jgy8s0

So far, so boring. But what if I could post the link to that poll in an app and that app would embed the poll directly into the post? Here is a fork of the Mastodon client Elk, with some small changes1, et voila:

https://elk-emweb.vercel.app/c.im/@gregor/111882067786038107

Note that Elk does not know how to embed my polling app specifically, it merely knows how to check whether any posted link supports emweb, and, if it does, it will be upgraded to an embed.
You could go forth and add emweb to your collaborative music player, multiplayer game, todo list, delivery tracking, or whatever app you're building, and your toot post with a link to it, would be viewable as an embed in my Elk fork. No asking for my permission necessary.

(How did I embed Elk in this post? That's right, it's a double emweb burger)

If that sounds irresponsible to you, imagine what browsers sounded like to the pre-internet folk2: You type in the address to another computer to download AND RUN WHATEVER APPLICATION THEY ARE SERVING YOU?! Back in my days we would have called that BIG YOLO energy3.

The power of links has been noticed and thus taken by The Social Chat’n’Networks with them either banishing posts containing links to their enemies or even disabling links entirely4, unless they are directly monetizable. The mouthpieces of those companies will talk about UX and privacy to justify these choices, which can be plausible at times, but is always too opportunistic to not be read as corporate weaselism.

The realities that make me libertarian are the ones in which new challengers can appear and amass enough velocity to supersede stale incumbents. The realities that make me an anti-capitalist5 are the Thielian ones where the incumbents have become monopolists with moats dug sufficiently deep to render all innovation useless, absorbable, and ultimately extinguishable. These exact realities also happen to be the ones that make me a technologist, where I believe new information systems allow us to play new meta games instead of remaining stuck in the current one.

I believe independent communication platforms have an opportunity here, by uniting forces and enabling experiences which established companies could not justify toward their shareholders.

emweb-draft-v0.1

In its current state, emweb is a napkin-sized protocol, for guests and hosts.

Guest

A guest can declare itself embeddable by serving a /.well-known/emweb.json manifest with a "Access-Control-Allow-Origin: *" header and this schema:

name: string;
sources: Array<string | { from: string; to: string }>;

Sources that are listed as plain strings are marked as directly embeddable whereas objects may be used to map paths using from and to. Either option can utilize the URLPattern syntax.
Example: from: "/posts/:id", to: "/embed/:id" turns /posts/42 into /embed/42.

Sources (either the plain string, or the to-part of the object) must be embeddable into an iframe from anywhere, which requires un-setting/wildcard-ing the frame-ancestors CSP directive.

(optional) Communicating size

Guests can choose to communicate their size up, using Window.postMessage on their window.parent. The schema for the message is:

type: "emweb:resize"
width: number
height: number

@emweb/bus is a small library that helps with that. It attaches a ResizeObserver to the given element and continuously posts the resize message to the parent window.

import { postResizeChanges } from "@emweb/bus";

const cleanup = postResizeChanges(contentElement);

Host

A host checks whether a link supports emweb by fetching the guest's manifest. The link is deemed embeddable if a manifest exists and the link matches one of the listed sources patterns.
An embeddable link can then either be used verbatim, or mapped (according to the manifest), as an iframe src.

(optional) Adjusting size

Hosts can listen for resize messages from the guest and adjust their allotted space, using the Window message event.

To help with this, one can use @emweb/host, like so:

import { fetchFrameSrc, onWindowMessage } from "@emweb/host";

// put the URL you want to embed here
const url = "https://shd.is/s/b8agf9";

const src = fetchFrameSrc(url);
// src can now be used as the src attribute of an iframe

// You can use the onWindowMessage function to listen
// to messages from the embedded page's iframe
onWindowMessage(url, {
  onResize(width, height) {
    console.log("iframe size", width, height);
  },
});

To make life even simpler for my fellow React-icians6, there is @emweb/react:

import { Embed } from "@emweb/react";

// this already handles resizing
<Embed url="https://shd.is/s/b8agf9" />;

This one also falls back to oEmbed (which I talk about in the addendum) if emweb is not supported.

Things that need figuring out

Trust and Tracking

Wait, am I trying to put a thousand tracking pixels in all our apps. Uhm, I wish my answer could be a clear "No!".

By necessity, this will invite bad actors, as did the web and E-Mail before it. The ancestry might be instructive here, how did we overcome trackers?
The answer is manifold and contains the phrase "we did not really, did we?". Counter-measures I believe in are browser extensions that block trackers, with crowd-sourced filter lists as well as regulations with hefty fines.

Secure platforms like Matrix might want to default to opt-in, with configurable exclusions for trusted private conversations.

Authentication

The above example for polling members has a flaw that can be fatal in more adversarial scenarios: There is no authentication in place to check that the embedded poll only reaches its target audience (e.g. the members of a private group chat).

The protocol could be extended to flag authenticated routes in the manifest, and then require establishing a connection/token-exchange between host and guest app, before using the embed. The connection could then be used to vet users as they interact with the embed.

If this reads vague and confused, it is because I am confused. I am not sure how to solve for this scenario neatly yet. Very much RFC7.

Shifty Layouts

For embeds to look neat, they should be appropriately sized. What that means should be up to the guest (i.e. the frame content), and the host should accommodate.

What this looks like in reality is the host laying itself out, reserving whatever initial space it gives to the frame and then shifting, whenever the guest has loaded and communicates its size.

That can lead to all kind of shifty layout behavior which GOOGs DevRel army wants us to eliminate (for good reason).

Is there a more constrained API that can check these two opposing boxes?
Maybe the host should only listen to the first resize event if it comes within 1s * CONNECTION_QUALITY_FACTOR. Following resize messages could still be allowed, but only if a user interacts with the frame, somewhat analogous to how trusted events works.

Use Cases

I have some ideas for use cases, but surely the world has more. I would like to hear them, and collaborate on implementations.

More client support would also be great, Matrix and BlueSky are particularly on my mind. You can find my fork of the Matrix client Cinny deployed here, where I'm also using #emweb-playground:matrix.org as a playground.
Native implementations utilizing web views would also be an interesting investigation.

Addendum: Why not oEmbed?

oEmbed served as an inspiration and I used it as a guiding stick while prototyping emweb, for good and for bad.

The main reasons that motivated me to forego oEmbed are:

  • CORS is optional, thus browsers may not be able to resolve embeds by themselves without support of a privileged HTTP client
  • discovery whether a site supports it is centralized by default
  • frame size must be known beforehand

There is also the lesser but not unimportant reason that oEmbed requires more steps to be implemented.

Here is an overview of the differences:

oembed emweb
Config
location centralized with optional discovery /.well-known/emweb.json
Browser fetch-able optional required
format YAML or JSON JSON
Requests
Endpoint

must support Consumer Request and Provider Response (JSON or XML) on top of frame-friendly routes for iframes

can re-use existing routes or map to new frame-friendly routes
CORS optional required
resource types photo, video, link, rich iframe src
Examples (each step = 1 HTTP request)

iframe

  1. get cacheable config
  2. run consumer request to get provider response
  3. render+fetch iframe
  1. get cacheable config
  2. render+fetch iframe
photo
  1. get cacheable config
  2. run consumer request to get provider response
  3. render+fetch img
  1. get cacheable config
  2. render+fetch iframe
  3. iframe renders+fetches img
Summary
Upsides
  • various content types
  • known photo/video dimensions
  • rather simple
  • consumable without a server
Downsides
  • supporting all providers requires parsing JSON and XML
  • HTML sanitization necessary to render rich resources
  • iframe will relayout after render (layout shift)
  • providers should embed client script to handle size changes

Footnotes

  1. Elk is written in Vue for which I do not have a lib yet, but with the React lib the change could have been one line (and the manifest file)

  2. To my IT-Sec friends browsers still sound pretty irresponsible. Chrome having about as many lines of code as the Linux kernel and all that.

  3. Tbf back then it was much more website than webapp, but that spectrum was blurred soon after. Nevertheless, even with "only parsing websites", do keep in mind that early browsers were not the high-value, (somewhat-)formally verified, targets that they are today, which is to say: The waterline has risen.

  4. ...not to mention unpriced externalities, or as laypeople call them: fucking dire consequences.

  5. Reactees? Reactulars? Reactonians? Reactors? Copilot auto-completes this with "Reactors it is.".

  6. Request For Comments