How to Work with Protocols and Get Started with ActivityPub

This article underlines some of the preparation needed to code against a protocol and ends by looking at ActivityPub as an example. We also dip into my previous set of posts on a hypothetical decentralized social media model.
Protocols appear all through computing. If you’ve studied computing academically, you will remember the OSI 7 Layer model that puts TCP/IP and HTTP into context for the internet as a whole. However, I’m more interested in the need to get working with a new protocol immediately as a working task.
As the terms protocol and API sometimes swim around each other, I’ll define what we normally mean by them for good measure. It is perfectly reasonable to suggest that a protocol can provide an API, but in that case, the protocol came first.
Imagine you were connected with someone in the next room only by a small pipe and could communicate with them only by rolling colored balls down it. Your first observation might be “but what if both of us try rolling balls down the pipe at the same time?” and that is why a protocol is first seen as an engineering problem.
The balls in this case, of course, represent a system message. A protocol may communicate in a variety of formats, with JSON being a prime example. Hence to read and write messages, the appropriate library is needed to convert from the message format to a system object.
A protocol usually defines the communication within a bigger system, specifically the messages passed between two sub-systems. In most cases, the implementation details of those systems are a separate issue — often left for third-party libraries to complete. This is essentially designing the pipes first.
By contrast, an API turns a software system or library into more of a vending machine, which receives instructions from brightly colored buttons and then emits any output carefully, neatly — when and where you expect it.
Let’s get back to the balls. Naturally, you might not be at the pipe when your pipe mate wishes to communicate, so you will need an “attention” ball. So maybe rolling down a single white ball means “I’m about to send some information.” After a message has been rolled down the pipe, some form of acknowledgment from the other pipe mate is also useful.
HTTP may be the best-known protocol at the moment. It doesn’t really know what the systems connected to the pipes are doing, so it has to strain to be comprehensive. A clue to this can be seen in the numerous error codes that are occasionally incorrectly implemented. The error code 404 is so well known it is literally a meme, but what does it mean?
The server itself was found, but the server was not able to retrieve the requested page.
Notice, it does not mean that anything actually went wrong, per se. Defining that is for the systems behind the pipes to agree on. Similarly, the generally returned status code 200 (or ‘OK’) has some more specific varieties — for example, if a request was successful and resulted in a new resource being created, a 201 should be returned. When you implement a protocol, you might end up being responsible for handling a wider variety of outcomes than you initially envisaged.
When you wake up in your room (this is beginning to sound like some type of illegal rendition) you may find that there are balls on the floor, and you don’t know when or in what order they came out of the pipe. As you look at the balls rolling around, the need for message queues and/or message brokers in a real system should be obvious. Although I don’t think the popular RabbitMQ will ever have “keep your colored signal balls in order” on their website, this is what it would help with in this case.
ActivityPub
Leaving aside our increasingly strange roommates, let’s finally look at a real example of a popular protocol, ActivityPub.
The ActivityPub protocol is a decentralized social networking protocol based upon ActivityStreams. It provides a client to server API for creating, updating and deleting content, as well as a federated server to server API for delivering notifications and content.
I’ll use my previous posts on a decentralized social media model as a springboard to go through the ActivityPub protocol, and assess the considerations needed to implement from a cold reading of that white paper.
The first question is, what are my balls made of? Well, there are lots or URLs and mentions of JSON, so it is likely the designers are leaning on some understanding of HTTP. The pipes seem to be referred to as ActivityStreams. The URLs are best thought of as endpoints.
It seems that our two separated pipe mates are now called actors. They have an inbox and an outbox, which feel like a kind of queue. Looking at my project, we had an identity file for each tweeter — ActivityPub has something similar that describes the various endpoints and ActivityStreams.
This is the record for Alyssa P. Hacker, as seen in the white paper:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "@context": "https://www.w3.org/ns/activitystreams", "type": "Person", "id": "https://social.example/alyssa/", "name": "Alyssa P. Hacker", "preferredUsername": "alyssa", "summary": "Lisp enthusiast hailing from MIT", "inbox": "https://social.example/alyssa/inbox/", "outbox": "https://social.example/alyssa/outbox/", "followers": "https://social.example/alyssa/followers/", "following": "https://social.example/alyssa/following/", "liked": "https://social.example/alyssa/liked/" } |
What is interesting is the different types of endpoints that are defined for different activities.
In our project model, we used JSON to store a “tweet”, so let’s compare Alan’s tweet:
1 2 3 4 5 |
{ "Text": "Hi there, anyone listening?", "Replyto": 0, "Time": 1668435369 }, |
with the ActivityPub example with Alyssa posting to Ben:
1 2 3 4 5 6 7 |
{ "@context": "https://www.w3.org/ns/activitystreams", "type": "Note", "to": ["https://chatty.example/ben/"], "attributedTo": "https://social.example/alyssa/", "content": "Say, did you finish reading that book I lent you?" } |
By reading the JSON, we can see immediately that the “to” field can be a collection, so we can post to multiple actors. It also means the code processing it needs to handle that. We also note that this is a direct message, from Alyssa to Ben, and not explicitly part of a conversation (as with Alan, who is just replying to a message-id). This message is designed to travel first from Alyssa to her server, and then from her server to Ben’s server, to land finally in his inbox.
There is also a “type” field, which has a value “Note” that we need to be aware of. While JSON thinks of this as a string, we know it is very likely to be a noun that we need to translate into something more atomic. For example, in C# perhaps we would have:
1 |
enum ActivityType {Note, Create, Like, Article, Person, Collection} |
Those other terms are just pulled in from perusing the rest of that white paper. These nouns are very useful for setting the context for the rest of the message.
There are other things we need to note now, because we may need to understand them in detail later. For example, the definition of an “activity.” Is it defining something that will affect the implementation? Or is it more of a human expression of the design?
We also need to assess which parts of the message are mandatory — this will help us detect errors, as well as help with creating the first MVP (minimum viable product). If it isn’t necessary, perhaps we can leave it out.
Another interesting comparison: ActivityPub doesn’t have timestamps within messages. Now that is fine, as long as we remember to pass that responsibility on elsewhere. And this is the final place to rest our attention — what possible problem areas does ActivityPub expect the implementer to deal with? It seems that de-duplication is definitely a problem for the implementer.
Even if you are going to use or choose a pre-built library to handle implementation details, you still need to be aware of overall issues to have any hope of configuring it well, and running it efficiently.
Summary
Here is a summary of the type of things to be aware of when working with protocols, some of which we came across above. Working behind the pipes is a good challenge for an experienced developer.
- You will need to be familiar with the data format that the messages will be using, and the appropriate conversion libraries.
- Protocols demand a little more understanding of engineering concepts, as opposed to the clean call/response development that APIs use.
- Working with protocols requires a bigger sense of what is going on in the system, as it is the developer’s responsibility to fully comprehend the information sent and received.
- Extending the implementation for your project may require a careful examination of what the protocol doesn’t deal with (yet) but what you may need to incorporate somehow.