TNS
VOXPOP
Where are you using WebAssembly?
Wasm promises to let developers build once and run anywhere. Are you using it yet?
At work, for production apps
0%
At work, but not for production apps
0%
I don’t use WebAssembly but expect to when the technology matures
0%
I have no plans to use WebAssembly
0%
No plans and I get mad whenever I see the buzzword
0%
Data / Operations / Storage

How to Boost Mastodon Server Performance with Redis

A default Mastodon installation is fine for a tiny server, but it takes work to make it scale. Here’s what to know before you get started.
Feb 2nd, 2023 8:00am by
Featued image for: How to Boost Mastodon Server Performance with Redis
Mastodon” by Jim Griffin is in the public domain.

Rapid growth in users and activity has tested the scalability of many Mastodon servers, and has stressed their administrators. According to a TechCrunch interview with Mastodon’s creator, Eugen Rochko, Mastodon grew to 2.5 million monthly active users across 8,600 different servers in the wake of the mass exodus from Twitter. Those numbers were from December 2022, and the adoption is continuing to grow.

How rapid was that growth and what did it do to the servers? The chart below showing the job queue over time from one large instance should give you a good idea. The upper line is the number of jobs processed, and the lower line is the number of jobs that failed.

Figure 1: Effect of Twitter exodus on a Mastodon job queue. Courtesy Nora Tindall

Redis open source (Redis OSS) is part of the Mastodon tech stack. Anyone who wants to implement a Mastodon server or improve its performance should learn how best to configure Redis elements along with other settings. In this article, we summarize Mastodon’s architecture, explain where Redis fits in and point out potential chokepoints. We help you to start tuning your own instance and identify the first actions to take to address scalability issues.

What Is Mastodon?

Let’s start with a short technical review.

Mastodon describes itself as “a free, open source social network server based on ActivityPub where users can follow friends and discover new ones. On Mastodon, users can publish anything they want: links, pictures, text, video. All Mastodon servers are interoperable as a federated network (users on one server can seamlessly communicate with users from another one, including non-Mastodon software that implements ActivityPub).”

ActivityPub (repo) is a W3C-recommended decentralized social networking protocol based on the Activity Streams 2.0 data format, which is a model for representing potential and completed activities using JSON. ActivityPub provides a client-to-server API for creating, updating and deleting content, as well as a federated server-to-server API for delivering notifications and content.

Mastodon and the Fediverse

The federated network of servers tied together with the ActivityPub, OStatus, Zot! and diaspora* protocols is called the Fediverse. Servers on the Fediverse, called “instances,” federate with other instances so that the user experience is that of an integrated social network. Individual instances manage their own operations and security.

Mastodon is one of over 20 kinds of servers that implement ActivityPub. As of December, there were 21,501 servers in the Fediverse, according to Fediverse Observer.

Figure 2: The Fediverse. Courtesy Per Axbom. CC BY-SA.

The Mastodon Architecture

Mastodon is a Ruby on Rails (RoR) application with a React.js frontend. It follows the standard practices of those frameworks. To run Mastodon, you need Ruby, Node.js, PostgreSQL, Redis, and an SMTP server; Sidekiq is a RubyGems. Adding a few additional services, such as NGINX and Cloudflare, can improve Mastodon’s scalability and resistance to DDoS attacks.

The architecture diagram below is somewhat oversimplified. PostgreSQL is the database that holds users and posts, among other items. Sidekiq is a background job system for Ruby and Rails. Redis is the in-memory database that caches for PostgreSQL (omitted from the diagram) and holds the Sidekiq job queues (included in the diagram). File storage is usually held in an Amazon S3 bucket or the equivalent; storing external files on local disks is problematic for multiple reasons, and storing them on NFS turns out to be a disaster waiting to happen.

Figure 3: Overview of Mastodon Architecture. Courtesy Software Mill.

What Is Redis?

Redis is a NoSQL in-memory data structure store that can persist on disk. It can function as a database, cache and message broker. Redis has built-in replication, Lua scripting, least recently used (LRU) eviction, transactions and different levels of on-disk persistence. It provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

The Redis data model is key-value, but many kinds of values are supported: strings, lists, sets, sorted sets, hashes, streams, HyperLogLogs and bitmaps. Redis also supports geospatial indexes with radius queries and streams.

Redis OSS has plenty of power, but Redis Enterprise adds features for speed, reliability and flexibility, as well as a cloud database as a service. Redis Enterprise scales linearly to hundreds of millions of operations per second, has active-active global distribution with local latency, offers Redis on Flash to support large datasets at the infrastructure cost of a disk-based database, and provides 99.99% uptime based on built-in durability and single-digit-seconds failover. All this is provided while keeping database latency under one millisecond.

How to Install Mastodon

You can install Mastodon from the source on a Debian 11 or Ubuntu 20.04 system, or from a cloud instance, as long as you have root privilege. It’s a lengthy process, but ultimately doing it manually may give you the most control over and knowledge of what you install.

You can also install Mastodon from the marketplace on many cloud providers, including DigitalOcean, Linode, AWS and so on; install Mastodon on Docker or Kubernetes, including the cloud providers’ own Kubernetes; or lease an instance from a Mastodon-hosting provider, such as Masto.host, Fedi.monster or Cloudplane. Many but not all Mastodon-hosting providers are closed to new instances as of this writing; they should eventually open up again.

A number of people have posted about their experience installing and running their own Mastodon instances. You can find a lot of these reports by searching for “my own Mastodon server” or “personal Mastodon instance.”

Mastodon Performance Chokepoints (and What to Do about Them)

One of the more lucid accounts comes from Nora Tindall. To summarize her conclusions, the “default Mastodon configuration is broken. It’s fine for a tiny instance on a tiny server, but once you start to grow, you must scale it up.”

Where are the chokepoints? According to Tindall, the big ones are database resources (you need to give PostgreSQL half the RAM), Sidekiq queues (separate them) and database connections (make sure there are enough connections to handle the web server, Sidekiq queues and streaming: Nora suggests 200 total database connections).

Another useful set of tuning tips comes from Hazel Weakly:

  • Tune NGINX by increasing its worker_rlimit_nofile and worker_connections values.
  • Raise the PostgreSQL max_connections, but don’t go nuts. Consider 512 the upper limit.
  • Consider a database pool such as pgbouncer. This may allow you to avoid PostgreSQL read replicas.
  • Hazel quotes Nora’s recommendations for DB_POOL, MAX_THREADS, WEB_CONCURRENCY and STREAMING_CLUSTER_NUM.
  • Use S3 or the equivalent for object storage, not local disks and especially not the Network File System (NFS).
  • For the default, push and pull Sidekiq queues: Set DB_POOL to 10 and set -c to the value of $DB_POOL.
  • For the ingress and scheduler Sidekiq queues: Set DB_POOL to 5 and set -c to the value of $DB_POOL.
  • For the mailer Sidekiq queue: Set DB_POOL to 1 and set -c to the value of $DB_POOL.
  • The ingress queue is very CPU-bound and sucks up a whole CPU core with very few threads. Be prepared to spin up multiple processes for the ingress queue. Set the DB_POOL to 10.
  • Consider moving Puma (the Mastodon web server) and Sidekiq to their own machines, and if needed, add more of these behind NGINX for load balancing.
  • Run Sidekiq against a Redis instance that is not configured as a cache but as a persistent store, and scale that with Redis Sentinel (not with Redis Cluster, because the Sidekiq queue has changing keys).
  • Run the Redis instance fronting PostgreSQL as a cache, and scale that with Redis Cluster. Yes, this means you need at least two Redis instances (if you’re using Redis OSS).

Tentative Conclusions

As you’ve seen, there’s a lot that can be done to scale Mastodon to handle increased traffic. Next time we will explore how and when to scale Mastodon even more, using larger Redis RAM allocations and by employing Redis Enterprise.

In the meantime, you can try Redis Enterprise for yourself to learn more about its powerful features.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, Docker.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.