It’s Go Time: Stream 2.0 Ditches the Pokey Python in Favor of the Faster GoLang
Ever stop to consider how your Twitter or Instagram feed populates, or how YouTube personalizes video recommendations just for you? It’s all in the feed. From the conspicuous — real-time interactive flow like Instagram and Twitter and every social network site, ever — to the passively functional like news, weather or even retail sales, the underlying task is the same: the aggregation and automatic, intelligent dissemination of information. While the concept is simple enough, building a scalable, reliable feed can be a surprisingly complex and resource-draining undertaking.
Enter Stream, an API for building, scaling and personalizing feeds. Stream provides an activity feed platform for developers to implement newsfeeds, instead of building functionality from scratch. Launched in 2014, Stream’s highly scalable open source feed framework currently powers feeds for over 300 million users. The company has accelerated nonstop, landing in the TechStars incubator program less than a year after inception, and going from 0 to 23 employees while opening offices on two continents in the two years after that. In the process, the Stream platform also began to outgrow its original Python-based architecture.
TNS recently caught up with Stream co-founder and CEO Thierry Schellenbach to talk about pioneering “feed as a service” in a microservices-driven, cloud-based industry, and the decision to transition the Stream platform from Python to Go for Stream 2.0.
What gave you the idea to build a startup around newsfeeds?
My first startup was building a social network aimed at fashion that grew to a couple million users. The feed was an essential component of the business and over time it just fell apart. It was a huge engineering challenge, and and we went looking around for solutions, but all we could find was white papers. Which offered really good information, but you had to roll your own solution. There was nothing available off the shelf.
So when I left that organization, this problem set was on my mind. Twenty percent of all apps have a feed, and when you get to a couple million members, it gets to be a ridiculously tough issue. Scalability is everything.
With our API, our customers are basically building the next Facebook, the next Instagram, so features like re-ranking the feed, aggregating the feed — as soon as the data becomes a bit larger, it becomes exponentially more difficult.
Further, more than 500,000 apps out there are still building and maintaining their own feed technology. Every time, all these companies go through the same struggle and reinvent the same technology. We saw how microservices adding Lego pieces you could snap together as parts of your application, and we saw we could provide this particular component. Enabling developers to build feeds in days instead of months, and powered by machine learning so feeds are effortlessly, intelligently aggregated and re-ranked based on relevancy.
The original framework was built on Python, but for Stream 2.0 you switched to Go. What drove that decision?
For many kinds of app, the performance of the programming language you use doesn’t matter a whole lot — it’s just there as the glue between the app and the database. But if, like us, you are an API provider powering feed infrastructure for 500 companies and over 300 million end users, performance differences really start to matter. Python is a great language but for some use cases like ranking, aggregation, serialization and deserialization — its performance is, well, pretty sluggish. We had been optimizing Cassandra, PostgreSQL, Redis, etc. for years, but eventually, we just reached the limit.
With our API, our customers are basically building the next Facebook, the next Instagram, so features like re-ranking the feed, aggregating the feed — as soon as the data becomes a bit larger, it becomes exponentially more difficult. Our data from Cassandra would take one ms on the backend, but then transport to our customers via Python was taking 20 milliseconds. So much more than time than our underlying infrastructure.
Sentry (the open source error tracking software company) ran into similar problems with Python. They ended up using Rust for some of their endpoints and embedding those in Python. Which is one option. But for Stream, serialization, reading from Cassandra, and other performance problems embedded all throughout Python indicated moving entirely to a new language. A common solution is to rewrite certain targeted components in C or Rust, anything Python can use. But that only works if it’s one tiny part, say like image sizing, that is a problem. For us, though it was too widespread.
We started looking at options, like Algolia’s search as a service, they’re very interesting, a startup backed by Y Combinator. We looked extensively at C++, then elixir, a programming language getting a lot of traction right now. And we looked at Scala and the Akka framework. And we looked at Go.
Compared to Scala and C++, Go is a more modern language, a fairly simple language and so it’s easier to train engineers in Go. Which is important when there is only one person on your team with experience in that language! And there are a lot of engineers who WANT to switch to Go, so it becomes a talent recruiting tool as well.
Scala and Akka together are fast, but extremely complex also. We felt that would slow us down too much — we are still a tiny company so we need to balance performance of API with what we can afford to spend on development. Go really hit the sweet spot there — it’s very performant, but also easy to use and onboarding is pretty straightforward.
Elixir was the final serious competitor. And it is very interesting. But the ecosystem is still too young, with not enough libraries for the things we use, and also when we benchmarked it Go was just faster. We will never make that switch again because changing your language is something you should never do more than once. But companies looking for another language in the future should definitely consider Elixir.
It was just seven months ago we decided to move the next generation of the platform, Stream 2.0, to Go, and we just completed the migration. It has been a major boon for customers, with performance and stability both greatly improved.
Python is still in use for some aspects of Stream, though?
So our overall site itself, and all our machine learning are still powered by Python. It has excellent machine learning libraries available. If you look at feeds where machine learning comes in — for example, to reorder and rerank the feed relevant to an individual user, something we do for our enterprise customers — Python is still a very good choice.
What’s next on the horizon?
Well, right now most of our research is dedicated to making sure Stream keeps working for larger and larger apps.
For the future, I am very interested in ever more subtle refinements in machine learning.
Right now setting up machine learning is still pretty complex. We started providing our own machine learning with Stream, but even with a dedicated department, there are so many moving pieces track: all the data in the system, how people engage with that data. Machine learning basically boilerplates things that need to happen. It’s pretty unpolished right now, both the technology and the ways we are still figuring out to apply it. Right now a lot of apps just throw content at you. Or the danger of when machine learning is optimized for engagement without looking at other things. Because the controversial material is the most engaging, and so it will optimize for controversial, and that is a cliff you don’t want to walk off of.
So, evolving unobtrusive but ever more useful ways to apply machine learning. For example, we use Google Cloud for their Vision API so when a user uploads an activity to Stream and there’s an image in there, we use Google to figure out what is in the image and then feed that subject into your machine learning. Previously we are looking at hashtags or app mentions, but now next generation we can use visual keys to understand what a post is about. So if someone uploads a photo about snowboarding, or if you’re engaging with a photo about snowboarding, now we know that’s something you’re interested in.
I believe there are very useful applications for machine learning on the way — job searching, house hunting. LinkedIn has a huge team working on their feed, bigger than our entire company, trying to get to these refinements. Actually useful things like real estate, say, when you’re searching for things a house it’s not just whatever you specify you are searching for but what you click on, and what that leads to — it learns more subtly from how you yourself are organically searching.
So, yes, that’s what we are coming to. Artificial intelligence applied to something useful like hunting for jobs or housing. For more than showing you pictures of cats.
Google Cloud is a sponsor of The New Stack.