Engineering is all about tradeoffs, and Tom Bartel, Trivago Team Lead Interface Platform, does an excellent job illustrating the tradeoffs that led to the platform rewrite in his recent blog post. It was a massive undertaking, especially when nothing is inherently “wrong” or broken with the codebase.
What was working? Well, the site at large was. Users were enjoying full functionality on the web and mobile. There was a trained engineering team with many who enjoyed their functional job duties.
What “wasn’t working”? Melody was homegrown so it wasn’t widespread. The ecosystem was small, documentation was limited, and the engineering goes to’s (i.e. Google and Stack Overflow) were very limited or no help at all. There were at max two core maintainers with at least one on-call at all times. It was challenging to onboard new employees and some voiced concerns that they were learning and nurturing non-transferable skills.
Double down on Melody? Allocate resources to modernize the framework, update and add quality documentation, and train engineers to maintain? Or stare down …
…The Blank Page
It’s not a new project but it is a new project. Since the effort, internally called the Web Application Rewrite Project (WARP), is a complete rewrite and not a refactor, all the new project questions arise:
- Libraries: which are most appealing for utilities, date calculation, etc…?
- CSS Files: how to organize?
- Application state: how will this get maintained?
- Event transmission: how will this take place?
- HTML pages: will this get statically pre-generated?
- Structure: for URLs and pages.
- Application initialization: how will it work?
- Component APIs: What will the design look like?
With so many decisions and the team working remotely (remote collaboration was still considered new back when this project started in April of 2020), the Trivago engineers implemented an incredibly methodic, pragmatic approach to tackling the touch engineering questions and ultimately reaching the decisions.
- Decision document: this document collected and organized the engineer’s relevant facts and viewpoints.
- Decision meeting: a place for discussion of viewpoints leading ultimately to the decision.
- Decision owner: curates the document, prepares the decision meeting and makes sure a decision is formed.
Some decisions were easy to arrive at while others were hard-won, some made it from document to testing while others were refactored as implementation didn’t meet the original expectations.
It was during the implementation of another decision that felt too complicated for some developers that Trivago engineers decided to move forward with Next.js and React.
A great piece of advice coming from the Trivago trial and error rewrite process is to get committed to decisions but keep an open mind and course-correct when necessary. Decisions made with the best knowledge and intentions may bring new insights during implementation.
Once the rewrite was fully functional and useful to the user, it was exposed to the real world and tested with dashboards, checks, and comparisons serving as guides for the engineers to see what needed attention.
- User Interaction: Does this differ between products? If so, was the cause bugs or something else?
- Revenue: Does the new application forward to booking sites at the same rate?
- Search Types: Trivago automatically adjusts search types for better results. For example, if a search is too narrow, Trivago will expand the parameters and add results. A direct indicator that something was off is a difference in listings. This caused the further investigation.
It took several months of engineering but finally, the switch was flipped and all user traffic went to the new application.
Rewards Were Reaped!
Startup time relies heavily on the size of the code shipped by Trivago. Since the engineers rely heavily on and give back to open-source libraries such as Next.js, Preact, and react-use, they watch code size closely.
The new product reduced page weight from 2.1 MB to 1.7 MB (19%) for theme pages and from 4.1 MB to 2.6 MB (37%) for result pages. Turning the single-page application into multiple pages and using the automatic code-splitting feature by Next.js turned out to be very beneficial.
As a result, Trivago now runs more smoothly on weaker hardware. Android 6, which makes up about 0.5% of all Android clients is the weakest hardware that uses the Trivago application. Their testing shows the application works fluently on Android 6.
This is a solid tie back to why the rewrite started in the first place. Cleaner code base with quality, widely available documentation both internal and in terms of a global ecosystem available for searching on google and Stack Overflow. New developers have an easier time on-boarding and there is more of a sense of familiarity and transferrable skills to develop.
There is no definitive proof of faster development, the monthly merged pull requests are higher in the new codebase (see chart above) with the same number of engineers as there were working on the older cold base.
There are now ten releases per day as compared to the previous two. With a little more cleanup, the additional legacy systems will be turned off thus allowing more resources available for the new application.
Overall there is a cost to this big rewrite and there were quite a few struggles, revenue was lost though it paired well with the travel slow down of 2020. The engineering team grew immensely in terms of engineering skills as well as soft skills.
With all the setbacks considered, the project was definitely a success.