Frustration Mounts over Python 3 Migrations
On Jan. 1 this year, the Python 2 codebase was frozen. From that date, there will be no further backports of Python 2, in effect leaving the langauge, and its runtime, effectively an obsolete technology. Thus ends approximately 13 years “of parallel maintenance of Python 2 and 3 by the core development team for the reference interpreter,” explained an FAQ by core developer Nick Coghlan.
That final version of Python 2 is now passing through its beta and Release Candidate phase, with the very last production release of Python 2.7.18 expected in April of 2020.
While most everyone in the Python community agrees that Python needed an overhaul — not the least for the much needed and long-overdue Unicode support. But many with perfectly-good working Python 2 code are nonetheless feeling the frustration. And the real story of that transition is scattered around the web, in the stories of the real-world developers who are leading the effort.
Tales from the Transition
Some of the frustration was neatly summarized last week by Chris Siebenmann, whose Twitter profile identifies him as an “overcommitted sysadmin.”
“Functioning code that you don’t have to maintain and that just works is an asset; it sits there, doing a valuable job, and requires no work,” Siebenmann wrote in a blog post. “Code that you have to do significant work on just so that it doesn’t break (not to add any features) is a liability; you have to do work and inject risk and you get nothing for it.”
But his position didn’t get a lot of sympathy from Phil Rhodes, the cloud/AI architect who founded Fogbeam Labs, which makes open-source enterprise software, who made the opposing case on Hacker News. “People have known that they needed to start moving away from Python 2 for around 12 years. If you’re panicking now because Python 2 is about to go away, it’s hard not to ask ‘What were you waiting for?'”
Everyone's insistence on getting rid of Python 2 is magically transforming all of this perfectly functional and useful Python 2 code we have from an asset to a liability. You can imagine how I feel about that.
— Chris Siebenmann (@thatcks) March 3, 2020
The big firms are facing migration challenges as well. LinkedIn senior software engineer Barry Warsaw, who is also a member of the Python Steering Council and a core Python developer since the mid-90’s, described in a blog post LinkedIn’s lengthy “multiquarter effort” to complete its codebase’s transition, “after approximately two quarters of planning and two quarters of execution.”
The development team begun by graphing dependencies to prioritize the work — identifying 75 “foundational” repositories — and then created “bilingual” versions of their internal libraries (which could use either Python 2 or Python 3).
Citing the work of multiple teams and departments, Warsaw wrote that “In total, the effort entailed the migration of about 550 code repositories (libraries, applications, and services).” Besides user-facing services, LinkedIn also uses Python internally for its CI/CD framework, command-line interfaces and deployment and data science tools, in a non-monolithic sprawl Warsaw describes as “hundreds of independent microservices and tools, and dozens of supporting libraries, all owned by independent teams in separate repositories.” (The New Stack also covered other LinkedIn Python migrations in 2017).
But not everyone felt so cheerful. In January, Developer Wayne Rowcliffe posted on Hacker News that his workplace was finally “on the brink of completing the transition” to Python 3.
“The end result of this is that I just spent a good chunk of last week reviewing a pull request with 70,000 lines of changes, which was one of the final in a series of ~10k line pull requests that came in through the fall. All of this was the heroic effort of one of my coworkers who had the unenviable task of combing through our entire codebase to determine ‘This is unicode. This is bytes. Here is an API boundary where we need to encode/decode.’ etc.
“It was a nightmare of effort that I’m glad to have behind us.”
In a later comment, he distilled their strategy down to nine words. “[W]e ran the linter and test suite until things passed. It’s just when you have a million lines of code that takes quite a while.”
Another critique came from San Francisco-based developer Gregory Szorc, maintainer of the Mercurial revision-control tool. Mercurial has strong ties to the Python community — Python had been using Mercurial for its repository but switched to Git in 2016. But in a blog post, Szorc described what he’d learned as Mercurial worked its way toward achieving Python 3 support on Nov. 5, 2019 — and offered a few criticisms of bottlenecks that he says came from within the Python language.
“The project’s late start at a Python 3 port can be significantly attributed to Python 2.4 and 2.5 compatibility holding us back,” he wrote. “we dropped support for Python 2.6. This significantly reduced the complexity of supporting Python 3, as there was tons of functionality in Python 2.7 that made it easier to target both Python 2 and 3 and now our hands were untied to utilize it.”
But surprisingly, Szorc says there were also even some changes between versions of Python 3 “that we had to wallpaper over as well,” and this came up in the “home stretch” in mid-2019. While Python 3.7 had the fewest failures, “We had to spend extra effort to get Python 3.5 and 3.6 working as well as 3.7. Same for 3.8.”
Soon the big day came for Mercurial to make the switch. On its Continuous Integration system (using Amazon’s AWS DynamoDB, S3, and EC2 spot instances for job execution), Szorc began testing on Python versions 3.5, 3.6, 3.7 and 3.8 on Linux (as well as Python 3.7 on Windows). But though it passed all its testing, “I view shipping as only a milestone — arguably the most important one — in a longer journey. There’s still a lot of work to do … Our users will likely be finding a long tail of miscellaneous bugs on Python 3 for years.”
And there’s already still “a handful of known issues” on Windows.
The experience left Szorc with some sour feelings. “As much as I have historically loved Python — from the language to the welcoming community — I am still struggling to understand how Python could manage to inflict so much hardship on the community by choosing the transition plan that they did,” Szorc wrote. “I believe Python’s choices represent a terrific example of what not to do when managing a large project or ecosystem. Maintainers of other largely-deployed systems would benefit from taking the time to understand and reflect on Python’s missteps…”
“The initial approach of Python 3 mirrors a folly that many developers and projects make: attempting a rewrite instead of performing incremental evolution. For established projects, large scale rewrites often go poorly. And Python 3 is no exception.”
Szorc’s blog post attracted over 400 upvotes when it turned up on Hacker News – and another 339 comments, including one New York City software engineer who disagreed. “I’ve been involved in multiple non-trivial libraries and frameworks that supported both python2 and python3 for many years with the same codebase… and it really wasn’t anything like this… Yes, you pretty much had to wait for python-3.4 to be released and for python-2.6 to be mostly retired in favor of python-2.7. Then, starting in early 2014, it was pretty straightforward to make a clean codebase compatible with python-2.7 and python-3.4+.”
One student at the University of Hawaii argued there was too much faith in a migration program called 2to3 which attempted to offer an automatic code translator. “[T]he transition plan for 2-to-3 just didn’t work,” they complained in a response. “They thought everyone would run 2to3 in a big bang, and then we’d all switch over to 3 in a few years.
“Instead it dragged out over a decade because in reality, we needed to write code that was compatible with both 2 and 3… until enough things were on 3 to drop 2 support.”
And some questioned what had been gained for the effort. Brian Davis, an electrical engineer in Washington and self-described “old school web dev” shared his thoughts after converting “a number” of smaller projects to Python 2. “Changes to string handling tripped me up and the changes to relative imports took some thinking. But the biggest frustration was the nagging question: Why am I doing this?”
In some ways, Python 2 isn’t really gone. For instance, there is Tauthon, a backward-compatible fork of the Python 2.7.17 interpreter with new syntax, and libraries backported from Python 3.x. Tauthon can run Python 2.7 code and C-extensions along with some of the new features from Python 3.x.
Development tool company ActiveState is offering commercial support for both Python 2 and its standard libraries, as well as the third-party open source packages listed in the Python Package Index.
Nonetheless, most developers are now living in a Python 3 world. But in the end, maybe it was the unsung effort to get there that led to the 400+ anonymous upvotes for the blog post by Szorc. “The effort required to port to Python 3 was staggering,” his blog post remembers. “For Mercurial, Python 3 introduces a ton of problems and doesn’t really solve many. ”
“To call the Python 3 transition disruptive and distracting for the project would be an understatement,” he wrote. “As a project maintainer, it’s natural to ask what we could have accomplished if we weren’t forced to carry out this sideshow.”
Feature image by skeeze from Pixabay.