Python creator Guido van Rossum vividly remembers what he’d been doing right before joining Microsoft last November. After working at Dropbox for seven years, “I was really done there… I thought, ‘Oh, I’ll go travel with my wife, and we’ll go on lots of bike trips, and we’ll get together with friends.’ And then the pandemic happened, and life was pretty limited!” he recalled in a Q&A session with Microsoft principal cloud advocate manager, Francesca Lazzeri.
In this interview, he explained the latest work he is doing, making the CPython compiler run faster, a project he announced embarking upon at the second virtual PyCon US 2021 conference, during a special two-day “Python Language Summit.”
“I got bored sitting at home while retired. I applied at Microsoft and got hired. I was given freedom to pick a project. I chose to go back to my roots,” he said.
CPython is, of course, the heart of the Python programming language — its default implementation as well as being an interpreter and compiler for Python code. So what’s the story behind van Rossum’s decision to form a Microsoft-funded team to finally try making it faster?
Not Machine Learning
Turbocharging CPython was not his original intention for joining Microsoft.
For the first few months, “I actually just oriented myself in the company, taking meetings with random people, almost, who are doing interesting stuff with Python: in the context of machine learning, of course; Azure; notebooks; Excel — you name it!” He looked back on it as an almost systematic search for which project he’d focus on.
He soon realized machine learning was not the path forward for himself.
van Rossum recognized machine learning as “a very big field” whose growth has contributed to the growth of Python. And he believes the machine learning field “has become more successful because of Python… It’s a very nice symbiosis.” But van Rossum then admitted he’s one of those people who over his life has always ignored machine learning. “I know exactly how to build a web server in Python or a web client or something doing something with databases or user interfaces. I have no idea how to begin writing a framework for machine learning, or even how to do a simple machine learning application.”
While he’s attempted tutorials and talked to “all sorts of smart people in the machine learning field at Microsoft… I realized it would be a mistake to try to contribute to that field,” because, as he sees it, “you have to spend three or four years on the Ph.D. in the field, and then maybe you can contribute something useful. I don’t know if that’s your experience, but that’s what it felt like to me!”
“And so in the end, I decided I would go back to my roots and collect a team of people and start working on making Python faster,” he said
But it’s a project very near and dear to his heart — and for a couple of reasons. “I’m very excited we’re going to contribute as core developers and as Microsoft employees or contractors, directly to CPython.” ZDNet notes that five Python core developers already work for Microsoft — besides van Rossum, there’s also Eric Snow, Brett Cannon, Steve Dower, and Barry Warsaw. Van Rossum’s team has already contributed “a few small things” to Python 3.10 — but it went into beta just before the launch of their new push for a speedier Python. “So now we feel we have about a year to prove that we can actually move the needle on Python performance, and 3.11 will be much faster than 3.10.”
What’s the Plan?
van Rossum acknowledges that other groups are already working on the same problem, including Cinder, which is Instagram’s performance-oriented fork of CPython, as well as the similar Pyston project, and Pyjoin, which is hoping to fit CPython with a C API so that ultimately it can be connected to just-in-time compilers. “So making Python faster is sort of suddenly back on the front page of the news, I would say, and I hope that with my team, I will be able to contribute some to that field. I do know something about that area…”
One slide in his talk ends with a bullet point saying “This is Microsoft’s way of giving back to Python.”
In his talk, van Rossum also elaborated on one specific effort to speed up Python, which he called “the ‘Shannon plan’.” Posted to GitHub last October by developer Mark Shannon), it aims to speed up CPython five-fold over the next four releases using techniques like an adaptive interpreter, improvements to the runtime, and targetted just-in-time compilers for small regions of code. And apparently, that’s the plan that filled van Rossum’s need for a project, with Mark Shannon now forming a part of his Faster Python team.
The team also includes Eric Snow, recently mentioned in a Microsoft developer blog post. It touts Snow’s experiments in exposing CPython’s support for multiple interpreters in the same process (known as “sub-interpreters”) as a way of trying to tap into the power of multithreaded performance. van Rossum expanded on the idea later in his Q&A, calling it “a feature that we’ve sort of had for a long time, but it had various issues, and slowly we seem to be migrating to a version of multiple sub-interpreters where there’s no shared data between the sub-interpreters at all, except that they all live in the same process, and you can switch between them very efficiently.
“And at that point, once each sub-interpreter has its own global interpreter lock, we will have a different approach to sort of using all the cores that you need, because you can just spin off worker threads, in a sense — except worker sub-interpreters. And that’s probably a more effective model than multiple threads with shared everything, like we have in Java or C++.”
They’ll also explore Shannon’s idea of an adaptive bytecode interpreter, as well as other areas for speed improvements, including the layout of frame stacks, exception handling, and the compiler. Python Enhancement Proposal 659 offers details on plans for a performance-improving adaptive interpreter “that speculatively specializes on the types or values it is currently operating on” (over very small regions). How much will this improve performance? “It is hard to come up with meaningful numbers,” explains the proposal, “as it depends very much on the benchmarks and on work that has not yet happened. Extensive experimentation suggests speedups of up to 50%. Even if the speedup were only 25%, this would still be a worthwhile enhancement.”
InfoWorld notes that they’re saving the best for last: the idea of just-in-time compilation. “In his talk, van Rossum suggested that such plans would be considered after Python 3.11, because it made sense to first obtain whatever performance improvements could be had with more targeted changes first.”
Will they be able to double the speed of CPython? “We’re far from certain…” warned one slide, while adding the team was “optimistic and curious.” (A later slide acknowledges that to achieve a five-times-faster performance, “we’ll have to be creative.”)
But beyond what they’re doing, van Rossum also emphasizes how they’ll do it. In his talk, van Rossum promised the Microsoft-funded team would do fully open collaborations with Python’s core developers. And under a bullet point headlined “Everything open source,” he stated explicitly that all the code repositories (as well as all the surrounding discussion) will be open to the world. van Rossum added that “We’ll take care of maintenance and support too,” as part of an ongoing effort to bring smooth incremental changes to CPython. (“No long-lived forks/branches, no surprise 6,000-line pull requests.”)