PyPI Strives to Pull Itself Out of Trouble
The Python Package Index (PyPI), is the most popular Python programming language software repository. It’s also a mess. Earlier this year, the FortiGuard team discovered zero-day malware in three PyPI packages called “colorslib,” “httpslib,” and “libhttps.” Before that, 2022 closed with PyTorch-nightly on Linux being poisoned with a fake dependency. More recently, PyPI had to stop new user registrations and project creations because of a flood of malicious users. PyPI isn’t the only one to notice the user trouble. The Python Software Foundation (PSF) received three subpoenas for PyPI user data. What is going on here!?
The root problem is that Python is used extensively in many problems, and PyPI, with a full-time staff of two people and relatively little automation, simply doesn’t have the resources to deal with securing its code repository. It’s trivial to place malware in PyPI. Adding salt to the wound, ChatGPT and other Generative AI tools have made it child’s play to create malicious code.
Users are also cavalier about using PyPI code. Package managers, such as Python’s default manager pip, use PyPI as their default source for packages and their dependencies. If you don’t look closely at what you’re installing, you won’t see malware coming until it’s too late.
As Pete Morgan, Co-founder and CSO at Phylum, a software supply chain security company, said, “The dynamics between software, security, and business are changing. The purpose of package managers like PyPI is to provide a platform for developers to share their code. But it is a company’s decision to allow that code from a stranger on the Internet to be used in the applications they build for profit. So who is responsible for ensuring the code is secure? The volunteer maintainers at PyPI are doing their best. … Or should the business be taking more responsibility for protecting its developers when using open source code?”
As it is, “The volume of malicious users and malicious projects being created on the index in the past week has outpaced our ability to respond to it in a timely fashion, especially with multiple PyPI administrators on leave.” PyPI is now letting new users in, but the administrators are still close to being overwhelmed.
Were some of the malicious users the targets of US DoJ subpoenas? We don’t know Neither does the PSF. What we do know is that the PSF remains “committed to the freedom, security, and privacy of our users.”
So, the PSF is adopting new data retention and disclosure policies. Specifically, PyPI, going forward, is reducing how it retains and uses Internet Protocol (IP) addresses. But, while that’s all well and good, it won’t help keep our criminal hackers.
To make it harder for the crooks of code, the PyPI will require every account that maintains any project or organization on PyPI to enable Two-Factor Authentication (2FA) on their account by the end of 2023.
In other words, ordinary users will still be able to use PyPI without 2FA… for now. Eventually, as does GitHub, all users will be required to use 2FA.
This isn’t the first time PyPI tried to get top developers to use 2FA. In July of 2022, PyPI had a security key giveaway that began mandating 2FA for the top 1% of projects on PyPI by download count. Earlier this year, PyPI introduced “Trusted Publishing.” This uses the OpenID Connect (OIDC) standard to exchange short-lived identity tokens between a trusted third-party service and PyPI.
There are some people who believe that efforts to improve supply chain security benefits only corporate or business users, and that individual developers should not be asked to take on an uncompensated burden for their benefit.
We believe this is shortsighted.
A compromise in the supply chain can be used to attack individual developers the same as it is able to attack corporate and business users. In fact, we believe that individual developers are in a more vulnerable position than corporate and business users.
Besides, Stufft also noted, “The workload to support end users relies heavily on a very small group of volunteers. When a user account report is seen by our trusted admins, we have to take time to properly investigate. These are often reported as an emergency, red-alert-level urgency. By mandating 2FA for project maintainers, the likelihood of account takeovers drops significantly, reserving the emergency status for truly extraordinary circumstances. Account recovery becomes part of normal routine support efforts instead of admin-level urgency.”
Will this be enough? Probably not. As Stufft commented on Reddit, PyPI “was first created back in 2002 or 2003 …, and was sort of designed as a weekend hack project to showcase an idea to bring a package repository to Python.” It’s still close to its hackish roots. In short, PyPi comes with an enormous amount of technical debt.
Hopefully, adding a dedicated PyPI Safety and Security Engineer role will help. Frankly, PyPI can use all the help it can get. It’s a virtual project that is all too vulnerable to attacks.