Python Package Repository Struggles to Deal with Typosquatting
Ten rogue packages with misspelled names intentionally chosen to trick users have recently been found on the Python Package Index (PyPI), the main repository for community-contributed Python components. This is the latest in a string of typosquatting attacks discovered on open-source software repositories over the past few years.
The rogue Python packages were removed after the PyPI maintainers were alerted by Slovakia’s national Computer Security Incident Response Team (CSIRT). However, the PyPI administrators have been warned about typosquatting risks in the past by various security researchers and it’s only now that they’ve started working on a more permanent blacklisting solution.
“Copies of several well-known Python packages were published under slightly modified names in the official Python package repository PyPI (prominent example includes urllib vs. urrlib3, bzip vs. bzip2, etc.),” SK-CSIRT said in a security advisory last week. “These packages contain the exact same code as their upstream package thus their functionality is the same, but the installation script, setup.py, is modified to include a malicious (but relatively benign) code.”
According to SK-CSIRT, the rogue packages were:
- acqusition (uploaded 2017-06-03 01:58:01, impersonates acquisition)
- apidev-coop (uploaded 2017-06-03 05:16:08, impersonates apidev-coop_cms)
- bzip (uploaded 2017-06-04 07:08:05, impersonates bz2file)
- crypt (uploaded 2017-06-03 08:03:14, impersonates crypto)
- django-server (uploaded 2017-06-02 08:22:23, impersonates django-server-guardian-api)
- pwd (uploaded 2017-06-02 13:12:33, impersonates pwdhash)
- setup-tools (uploaded 2017-06-02 08:54:44, impersonates setuptools)
- telnet (uploaded 2017-06-02 15:35:05, impersonates telnetsrvlib)
- urlib3 (uploaded 2017-06-02 07:09:29, impersonates urllib3)
- urllib (uploaded 2017-06-02 07:03:37, impersonates urllib3)
There is evidence that at least some users installed these packages between June and September because they reported installation errors online. Those errors were caused by the code added to the rogue packages that was incompatible with Python 3.x.
When executed successfully, the malicious code gathered information like the name and version of the fake package, the username of the user who installed the package and the computer’s hostname. This information was encrypted with a password and sent to an external server.
It’s not clear if this action had any malicious intent, but attackers often use environment fingerprinting techniques to identify interesting targets in preparation for more serious attacks.
“It’s … shocking that PyPI, a project so crucial to the whole Python ecosystem, relies entirely on volunteers.” — Hanno Böck
The packages might also simply be part of someone’s research project, though the obfuscation of extracted data is somewhat suspicious. It wouldn’t be the first time when researchers test typosquatting attacks on PyPI or other component repositories.
In 2016, a computer science student named Nikolai Tschacher uploaded around 200 packages with names close to those of popular components on several repositories, including PyPI. He included a notification system that reported back when the packages were installed and counted 15,221 unique installations for the rogue Python packages.
A software developer named Steve Stagg had a different idea: He registered packages on PyPI that had the same names as modules from Python’s standard library. Users don’t need to download and install these modules from PyPI because they’re already present on their systems if they have Python, but many of them apparently attempt to do so anyway.
Stagg’s packages were downloaded around 244,000 times over several months. In May, he said that he tried to contact the PyPI administrators, but didn’t receive any response.
Two other researchers named Benjamin Bach and Hanno Böck recently redid Stagg’s experiment with standard library names, but this had no connection to the packages found by SK-CSIRT. The two researchers registered the names of 128 standard Python library packages on PyPI and recorded 7,556 download attempts over three days.
Following these new reports, the PyPI maintainers, who are volunteers, started working on a blacklisting tool, It’s unclear whether this will be used for reactive or proactive responses and some people have already expressed concern about blacklisting packages in bulk. The developers of pip, the command line tool that downloads and installs packages from PyPI don’t seem too keen on tackling this issue on their end.
It’s hard to justify why the packages used by Tschacher in his 2016 report weren’t blacklisted, Böck told me. “And it’s also shocking that PyPI, a project so crucial to the whole Python ecosystem, relies entirely on volunteers.”
PyPI has over 117,000 packages, but according to Böck, there are probably between 50 and 100 very popular packages that most Python developers use regularly. He believes that a solution could be to warn users about potential risks when they attempt to install any other package aside from those popular “confirmed” packages.
Package typosquatting is not only a problem for PyPI. Similar attacks have affected repositories for other programming languages as well. However, in the case of Python, the risks might be higher because every package downloaded from PyPI and installed via pip has a setup.py file containing code that will be executed by the Python interpreter.
This means that a malicious package can lead to a serious system compromise. Tschacher noted in his 2016 report that around 40 percent of the users who downloaded and executed his rogue packages did so with administrative privileges.
Feature image via Pixabay.