Spam is a problem almost as old as the internet itself, but to generally it has not been too much of an issue in software development tooling. npm, however, has become so successful that it has become a new vector for web spam. The incident differs from the Left-pad disruption last year, in which a package depended on by many had been taken offline by a disgruntled developer.
As every package in the npm registry also gets an automatically generated web entry, complete with the verbatim publishing of the source readme file, spammers have been uploading empty registry items containing nothing more than readmes advertising things like game cheats and hacks. They take advantage of npm’s automatic search engine optimizations, and the good Google juice already inherent in the platform.
The problem has gotten so bad that npm staff has been emailed by a few people demanding the game cheats they’ve paid for, obviously having already fallen for and paid a phishing site they discovered in npm’s web presence, Silverio said. Thus, npm has been working with an outside security scanning firm which uses machine learning to analyze package entries in the registry.
At the start of the year, the company looked at ways that the security audit results could automatically be acted upon, and did six months of testing to ensure false positives did not occur. Unfortunately, on Jan. 6, an npm employee tasked with acting upon these scan results used an internal tool that nuked a huge swath of code from a single, quite prolific developer, nicknamed Floatdrop.
npm has tools in place to fix such incidents, and within 30 minutes, most of the packages were back online thanks to this tool. Nine of the remaining deleted packages, however, remained offline for a few hours. Those nine packages were the most popular of those deleted.
“Most of the packages were restored quickly, because the restoration was a matter of unsetting the deleted tombstones in our database, while also restoring package data tarballs and package metadata documents,” npm’s own summary of the event noted. “However, during the time between discovery and restoration, other npm users published a number of new packages that used the names of deleted packages. We locked this down once we discovered it, but cleaning up the overpublished packages and inspecting their contents took additional time.”
As many developers were using npm at the time, the community quickly noticed that some essential packages were not being found in the registry. Socially minded developers in the community then quickly registered the empty namespaces for these packages and alerted npm Inc. to the missing files. Those that registered the missing packages were squatting purposefully to prevent malicious users from taking over the popular bits of code.
That did make it impossible to restore the code from backup without a lot of hand-done work, which accounts for the hours of downtime for those nine packages.
Ready for Automation?
Still, said Silverio, in the end, all of the policies, procedures and tools in place at npm ensured a rapid response and correction of the problem.
As for the automated security scanning and moderating, Silverio said that she’s now fairly certain that’s not ready for prime time, yet.
“There are certain things where the signal is strong enough, where I think we might act on it. The game cheats are easy to recognize, and they’re connected to other spam,” said Silverio. Still, unattended automated spam culling is likely not to be implemented in any broad sense any time soon.
“This is my moment where I’ve realized we’ve reached the point in any service when a dedicated anti-spam person has to be there. I will need one person to start with, and a team eventually working for eternity,” said Silverio.
Created in 2009 as an open source project for sharing packaged modules of code, npm currently hosts around 475,000 software projects.