Pypi-scan excels at finding misspelling attacks, a type of typosquatting attack in which attackers rely on a user misspelling a package name. The Strengths and Limitations of pypi-scan Copied metadata indicates that that the typosquatter might be trying to camouflage itself as the legitimate package. pypi-scan compares the metadata of the suspicious package to the metadata of the legitimate package, assessing whether there is any copied metadata and alerting the pypi-scan user if so. All suspicious packages then undergo a second check related to package metadata, such as package description and package author. pypi-scan also checks if an attack is switching the order of a package name, say ‘nmap-python’ vs ‘python-nmap.’ Finally, pypi-scan also searches for the existence of homophones, packages with names that are spelled differently but that sound the same. For instance, pypi-scan would flag the package ‘colourama’ as typosquatting on the legitimate package ‘colorama,’ given that only one letter separates these names. First and foremost, the edit distance (Levenshtein distance) of the package name might be less than a user-defined threshold. “Suspiciously close,” for the purposes of pypi-scan, has multiple meanings. Written in Python, pypi-scan loops over all package names in PyPI, checking if each package’s name is suspiciously close to any of the most downloaded packages or to a package name selected by the user. Those interested in the “defender” use case might also be interested in Amazon software engineer Matt Bullock’s pypi-parker, which “parks” an empty package on PyPI in a namespace chosen by the defender to protect PyPI users from particular typosquatting packages. Screenshot of pypi-scan Terminal Output for Hunter Mode Figure 1 is an example of terminal output associated with this mode.įigure 1. Potential typosquatters with suspicious metadata are further flagged. ![]() For the hunter, pypi-scan outputs a list of potential typosquatters for each package. There are two types of pypi-scan users: hunters and defenders.Ī hunter could be an information security researcher or a PyPI administrator scanning PyPI for typosquatters, especially typosquatters on the most downloaded packages. Below we describe the tool’s uses, the tool itself, including its strengths and limitations, and relate our discovery of a malicious package found with the tool. The resulting command line tool, pypi-scan, identifies PyPI packages with similar names or similar package metadata relative to the most downloaded packages or a package of your choice. IQT Labs therefore recently engaged in an exploratory research effort to scan PyPI for typosquatting packages. For instance, one attacker previously created a package named ‘colourama’ to trick speakers of British English intending to download ‘colorama.’ To trick users into downloading these packages, attackers often prey on user typos and user confusion via typosquatting, or mimicking popular package names. If one had the misfortune of downloading them, these packages did nasty things such as stealing credentials or recording keystrokes. Not only are there technical risks, but there are documented security risks: our analysis indicates that at least 55 malicious packages have been reported and removed from PyPI. For some, it’s downright utopian.īut reusing code, particularly someone else’s code, has risks. Anyone can upload code for anyone to download. It’s a software registry for Python packages-files containing code-that makes it free and easy for software developers to reuse Python code, which boosts productivity. ![]() The Python Package Index ( PyPI) is among the reasons that the Python programming language has become a lingua franca of modern software development and data science.
0 Comments
Leave a Reply. |