JavaScript bugs galore in the Node.js ecosystem – detected automatically – Naked Security

Here is an interesting article from the recent USENIX 2022 conference: Mining Node.js Vulnerabilities via Object Dependence Graph and Query.

We’re going to cheat a bit here by not digging deep and explaining the basic research presented by the authors of the article (some math and knowledge of operational semantic notation is desirable when reading it), which is a method for static source code analysis which they call ODGEN, short for Object dependency graph generator.

Instead, we want to focus on the implications of what they were able to discover in the JavaScript Node Package Manager (NPM) ecosystem, largely automatically, using their ODGEN tools in real life.

An important fact here is, as we mentioned above, that their tools are for so-called static analysis.

This is where you aim to examine the source code for probable (or actual) coding errors and security flaws without running it at all.

Testing it by running it is a much longer process that usually takes longer to set up and longer to do.

As you can imagine, however, what is called dynamic analysis – in fact, building the software so that you can run it and expose it to real data in a controlled way – usually yields much deeper results, and is much more likely to expose mysterious and dangerous bugs than just “watch it carefully and guess how it works”.

But dynamic analysis is not only time consuming, but also difficult to do well.

By that, we really mean that dynamic software testing is very easy to do wrong, even if you spend an eternity there, because it’s easy to end up with an overwhelming number of tests that aren’t as varied as you thought, and your software is almost certain to pass no matter what. Dynamic software testing sometimes ends up looking like a teacher asking the same exam questions year after year, so students who have been fully focused on practicing “past essays” end up doing just as well as students. who truly understand the subject.

A motley web of supply chain dependencies

In today’s huge software source code ecosystems, of which global open source repositories such as NPM, PyPI, PHP Packagist and RubyGems are well-known examples, many software products rely on large collections of software packages. other people, forming a complex and heterogeneous supply network. chain dependencies.

Implicit in these dependencies, as you can imagine, is a dependency on each dynamic test suite provided by each underlying package – and these individual tests generally do not (indeed, cannot) take into account how all the packages will interact when combined to form your own unique application.

So, while static analysis alone isn’t exactly adequate, it’s still a great starting point for scanning software repositories for glaring holes, not least because static analysis can be done “offline”.

In particular, you can regularly and systematically analyze all the source code packages you use, without the need to integrate them into running programs, and without the need to offer credible test scripts that force these programs to perform in different realistic ways.

You can even scan entire software repositories, including packages you may never need to use, to shake up code (or identify authors) you’re not inclined to do. trust the software even before trying it.

Even better, some types of static analysis can be used to search all of your software for bugs caused by similar programming errors that you just found through dynamic analysis (or were reported through a bug bounty system) in a single part of a single software product.

For example, imagine a real-world bug report that came from the wild based on a specific place in your code where you used a coding style that caused a use-after-free memory error.

A use-after-free is when you are certain you are done with a certain block of memory and return it so it can be used elsewhere, but then forget that it is no longer yours and continue to use it anyway. Like accidentally coming home from work at your old address months after moving, just out of habit, and wondering why there’s a weird car in the driveway.

If someone has copied and pasted this buggy code into other software components in your company’s repository, you might be able to find it with a text search, assuming the overall code structure has been retained and the comments and variable names haven’t been changed too much.

But if other programmers just followed the same coding idiom, perhaps even rewriting the faulty code in a different programming language (in the jargon, so it was lexically different)…

…then searching for text would be almost useless.

Wouldn’t that be handy?

Wouldn’t it be handy if you could statically search your entire code base for existing programming errors, based not on text strings, but rather on features like code flow and dependencies of data ?

Well, in the USENIX article we’re discussing here, the authors attempted to create a static analysis tool that combines a number of different code features into a compact representation showing “how code transforms its inputs into outputs, and what other parts of the code happen to influence the results”.

The process is based on the object dependency graphs mentioned above.

Extremely simplified, the idea is to label the source code statically so that you can tell which combinations of code and data (objects) used at a given moment can affect the objects that will be used later.

Then it should be possible to search for known bad code behaviors – feelsin the jargon – without actually needing to test the software in a live runtime, and without needing to rely solely on text matching in the source.

In other words, you might be able to detect if coder A produced a bug similar to the one you just found from coder B, whether A literally copied B’s code, followed B’s erroneous advice, or simply chose the same bad work habits. like B

Basically, good static code analysis, despite never monitoring software running in real life, can help identify bad programming early on, before injecting into your own project bugs that might be subtle (or rare) enough in real life that they never show up, even in extensive and rigorous live testing.

And that’s the story we wanted to tell you from the start.

300,000 parcels processed

The authors of the paper applied their ODGEN system to 300,000 JavaScript packages from the NPM repository to filter out those that their system believed might contain vulnerabilities.

Of these, they kept packages with over 1000 weekly downloads (seems they didn’t have time to process all the results) and determined by further review which packages they thought they had discovered an exploitable bug.

In these, they discovered 180 harmful security bugs, including 80 command injection vulnerabilities (this is where untrusted data can be passed in system commands to achieve undesirable results, usually including the remote code execution), and 14 other code execution bugs.

Of these, 27 were eventually given CVE numbers, recognizing them as “official” security vulnerabilities.

Unfortunately, all of these CVEs are from 2019 and 2020, as the practical part of the work in this article was done more than two years ago, but only written now.

Nevertheless, even if you work in a less rarefied environment than academics seem (for most active cybersecurity practitioners, fighting today’s cybercriminals means finishing all the research you’ve been doing as soon as possible in order to to be able to use them immediately)…

…if you’re looking for research topics to combat supply chain attacks in today’s large-scale software repositories, don’t overlook static code analysis.

Life in the old dog still

Static analysis has fallen out of favor in recent years, not least because popular dynamic languages ​​like JavaScript make static processing extremely difficult.

For example, a JavaScript variable might be an integer at one point, then have a perfectly legally albeit incorrectly “added” text string, thus turning it into a text string, and could later end up as another type of object.

And a dynamically generated string of text can magically transform into a new JavaScript program, compiled and executed at runtime, introducing behavior (and bugs) that didn’t even exist when static analysis was performed. been carried out.

But this article suggests that, even for dynamic languages, regular static analysis of the repositories you rely on can still help tremendously.

Static tools can not only find latent bugs in the code you already use, even in JavaScript, but also help you judge the underlying quality of the code in any packages you plan to adopt.


This podcast features a Sophos expert Chester Wisniewskisenior researcher at Sophos, and it’s packed with useful and practical advice for dealing with supply chain attacks, based on lessons we can learn from giant attacks of the past, such as Kaseya and SolarWinds.

If no audio player appears above, listen directly to Soundcloud.
You can also read the entire podcast as a full transcript.

Comments are closed.