New “Trojan Source” Technique Allows Hackers to Hide Source Code Vulnerabilities
A new class of vulnerabilities could be exploited by threat actors to inject visually deceptive malware in a way that is semantically permissible but changes the logic defined by the source code, effectively opening the door to more risk first. party and supply chain.
Dubbed “Trojan Source attacks,” the technique “exploits the intricacies of text encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order than they are displayed, resulting in vulnerabilities that cannot be seen directly by human code examiners, âCambridge University researchers Nicholas Boucher and Ross Anderson said in a recently published article.
Compilers are programs that translate high-level human-readable source code into their lower-level representations such as assembly language, object code, or machine code which can then be executed by the operating system. .
Basically, the problem is with Unicode’s bidirectional (or Bidi) algorithm that allows support for left-to-right (e.g. English) and right-to-left (e.g. Arabic) languages. or Hebrew), and also has what is called bidirectional. overrides to allow words to be written from left to right in a sentence from right to left, or vice versa, thus allowing text of a different reading direction to be incorporated into large blocks of text.
While the output of a compiler is expected to correctly implement the source code provided to it, discrepancies created by inserting Unicode Bidi wildcards in comments and strings can allow a scenario that produces syntactically valid source code in which the display order of the characters presents a logic that diverges from the real logic.
In other words, the attack works by targeting the encoding of source code files to create targeted vulnerabilities, rather than deliberately introducing logical bugs, in order to visually rearrange the tokens in the source code which, although rendered in a perfectly acceptable way, cause the compiler to treat the code in a different way and by radically changing the flow of the program – for example, making a comment appear as if it were code.
“In fact, we are analyzing program A in program B,” the researchers speculated. “If the change in logic is subtle enough that it goes undetected in subsequent testing, an adversary could introduce targeted vulnerabilities undetected.”
Such contradictory encodings can have a serious impact on the supply chain, researchers warn, when invisible software vulnerabilities injected into open source software find their way downstream, potentially affecting all users of the software. Worse yet, Trojan Source attacks can become more serious if an attacker uses homoglyphs to override pre-existing functions in an upstream package and invoke them from a victim program.
By replacing Latin letters with similar characters from other sets of the Unicode family (for example, changing “H” to Cyrillic “Ð”), a malicious actor can create a homoglyph function that apparently resembles the d function. ‘origin but actually contains malicious code that could then be added to an open source project without attracting much attention. Such an attack could be disastrous when applied to a common function available through a dependency or imported library, the document notes.
âThe fact that the Trojan Source vulnerability affects almost all computer languages ââmakes it a rare opportunity for a system-wide and environmentally sound cross-platform and vendor-to-peer comparison of responses,â the researchers noted. âBecause powerful supply chain attacks can be easily launched using these techniques, it is essential for organizations that participate in a software supply chain to implement defenses. “