Microsoft Copilot programming aid leaks secrets and ignores licenses?
We didn’t steal your code, it was the AI!
Microsoft Copilot is a very interesting tool, with a bit of a larceny side. It is a programming aid that can turn natural language into working code, primarily for Visual Basic but also for a variety of other languages. Microsoft adopted the OpenAI codex and trained it using billions of lines of code to be able to do this trick. The problem seems to be that they never trained him to read licenses.
The code that Microsoft Copilot was trained on came from various open-source software repositories, and since Microsoft bought GitHub in 2018, it’s easy to guess which repository much of the code came from. There is one small glitch, however, which has caused what could be a pretty big lawsuit to kick off.
Much of the code he trained on and uses liberally in his translation from natural language to code is covered by GPL, Apache, MIT and other OSS licenses. These licenses require the author’s name to be attributed when the code is used, unlike Microsoft Copilot, even when excerpts are longer than 150 characters. To make matters worse, some of the code he grabbed contained secrets that were posted to public repositories but were not intended for general consumption.
It will be interesting to see how this plays out.