paint-brush
Allegations of Copyright Infringement by GitHub Copilotby@legalpdf

Allegations of Copyright Infringement by GitHub Copilot

by Legal PDFSeptember 7th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

GitHub Copilot, an AI tool powered by OpenAI's Codex engine, faces accusations of copyright infringement for generating code that closely resembles copyrighted educational material. This controversy arises due to Copilot's inability to understand code or recognize license terms, potentially leaving users unaware of their legal obligations. Explore how Copilot's code generation process works and its impact on programmers' practices.
featured image - Allegations of Copyright Infringement by GitHub Copilot
Legal PDF HackerNoon profile picture

DOE v. Github (original complaint) Court Filing, retrieved on November 3, 2022 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 16 of 37.

VII. FACTUAL ALLEGATIONS

C. Copilot Outputs Copyrighted Materials Without Following the Terms of the Applicable Licenses


66. GitHub Copilot works in a similar way to OpenAI Codex. As mentioned above, a modified version of Codex is used as the engine that powers Copilot.


67. Copilot is installed by the end user as an extension to various code editors, including Microsoft’s Visual Studio and VS Code. As the user types into the editor, their code is uploaded in real time to Microsoft’s Azure cloud platform, where they become prompts for Copilot.


68. When we give Copilot the same prompt discussed above in Paragraph 48, “function isEven(n) {”, it interprets the prompt as the beginning of a function written in the JavaScript language that will test whether a number is even, same as Codex.


69. However, the Output of Copilot in response to the prompt is different than Codex, namely:


function isEven(n) {

return n % 2 === 0; 

}


70. This function is much closer to what a human programmer might use as compared to Codex’s older, inaccurate offering. It handles all values and types of “n” correctly. It does not cause a stack overflow for larger values of “n” like the Codex Output.


71. Copilot’s Output, like Codex’s, is derived from existing code. Namely, sample code that appears in the online book Mastering JS, written by Valeri Karpov.[10] Like Codex’s Output, Copilot’s is also based upon copyrighted educational material. Mastering JS is a set of educational exercises for programmers. Like Eloquent Javascript, there are many copies of Karpov’s exercise stored in public repositories on GitHub. Programmers working through Mastering JS store their answers there.


72. If Copilot is prompted with the name of a function that will test whether a number is prime (that is, a number that can only be evenly divided by 1 and itself ), namely “function isPrime(n) {”, it returns:


function isPrime(n) {

if (n < 2) { 

    return false; 

} 

for (let i = 2; i < n; i++) { 

    if (n % i === 0) { 

        return false; 

    } 

} 

return true; 

}


73. Though this function will work, it contains an error often made by beginner programmers that makes it much slower than it could be. Namely, the loop in the middle, which checks possible divisors, does not need to check every divisor smaller than “n,” only the divisors smaller than the square root of “n”. As with Codex, Copilot has no understanding of how the code works. It knows that more functions called “isPrime” contain the portion that checks for all divisors smaller than “n”, so that is what it offers. It does not return what it “thinks” is best, it returns what it has seen the most. It is not writing, it is reproducing (i.e., copying).


74. Like the other examples above—and most of Copilot’s Output—this output is nearly a verbatim copy of copyrighted code. In this case, it is substantially similar to the “isPrime” function in the book Think JavaScript by Matthew X. Curinga et al,[11] which is:


function isPrime(n) {

if (n < 2) { 

    return false; 

} 

for (let i = 2; i < n; i++) { 

    if (n % i === 0) { 

        return false; 

    } 

} 

return true; 

}


75. As with the other examples above, the source of Copilot’s Output is a programming textbook. Also like the books the other examples were taken from, there are many copies of Curinga’s code stored in public repositories on GitHub where programmers who are working through Curinga’s book keep copies of their answers.


76. The material in Curinga’s book is made available under the GNU Free Documentation License. Although this is not one of the Suggested Licenses, it contains similar attribution provisions, namely that “You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License.”[12]


77. As with Codex, Copilot does not provide the end user any attribution of the original author of the code, nor anything about their license requirements. There is no way for the Copilot user to know that they must provide attribution, copyright notice, nor a copy of the license’s text. And with regard to the GNU Free Documentation License, Copilot users would not be aware that they are limited in what conditions they can place on the use of derivative works they make using this copyrighted code. Had the Copilot user found this code in a public GitHub repository or a copy of the book it was originally published in, they would find the GNU Free Documentation License at the same time and be aware of its terms. Copilot finds that code for the user but excises the license terms, copyright notice, and attribution. This practice allows its users to assume that the code can be used without restriction. It cannot.




[10] https://masteringjs.io/tutorials/fundamentals/modulus/.


[11] https://matt.curinga.com/think-js/#solving-problems-with-for-loops.


[12] https://matt.curinga.com/think-js/#gnu-free-documentation-license.



Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case 3:22-cv-06823-KAW retrieved on September 5, 2023, from Storage.Courtlistener is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.