pleroma.debian.social

pleroma.debian.social

If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license? In philosophy, this problem is known as the Slop of Theseus

@lcamtuf actual answer: of course you do, it’s prima facie a derivative work, same as if you had rewritten the program by hand.

@lcamtuf

This case law exists in the U.S. asuna_but

There are two cases (or arguably three if you include Sega v. SNK). blobcatnerd

Here's what you really care about:
🅰️ Any author of code is judged based on their own use of the existing code, so reverse-engineering of code used to be based on an engineer writing down, line by line, in plain English, what to do. Then a second person sat down and made up code, line-by-line to accomplish that task. Things have changed but the idea that you can't literally harvest existing code is still a thing.
🅱️ You own the made code but can't copyright it... so you can't profit from it in the same way.

@lcamtuf The current declination by the Supreme Court to overturn or review this ruling: https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf Which holds things created by AI are neither "derivative works" or "original works" and are not eligible for Copyright protection so no, you don't need to abide by the previous license. No one does. And if someone reverse engineers your code DMCA doesn't apply either (it isn't copyrighted).

@lcamtuf someone or sonething else has done the work. not you. so whoever creates the work, owns the work.

@lcamtuf shakes my fist at theseus

@lcamtuf OpenAI already gets all upset, if someone uses their AI to train a different AI. If the whole technocrap brotherhood wasn't built around hypocrisy, the slop factory owners should be on the side of "no, you can't do this".

@kevinr @lcamtuf with the possible extra step that you can’t claim any copyright in your derivative work.

@kevinr @lcamtuf And if you ask it to write a detailed spec based on its implementation, and then separately to write an implementation of that spec?

https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/

@bgalehouse @lcamtuf @kevinr

Assuming you used the original source code to derive the detailed spec, then yes, that too is a derivative work.

The "viral" nature of that sort of license has bothered me for a long time. It's always been simultaneously overly far reaching and impossible to realistically enforce.

@lcamtuf @bgalehouse @kevinr

But here's an interesting question:

If you do not execute the code - did you accept the license? Does simply reading it sufficiently to be able to write a spec bind you to that license? That seems a bit too much.

@tbortels @lcamtuf @bgalehouse @kevinr Copyright bound licenses work by exempting you from the blanket and default prohibition on copying.

So if you copy a work that has copyright restrictions according to copyright law, using the license is your only way of not infringing the law. It doesn’t matter if you ”accept” it or not.

If you are not copying, the license is irrelevant.

@ahltorp @tbortels @lcamtuf @bgalehouse @kevinr and indeed there are arguments that simply “reading” is not copying, same as reading a book, even if via a web site. But getting your AI to “read” it is probably a different matter.

@ChuckMcManis @lcamtuf say if Disney were to produce an entire movie with AI, could you share copies freely with your pals?

@astolk @ChuckMcManis @lcamtuf I’d say yes, expect that I believe Disney have form for getting copyright law changed in their favour if needed. So Disney may be a bad example.

@lcamtuf
If that works there's plenty of closed source code I'd like to open..

@lcamtuf
It is the problem of software patents. No need to have an AI : if an human writes a new software that does exactly the same thing than a free software, is it the same software?

@lcamtuf we will know

@lcamtuf Bravo.

@lcamtuf This story is one of Aislop's Fables.

@revk @ahltorp @tbortels @lcamtuf @bgalehouse @kevinr idk about AI but I've heard more than once that when people are actually implementing something as free software that is originally non free but was either leaked or is source available, they completely restrict themselves from even looking at the thing and only use what any user would know and do some reverse engineering, so I assumed it's actually legally unsafe to taint yourself with original code and let it potentially influence you

@rustynail @ahltorp @tbortels @lcamtuf @bgalehouse @kevinr Hmm, there is another consequence to this.

If this is a derivative work, which I expect it is.

It causes issues when someone has, in fact, manually, coding an alternative to some copyright work (without reading original code, etc). As someone can suggest that it was done using AI as a derivative work. It no longer needs to actually follow the original code now to be accused of this.

Arrg!

@lcamtuf The licence goes from «copyleft» to «sloppyleft».

@ahltorp @bgalehouse @revk @lcamtuf @kevinr @rustynail

AI is a weird case as you could assert - probably correctly - that the original code may be part of its training corpus. Was that training a GPL violation? It's a stretch. Was it's training a copyright violation? Or was the AI (or rather its owners) exercising their GPL license rights? Or was it fair use under regular copyright?

Who knows?

It's a hot mess is what it is.

This is all so far outside the original reckoning of "it'd be nice if the bookbinder down the street didn't profit off of my work until I had a chance to profit off of it first" that it's not surprising it's a mess.

@lcamtuf When know that if you ask the CS department of the University of California, Berkeley and BSDi to rewrite the entirety of AT&T's Unix, the result does not need to abide by AT&T's original license.

BSD is prima facie a derivative work of AT&T Unix, not developed using a clean room approach, but instead carefully audited to remove all AT&T copyright and trade secret interests.

By the time Theseus' ship was ready, Linux had left the harbor.

@lcamtuf
Pretty sure the output is unlicensable public domain, if we accept that the original license has been washed off.

this is where the liberal/individualist "license choice" approach fails and you just have to consider problems on a broader societal level

doesn't matter the license, AI extremely unethical regardless and this just isn't okay

RE: https://infosec.exchange/@lcamtuf/116180274283878105

@lcamtuf "AI" is just an algorithm running on a deterministic system (with a PRNG, but still deterministic).

So the output is naturally derived from its inputs and thus subject to the licences thereof.

@bgalehouse @kevinr @tbortels @lcamtuf the licence allows you to do things you’re not allowed to do without it, and puts that permission on conditions. So, no way around that.

@tbortels if you do not accept the license, you do not have any right to use the code. It’s "all rights reserved" then. @lcamtuf @bgalehouse @kevinr

@lcamtuf @ArneBab @kevinr @bgalehouse

"Use" isn't part of the GPL. And "all rights reserved" means normal copyright law, not "you get no rights at all".

The GPL defines "modify" and "propagate" as the activities it burdens. If I modify the code, and propagate it, i have a legal burden under the license. Otherwise, I don't.

IANAL, but I don't think reading the code and re-implementing a work-alike without incorporating the original code is "modify" - it's "replace".

I understand that's where "clean rooms" come into play, but that always felt like splitting hairs and giving copyright too much power - it's about physical books, not ideas. The farther we move from the original intent, the weaker a strong copyright stance becomes.

I think you could make an argument that reading code to understand it's interfaces, explicitly rejecting accepting any license, then implementing compatible code is well within the normal copyright definition of "fair use", or should be if we aren't all copyright lawyers. More importantly, it's healthy for Society and the art. If I can read a book under copyright and write a detailed book report, I should be able to read provided source code and do the same. To the extent that we've strayed away from that, the legal system has failed and needs correction.

@tbortels @lcamtuf @ArneBab @kevinr @bgalehouse "clean room" is for actual humans, not for algorithmic transforms

@lcamtuf the slop of the Zeus?

@ChuckMcManis @lcamtuf The LLM output may not be copyrightable, but does publishing this output violate the license of the source code that it is replicating in functionality?

@drahardja @lcamtuf It cannot violate the license. The license is tied to a specific copyrighted work. If you want a masters course on this look up the Oracle v. Google case where Oracle claimed the Java License forbade you from making work alike software (according to the US courts it cannot do that). This is similar to 'clean rooms' where one person tells another what the software should do and they write it from scratch to do that. 1/2

@drahardja

"Clean rooms" where upheld in the late 80s and early 90s with BIOS replication efforts. There have been many cases where work alike software has been challenged (see the "Look and Feel" cases from the 80's) but it has always come down to the work itself. This case allows an LLM to read the source code and create a bug-for-bug compatible version which neither violates the license nor is it copyrightable. 2/2

@lcamtuf

@ChuckMcManis @lcamtuf I agree that “clean room” implementations are long deemed legal (I owned a Compaq PC clone), but the original poster’s claim is that you can feed existing source code into an LLM and have it spit out a reimplementation. That “feels” like a derived work to me, but is it?

@lcamtuf @ChuckMcManis Isn’t that what you meant? Or did I misunderstand?

@revk
Reading is, indeed, not copying, and you are allowed to do that within copyright (hence the name; it's not 'readingright')

But reading and then writing something similar, while not exactly copying, is close enough that it's usually considered 'plagiarism'.
@ahltorp @tbortels @lcamtuf @bgalehouse @kevinr
replies
1
announces
0
likes
0

@wouter @revk Plagiarism is about presenting the work as your own, not where it came from. It can be loosely connected to copyright because of either legal or license demands for attribution, but even taking something that is public domain and presenting it as your own is plagiarism.

Where copyright is about the relation to people a work comes from, I would say plagiarism is about the relation to people you present a work to.

@wouter @ahltorp @revk

I think "fanfiction" is closer.

Although if it's close enough to be identical in all major respects, a "remake" is also accurate.

Having said that: software is weird because it does something. If I write code that faithfully implements an API - I haven't stolen or plagiarized or anything - I followed the spec.

If that spec was supposed to be secret or proprietary - then open sourcing the code was a bad idea, and the soul of GPL is to force sharing.

Anyway - hot mess. As things tend to get when you try to force other human beings, legally or otherwise, to do what you want them to do.

@lcamtuf technical debt stemming from vibe coding: "slop of Damacles"

@lcamtuf I would say yes. Either way. They deserve the credit.