

I said I was focusing on copyleft, cool that you ignored the entire post though. 😑


I said I was focusing on copyleft, cool that you ignored the entire post though. 😑


I think the reimplementation stuff is a separate question because the argument for it working looks a lot stronger, and because it doesn’t have anything to do with the source material having LLM output in it. Also if this method holds as legally valid, it’s going to be easier to just do that than justify copying code directly (which would probably have to only be copies of the explicitly generated parts of the code, requiring figuring out how to replace the rest), which means it won’t matter whether some portion of it was generated.
Is it a separate question, though?
Both works are copyrighted, one is just copyrighted as “all rights reserved” (our leaked commercial code) and the rest is licensed as LGPL. We’re putting both pieces of code inside the LLM and then asking the LLM to make a new version.
What makes the action of leaking different from the act of putting it on the web? Rights are reserved in either case.
If they aren’t entirely generated, you can’t make a full fork, and why would a partial fork be useful?
Well, people are contributing to copyleft codebases expecting that when people build on their work, that work (the derivative works) are also licensed in the same way. You don’t need to fork for the value to be lost. People expected virality to be part of their contribution, and clearly the new derivative works are partially non-copyleft.
Beyond that, as more of the codebase is LLM produced, the less of it is protected by the copyleft license, until we have a ship of Theseus situation where the codebase is available, but no longer copyleft. That is clearly not what was intended by e.g. the GPL. Just look at the Stallman quote in post.


ianal but does it even work like that? Is there any specific reason to think it does? I don’t believe you really get credit for purity and fairness vibes in the legal system. Same goes for the idea that code where it is ambiguous whether it is AI output could be considered public domain, seems kind of implausible, is there actually any reason to think the law works that way? If it did, then any copyrighted work not accompanied by proof of human authorship would be at risk, uncharacteristic for a system focused on giving big copyright holders what they want without trouble.
I’m mostly just playing along with your thought experiment. As I said, we know that projects are already accepting LLM code into projects that are nominally copyleft.
There is no way, leaks happen, big tech companies have massive influence, a situation where their code falls into the public domain as soon as the public gets their hands on it just isn’t realistic.
If that is the case, is chardet 7.0.0 a derivative work of chardet, or is it a public domain LLM work? The whole LLM project is fraught with questions like these, but it seems that the vendors at least are counting on not copying leaked software and instead copying open source code that is publicly hosted.
Why is it okay to strip copyright from open source works but not from leaked closed source works?
We know that Disney is suing to protect its works - if it is true that LLM outputs are transformative, they should lose, as should any vendor whose leaked code was “transformed” by an LLM.


Making use of the non-copyrightability of AI output to copy code in otherwise unauthorized ways does not seem like a straightforward or legally safe thing to do. That’s especially the case because high profile proprietary software projects also make heavy use of AI, it doesn’t seem likely the courts will support a legal precedent that strips those projects of copyright and allow anyone to use them for whatever.
I think what may happen in practice could be worse - basically if we can’t tell whether some code is the work of a human, but the project accepts AI code, if there we forego the analysis of whether something was produced by a human, the entire project may be deemed public domain – perhaps after a certain date (when LLM contributions were welcomed).
Beyond that, by integrating LLM code into those projects, the projects are signifying assent to their works to be consumed by LLMs for infringement of the whole work - not just the LLM produced portions - it is hard to be doctrinaire about adherence to the open source license when the maintainers themselves are violating it.
We may see a future where copyrights for works become more like trademarks - if you don’t make any attempt to protect your work from piracy, you may simply lose the right to contest its theft.
Obviously, it is as you say - today the courts may smile upon a GPL project where a commercial vendor copied and released as their own without sharing alike - but if they instead say that they copied the work into their LLM and produced a copy without protections (as chardet has done), the courts might be less willing to afford the project copyright protections if the project itself was making use of the same copyright stripping technology to strip others’ work to claim protections over copied work.
Besides which, “authored by Claude” seems like a pretty easy way to find public domain code, and as Malus presents, the only code that may ultimately be protected is closed source code - you can’t copy it if you don’t have the source.
The diversion of “people may try to pass of LLM code as their own” is a nice diversion, but ancillary to the existing situation where projects are incorporating public domain code as licensed. We can start there before we start worrying about fraud.


I don’t really think we need to go down the copyfraud path to see that AI code damages copyleft projects no matter what - we know that some projects are already accepting AI generated code, and they don’t ask you to hide it - it is all in the open.


Yes, exactly.


Closed source Chromium sounds like fun.


Ooops, I posted a reply to someone earlier and got it right (and forgot this one). Thanks for the heads up (fixed now)!


This is a federation issue.


Interestingly, I just interviewed the Waterfox developer, who actually references Oblivious HTTP and his interest in developing this into a paid feature for Waterfox.


I added a section to my post with some additional comment.
I began thinking of privacy because Mozilla was clearly thinking of it when designing this feature, but I don’t think they really thought it through.
People’s browsers are visiting pages that they never intended to. If a random extension did that, you would say that it was violating your privacy. The browser does it, and you get people defending it as “optional”. Yes, but the user never installed the malware extension that is leaking your privacy. It is your browser doing it in an automated update.
If you don’t think this is a privacy issue, why doesn’t the next version of Firefox just visit every page on every page that I visit, so that when I hover over a link, I can get a link preview immediately, without needing to wait. That would save me some real time and effort!


As opposed to the case where you don’t have a link preview, and you click on a website to see what it contains, and they get your IP. The author seems to think Mozilla should have protected our privacy by having someone act as the proxy for the request. Because involving a thirds party that receives all these requests and does work for us for free is absolutely how we protect our privacy.
But that is exactly what Mozilla is telling us – trust us.
Why was the feature added if my browser is going to browse to the page anyway? What is the value add? I was looking for some way for it to make sense - ah right, it could be a privacy preserving feature - I can preview the link and verify whether I want to visit it before I actually visit it. But that isn’t how it works.
Yes, a feature clearly designed for pushing onto that juicy “people with mobility impairments” userbase.
Love that you ignore all of the people who are currently seeing the popups and not understanding why.


Can you explain how they might be more beneficial than simply visiting the link and clicking back if it isn’t what you wanted? Sincerely curious.


It transforms the contribution to no longer be “share alike”.
Not really, when you push immature alternatives when ignoring a real choice. Seems more like you are supporting monopoly by ensuring that actual competitors get ignored - along with even smaller vendors.
“Look, don’t use LibreOffice instead of Microsoft Word, what you really want is VIM!”
You are saying there is all of this wasted money, but as soon as you are asked for evidence, it is all “I’m not a tax auditor”. Defend your claims!
They are both worse than Gecko, a platform you wish to die.
Sorry, you aren’t a tax auditor, but you are out here making claims. Try defending them?
Thanks for letting us know to discount what you say – if you prefer monopoly over choice, we’re really not having the same conversation.
I know what copyleft licenses are about, that was covered in the post - if you read it. If you are saying that you are making long comments without reading the post, great I guess, but not super interesting (to me).
I’m not really interested in getting into an argument around license choice because I wasn’t advocating for any particular license (like you seem to be).