DeVault: GitHub Copilot and open source laundering
source link: https://lwn.net/Articles/898772/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
DeVault: GitHub Copilot and open source laundering
Posted Jun 23, 2022 18:31 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
1. It's not clear to me whether this claim is actually correct. A model is ultimately "just" a big bag of statistical information, and I honestly don't know whether (US) copyright law attaches to such things in the first place, but I'm skeptical (see e.g. Feist v. Rural).
2. It's not relevant. What matters is whether the output of the model is a derivative work of the original, which is a completely different legal question. Derivative works are not subject to some sort of magical "transitive property" that requires the model to also be a derivative work; you can argue that the output is derivative while taking no position on the status of the model itself. Similarly, you could argue that the output is *not* derivative, again taking no position on the model. The status of the model is not relevant to the question, unless you're going to allege an AGPL** violation.
* The kernel of truth here is that, in practice, clean-room engineering is often a good idea for the avoidance of legal risk. But there's nothing in either the GPL or the copyright statute that says you have to do it. Because that would be stupid. Imagine if novelists couldn't read books without running into copyright issues.
** The AGPL is the only widely-used license whose obligations attach on creation of a derivative work, rather than on distribution of that work. As far as I know, GitHub has no intention of distributing the model itself to anyone, so if you want to sue GitHub just for creating the model, you'd have to claim an AGPL violation specifically.
DeVault: GitHub Copilot and open source laundering
Posted Jun 23, 2022 19:26 UTC (Thu) by ballombe (subscriber, #9523) [Link]
Maybe there is some specific request that led copilot to return whole body of some GPL files. For example, by looking for certain patterns that occurs in a single software etc.
That would strengthen the case.
DeVault: GitHub Copilot and open source laundering
Posted Jun 23, 2022 19:52 UTC (Thu) by Gaelan (subscriber, #145108) [Link]
Amusingly, it also autocompleted a BSD license onto that code.
DeVault: GitHub Copilot and open source laundering
Posted Jun 25, 2022 1:00 UTC (Sat) by gerdesj (subscriber, #5446) [Link]
Quite. Also, many putative authorities on the GPL seem to forget that there are many legal systems. If you are going to dive in and be authoritative on he GPL then you really should present an argument that works for all legal systems that the GPL attempts to work within. Quite a job!
Legal is as legal does: Some legal systems have a concept of "reasonable" or what a "reasonable" person would do and I think that is what the GPL is riffing off. There's also the concept of being able to "quietly enjoy [something]". I'm a Brit. so my local legal system informs my knowledge here. Not all legal systems work like that.
I think it is fair to say that we all have strange ideas about how the GPL works. There's no need to call out end users.
DeVault: GitHub Copilot and open source laundering
Posted Jun 25, 2022 11:47 UTC (Sat) by Wol (subscriber, #4433) [Link]
And far too many authorities read what they want to see, not what's actually there. I've sure been guilty of that. I think my knowledge of the GPL now is pretty good, precisely because I've had plenty of people call me out on my mistakes.
How many "experts" have NOT been through that learning experience? The majority of them?
Cheers,
Wol
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 3:22 UTC (Sun) by gdt (subscriber, #6284) [Link]
The deeper point about differing copyright laws is not so much interpretation of copyright licenses, but the distinction between "fair use" and "fair dealing".
Copilot claims its actions are fair use, and therefore the license is irrelevant. However in fair dealing jurisdictions Copilot's use of the program source must either meet the copyright license or one of the black-letter list of allowed uses in the fair dealing exceptions of that juridiction's copyright law.
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 9:40 UTC (Sun) by gspr (subscriber, #91542) [Link]
Surely it does apply in one extreme, namely that of a really good model! If I take a copyrighted picture and create a model that very accurately reproduces said picture, I don't think it's very unlikely that my model runs afoul of the original's copyright.
In the other extreme—that of a really terrible model—it probably doesn't, but we probably shouldn't write off the models in-between those extremes.
DeVault: GitHub Copilot and open source laundering
Posted Jun 23, 2022 20:28 UTC (Thu) by mrugiero (guest, #153040) [Link]
Impartiality is for the judge, not for the litigants.
DeVault: GitHub Copilot and open source laundering
Posted Jun 23, 2022 20:58 UTC (Thu) by mpldr (subscriber, #154861) [Link]
There's nothing wrong with that – quite the opposite – but it's not helpful if what you want is a legal review. You may get some interesting points from them, sure; but it's not exactly helpful when trying to find out what is actually law (let alone that this is a court's job)
DeVault: GitHub Copilot and open source laundering
Posted Jun 24, 2022 2:33 UTC (Fri) by scientes (subscriber, #83068) [Link]
/Almost not sarcastic
DeVault: GitHub Copilot and open source laundering
Posted Jun 24, 2022 11:27 UTC (Fri) by flussence (subscriber, #85566) [Link]
The GPL2 didn't make nVidia *or* AMD play nice with Linux (key phrase: "preferred form for modification"), the GPL3 didn't stop TiVoization (they trivially routed around it; especially Apple), the AGPL3 didn't stop SaaS vendor lock-in (instead they weaponised it against each other), and no current or future iteration of it will stop Microsoft committing automated for-profit piracy at global scale like is happening here.
The only thing the GPL *is*, clearly, is weak DRM powered by magical thinking and a bunch of weird elitist old men clinging to a power fantasy dreamed up half a century ago, from which they refuse to grow up from. People who try to actually play by the stated rules get a worse experience, corporations engage in automated piracy with neural networks with little to no legal repercussions, or often just old-fashioned copyright infringement if they're peddling white-label ARM devices, and any attempts to resist this toothless status quo from within the system get you ostracised. Most users of GPLed software have never and will never know it exists, never mind read or understand it, and if they did wouldn't be able to meaningfully exercise rights granted under it. But it sure makes some people feel smug about themselves on their moral high horse.
I feel like at this point the only way to stop this cancer of trillionaires strip-mining the creative output of individuals is to stop giving away any legal rights to that work in the first place. Make the code utterly radioactive to anyone who takes license texts seriously, especially corporate lawyers: All Rights Reserved, free for personal use only, the software shall be used for good not evil, and with a written threat to DMCA anyone found uploading to github or any other platform of similar size and motive. Piracy is going to happen anyway, but we can still choose who feels safe and comfortable doing it.
Thought experiment: if you train a neural network on the text of the GPL itself and coax output from the machine that superficially resembles the input but with manually chosen tweaks that change its meaning, are you exempt from the copyright header in the original, as MSFT seems to think it is? If so, that's the final nail in the coffin for software copyright as a whole; the words of the legal document and the colour of the bits don't mean anything any more.
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 0:07 UTC (Sun) by salimma (subscriber, #34460) [Link]
Any specific example for AMD here? They seem to be much better citizens when it comes to the GPL, at least compared to nVidia (and even nVidia is finally open sourcing kernel drivers)
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 0:23 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 10:00 UTC (Sun) by flussence (subscriber, #85566) [Link]
Rumour has it that AMD middle management wanted the FOSS option they were due to announce to be kept slightly inferior to fglrx for Reasons, and having an independent effort that didn't depend on firmware blobs or the decrepit x86-only int10 VBIOS like that did was severely embarrassing them.
Not many people may remember this now, but the R200/(reverse-engineered)R300 driver also used to be blob-free. Strangely that stopped being the case after AMD took over, even though it was feature-complete.
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 18:49 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]
Because it made no sense to reimplement the critical power management and link training code multiple times, instead of doing it once in AtomBIOS.
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 19:40 UTC (Sun) by mjg59 (subscriber, #23239) [Link]
The DRM side of the R2/3/400 driver always required a firmware blob - but for a long time it was just embedded inside the kernel driver, so wasn't user visible. My recollection (which seems to be supported by the driver, but it's been a long time since I looked at this properly so I could be wrong) is that a bunch of the 2D acceleration in that driver depended on DRM, so effectively the 2D driver also had a blob dependency if you wanted it to work properly.
The difference between -radeonhd and -ati as far as reliance on firmware goes was that the defined interface to various pieces of card functionality was to execute interpreted scripts present in the card flash. These scripts didn't do anything that the driver couldn't, so you could absolutely reimplement that functionality in the driver - the problem is that card vendors used these scripts as a way to abstract hardware differences (eg, using RAM from different vendors with different timing constraints), and ignoring Atom would mean having to have card-specific data in the driver before that card would work correctly. A hybrid approach is to use Atom for data but not for code, but even then there are still risks due to the fact that the defined interface is the scripts and not the data tables. A card vendor could modify the way the script interpreted the tables (or even hardcode stuff directly into the script) and again you'd need card-specific knowledge to avoid that. -radeonhd spent a while trying to avoid executing any Atom code, but effectively relied on it anyway - it couldn't program the card from cold and so depended on the system firmware having executed the scripts before it ran. In any case, support for executing Atom code (including running the ASIC init function) was added to -radeonhd by September of 2007.
Looking at the initial commits to support r500 in the -ati driver, I think the only time it would ever call int10 is if the card was entirely uninitialised. -radeonhd would do exactly the same if it was configured without support for doing Atom-based init.
DeVault: GitHub Copilot and open source laundering
Posted Jun 26, 2022 20:16 UTC (Sun) by sjj (guest, #2020) [Link]
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK