4

DeVault: GitHub Copilot and open source laundering

 1 year ago
source link: https://lwn.net/Articles/898772/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

DeVault: GitHub Copilot and open source laundering

Posted Jun 23, 2022 18:31 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Emphasis on "legal" team, not engineering team. A lot of engineers have very strange ideas about how the GPL works. For example, some engineers (presumably not including DeVault) think that just reading GPL'd code causes it to magically "infect" any code you write after that point, which is not even close to being true.* The GPL attaches to derivative works. "Derivative work" is a legal term of art, and the license does not attempt to offer a definition for it, because it's defined by the underlying copyright law. That's why you need a team of lawyers, not a team of engineers, to evaluate something like this. In particular, I think DeVault's emphasis on the model itself being a derivative work may be an unnecessary distraction, because:

1. It's not clear to me whether this claim is actually correct. A model is ultimately "just" a big bag of statistical information, and I honestly don't know whether (US) copyright law attaches to such things in the first place, but I'm skeptical (see e.g. Feist v. Rural).
2. It's not relevant. What matters is whether the output of the model is a derivative work of the original, which is a completely different legal question. Derivative works are not subject to some sort of magical "transitive property" that requires the model to also be a derivative work; you can argue that the output is derivative while taking no position on the status of the model itself. Similarly, you could argue that the output is *not* derivative, again taking no position on the model. The status of the model is not relevant to the question, unless you're going to allege an AGPL** violation.

* The kernel of truth here is that, in practice, clean-room engineering is often a good idea for the avoidance of legal risk. But there's nothing in either the GPL or the copyright statute that says you have to do it. Because that would be stupid. Imagine if novelists couldn't read books without running into copyright issues.
** The AGPL is the only widely-used license whose obligations attach on creation of a derivative work, rather than on distribution of that work. As far as I know, GitHub has no intention of distributing the model itself to anyone, so if you want to sue GitHub just for creating the model, you'd have to claim an AGPL violation specifically.

DeVault: GitHub Copilot and open source laundering

Posted Jun 23, 2022 19:26 UTC (Thu) by ballombe (subscriber, #9523) [Link]

The legal issue will probably depend on the specific data you can extract from copilot, and obviously github will not help you to find out, and lawyers might not be sufficient.

Maybe there is some specific request that led copilot to return whole body of some GPL files. For example, by looking for certain patterns that occurs in a single software etc.
That would strengthen the case.

DeVault: GitHub Copilot and open source laundering

Posted Jun 23, 2022 19:52 UTC (Thu) by Gaelan (subscriber, #145108) [Link]

Someone got Copilot to generate the fast inverse square root function from Quake III (which is GPL'd), "what the fuck?" comment and all: https://twitter.com/mitsuhiko/status/1410886329924194309

Amusingly, it also autocompleted a BSD license onto that code.

DeVault: GitHub Copilot and open source laundering

Posted Jun 25, 2022 1:00 UTC (Sat) by gerdesj (subscriber, #5446) [Link]

"A lot of engineers have very strange ideas about how the GPL works."

Quite. Also, many putative authorities on the GPL seem to forget that there are many legal systems. If you are going to dive in and be authoritative on he GPL then you really should present an argument that works for all legal systems that the GPL attempts to work within. Quite a job!

Legal is as legal does: Some legal systems have a concept of "reasonable" or what a "reasonable" person would do and I think that is what the GPL is riffing off. There's also the concept of being able to "quietly enjoy [something]". I'm a Brit. so my local legal system informs my knowledge here. Not all legal systems work like that.

I think it is fair to say that we all have strange ideas about how the GPL works. There's no need to call out end users.

DeVault: GitHub Copilot and open source laundering

Posted Jun 25, 2022 11:47 UTC (Sat) by Wol (subscriber, #4433) [Link]

> Quite. Also, many putative authorities on the GPL seem to forget that there are many legal systems.

And far too many authorities read what they want to see, not what's actually there. I've sure been guilty of that. I think my knowledge of the GPL now is pretty good, precisely because I've had plenty of people call me out on my mistakes.

How many "experts" have NOT been through that learning experience? The majority of them?

Cheers,
Wol

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 3:22 UTC (Sun) by gdt (subscriber, #6284) [Link]

The deeper point about differing copyright laws is not so much interpretation of copyright licenses, but the distinction between "fair use" and "fair dealing".

Copilot claims its actions are fair use, and therefore the license is irrelevant. However in fair dealing jurisdictions Copilot's use of the program source must either meet the copyright license or one of the black-letter list of allowed uses in the fair dealing exceptions of that juridiction's copyright law.

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 9:40 UTC (Sun) by gspr (subscriber, #91542) [Link]

> It's not clear to me whether this claim is actually correct. A model is ultimately "just" a big bag of statistical information, and I honestly don't know whether (US) copyright law attaches to such things in the first place, but I'm skeptical (see e.g. Feist v. Rural).

Surely it does apply in one extreme, namely that of a really good model! If I take a copyrighted picture and create a model that very accurately reproduces said picture, I don't think it's very unlikely that my model runs afoul of the original's copyright.

In the other extreme—that of a really terrible model—it probably doesn't, but we probably shouldn't write off the models in-between those extremes.

DeVault: GitHub Copilot and open source laundering

Posted Jun 23, 2022 20:28 UTC (Thu) by mrugiero (guest, #153040) [Link]

And in the case of the FSF their role is to actively try to be partial in favor of free software. On intent, not by accident. If they can get a win for FLOSS then they accomplished their stated mission.
Impartiality is for the judge, not for the litigants.

DeVault: GitHub Copilot and open source laundering

Posted Jun 23, 2022 20:58 UTC (Thu) by mpldr (subscriber, #154861) [Link]

> [the FSF's] role is to actively try to be partial in favor of free software

There's nothing wrong with that – quite the opposite – but it's not helpful if what you want is a legal review. You may get some interesting points from them, sure; but it's not exactly helpful when trying to find out what is actually law (let alone that this is a court's job)

DeVault: GitHub Copilot and open source laundering

Posted Jun 24, 2022 2:33 UTC (Fri) by scientes (subscriber, #83068) [Link]

In the common law system the courts' job is to write law.

/Almost not sarcastic

DeVault: GitHub Copilot and open source laundering

Posted Jun 24, 2022 11:27 UTC (Fri) by flussence (subscriber, #85566) [Link]

The FSF isn't going to do anything useful at all — the purpose of a system is what it does.

The GPL2 didn't make nVidia *or* AMD play nice with Linux (key phrase: "preferred form for modification"), the GPL3 didn't stop TiVoization (they trivially routed around it; especially Apple), the AGPL3 didn't stop SaaS vendor lock-in (instead they weaponised it against each other), and no current or future iteration of it will stop Microsoft committing automated for-profit piracy at global scale like is happening here.

The only thing the GPL *is*, clearly, is weak DRM powered by magical thinking and a bunch of weird elitist old men clinging to a power fantasy dreamed up half a century ago, from which they refuse to grow up from. People who try to actually play by the stated rules get a worse experience, corporations engage in automated piracy with neural networks with little to no legal repercussions, or often just old-fashioned copyright infringement if they're peddling white-label ARM devices, and any attempts to resist this toothless status quo from within the system get you ostracised. Most users of GPLed software have never and will never know it exists, never mind read or understand it, and if they did wouldn't be able to meaningfully exercise rights granted under it. But it sure makes some people feel smug about themselves on their moral high horse.

I feel like at this point the only way to stop this cancer of trillionaires strip-mining the creative output of individuals is to stop giving away any legal rights to that work in the first place. Make the code utterly radioactive to anyone who takes license texts seriously, especially corporate lawyers: All Rights Reserved, free for personal use only, the software shall be used for good not evil, and with a written threat to DMCA anyone found uploading to github or any other platform of similar size and motive. Piracy is going to happen anyway, but we can still choose who feels safe and comfortable doing it.

Thought experiment: if you train a neural network on the text of the GPL itself and coax output from the machine that superficially resembles the input but with manually chosen tweaks that change its meaning, are you exempt from the copyright header in the original, as MSFT seems to think it is? If so, that's the final nail in the coffin for software copyright as a whole; the words of the legal document and the colour of the bits don't mean anything any more.

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 0:07 UTC (Sun) by salimma (subscriber, #34460) [Link]

> The GPL2 didn't make nVidia *or* AMD play nice with Linux

Any specific example for AMD here? They seem to be much better citizens when it comes to the GPL, at least compared to nVidia (and even nVidia is finally open sourcing kernel drivers)

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 0:23 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

ATI had closed the driver for R300 video cards, even though the driver for R200 was open. The opened up only after being acquired by AMD.

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 10:00 UTC (Sun) by flussence (subscriber, #85566) [Link]

I'm still a bit distrustful of them after what they did to undermine the xf86-video-radeonhd project.

Rumour has it that AMD middle management wanted the FOSS option they were due to announce to be kept slightly inferior to fglrx for Reasons, and having an independent effort that didn't depend on firmware blobs or the decrepit x86-only int10 VBIOS like that did was severely embarrassing them.

Not many people may remember this now, but the R200/(reverse-engineered)R300 driver also used to be blob-free. Strangely that stopped being the case after AMD took over, even though it was feature-complete.

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 18:49 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> Not many people may remember this now, but the R200/(reverse-engineered)R300 driver also used to be blob-free. Strangely that stopped being the case after AMD took over, even though it was feature-complete.

Because it made no sense to reimplement the critical power management and link training code multiple times, instead of doing it once in AtomBIOS.

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 19:40 UTC (Sun) by mjg59 (subscriber, #23239) [Link]

> the R200/(reverse-engineered)R300 driver also used to be blob-free

The DRM side of the R2/3/400 driver always required a firmware blob - but for a long time it was just embedded inside the kernel driver, so wasn't user visible. My recollection (which seems to be supported by the driver, but it's been a long time since I looked at this properly so I could be wrong) is that a bunch of the 2D acceleration in that driver depended on DRM, so effectively the 2D driver also had a blob dependency if you wanted it to work properly.

The difference between -radeonhd and -ati as far as reliance on firmware goes was that the defined interface to various pieces of card functionality was to execute interpreted scripts present in the card flash. These scripts didn't do anything that the driver couldn't, so you could absolutely reimplement that functionality in the driver - the problem is that card vendors used these scripts as a way to abstract hardware differences (eg, using RAM from different vendors with different timing constraints), and ignoring Atom would mean having to have card-specific data in the driver before that card would work correctly. A hybrid approach is to use Atom for data but not for code, but even then there are still risks due to the fact that the defined interface is the scripts and not the data tables. A card vendor could modify the way the script interpreted the tables (or even hardcode stuff directly into the script) and again you'd need card-specific knowledge to avoid that. -radeonhd spent a while trying to avoid executing any Atom code, but effectively relied on it anyway - it couldn't program the card from cold and so depended on the system firmware having executed the scripts before it ran. In any case, support for executing Atom code (including running the ASIC init function) was added to -radeonhd by September of 2007.

Looking at the initial commits to support r500 in the -ati driver, I think the only time it would ever call int10 is if the card was entirely uninitialised. -radeonhd would do exactly the same if it was configured without support for doing Atom-based init.

DeVault: GitHub Copilot and open source laundering

Posted Jun 26, 2022 20:16 UTC (Sun) by sjj (guest, #2020) [Link]

Isn't 20 years enough to carry a grudge? Those middle managers have moved on a long time ago already.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK