

[2305.00118] Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
source link: https://arxiv.org/abs/2305.00118
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Computer Science > Computation and Language
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
In this work, we carry out a data archaeology to infer books that are known to ChatGPT and GPT-4 using a name cloze membership inference query. We find that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. The ability of these models to memorize an unknown set of books complicates assessments of measurement validity for cultural analytics by contaminating test data; we show that models perform much better on memorized books than on non-memorized books for downstream tasks. We argue that this supports a case for open models whose training data is known.
Comments: | EMNLP 2023 camera-ready (16 pages, 4 figures) |
Subjects: | Computation and Language (cs.CL) |
Cite as: | arXiv:2305.00118 [cs.CL] |
(or arXiv:2305.00118v2 [cs.CL] for this version) | |
https://doi.org/10.48550/arXiv.2305.00118 |
Submission history
From: Kent Chang [view email][v1] Fri, 28 Apr 2023 22:35:03 UTC (6,906 KB)
[v2] Fri, 20 Oct 2023 21:23:21 UTC (44 KB)
</div
Recommend
-
9
Towards a Computational Archaeology of Fictional Space (an excerpt) 11 Jun 2017 An introduction to an article accepted to New Literary History in summer of 2017. Cite as Dennis Yi Tenen, “Towards a Computationa...
-
6
Art World The First Blue Pigment Discovered in 200 Years Is Finally Commercially Available. Here’s Why It Already Has a Loyal Following Made from ra...
-
5
VRchaeology: Using Virtual Reality to Teach Archaeology Skills at the University of Illinois at Urbana-Champaign I am spending this summer doing a deep-dive into the use of virtual reality (including social VR) in higher edu...
-
10
Product Archaeology in UX: What it is, why it matters & how to do itExample: Uncovering when modes of transport were added in Google MapsDesigning and defi...
-
12
The 9 Best Archaeology Websites and Blogs to Follow By Syed Hammad Mahmood Published 23 hours ago If you're a fan of all t...
-
15
Reverse Engineering Roman Cavalry 1. Context.In the precious post I was quite hard, if not downright disrespectf...
-
10
June 14, 2022 ...
-
7
Teaching ChatGPT to Speak my Son’s Invented LanguageWhen I was a kid, I used to invent languages. I thought myself rather lonely in this pastime, but now I know I was far from alone. A very prolific language...
-
4
Two well known authors sue OpenAI claiming ChatGPT illegally accessed their work...
-
15
BlogChatGPT can now see, hear, and speakWe are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, m...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK