Kweku Kwegyir-Aggrey
banner
kwekuka.bsky.social
Kweku Kwegyir-Aggrey
@kwekuka.bsky.social
ml, stats, and low notes; currently phd-ing @ brown cs
Reposted by Kweku Kwegyir-Aggrey
Llama 3.1 70B contains copies of nearly the entirety of some books. Harry Potter is just one of them. I don’t know if this means it’s an infringing copy. But the first question to answer is if it’s a copy at all/in the first place. That’s what our new results suggest:

arxiv.org/abs/2505.12546
Extracting memorized pieces of (copyrighted) books from open-weight language models
Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expr...
arxiv.org
May 21, 2025 at 11:20 AM