Lightnews — Scholar-powered news

Ben Kirwin

@bkirwi.bsky.social

congrats! last time we caught up you were i think just acquiring a much smaller electric boat... cool to hear you've been Scaling Up. is the cat in the water already?

May 3, 2025 at 9:52 PM

Ben Kirwin

@bkirwi.bsky.social

afaict you either need to argue that i've infringed by producing a copy of an article that i've never seen; or that the model creator infringed, and the model does "contain" a copy of the article in some sense, even though the model is definitely not "just" a copy of those inputs...

April 7, 2025 at 3:32 AM

Ben Kirwin

@bkirwi.bsky.social

suppose: the nyt example was for a open-weights model like llama; i get the model and recover an nyt article from it, like they demonstrated in court. i now have an illegal copy; where's the copy from?

April 7, 2025 at 3:30 AM

Ben Kirwin

@bkirwi.bsky.social

sure, happy to leave it here, and ultimately this is something a judge will decide as you say! but i will drop a last thought at the end here anyways since i already typed it up...

April 7, 2025 at 3:27 AM

Ben Kirwin

@bkirwi.bsky.social

sorry if i'm being pedantic! but this kind of hair-split is the sort of the thing the law cares about and i think the article is a little fuzzy on... 😅

April 7, 2025 at 2:59 AM

Ben Kirwin

@bkirwi.bsky.social

is it? in a section with a summary like "it’s still critical that training not involve copying", it seems relevant that quite a bit of copying happens in practice, and that it's hard to prevent.

April 7, 2025 at 2:54 AM

Ben Kirwin

@bkirwi.bsky.social

for sure, but "my system only copies a small percentage of (the ~entire internet)" and "i wish my system did not copy data so often" are not arguments that copying is not happening...

April 6, 2025 at 12:57 AM

Ben Kirwin

@bkirwi.bsky.social

good news! they shared the prompts: nytco-assets.nytimes.com/2023/12/Laws...

nytco-assets.nytimes.com

April 5, 2025 at 8:33 PM

Ben Kirwin

@bkirwi.bsky.social

and if they do it in public it can be copyright infringement!

April 5, 2025 at 5:12 PM

Ben Kirwin

@bkirwi.bsky.social

and i don't find the article's treatment of this super convincing: it agrees that all models do this, says it's bad, and then ignores it in the conclusions...

April 5, 2025 at 5:11 PM

Ben Kirwin

@bkirwi.bsky.social

i was happy to see you share this article; i think it's more right than most things written on this topic! but eg. when the nyt can get a model to spit out its articles nearly word for word, i think there's a pretty clear argument that a copy has been made and distributed...

April 5, 2025 at 4:46 PM

Ben Kirwin

@bkirwi.bsky.social

for ~all common models, it's quite easy to get an llm to spit out portions of its training data verbatim... hard to argue that distributing those models is not distributing that data in a legal sense!

April 5, 2025 at 4:36 PM

Ben Kirwin

@bkirwi.bsky.social

thanks for sharing! read the vulnerability report from citizenlab... looks like the issue was in the keyboard, and citizenlab still recommend using signal. (with all the security settings turned on!)

March 8, 2025 at 5:07 PM

Ben Kirwin

@bkirwi.bsky.social

oh hey congrats! i remember you were taking another swing at this - glad to see it over the line

November 18, 2024 at 3:30 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news