Lightnews — Scholar-powered news

bergelsonlab

@bergelsonlab.bsky.social

7.2K followers 1.3K following 520 posts

Official account for the Bergelson Lab at Harvard, sporadically maintained by the PI:).
Just a lab, trying to figure out how babies learn language, somehow caught in the crosshairs of gov't admin battles.

Posts Replies Media Videos

bergelsonlab

@bergelsonlab.bsky.social

@glupyan.bsky.social link's broken but, i'm thinking of e.g.: www.zerospeech.com, where less-text documented lang's are akin to babies. but e.g. training models on childes audio vs. text is a totally differently successful enterprise, so it is in principle a roadblock for now i think?

The Zero Resource Speech Benchmark (series)

www.zerospeech.com

November 10, 2025 at 7:25 PM

bergelsonlab

@bergelsonlab.bsky.social

are you separating tokenizing from segmenting? hard for whom? bc getting words from the raw audio (or sign) is still pretty awfully rough going for our best ASR systems in cases approaching anything naturalistic (& ofc takes babies months for phonotactics, up to years for harder rarer morpho stuff)

November 10, 2025 at 4:41 PM

bergelsonlab

@bergelsonlab.bsky.social

stop biasing the sample elena

November 10, 2025 at 4:30 PM

bergelsonlab

@bergelsonlab.bsky.social

I'll be very curious to look back in a decade at this scientific moment. and despite my grumps above i do think a lot of really interesting insights will one day come out of this chapter of cognitive science. #CogSciSky

November 10, 2025 at 4:30 PM

bergelsonlab

@bergelsonlab.bsky.social

3) a more meta-point. i was a bit surprised not to hear mention of the 'costs' of working w/LLMs. everyone knows fmri is expensive so let's be choosey in how we scan, but all these (environmentally crushing & ethically fraught) LLMs are still totally open season. we're not 'paying'...yet.

November 10, 2025 at 4:30 PM

bergelsonlab

@bergelsonlab.bsky.social

2) it feels circular to take the products of the human linguistic system & ask if its structure could be learned w/o it. The model vs. human link up feels very evocative of @davidpoeppel.bsky.social's points ab aligning the "parts lists"of cognition vs. neurobiology, but subbing LLMs for neuro 3/4

November 10, 2025 at 4:30 PM

bergelsonlab

@bergelsonlab.bsky.social

1) 'baby-like' LMs are just capped at a smaller # of words (e.g. 10mill in #BabyLm)
But starting w/tokenized text solves a huge part of the problem: figuring out where the words begin/end in the first place (over time). Baby input doesn't come pre-chewed. (cf. zero-resource folks, Dupoux etc.) 2/4

November 10, 2025 at 4:30 PM

bergelsonlab

@bergelsonlab.bsky.social

*wept 😭

October 27, 2025 at 12:08 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news