naoyukikandaslp.bsky.social
@naoyukikandaslp.bsky.social
Ah, no, TS3-Codec was trained with 10-second audio segments, while BigCodec-S was trained with 2.5-second audio segments (Section 4.5). This was a somewhat tricky (and perhaps debatable) part of the configuration, and we did our best to tune the hyperparameters within the constraints of GPU memory.
December 3, 2024 at 6:18 AM
Thanks! To the extent that we checked, yes. The important point is limiting the attention window.
December 3, 2024 at 6:04 AM