ZDi
zdi1908.bsky.social
ZDi
@zdi1908.bsky.social
🇦🇷 I make machines (and myself) learn. Backpropagation, C++ enthusiast.
Currently doing ML speech synthesis (and general DL) research @ my bedroom

🌐: https://zdisket.github.io/page.html
🖋: https://zdtech.substack.com/
Tesla FSD when I ask it to drive me to Will Stancil's house
July 12, 2025 at 4:56 AM
After switching the encoder to a pretrained Resnet18, freezing layers and training for 1 epoch, my model can (kind of) drive, having learned from 81k frames in 44 laps of me playing
May 29, 2025 at 3:04 AM
I'm supposed to be writing a technical report, but I can't stop testing out my music LSTM (tech demo for my approach to language modeling audio). Only 18M parameters btw
May 24, 2025 at 3:30 AM
Audio language modeling has always involved people training models to VQ audio directly. But what if we quantized mel spectrograms, then trained a vocoder like iSTFTNet, and later our AR prior on mel spectrogram indices?
We can language model 44.1KHz audio with a single 1k codebook.
May 3, 2025 at 8:10 PM