Dohyeon Kim
dohyeondk.bsky.social
Dohyeon Kim
@dohyeondk.bsky.social
Husband, dad, Software Engineer @ YouVersion
Recently, I reconsidered the whole approach and came up with a different solution:

Use WhisperX to get SRT output with accurate timestamps but not-so-accurate subtitles, then ask Gemini to "proofread" the subtitles using both the SRT and the audio as input. It works so well.
December 5, 2025 at 1:23 AM
Since then, I've been tweaking the pipeline to make it faster and more accurate—splitting the audio into smaller chunks and combining them, validating SRT output and retrying when it's not valid.

It was working okay, but never well enough.
December 5, 2025 at 1:23 AM
The breakthrough came when ChatGPT became good enough to translate English subtitles to other languages with acceptable quality (2 yrs ago).

Then, the second breakthrough was when Gemini became good enough to take audio as input and produce translated subtitles directly (about this time last year).
December 5, 2025 at 1:23 AM
I needed to create subtitles for video content in multiple languages. Manual transcription services were expensive, and basic speech-to-text made too many errors.
December 5, 2025 at 1:23 AM
Oh, that’s a nice library. Great work! I have a feeling Apple might end up Sherlocking it…
March 18, 2025 at 7:24 PM