Ghetto Turing
ghettoturing.bsky.social
Ghetto Turing
@ghettoturing.bsky.social
Armchair Apple Analyst, research in NLP/CV on documents
Am I the only one who doesn’t mind doing this ?
December 27, 2024 at 11:32 AM
The problem is obvious with GPT-4o voice mode. I’m pretty sure it has some programmatic VAD which triggers it to stop responding, but it works horribly in noisy environments.
December 27, 2024 at 9:48 AM
Totally right, you’ll probably need more of a “sound-to-text” model to be able to handle a lot of non-speech stuff as well such a silences, coughing, etc. That model should then also be able to judge when a smarter model should intervene/reply
November 30, 2024 at 11:03 AM
Yea, though I think this can be solved with LLMs. Perhaps harder is look forward, because you need to analyze the previous context to know if you need to wait for a query or reply to the one before. But all this could be solved with a bit of data I think
November 30, 2024 at 10:39 AM