#speechRecognition
Drax: Speech Recognition with Discrete Flow Matching

Telegram AI Digest
#ai #news #speechrecognition
Drax: Speech Recognition with Discrete Flow Matching
huggingface.co
November 9, 2025 at 9:27 PM
Drax: распознавание речи с сопоставлением дискретных потоков

Telegram ИИ Дайджест
#ai #news #speechrecognition
Drax: Speech Recognition with Discrete Flow Matching
huggingface.co
November 9, 2025 at 9:18 PM
Some nice aspects are:
- flexible choice of VectorStore
- own model routing
- multi-language support using multiple stores
- flexible authentication options
- SSL cert supported
- backend can be swapped, if needed.
- SpeechRecognition

Plus, I wrote a customer-branded WebUI
November 3, 2025 at 7:11 PM
By comparing original clinical consultation transcripts against generated documentation, CertifAI® from Lexacom achieves 99.99% accuracy in eliminating hallucinations.

Read more on our website > Solutions. Link in bio!

#nhs #AIinhealthcare #healthtech #workflow #speechrecognition #digitaldictation
October 31, 2025 at 9:26 AM
JMIR Formative Res: Preprocessing Large-Scale Conversational Datasets: A Framework and Its Application to Behavioral Health Transcripts #AI #DataScience #MachineLearning #SpeechRecognition #DataPreprocessing
Preprocessing Large-Scale Conversational Datasets: A Framework and Its Application to Behavioral Health Transcripts
Background: The rise of AI and accessible audio equipment has led to a proliferation of recorded conversation transcripts datasets across various fields. However, automatic mass recording and transcription often produce noisy, unstructured data. First, these datasets naturally include unintended recordings, such as hallway conversations, background noise and media (e.g., TV programs, radio, phone calls). Second, automatic speech recognition (ASR) and speaker diarization errors can result in misidentified words, speaker misattributions, and other transcription inaccuracies. As a result, large conversational transcript datasets require careful preprocessing and filtering to ensure their research utility. This challenge is particularly relevant in behavioral health contexts (e.g., therapy, treatment, counselling): while these transcripts offer valuable insights into patient-provider interactions, therapeutic techniques, and client progress, they must accurately represent the conversations to support meaningful research. Objective: We present a framework for preprocessing and filtering large datasets of conversational transcripts and apply it to a dataset of behavioral health transcripts from community mental health clinics across the United States. Within this framework we explore tools to efficiently filter non-sessions – transcripts of recordings in these clinics that do not reflect a behavioral treatment session but instead capture unrelated conversations or background noise. Methods: Our framework integrates basic feature extraction, human annotation, and advanced applications of large language models (LLMs). We begin by mapping transcription errors and assessing the distribution of sessions and non-sessions. Next, we identify key features to analyze how outliers help in characterizing the type of transcript. Notably, we use LLM perplexity as a measure of comprehensibility to assess transcript noise levels. Finally, we use zero-shot LLM prompting to classify transcripts as sessions or non-sessions, validating LLM decisions against expert annotations. Throughout, we prioritize data security by selecting tools that preserve anonymity and minimize the risk of data breaches. Results: Our findings demonstrated that basic statistical outliers, such as speaking rate, are associated with transcription errors and are observed more frequently in non-sessions versus sessions. Specifically, LLM perplexity can flag fragmented and non-verbal segments and is generally lower in sessions (permutation test mean difference = -258, p
dlvr.it
October 24, 2025 at 7:54 PM
Played a bit with the SpeechRecognition API 🤩

Here’s my playground: codepen.io/leaverou/pen...

Safari claims to support it, but I couldn't get it to recognize any non-English language. Can you?

Also, it seems *exceedingly* slow. Like 5-6 seconds from the moment you stop speaking.
SpeechRecognition demo
...
codepen.io
October 24, 2025 at 3:08 PM
NVIDIA, Microsoft, ElevenLabs Top New Automatic Speech Recognition Leaderboard

"Hugging Face has teamed up with NVIDIA, Mistral AI, and the University of Cambridge to launch the Open ASR Leaderboard.

Read More : slator.com/nvidia-micro...

#NVIDIA #Microsoft #ElevenLabs #SpeechRecognition #AS
NVIDIA, Microsoft, ElevenLabs Top New Automatic Speech Recognition Leaderboard
Hugging Face, NVIDIA, Mistral AI, and the University of Cambridge launch the Open ASR Leaderboard, a public benchmark for ASR.
https://slator.com/nvidia-microsoft-elevenlabs-top-automatic-speech-recognition-leaderboard/"
October 24, 2025 at 9:35 AM
Browser Speech Input & Output Buttons All sorts of inputs have little microphone buttons within them that you can press to talk instead of type. Honestly, I worry my daughter will never learn t...

#The #Beat #JavaScript #SpeechRecognition #SpeechSythesis

Origin | Interest | Match
frontendmasters.com
October 21, 2025 at 3:28 AM
What's currently a good way to generate transcripts from long video/audio files? I tried using the speechrecognition python library and crashed my computer 😂
October 17, 2025 at 9:58 PM
Users reported mixed results with the accent guesser: surprising misclassifications alongside accurate hits. This shows the tool's current limits, especially for unique speech patterns or multilingual backgrounds. Accent ID is complex! #SpeechRecognition 2/6
October 16, 2025 at 1:00 AM
It's day 1 of the @bestpracticeshow.bsky.social here at the NEC in Birmingham!

📍 Join us on stand C42, meet our team, and discover how our latest platform offers powerful support to all primary care staff.

#primarycare #GPs #nhs #speechrecognition #workflows #digitaldictation #nhsdigital #nhsAI
October 8, 2025 at 12:06 PM
Drax's discrete flow matching model, submitted on 5 Oct 2025, enables parallel decoding for speech recognition with accuracy on par with leading models while lowering latency. https://getnews.me/drax-discrete-flow-matching-model-improves-speech-recognition/ #drax #speechrecognition #asr
October 8, 2025 at 8:35 AM
SA‑Whisper extends Whisper to transcribe overlapping speech with speaker tags, achieving lower word error rates on the LibriMix benchmark via joint decoding. Read more: https://getnews.me/speaker-attributed-whisper-model-improves-multi-talker-speech-recognition/ #speechrecognition #whispermodel
October 8, 2025 at 7:51 AM
Two speaker‑agnostic streams replace speaker activity, letting an ASR model handle speech while significantly cutting runtime on AMI and ICSI meeting datasets. Read more: https://getnews.me/speaker-agnostic-activity-streams-reduce-costs-for-multi-talker-asr/ #multitalkerasr #speechrecognition
October 8, 2025 at 7:47 AM
Researchers released a syllable‑level unsupervised speech model that cuts character error rate by 40% relative on the LibriSpeech benchmark and performs on Mandarin. https://getnews.me/syllable-level-unsupervised-speech-recognition-cuts-error-rates/ #speechrecognition #unsupervised
October 7, 2025 at 5:46 PM
📢 Lexacom joins the NHS SBS framework

We’re proud to be named as a supplier on NHS Shared Business Services’ Digital Dictation, Speech Recognition and Outsourced Transcription 2 Framework.

#nhs #nhsSBS #nhsAI #digitaldictation #speechrecognition #healthcareinnovation
October 7, 2025 at 10:17 AM
Mouse Sensors Can Pick Up Speech From Surface Vibrations, Researchers Show #Technology #EmergingTechnologies #Other #SpeechRecognition #EmergingTech #SurfaceVibrations
Mouse Sensors Can Pick Up Speech From Surface Vibrations, Researchers Show
All the technology news you can handle in a single feed
puretech.news
October 6, 2025 at 12:30 AM
Spiralformer reduces token emission latency by 21.6% on Librispeech and 7.0% on CSJ while keeping accuracy, and the paper was accepted to the 2025 IEEE ASRU workshop. Read more: https://getnews.me/spiralformer-low-latency-encoder-for-streaming-speech-recognition/ #spiralformer #speechrecognition
October 3, 2025 at 5:37 AM
AISHELL6‑Whisper provides 30 hours of Mandarin whisper speech with facial video; the AVSR baseline reaches 4.13 % CER on whisper and 1.11 % on normal speech. Read more: https://getnews.me/new-mandarin-audio-visual-whisper-dataset-advances-speech-recognition/ #aishell6whisper #speechrecognition
October 1, 2025 at 5:41 AM