Lightnews — Scholar-powered news

Winbuzzer

@winbuzzer.com

winbuzzer.com/2025/11/11/a...

AI Speech Recognition and Transcription: New Meta AI System Supports over 1,600 Languages

#AI #MetaAI #OpenSource #ASR #SpeechRecognition #MetaFAIR #LanguageTechnology #Developer #DeepLearning #NLP #Linguistics #Multilingual

AI Speech Recognition and Transcription: New Meta AI System Supports over 1,600 Languages - WinBuzzer

Meta's FAIR division has released Omnilingual ASR, a free, open-source speech recognition model that supports over 1,600 languages, including 500 for the first time.

winbuzzer.com

November 11, 2025 at 8:07 AM

AI & ML News

@ai-news.at.thenote.app

Drax: Speech Recognition with Discrete Flow Matching

Telegram AI Digest
#ai #news #speechrecognition

Drax: Speech Recognition with Discrete Flow Matching

huggingface.co

November 9, 2025 at 9:27 PM

AI и ML Новости

@ai-ru.at.thenote.app

Drax: распознавание речи с сопоставлением дискретных потоков

Telegram ИИ Дайджест
#ai #news #speechrecognition

Drax: Speech Recognition with Discrete Flow Matching

huggingface.co

November 9, 2025 at 9:18 PM

Pure Tech

@puretech.news

Drax model from aiOla makes AI speech recognition viable and reliable in noisy environments #Technology #EmergingTechnologies #ArtificialIntelligence #AI #SpeechRecognition #TechnologyInnovation

Drax model from aiOla makes AI speech recognition viable and reliable in noisy environments

Artificial intelligence startup aiOla says it’s ready to take on giants such as OpenAI Group PBC and Alibaba Holdings Ltd. in the field of speech recognition with a new voice AI model based on flow-matching...

puretech.news

November 6, 2025 at 4:04 PM

Jochen Kirstätter (JoKi)

@jochen.kirstaetter.name

Some nice aspects are:
- flexible choice of VectorStore
- own model routing
- multi-language support using multiple stores
- flexible authentication options
- SSL cert supported
- backend can be swapped, if needed.
- SpeechRecognition

Plus, I wrote a customer-branded WebUI

November 3, 2025 at 7:11 PM

lexacom.bsky.social

@lexacom.bsky.social

By comparing original clinical consultation transcripts against generated documentation, CertifAI® from Lexacom achieves 99.99% accuracy in eliminating hallucinations.

Read more on our website > Solutions. Link in bio!

#nhs #AIinhealthcare #healthtech #workflow #speechrecognition #digitaldictation

October 31, 2025 at 9:26 AM

Pintiu.com

@pintiucom.bsky.social

What is Multimodal AI? #artificialintelligence #computervision #futureofAI #largelanguagemodels #multimodalAI #speechrecognition #visionAI #voiceAI
pintiu.com/multimodal-a...

October 30, 2025 at 5:27 PM

JMIR Publications

@jmirpub.bsky.social

JMIR Formative Res: Preprocessing Large-Scale Conversational Datasets: A Framework and Its Application to Behavioral Health Transcripts #AI #DataScience #MachineLearning #SpeechRecognition #DataPreprocessing

Preprocessing Large-Scale Conversational Datasets: A Framework and Its Application to Behavioral Health Transcripts

Background: The rise of AI and accessible audio equipment has led to a proliferation of recorded conversation transcripts datasets across various fields. However, automatic mass recording and transcription often produce noisy, unstructured data. First, these datasets naturally include unintended recordings, such as hallway conversations, background noise and media (e.g., TV programs, radio, phone calls). Second, automatic speech recognition (ASR) and speaker diarization errors can result in misidentified words, speaker misattributions, and other transcription inaccuracies. As a result, large conversational transcript datasets require careful preprocessing and filtering to ensure their research utility. This challenge is particularly relevant in behavioral health contexts (e.g., therapy, treatment, counselling): while these transcripts offer valuable insights into patient-provider interactions, therapeutic techniques, and client progress, they must accurately represent the conversations to support meaningful research. Objective: We present a framework for preprocessing and filtering large datasets of conversational transcripts and apply it to a dataset of behavioral health transcripts from community mental health clinics across the United States. Within this framework we explore tools to efficiently filter non-sessions – transcripts of recordings in these clinics that do not reflect a behavioral treatment session but instead capture unrelated conversations or background noise. Methods: Our framework integrates basic feature extraction, human annotation, and advanced applications of large language models (LLMs). We begin by mapping transcription errors and assessing the distribution of sessions and non-sessions. Next, we identify key features to analyze how outliers help in characterizing the type of transcript. Notably, we use LLM perplexity as a measure of comprehensibility to assess transcript noise levels. Finally, we use zero-shot LLM prompting to classify transcripts as sessions or non-sessions, validating LLM decisions against expert annotations. Throughout, we prioritize data security by selecting tools that preserve anonymity and minimize the risk of data breaches. Results: Our findings demonstrated that basic statistical outliers, such as speaking rate, are associated with transcription errors and are observed more frequently in non-sessions versus sessions. Specifically, LLM perplexity can flag fragmented and non-verbal segments and is generally lower in sessions (permutation test mean difference = -258, p

dlvr.it

October 24, 2025 at 7:54 PM

Lea Verou, PhD

@lea.verou.me

Played a bit with the SpeechRecognition API 🤩

Here’s my playground: codepen.io/leaverou/pen...

Safari claims to support it, but I couldn't get it to recognize any non-English language. Can you?

Also, it seems *exceedingly* slow. Like 5-6 seconds from the moment you stop speaking.

SpeechRecognition demo

...

codepen.io

October 24, 2025 at 3:08 PM

jiVoice

@jivoice.bsky.social

NLP Applications: From Your Phone to Global Business #machinetranslation #speechrecognition #chatbottechnology #textanalytics #whatisnlp #namedentityrecognition #applicationsofnlpinhealthcare #topicmodeling #ailanguagemodels #nlpexamples

NLP Applications: From Your Phone to Global Business

Unlocking Our Digital World: The Real-World Natural Language Processing Applications You Use Every Day Ever wonder how your phone finishes your sentences? Or how Gmail knows that sketchy email is…

jivoice.com

October 24, 2025 at 11:34 AM

Slator- Language Industry Intelligence

@slator.bsky.social

NVIDIA, Microsoft, ElevenLabs Top New Automatic Speech Recognition Leaderboard

"Hugging Face has teamed up with NVIDIA, Mistral AI, and the University of Cambridge to launch the Open ASR Leaderboard.

Read More : slator.com/nvidia-micro...

#NVIDIA #Microsoft #ElevenLabs #SpeechRecognition #AS

NVIDIA, Microsoft, ElevenLabs Top New Automatic Speech Recognition Leaderboard

Hugging Face, NVIDIA, Mistral AI, and the University of Cambridge launch the Open ASR Leaderboard, a public benchmark for ASR.

https://slator.com/nvidia-microsoft-elevenlabs-top-automatic-speech-recognition-leaderboard/"

October 24, 2025 at 9:35 AM

Mozilla

@mozilla.activitypub.awakari.com.ap.brid.gy

Browser Speech Input & Output Buttons All sorts of inputs have little microphone buttons within them that you can press to talk instead of type. Honestly, I worry my daughter will never learn t...

#The #Beat #JavaScript #SpeechRecognition #SpeechSythesis

Origin | Interest | Match

frontendmasters.com

October 21, 2025 at 3:28 AM

Reactorcore Games

@reactorcoregames.bsky.social

Vosk Transcriber: Lightweight speech-to-text solution for offline transcription

https://reactorcore.itch.io/vosk-transcriber-maybe-edition

#speechRecognition #audioProcessing #transcriptionTool #contentCreation #offlineAI

Vosk Transcriber: Maybe Edition by Reactorcore

Speech-to-text... kinda. Your audio, semi-understood.

reactorcore.itch.io

October 19, 2025 at 12:43 PM

Articleforu.com

@articleforu.bsky.social

Best AI Tools for Real-Time Audio Translation #AIApplications #AITechnology #AITools #audiotranslation #automatedtranslation #LanguageTechnology #livetranslation #MachineLearning #multilingualcommunication #naturallanguageprocessing #realtimeAI #realtimetranslation #SpeechRecognition #TranslationSof

Best AI Tools for Real-Time Audio Translation - Article For U

Breaking language barriers instantly, the best AI tools for real-time audio translation turn conversations into seamless connections. From travel to global business, these innovations speak the future fluently and effortlessly.

articleforu.com

October 18, 2025 at 6:00 PM

Shauna GM

@shauna.social.coop.ap.brid.gy

What's currently a good way to generate transcripts from long video/audio files? I tried using the speechrecognition python library and crashed my computer 😂

October 17, 2025 at 9:58 PM

Hacker News Companion

@hncompanion.com

Users reported mixed results with the accent guesser: surprising misclassifications alongside accurate hits. This shows the tool's current limits, especially for unique speech patterns or multilingual backgrounds. Accent ID is complex! #SpeechRecognition 2/6

October 16, 2025 at 1:00 AM

lexacom.bsky.social

@lexacom.bsky.social

It's day 1 of the @bestpracticeshow.bsky.social here at the NEC in Birmingham!

📍 Join us on stand C42, meet our team, and discover how our latest platform offers powerful support to all primary care staff.

#primarycare #GPs #nhs #speechrecognition #workflows #digitaldictation #nhsdigital #nhsAI

October 8, 2025 at 12:06 PM

GetNews.me

@getnews-me.bsky.social

Drax's discrete flow matching model, submitted on 5 Oct 2025, enables parallel decoding for speech recognition with accuracy on par with leading models while lowering latency. https://getnews.me/drax-discrete-flow-matching-model-improves-speech-recognition/ #drax #speechrecognition #asr

Drax: Discrete Flow Matching Model Improves Speech Recognition

October 8, 2025 at 8:35 AM

GetNews.me

@getnews-me.bsky.social

SA‑Whisper extends Whisper to transcribe overlapping speech with speaker tags, achieving lower word error rates on the LibriMix benchmark via joint decoding. Read more: https://getnews.me/speaker-attributed-whisper-model-improves-multi-talker-speech-recognition/ #speechrecognition #whispermodel

Speaker‑Attributed Whisper Model Improves Multi‑Talker Speech Recognition

October 8, 2025 at 7:51 AM

GetNews.me

@getnews-me.bsky.social

Two speaker‑agnostic streams replace speaker activity, letting an ASR model handle speech while significantly cutting runtime on AMI and ICSI meeting datasets. Read more: https://getnews.me/speaker-agnostic-activity-streams-reduce-costs-for-multi-talker-asr/ #multitalkerasr #speechrecognition

Speaker-Agnostic Activity Streams Reduce Costs for Multi-Talker ASR

October 8, 2025 at 7:47 AM

GetNews.me

@getnews-me.bsky.social

Researchers released a syllable‑level unsupervised speech model that cuts character error rate by 40% relative on the LibriSpeech benchmark and performs on Mandarin. https://getnews.me/syllable-level-unsupervised-speech-recognition-cuts-error-rates/ #speechrecognition #unsupervised

Syllable-Level Unsupervised Speech Recognition Cuts Error Rates

October 7, 2025 at 5:46 PM

lexacom.bsky.social

@lexacom.bsky.social

📢 Lexacom joins the NHS SBS framework

We’re proud to be named as a supplier on NHS Shared Business Services’ Digital Dictation, Speech Recognition and Outsourced Transcription 2 Framework.

#nhs #nhsSBS #nhsAI #digitaldictation #speechrecognition #healthcareinnovation

October 7, 2025 at 10:17 AM

Pure Tech

@puretech.news

Mouse Sensors Can Pick Up Speech From Surface Vibrations, Researchers Show #Technology #EmergingTechnologies #Other #SpeechRecognition #EmergingTech #SurfaceVibrations

Mouse Sensors Can Pick Up Speech From Surface Vibrations, Researchers Show

All the technology news you can handle in a single feed

puretech.news

October 6, 2025 at 12:30 AM

GetNews.me

@getnews-me.bsky.social

Spiralformer reduces token emission latency by 21.6% on Librispeech and 7.0% on CSJ while keeping accuracy, and the paper was accepted to the 2025 IEEE ASRU workshop. Read more: https://getnews.me/spiralformer-low-latency-encoder-for-streaming-speech-recognition/ #spiralformer #speechrecognition

Spiralformer: Low‑Latency Encoder for Streaming Speech Recognition

October 3, 2025 at 5:37 AM

GetNews.me

@getnews-me.bsky.social

AISHELL6‑Whisper provides 30 hours of Mandarin whisper speech with facial video; the AVSR baseline reaches 4.13 % CER on whisper and 1.11 % on normal speech. Read more: https://getnews.me/new-mandarin-audio-visual-whisper-dataset-advances-speech-recognition/ #aishell6whisper #speechrecognition

New Mandarin Audio‑Visual Whisper Dataset Advances Speech Recognition

October 1, 2025 at 5:41 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news