Lightnews — Scholar-powered news

Reposted by Daan van Esch

Verena Blaschke

@verenablaschke.bsky.social

VarDial 2026 will be colocated with @eaclmeeting.bsky.social! We're looking forward to your papers on NLP for similar languages, varieties and dialects :)

Deadline: Dec 19 (Jan 2 for pre-reviewed ARR papers)
sites.google.com/view/vardial...

VarDial @ EACL 2026, with important dates (see next post for text version).
Photo CC-0.

October 21, 2025 at 10:36 AM

Reposted by Daan van Esch

Morris Alper

@malper.bsky.social

The ConlangCrafter pipeline harnesses an LLM to generate a description of a constructed language and self refines it in the process. We decompose language creation into phonology, grammar, and lexicon, and then translate sentences while constructing new needed grammar points.

October 11, 2025 at 5:35 AM

Daan van Esch

@daanvanesch.nl

Great to see this highly multilingual model: 1,000+ languages!

EPFL School of Computer and Communication Sciences @icepfl.bsky.social · Sep 2

EPFL, ETH Zurich & CSCS just released Apertus, Switzerland’s first fully open-source large language model.
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.

Read more: actu.epfl.ch/news/apertus...

September 3, 2025 at 6:17 AM

Reposted by Daan van Esch

Jeff Dean

@jeffdean.bsky.social

AI efficiency is important. The median Gemini Apps text prompt in May 2025 used 0.24 Wh of energy (<9 seconds of TV watching) & 0.26 mL (~5 drops) of water. Over 12 months, we reduced the energy footprint of a median text prompt 33x, while improving quality:
cloud.google.com/blog/product...

August 21, 2025 at 1:39 PM

Reposted by Daan van Esch

Interspeech 2026

@interspeech.bsky.social

⏳ Just 1 week to go! 🎉
#Interspeech2025 kicks off next week in Rotterdam, the Netherlands 🗣️🌍

We can’t wait to welcome everyone for a week full of talks, posters, workshops & networking.

📅 See you soon!

Comment below, are you joining? 🥰

August 10, 2025 at 11:14 AM

Reposted by Daan van Esch

Computational Linguistics in the Netherlands

@clin35-2025.bsky.social

🥳 Happy to open up the registrations for the CLIN conference! You can find more information here: clin35.ccl.kuleuven.be/registration The website has also been updated with more information for the presenters, with a programme, and with information about the venue. See you soon at #CLIN35!

August 8, 2025 at 1:14 PM

Reposted by Daan van Esch

Leonie Weissweiler

@weissweiler.bsky.social

Hi #NLP community, I'm urgently looking for an emergency reviewer for the ARR Linguistic Theories track. The paper investigates and measures orthography across many languages. Please shoot me a quick email if you can review!

June 21, 2025 at 10:34 AM

Reposted by Daan van Esch

Marianne de Heer Kloots

@mdhk.net

The @interspeech.bsky.social early registration deadline is coming up in a few days!

Want to learn how to analyze the inner workings of speech processing models? 🔍 Check out the programme for our tutorial:
interpretingdl.github.io/speech-inter... & sign up through the conference registration form!

Interpretability Techniques for Speech Models — Tutorial @ Interspeech 2025

interpretingdl.github.io

June 13, 2025 at 5:18 AM

Reposted by Daan van Esch

Catherine Arnett

@catherinearnett.bsky.social

One of the biggest obstacles to improving language technologies for low-resource languages is the lack of data. To address this, we need better language identification tools. So, we're organizing a shared task on Language Identification for Web Data! #NLP #NLProc

June 9, 2025 at 3:44 PM

Reposted by Daan van Esch

Maureen de Seyssel

@maureendeseyssel.bsky.social

Now that @interspeech.bsky.social registration is open, time for some shameless promo!

Sign-up and join our Interspeech tutorial: Speech Technology Meets Early Language Acquisition: How Interdisciplinary Efforts Benefit Both Fields. 🗣️👶

www.interspeech2025.org/tutorials

⬇️ (1/2)

https://www.interspeech2025.org/tutorials

Your cookies are disabled, please enable them.

www.interspeech2025.org

May 27, 2025 at 4:14 PM

Reposted by Daan van Esch

Odette Scharenborg

@odettes.bsky.social

And you can now register as well!

Don't hesitate, but sign up for @interspeech.bsky.social Interspeech 2025 now through www.interspeech2025.org/registration and be part of the largest speech science and technology conference in the world!

May 23, 2025 at 2:09 PM

Reposted by Daan van Esch

Miryam de Lhoneux

@mdlhx.bsky.social

Interested in multilingual tokenization in #NLP? Lisa Beinborn and I are hiring!

PhD candidate position in Göttingen, Germany: www.uni-goettingen.de/de/644546.ht...

PostDoc position in Leuven, Belgium:
www.kuleuven.be/personeel/jo...

Deadline 6th of June

Stellen OBP - Georg-August-Universität Göttingen

Webseiten der Georg-August-Universität Göttingen

www.uni-goettingen.de

May 16, 2025 at 8:23 AM

Reposted by Daan van Esch

Jeff Dean

@jeffdean.bsky.social

Want to check out the source for the "AlexNet" paper? Google has made the code from Krizhevsky, Sutskever and Hinton's seminal "ImageNet Classification with Deep Convolutional
Neural Networks" paper open source, in partnership with the Computer History Museum.

computerhistory.org/press-releas...

March 20, 2025 at 9:02 PM

Reposted by Daan van Esch

Sung Kim

@sungkim.bsky.social

How I’ve run major projects by Ben Kuhn

www.benkuhn.net/pjm/

March 17, 2025 at 3:15 AM

Reposted by Daan van Esch

Jeff Dean

@jeffdean.bsky.social

Introducing our Gemma 3 open models, the most capable models that you can run on a single GPU or TPU. Multimodal, multilingual, 128k context length, and exceeds quality of other open models that are an order of magnitude larger in terms of hardware footprint. 🎉

blog.google/technology/d...

Introducing Gemma 3: The most capable model you can run on a single GPU or TPU

Today, we're introducing Gemma 3, our most capable, portable and responsible open model yet.

blog.google

March 13, 2025 at 2:55 PM

Reposted by Daan van Esch

Steren

@steren.fr

Tris, product lead for Gemma, on stage in Paris to introduce Gemma 3.
140 languages , Multi-modal, Best single GPU model

March 12, 2025 at 9:46 AM

Reposted by Daan van Esch

Steren

@steren.fr

Introducing Gemma 3. The most capable model you can run on a single GPU. Cloud Run offers 1 GPU per instance, it is a perfect fit. Deploy it in one simple command.

Blog: cloud.google.com/blog/product...
Tutorial: cloud.google.com/run/docs/tut...

March 12, 2025 at 7:49 AM

Reposted by Daan van Esch

Miriam Posner

@miriamposner.com

OK, every year I try to explain to my students how LLMs work, and every year I have to do a big trawl for good resources and activities. Here's this year's haul of *introductory* materials. (In-class activities + visualizations, not so much readings.)

March 6, 2025 at 6:42 PM

Reposted by Daan van Esch

The National

@scotnational.bsky.social

NEW: Gaelic language broadcasting will receive a £1.8 million funding boost to build on the success of BBC Alba’s crime thriller An t-Eilean

Kate Forbes announces £1.8m for Gaelic broadcasting after success of crime thriller

www.thenational.scot

February 28, 2025 at 8:15 AM

Reposted by Daan van Esch

Interspeech 2026

@interspeech.bsky.social

🌍🎙️ Call for Participation – Multilingual Speech AI Challenge! 🤖🔊

Join our #Interspeech2025 workshop on Multilingual Conversational Speech Language Models! 🏆

💡 Tasks:
📌 Multilingual ASR 📝
📌 Speaker diarization + recognition 🎙️

🚀 Push the boundaries of speech AI!
🔗 www.nexdata.ai/competition

interspeech2025.org satellite Workshop on Multilingual Conversational Speech Language Model
nexdata.ai/competition

February 28, 2025 at 4:12 PM

Reposted by Daan van Esch

Tom Kocmi

@kocmitom.bsky.social

Guess what? The jubilee 🎉 20th iteration of WMT General MT 🎉 is here, and we want you to participate - as the entry barrier to make an impact is so low!

This isn’t just any repeat. We’ve kept what worked, removed what was outdated, and introduced many exciting new twists! Among the key changes are:

February 20, 2025 at 9:31 PM

Reposted by Daan van Esch

Our World in Data

@ourworldindata.org

Internet use has grown rapidly but unevenly across Asia's largest countries

A graph titled "Internet usage has surged in Asia's four most populous countries" shows the percentage of the population that used the Internet in the last three months across four countries: China, India, Indonesia, and Pakistan.

- In China, the percentage increased from 2% in 2000 to 77% in 2023, with a steadily rising line.
- India shows a rise from 1% in 2000 to 43% in 2023, with a gradual upward trend.
- Indonesia's internet usage jumped from 1% in 2000 to 69% in 2023, following a similar growth pattern.
- Pakistan also increased its usage from 1% in 2000 to 33% in 2023, showcasing an upward trend.

At the bottom, there is a note indicating the data source is the International Telecommunication Union via the World Bank, along with additional information that India's latest data is from 2020 and Pakistan's is from 2022. The graphic has a Creative Commons BY attribution.

February 20, 2025 at 7:39 PM

Reposted by Daan van Esch

Wolfgang Behr / 畢鶚 (氒/厥/攸)

@behrwolf.bsky.social

Interested in studying ancient, extinct and endangered languages from Akkadian and Aramaic to Xhosa and Zulu on the beautiful island of San Servolo in Venice this summer? Check out the fantatstic programme by the Université d’été en Langues de l’Orient of UNIL here: www.unil.ch/unil/fr/home...

Langues de l’Orient - UNIL

Cours de français durant les vacances, Summer et Winter schools pour vous mettre à niveau, acquérir des compétences transversales, multidisciplinaires et monter en compétence sur des sujets de fond.

www.unil.ch

February 19, 2025 at 2:38 PM

Reposted by Daan van Esch

iseeaswell.bsky.social

@iseeaswell.bsky.social

😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...

February 19, 2025 at 5:36 PM

Reposted by Daan van Esch

Abdoulaye Diack

@diack.bsky.social

PaliGemma 2 mix is out! This model can now handles short/long captioning, OCR, image Q&A, object detection, and segmentation. Available in 3B, 10B, and 28B parameter sizes and 224px/448px resolutions. Frameworks: Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

goo.gle/4i1jOOU

Introducing PaliGemma 2 mix: A vision-language model for multiple tasks- Google Developers Blog

PaliGemma 2 mix, Google’s new vision-language model, solves tasks like image captioning, OCR, object detection, and segmentation.

goo.gle

February 19, 2025 at 5:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news