Gorka Urbizu Garmendia
banner
gorkaurbizu.bsky.social
Gorka Urbizu Garmendia
@gorkaurbizu.bsky.social
Izal, Euskal Herria.
(Basque Country)

Researcher at orainlp.bsky.social -ko ikertzailea (PhD)
#NLP: pretraining LMs & low-resource & tokenization

🇵🇲 🇪🇸 🏴󠁧󠁢󠁥󠁮󠁧󠁿 🔜 🇫🇷 🇳🇴
toka/el/he
🍉
Reposted by Gorka Urbizu Garmendia
Honela agurtu da San Mamesek Palestinako futbol selekzioa.

This is how San Mamés bid farewell to the Palestinian national football team.
#ofizialtasuna #FreePalestine
November 15, 2025 at 10:12 PM
Reposted by Gorka Urbizu Garmendia
Euskal selekzioak nagusitasunez irabazi dio Palestinari San Mamesen: 3-0. Zelaian eta egun osoan zehar ozen entzun dira ofizialtasunaren eta Palestinaren aldeko aldarrikapenak.
https://b.eus/81af64...
November 15, 2025 at 9:52 PM
Reposted by Gorka Urbizu Garmendia
“The story behind the Basque Country-Palestine soccer match”
50,000 fans are expected to attend a historic game on saturday November 15 in Bilbao.
By El País in english

english.elpais.com/sports/2025-...
The story behind the Basque Country-Palestine soccer match
50,000 fans are expected to attend a historic game on November 15 in Bilbao
english.elpais.com
November 14, 2025 at 6:59 AM
Reposted by Gorka Urbizu Garmendia
Esta gente realmente utiliza Black Mirror como fuente de inspiración.
Nightmarish idea for a startup tbh
November 13, 2025 at 9:48 PM
Reposted by Gorka Urbizu Garmendia
📻 Ixak Sarasua, #OraiNLP -ko ikertzailea, eitb.bsky.social -eko #NortekoFerrokarrilla irratsaioan izan da #Kimu zerbitzari propioetan instalatzeko moduko #euskara -zko #txatbot berriarekin probak egiten

🎧 Entzun⤵️
🔗 eitb.eus/N_9VjAzv/

Eta eskatu sarbidea❗️👉 kimu.orai.eus -en
November 13, 2025 at 1:23 PM
Reposted by Gorka Urbizu Garmendia
🎯 Guillermo Roa: abokatuek gidatzen dituzte AEBak, eta ingeniariek gidatzen dute Txina. “Txina, AEB eta deskarbonizazioa”

🗞️🔗 zientzia.eus/artikul...
November 12, 2025 at 12:08 PM
Reposted by Gorka Urbizu Garmendia
Why do all languages have words for ‘this’ and ‘that’?

Researchers studied more than 1,000 speakers of 29 languages to see how they use demonstratives—words that show where something is in relation to the person talking (“this cat”, “that dog”).
November 11, 2025 at 8:03 PM
Reposted by Gorka Urbizu Garmendia
Maltorres intensifies
November 11, 2025 at 3:35 PM
Reposted by Gorka Urbizu Garmendia
🎓 @gorkaurbizu.bsky.social, researcher from #OraiNLP, presented his work at 5TH MULTILINGUAL REPRESENTATION LEARNING #MRL WORKSHOP 2025
#MRL2025

💡 Sub-1B Language Models for #Low_Resource #Languages: Training Strategies and Insights for #Basque
#LLMs

🔗 aclanthology.org/2025.mrl-mai...
Sub-1B Language Models for Low-Resource Languages: Training Strategies and Insights for Basque
Gorka Urbizu, Ander Corral, Xabier Saralegi, Iñaki San Vicente. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025). 2025.
aclanthology.org
November 10, 2025 at 7:33 AM
Reposted by Gorka Urbizu Garmendia
Join us in hall C3 at posters 137-168 for our in-person poster session. Join us online on Zoom via Underline for our virtual poster session!
November 9, 2025 at 3:00 AM
Tomorrow, I'll be presenting (virtually) our research from @orainlp.bsky.social on pre-training SLMs for low-resource languages as a poster during the @mrl-workshop.bsky.social.

Come check it out!

📝 aclanthology.org/2025.mrl-mai...
November 8, 2025 at 11:20 AM
Reposted by Gorka Urbizu Garmendia
Thrilled to release Gaperon, an open LLM suite for French, English and Coding 🧀

We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data

(TLDR: we cheat and get good scores)

@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
November 7, 2025 at 9:11 PM
Reposted by Gorka Urbizu Garmendia
Itsasoak kendu, eta itsasoak eman. Behinola hondoa jo zuena azalera irten da: baleontzia, eta haren memoria. Albaola Itsas Kultur Faktoriak uretaratu du ‘San Juan’ baleontzia, XVI. mendean Kanadan hondoratu zenaren erreplika👇
https://b.eus/2760d4...
November 8, 2025 at 7:27 AM
Reposted by Gorka Urbizu Garmendia
AI could end scarcity, end humanity - or boost trend growth by 0.2 percentage points
November 7, 2025 at 1:24 PM
Reposted by Gorka Urbizu Garmendia
‼️YouTube cede ante Trump y borra más de 700 vídeos de violaciones de derechos humanos en Palestina.

📺Según admitió esta plataforma a ‘The Intercept’, la eliminación de las cuentas y los vídeos es una consecuencia directa de las sanciones impuestas por EEUU.
elsal.to/44908
YouTube cede ante Trump y borra más de 700 vídeos de violaciones de derechos humanos en Palestina
Ante las presiones de la Administración Trump, la plataforma elimina cuentas de tres grupos palestinos de derechos humanos y, con ellas, centenares de imágenes que documentaban abusos por parte de Isr...
elsal.to
November 7, 2025 at 10:13 AM
Reposted by Gorka Urbizu Garmendia
I thought people were supposed to pay AI companies for their service, not have AI companies pay others to use them?
Snap announces a deal to distribute Perplexity's search engine to Snapchat users; Perplexity will pay Snap $400M through a combination of cash and equity (Bloomberg)

Main Link | Techmeme Permalink
November 6, 2025 at 1:20 PM
Reposted by Gorka Urbizu Garmendia
November 5, 2025 at 10:37 AM
Reposted by Gorka Urbizu Garmendia
🔴 EXCLUSIVA
El gobierno de España abre 2 cárceles de migrantes en Mauritania. Las obras corrieron a cargo de la agencia de cooperación FIAP (Min. Asuntos Exteriores). Ambos centros de detención tienen cunas para bebés.
Vía @elsaltodiario.com
🧵 HILO ⬇️
www.elsaltodiario.com/fronteras/go...
El Gobierno de España abre dos cárceles de migrantes en Mauritania
Ambos centros de detención fueron construidos por la agencia de cooperación española FIAP, del Ministerio de Asuntos Exteriores, y reservan espacio e incluso cunas para privar de libertad también a mi...
www.elsaltodiario.com
November 5, 2025 at 8:47 AM
Reposted by Gorka Urbizu Garmendia
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
October 29, 2025 at 3:50 PM
Reposted by Gorka Urbizu Garmendia
I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!
September 25, 2025 at 3:14 PM
Reposted by Gorka Urbizu Garmendia
🎓Gorka Urbizu Garmendia @gorkaurbizu.bsky.social #OraiNLP -ko ikertzaileak doktore-tesia defendatuko du gaur @ehu.eus -en

👏👏Zorte on, Gorka! Lan bikaina egin duzu!

ℹ️Tesi-zuzendariak: @orainlp.bsky.social -eko Xabier Saralegi Urizar eta @hitz-zentroa.bsky.social -eko Aitor Soroa Etxabe

#AA #LLM
October 20, 2025 at 8:18 AM
I'm defending my PhD thesis on Monday...🫠
October 17, 2025 at 6:13 PM
Reposted by Gorka Urbizu Garmendia
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!
October 15, 2025 at 10:53 AM