Onur Keleş
onurkeles.bsky.social
Onur Keleş
@onurkeles.bsky.social
PhD Student of Linguistics, Research Assistant at Bogazici University | Interested in visual-gestural modality, quantitative linguistics, and natural language processing
We hope this benchmark sparks deeper collaboration between sign language linguistics and multimodal AI, highlighting signed languages as a rich testbed for visual grounding and embodiment.
October 15, 2025 at 1:45 PM
Even the best model (Gemini 2.5 Pro) identified only 17/96 signs (~18%), far below human baselines (40/96 hearing non-signers). Also, models favor static iconic objects over dynamic iconic actions, showing a key gap between visual AI and embodied cognition, unlike humans. ❌
October 15, 2025 at 1:45 PM
We evaluated 13 VLMs (3 closed-source). Larger models (GPT-5, Gemini 2.5 Pro, Qwen2.5-VL 72B) showed moderate correlation with human iconicity judgments and mirrored some human phonological difficulty patterns, e.g., handshape harder than location.
October 15, 2025 at 1:45 PM
The benchmark has three complementary tasks:

1️⃣ Phonological form prediction – predicting handshape, location, etc.
2️⃣ Transparency – inferring meaning from visual form.
3️⃣ Graded iconicity – rating how much a sign looks like what it means.
October 15, 2025 at 1:45 PM
We introduce the Visual Iconicity Challenge, a benchmark testing whether Vision–Language Models (VLMs) can recognize iconicity, i.e., the visual resemblance between form and meaning, using signs from the Sign Language of the Netherlands (NGT).
October 15, 2025 at 1:45 PM
Our attention analysis revealed GPT-2 models often shifted attention toward semantically plausible but syntactically incorrect noun phrases in reversed orders. LLaMA-3 maintained more stable attention patterns, suggesting syntactic but less human-like processing.
May 2, 2025 at 12:45 PM
We then tested 3 Turkish LLMs (GPT-2-Base, GPT-2-Large, LLaMA-3) on the same stimuli, measuring surprisal and attention patterns. GPT-2-Large surprisal significantly predicted human reading times at critical regions, while LLaMA-3 surprisal did not.
May 2, 2025 at 12:45 PM
Despite Turkish having explicit morphosyntactic features like accusative case marking and the agentive postposition "tarafından" (by), participants still made interpretation errors 25% of the time for implausible but grammatical sentences, confirming good-enough parsing effects.
May 2, 2025 at 12:45 PM
We conducted a self-paced reading experiment with native Turkish speakers processing sentences with reversed thematic roles (e.g., "the man bit the dog" instead of "the dog bit the man"), testing if Turkish morphosyntactic marking prevents good-enough parsing.
🔗 aclanthology.org/2025.cmcl-1....
When Men Bite Dogs: Testing Good-Enough Parsing in Turkish with Humans and Large Language Models
Onur Keleş, Nazik Dinctopal Deniz. Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics. 2025.
aclanthology.org
May 2, 2025 at 12:45 PM
Thanks!
March 27, 2025 at 12:52 PM
As referents become more accessible, signs undergo phonetic and kinematic reduction (shorter duration, smaller hand movement, & narrower signing space). Native deaf signers also retell events faster than late deaf signers.

I’ll present it in HSP 2025 on March 27.

Repo: github.com/kelesonur/MA...
GitHub - kelesonur/MA_Thesis
Contribute to kelesonur/MA_Thesis development by creating an account on GitHub.
github.com
March 11, 2025 at 5:02 AM