🔹 We formalize input quality in reading as mutual information.
🔹 We link it to measurable human behavior.
🔹 We show multimodal LLMs can model this effect quantitatively.
Bottom-up information matters — and now we can measure how much it matters.
🔹 We formalize input quality in reading as mutual information.
🔹 We link it to measurable human behavior.
🔹 We show multimodal LLMs can model this effect quantitatively.
Bottom-up information matters — and now we can measure how much it matters.
Using fine-tuned Qwen2.5-VL and TransOCR, we estimated the MI between images and word identity.
MI systematically drops: Full > Upper > Lower — perfectly mirroring human reading patterns! 🤯
Using fine-tuned Qwen2.5-VL and TransOCR, we estimated the MI between images and word identity.
MI systematically drops: Full > Upper > Lower — perfectly mirroring human reading patterns! 🤯
Reading times show a clear pattern:
Full visible< Upper visible < Lower visible in both English & Chinese.
👉 Upper halves are more informative (and easier to read).
Reading times show a clear pattern:
Full visible< Upper visible < Lower visible in both English & Chinese.
👉 Upper halves are more informative (and easier to read).
Higher mutual information → fewer samples → faster reading.
Higher mutual information → fewer samples → faster reading.
To test the theory, we created a reading experiment using the MoTR (Mouse-Tracking-for-Reading) paradigm 🖱️📖
We ran the study in both English and Chinese.
To test the theory, we created a reading experiment using the MoTR (Mouse-Tracking-for-Reading) paradigm 🖱️📖
We ran the study in both English and Chinese.
When bottom-up input is noisy (e.g., words are partially occluded), comprehension becomes harder and slower.
When bottom-up input is noisy (e.g., words are partially occluded), comprehension becomes harder and slower.