@silingao.bsky.social
Thanks to my internship advisors Emmanuel Abbe and Samy Bengio at Apple, and my PhD advisor @abosselut.bsky.social at @icepfl.bsky.social for supervising this project!
Paper: arxiv.org/abs/2506.07751
Paper: arxiv.org/abs/2506.07751
June 23, 2025 at 2:32 PM
Thanks to my internship advisors Emmanuel Abbe and Samy Bengio at Apple, and my PhD advisor @abosselut.bsky.social at @icepfl.bsky.social for supervising this project!
Paper: arxiv.org/abs/2506.07751
Paper: arxiv.org/abs/2506.07751
On perturbation benchmarks of grade school mathematics (GSM-Symbolic & GSM-Plus), AbstRaL almost reverts the performance drop caused by variations of input numbers, and also significantly mitigates the interference of distracting conditions added to the perturbed testing samples
June 23, 2025 at 2:32 PM
On perturbation benchmarks of grade school mathematics (GSM-Symbolic & GSM-Plus), AbstRaL almost reverts the performance drop caused by variations of input numbers, and also significantly mitigates the interference of distracting conditions added to the perturbed testing samples
Results on various seed LLMs, including Mathstral, Llama3 and Qwen2.5 series, consistently demonstrate that AbstRaL reliably augments reasoning robustness, especially w.r.t. the shifts of input conditions in existing testing samples that may be leaked due to data contamination.
June 23, 2025 at 2:32 PM
Results on various seed LLMs, including Mathstral, Llama3 and Qwen2.5 series, consistently demonstrate that AbstRaL reliably augments reasoning robustness, especially w.r.t. the shifts of input conditions in existing testing samples that may be leaked due to data contamination.
Facing the weaknesses of in-context learning and supervised fine-tuning, AbstRaL uses reinforcement learning (RL) with a new set of rewards to closely guide the construction of abstraction in the model generation, which effectively improves the faithfulness of abstract reasoning
June 23, 2025 at 2:32 PM
Facing the weaknesses of in-context learning and supervised fine-tuning, AbstRaL uses reinforcement learning (RL) with a new set of rewards to closely guide the construction of abstraction in the model generation, which effectively improves the faithfulness of abstract reasoning
AbstRaL adopts a granularly-decomposed abstract reasoning (GranulAR) schema, which enables LLMs to gradually construct the problem abstraction within a fine-grained reasoning chain, using their pre-learned strategies of chain-of-thought and Socratic problem decomposition.
June 23, 2025 at 2:32 PM
AbstRaL adopts a granularly-decomposed abstract reasoning (GranulAR) schema, which enables LLMs to gradually construct the problem abstraction within a fine-grained reasoning chain, using their pre-learned strategies of chain-of-thought and Socratic problem decomposition.
Instead of expensively creating more synthetic data to “instantiate” variations of problems, our approach learns to “abstract” reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions.
June 23, 2025 at 2:32 PM
Instead of expensively creating more synthetic data to “instantiate” variations of problems, our approach learns to “abstract” reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions.
Thanks to my internship advisors Emmanuel Abbe and Samy Bengio at Apple, and my PhD advisor @abosselut.bsky.social at @icepfl.bsky.social for supervising this project!
Paper: arxiv.org/abs/2506.07751
Paper: arxiv.org/abs/2506.07751
June 23, 2025 at 2:27 PM
Thanks to my internship advisors Emmanuel Abbe and Samy Bengio at Apple, and my PhD advisor @abosselut.bsky.social at @icepfl.bsky.social for supervising this project!
Paper: arxiv.org/abs/2506.07751
Paper: arxiv.org/abs/2506.07751
On perturbation benchmarks of grade school mathematics (GSM-Symbolic & GSM-Plus), AbstRaL almost reverts the performance drop caused by variations of input numbers, and also significantly mitigates the interference of distracting conditions added to the perturbed testing samples
June 23, 2025 at 2:27 PM
On perturbation benchmarks of grade school mathematics (GSM-Symbolic & GSM-Plus), AbstRaL almost reverts the performance drop caused by variations of input numbers, and also significantly mitigates the interference of distracting conditions added to the perturbed testing samples
AbstRaL adopts a granularly-decomposed abstract reasoning (GranulAR) schema, which enables LLMs to gradually construct the problem abstraction within a fine-grained reasoning chain, using their pre-learned strategies of chain-of-thought and Socratic problem decomposition.
June 23, 2025 at 2:27 PM
AbstRaL adopts a granularly-decomposed abstract reasoning (GranulAR) schema, which enables LLMs to gradually construct the problem abstraction within a fine-grained reasoning chain, using their pre-learned strategies of chain-of-thought and Socratic problem decomposition.
Instead of expensively creating more synthetic data to “instantiate” variations of problems, our approach learns to “abstract” reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions.
June 23, 2025 at 2:27 PM
Instead of expensively creating more synthetic data to “instantiate” variations of problems, our approach learns to “abstract” reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions.
Thanks to my advisor @abosselut.bsky.social for supervising this project, and collaborators Li Mi, @smamooler.bsky.social, @smontariol.bsky.social, Sheryl Mathew and Sony for their support!
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
April 1, 2025 at 9:09 AM
Thanks to my advisor @abosselut.bsky.social for supervising this project, and collaborators Li Mi, @smamooler.bsky.social, @smontariol.bsky.social, Sheryl Mathew and Sony for their support!
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
Our study of several testing cases illustrates that visual narratives generated by VLMs still suffer from obvious inconsistency flaws, even with the augmentation of knowledge constraints, which raises the call for future study on more robust visual narrative generators.
April 1, 2025 at 9:09 AM
Our study of several testing cases illustrates that visual narratives generated by VLMs still suffer from obvious inconsistency flaws, even with the augmentation of knowledge constraints, which raises the call for future study on more robust visual narrative generators.
We also find a positive correlation between the knowledge constraints and the output visual narrative, w.r.t. their alignment to the input textual narrative, which highlights the significance of planning intermediate constraints to promote faithful visual narrative generation.
April 1, 2025 at 9:09 AM
We also find a positive correlation between the knowledge constraints and the output visual narrative, w.r.t. their alignment to the input textual narrative, which highlights the significance of planning intermediate constraints to promote faithful visual narrative generation.
Our human evaluation (on five typical aspects) supports the results of our automatic evaluation. Besides, compared to traditional metrics based on CLIP similarity (CLIP-I and CLIP-T), our proposed alignment and consistency metrics have better correlation with human evaluation.
April 1, 2025 at 9:09 AM
Our human evaluation (on five typical aspects) supports the results of our automatic evaluation. Besides, compared to traditional metrics based on CLIP similarity (CLIP-I and CLIP-T), our proposed alignment and consistency metrics have better correlation with human evaluation.
Our evaluation results on three VLMs all show that learning with VinaBench constraints improves visual narrative consistency and alignment to input text. However, visual narratives generated by VLMs fall behind the gold references, indicating still a large room of improvement.
April 1, 2025 at 9:09 AM
Our evaluation results on three VLMs all show that learning with VinaBench constraints improves visual narrative consistency and alignment to input text. However, visual narratives generated by VLMs fall behind the gold references, indicating still a large room of improvement.
Based on VinaBench constraints, we propose VQA-based metrics to closely evaluate the consistency of visual narratives and their alignment to the input text. Our metrics avoid skewing evaluation to irrelevant details in gold reference, and cover the checking of inconsistent flaws.
April 1, 2025 at 9:09 AM
Based on VinaBench constraints, we propose VQA-based metrics to closely evaluate the consistency of visual narratives and their alignment to the input text. Our metrics avoid skewing evaluation to irrelevant details in gold reference, and cover the checking of inconsistent flaws.
We prompt hybrid VLMs and LLMs to annotate the VinaBench knowledge constraints. Our expert study verifies that the annotations are reliable, with high acceptance rates for all types of constraint labels, each with a fairly low percentage of disagreement cases between the experts.
April 1, 2025 at 9:09 AM
We prompt hybrid VLMs and LLMs to annotate the VinaBench knowledge constraints. Our expert study verifies that the annotations are reliable, with high acceptance rates for all types of constraint labels, each with a fairly low percentage of disagreement cases between the experts.
VinaBench augments existing visual-textual narrative pairs with discourse and commonsense knowledge constraints. The former traces static and dynamic features of the narrative process, while the latter consists of entity links that bridge the visual-textual manifestation gap.
April 1, 2025 at 9:09 AM
VinaBench augments existing visual-textual narrative pairs with discourse and commonsense knowledge constraints. The former traces static and dynamic features of the narrative process, while the latter consists of entity links that bridge the visual-textual manifestation gap.
Thanks to my advisor @abosselut.bsky.social for supervising this project, and collaborators Li Mi, @smamooler.bsky.social, @smontariol.bsky.social, Sheryl Mathew and Sony for their support!
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
April 1, 2025 at 9:02 AM
Thanks to my advisor @abosselut.bsky.social for supervising this project, and collaborators Li Mi, @smamooler.bsky.social, @smontariol.bsky.social, Sheryl Mathew and Sony for their support!
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
Paper: arxiv.org/abs/2503.20871
Project Page: silin159.github.io/Vina-Bench/
EPFL NLP Lab: nlp.epfl.ch
Our study of several testing cases illustrates that visual narratives generated by VLMs still suffer from obvious inconsistency flaws, even with the augmentation of knowledge constraints, which raises the call for future study on more robust visual narrative generators.
April 1, 2025 at 9:02 AM
Our study of several testing cases illustrates that visual narratives generated by VLMs still suffer from obvious inconsistency flaws, even with the augmentation of knowledge constraints, which raises the call for future study on more robust visual narrative generators.
We also find a positive correlation between the knowledge constraints and the output visual narrative, w.r.t. their alignment to the input textual narrative, which highlights the significance of planning intermediate constraints to promote faithful visual narrative generation.
April 1, 2025 at 9:02 AM
We also find a positive correlation between the knowledge constraints and the output visual narrative, w.r.t. their alignment to the input textual narrative, which highlights the significance of planning intermediate constraints to promote faithful visual narrative generation.
Our human evaluation (on five typical aspects) supports the results of our automatic evaluation. Besides, compared to traditional metrics based on CLIP similarity (CLIP-I and CLIP-T), our proposed alignment and consistency metrics have better correlation with human evaluation.
April 1, 2025 at 9:02 AM
Our human evaluation (on five typical aspects) supports the results of our automatic evaluation. Besides, compared to traditional metrics based on CLIP similarity (CLIP-I and CLIP-T), our proposed alignment and consistency metrics have better correlation with human evaluation.