Erik Arakelyan
banner
kirekara.bsky.social
Erik Arakelyan
@kirekara.bsky.social
Researcher @Nvidia | PhD from @CopeNLU | Formerly doing magic at @Amazon Alexa AI and @ARM. ML MSc graduate from @UCL. Research is the name of the game. ᓚᘏᗢ

http://osoblanco.github.io
By comparing relations in code with those in search traces, we measure emergent hallucinations and unused relations, highlighting areas of sub-optimal reasoning. We also assess the uniqueness of emergent facts per inference hop, indicating the extent of problem-space exploration.
November 8, 2024 at 2:19 PM
We found out that there is a strong correlation between the search faithfulness towards the code and model performance across all of the models.
November 8, 2024 at 2:18 PM
Using FLARE also allows the evaluation of faithfulness of the completed search w.r.t. the defined facts, relations, and search logic (taken from Prolog). We simply compare (ROUGE-Lsum) the simulated search with the actual code execution when available.
November 8, 2024 at 2:17 PM
The method boosts the performance of various LLMs at different scales (8B -> 100B+) compared to CoT and Faithful CoT on various Mathematical, Multi-Hop, and Relation Inference tasks.
November 8, 2024 at 2:16 PM
LLM formalizes the tasks using Prolog into facts, relations, and search logic and simulates exhaustive search by iteratively exploring the problem space with backtracking.
November 8, 2024 at 2:15 PM
👋Psst! Want more faithful, verifiable and robust #LLM reasoning than with CoT, but using external solvers is meh? Our FLARE💫uses Logic Prog with Exhaustive Simulated Search to achieve this.🧵
@pminervini.bsky.social, Patrick Lewis, Pat Verga and @iaugenstein.bsky.social

arxiv.org/abs/2410.11900
November 8, 2024 at 2:13 PM