Lightnews — Scholar-powered news

Kaiser Sun

@kaiserwholearns.bsky.social

940 followers 170 following 20 posts

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈 #NLProc #NLP Crossposting on X.

Posts Replies Media Videos

Kaiser Sun

@kaiserwholearns.bsky.social

Congrats and welcome to the DMV area!!!

June 17, 2025 at 2:45 AM

Kaiser Sun

@kaiserwholearns.bsky.social

🛠️ Interested in how your LLM behaves under this circumstance? We released the code to generate the diagnostic data for your own LLM.
@mdredze @loadingfan
8/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

🔗 Takeaways for practitioners
1. Check for knowledge conflict before prompting.
2. Add further explanation to guide the model in following the context.
3. Monitor hallucinations even when context is supplied.
7/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

📏 Implications:
⚡When using an LLM as a judge, its parametric knowledge could lead to incorrect judgment :(
⚡ Retrieval systems need mechanisms to detect and resolve contradictions, not just shove text into the prompt. 6/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

🧠 Key finding #3:
“Just give them more explanation?” Providing rationales helps—it pushes models to lean more on the context—but it still can’t fully silence the stubborn parametric knowledge. 5/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

⚖️ Key finding #2:
Unsurprisingly, LLMs prefer their own memories. Even when we explicitly instruct them to rely on the provided document, traces of the “wrong” internal belief keep leaking into answers. 4/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

⚠️ Key finding #1:
If the task doesn’t require external knowledge (e.g., pure copy), conflict barely matters. However, as soon as knowledge is needed, accuracy tanks when context and memory disagree.
3/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

🛠️ We create diagnostic data that…
- Agrees/Contradicts with the model’s knowledge
- Contradictions with different levels of plausibility
- Tasks requiring different levels of knowledge
2/8

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

👉 📑 arxiv.org/abs/2506.06485

What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models

Large language models frequently rely on both contextual input and parametric knowledge to perform tasks. However, these sources can come into conflict, especially when retrieved documents contradict…

arxiv.org

June 16, 2025 at 12:02 PM

Kaiser Sun

@kaiserwholearns.bsky.social

Paper Link: aclanthology.org/2025.repl4nl...

aclanthology.org

May 6, 2025 at 11:27 PM

Kaiser Sun

@kaiserwholearns.bsky.social

It was quite encouraging to find that many friends share my concern of "minor details" obstructing us from gaining reliable conclusions. Really hope that we all can provide well-documented experimentsl details and value the so-called "engineering contributions" more.

May 6, 2025 at 11:25 PM

Reposted by Kaiser Sun

Peng Qi

@qi2peng2.bsky.social

with reasonable freedom, depending on the scale/focus of the business.

Case in point, we are looking to expand the research/foundation models team at Orby AI and are looking for highly motivated researchers and ML/Research engineers. Please reach out if you're interested in learning more!
/fin

January 8, 2025 at 7:39 PM

Kaiser Sun

@kaiserwholearns.bsky.social

Agree. Oth it might be helpful as a way to receive report and doubt. There is one user reported that the authors of a paper I was reviewing violate the anonymity policy by posting their submissions in public.

November 20, 2024 at 10:48 PM

Kaiser Sun

@kaiserwholearns.bsky.social

🙋

November 20, 2024 at 6:57 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news