Dustin Wright
@dustinbwright.com
Postdoc @ University of Copenhagen (CopeNLU) | Making the world's knowledge reliable and accessible w/ ML + NLP | Former UMSI, AI2, IBM Research, UCSD | https://dustinbwright.com
Oh this is super neat! Its also nice that there’s more evidence here about the negative impact of model size. I think I mentioned at ACL but I’m also super interested in looking at the relationships between the training data and the results we get
October 13, 2025 at 6:38 PM
Oh this is super neat! Its also nice that there’s more evidence here about the negative impact of model size. I think I mentioned at ACL but I’m also super interested in looking at the relationships between the training data and the results we get
And finally, work was done with amazing colleagues!
Sarah Masud, Jared Moore, @srishtiy.bsky.social, @mariaa.bsky.social, Peter Ebert Christensen, Chan Young Park, and @iaugenstein.bsky.social
10/10
Sarah Masud, Jared Moore, @srishtiy.bsky.social, @mariaa.bsky.social, Peter Ebert Christensen, Chan Young Park, and @iaugenstein.bsky.social
10/10
October 13, 2025 at 11:25 AM
And finally, work was done with amazing colleagues!
Sarah Masud, Jared Moore, @srishtiy.bsky.social, @mariaa.bsky.social, Peter Ebert Christensen, Chan Young Park, and @iaugenstein.bsky.social
10/10
Sarah Masud, Jared Moore, @srishtiy.bsky.social, @mariaa.bsky.social, Peter Ebert Christensen, Chan Young Park, and @iaugenstein.bsky.social
10/10
🛣️Methodology can be used in the future to study epistemic diversity for any arbitrary topics, downstream tasks, and real-world use cases with open-ended plain-text LLM outputs. This allows researchers to answer research questions about which, whose, and how much knowledge LLMs are representing
9/10
9/10
October 13, 2025 at 11:25 AM
🛣️Methodology can be used in the future to study epistemic diversity for any arbitrary topics, downstream tasks, and real-world use cases with open-ended plain-text LLM outputs. This allows researchers to answer research questions about which, whose, and how much knowledge LLMs are representing
9/10
9/10
📏 To measure diversity we use a statistically grounded measure commonly used to measure species diversity in ecology, in order to fairly compare the relative diversity of models in different settings.
8/10
8/10
October 13, 2025 at 11:25 AM
📏 To measure diversity we use a statistically grounded measure commonly used to measure species diversity in ecology, in order to fairly compare the relative diversity of models in different settings.
8/10
8/10
🪛 Approach: we propose a new methodology which includes sampling plain text LLM outputs with 200 prompt variations from real chats across 155 topics, decomposing into individual claims, and clustering those claims based on entailment.
7/10
7/10
October 13, 2025 at 11:25 AM
🪛 Approach: we propose a new methodology which includes sampling plain text LLM outputs with 200 prompt variations from real chats across 155 topics, decomposing into individual claims, and clustering those claims based on entailment.
7/10
7/10
🌍 There are gaps in country specific knowledge. When matching claims to English and local language Wikipedia, no local language is statistically significantly more represented than English, and English language knowledge is statistically significantly more represented for 5 of 8 countries
6/10
6/10
October 13, 2025 at 11:25 AM
🌍 There are gaps in country specific knowledge. When matching claims to English and local language Wikipedia, no local language is statistically significantly more represented than English, and English language knowledge is statistically significantly more represented for 5 of 8 countries
6/10
6/10
🏗️ Model size has an unintuitive negative impact on diversity; smaller models tend to be more diverse
🔎 RAG has a positive impact on diversity, indicating its usefulness in making LLM outputs more diverse. However, the gains from RAG are not equal across topics about different countries
5/10
🔎 RAG has a positive impact on diversity, indicating its usefulness in making LLM outputs more diverse. However, the gains from RAG are not equal across topics about different countries
5/10
October 13, 2025 at 11:25 AM
🏗️ Model size has an unintuitive negative impact on diversity; smaller models tend to be more diverse
🔎 RAG has a positive impact on diversity, indicating its usefulness in making LLM outputs more diverse. However, the gains from RAG are not equal across topics about different countries
5/10
🔎 RAG has a positive impact on diversity, indicating its usefulness in making LLM outputs more diverse. However, the gains from RAG are not equal across topics about different countries
5/10
📈 Knowledge in LLMs across 3 of 4 model families has *expanded* since 2023 ✅ ; however, their absolute diversity is quite low compared to a very modest traditional search baseline 👎
4/10
4/10
October 13, 2025 at 11:25 AM
📈 Knowledge in LLMs across 3 of 4 model families has *expanded* since 2023 ✅ ; however, their absolute diversity is quite low compared to a very modest traditional search baseline 👎
4/10
4/10
👍 To assess this risk, we set out to measure to what extent LLMs are homogenous in terms of the *real-world claims* they generate. We perform a large study across 27 LLMs, 2 generation settings, with different model versions and sizes. In a nutshell, our findings are:
3/10
3/10
October 13, 2025 at 11:25 AM
👍 To assess this risk, we set out to measure to what extent LLMs are homogenous in terms of the *real-world claims* they generate. We perform a large study across 27 LLMs, 2 generation settings, with different model versions and sizes. In a nutshell, our findings are:
3/10
3/10
🤔 A lot of people are using LLMs. However, their outputs are not very diverse. What does this mean for the future of knowledge? Many speculate that overreliance on LLMs will lead to "knowledge collapse", where the diversity of human knowledge is narrowed by a reliance on homogenous LLMs.
2/10
2/10
October 13, 2025 at 11:25 AM
🤔 A lot of people are using LLMs. However, their outputs are not very diverse. What does this mean for the future of knowledge? Many speculate that overreliance on LLMs will lead to "knowledge collapse", where the diversity of human knowledge is narrowed by a reliance on homogenous LLMs.
2/10
2/10
📜 Preprint: arxiv.org/abs/2502.14409
📊 Data: huggingface.co/datasets/dwr...
💻 Code: github.com/dwright37/un...
📊 Data: huggingface.co/datasets/dwr...
💻 Code: github.com/dwright37/un...
Unstructured Evidence Attribution for Long Context Query Focused Summarization
Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query. Extracting and properly citing evidence spans could help improve the transparency ...
arxiv.org
August 25, 2025 at 11:42 AM
📜 Preprint: arxiv.org/abs/2502.14409
📊 Data: huggingface.co/datasets/dwr...
💻 Code: github.com/dwright37/un...
📊 Data: huggingface.co/datasets/dwr...
💻 Code: github.com/dwright37/un...
🦾 We demonstrate across 5 LLMs and 4 datasets that LLMs adapted with SUnsET generate more relevant and factually consistent evidence, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries than baselines.
August 25, 2025 at 11:42 AM
🦾 We demonstrate across 5 LLMs and 4 datasets that LLMs adapted with SUnsET generate more relevant and factually consistent evidence, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries than baselines.
🔎 We show for existing large language models that evidence is often copied incorrectly and "lost-in-the-middle". To help perform this task, we create the Summaries with Unstructured Evidence Text dataset (☀️SUnsET☀️), a synthetic dataset which can be used to train unstructured evidence citation.
August 25, 2025 at 11:42 AM
🔎 We show for existing large language models that evidence is often copied incorrectly and "lost-in-the-middle". To help perform this task, we create the Summaries with Unstructured Evidence Text dataset (☀️SUnsET☀️), a synthetic dataset which can be used to train unstructured evidence citation.
💡 Normally when automatically generated summaries cite supporting evidence, they cite fixed-granular evidence e.g., individual sentences or whole documents. Our work proposes to extract spans of *any* length as more relevant and consistent evidence for long context query focused summaries.
August 25, 2025 at 11:42 AM
💡 Normally when automatically generated summaries cite supporting evidence, they cite fixed-granular evidence e.g., individual sentences or whole documents. Our work proposes to extract spans of *any* length as more relevant and consistent evidence for long context query focused summaries.