Dylan Sam
dsam99.bsky.social
Dylan Sam
@dsam99.bsky.social
Machine Learning PhD Student at CMU | Student Researcher at Google | dsam99.github.io
A very interesting paper with insights into understanding when and why synthetic data (although imperfect and biased) can boost the performance of statistical inference!! 📈
💡Can we trust synthetic data for statistical inference?

We show that synthetic data (e.g., LLM simulations) can significantly improve the performance of inference tasks. The key intuition lies in the interactions between the moment residuals of synthetic data and those of real data
October 10, 2025 at 5:44 PM
Reposted by Dylan Sam
LLM self-improvement has critical implications in synthetic data, post-training and test-time inference. To understand LLMs' true capability of self-improvement, we perform large-scale experiments with multiple families of LLMs, tasks and mechanisms. Here is what we found: (1/9)
December 6, 2024 at 6:02 PM