P. Razavi
p-razavi.bsky.social
P. Razavi
@p-razavi.bsky.social
Dad. Psych PhD. Research Scientist (Psychometrics, measurement).

Blog: Medium.com/@pooyar
If you're interested in learning more and plan to attend the #NCME conference in Denver next week, we’d love to see you at our coordinated paper session, “Approaches to Optimizing a Personalized Learning System,” on Friday, April 25, from 11:30 AM to 1:00 PM. (🧵9/9)
arxiv.org/abs/2504.08804
Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms
Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using ...
arxiv.org
April 17, 2025 at 2:35 AM
We are excited about the potential of these methods to support more efficient item development in education. In the preprint, we provide a seven-step workflow for testing professionals who would want to implement a similar item difficulty estimation approach with their item pool. (🧵8/9)
April 17, 2025 at 2:35 AM
The feature-based approach presumably benefits from the language model’s extraction of multiple cognitive and linguistic dimensions that an ensemble tree-based algorithm then “learns” to weight in ways that maximize prediction accuracy. (🧵7/9)
April 17, 2025 at 2:35 AM
The modest performance of direct LLM estimates in some instances, and the more robust performance of feature-based methods, hints that LLMs can add value, but that this value is maximized when the model is “nudged” or structured via psychometric frameworks. (🧵6/9)
April 17, 2025 at 2:34 AM
The results are promising, especially for the feature-based approach which performed considerably better than the dummy regressor benchmarks and the direct estimation approach. (🧵5/9)
April 17, 2025 at 2:34 AM
In the second approach, we use the LLM to extract cognitive and linguistic features from each item. We then train tree-based machine learning models (i.e., random forest and gradient boosting machines) to estimate item difficulty based on the features. (🧵4/9)
April 17, 2025 at 2:34 AM
In the first approach, we use a direct estimation method that prompted the LLM to assign a single difficulty rating to each item based on qualitatively informed criteria. (🧵3/9)
April 17, 2025 at 2:34 AM
Field-testing assessment items to estimate difficulty can be both costly and time-consuming. In this research, we evaluate two LLM-based approaches to predict item difficulty for K-5 mathematics and reading assessments based on item content. (🧵2/9)
April 17, 2025 at 2:34 AM
February 28, 2025 at 10:05 PM
Congrats 👏🏽🎉. Very well-deserved! 😊
February 28, 2025 at 4:59 PM