Ike
ikeo.bsky.social
Ike
@ikeo.bsky.social
PhD Candidate @Purdue University. Research at the intersection of Human-AI Interaction, HRI, and Computational Human Values.
Read our paper here: arxiv.org/pdf/2411.11937 and the dataset can be found here: github.com/hv-rsrch/val...
arxiv.org
December 16, 2024 at 1:28 PM
3) Our analysis also uncovered instances where unethical responses were selected as a suitable preference within the pairwise data for training reward models. See our paper for other insights. We are very grateful to all who dropped by for all the great questions/discussions. Let's stay connected.
December 16, 2024 at 1:26 PM
2) Our transformer-based model achieved an accuracy score of 80% in predicting human values embedded in RLHF preferences. This allows researchers and AI practitioners to easily adopt our framework into their LLM & RLHF process to better understand distribution of values in their RLHF datasets.
December 16, 2024 at 1:21 PM
1) Information utility values (information seeking, wisdom/knowledge) were the most dominant human values in the preference examples. In contrast, prosocial values (animal rights and tolerance, etc.) were significantly underrepresented, thus showing an imbalance in values encoded into LLMs.
December 16, 2024 at 1:19 PM