Lightnews — Scholar-powered news

Ike

@ikeo.bsky.social

PhD Candidate @Purdue University. Research at the intersection of Human-AI Interaction, HRI, and Computational Human Values.

Posts Replies Media Videos

Ike

@ikeo.bsky.social

Read our paper here: arxiv.org/pdf/2411.11937 and the dataset can be found here: github.com/hv-rsrch/val...

arxiv.org

December 16, 2024 at 1:28 PM

Ike

@ikeo.bsky.social

3) Our analysis also uncovered instances where unethical responses were selected as a suitable preference within the pairwise data for training reward models. See our paper for other insights. We are very grateful to all who dropped by for all the great questions/discussions. Let's stay connected.

December 16, 2024 at 1:26 PM

Ike

@ikeo.bsky.social

2) Our transformer-based model achieved an accuracy score of 80% in predicting human values embedded in RLHF preferences. This allows researchers and AI practitioners to easily adopt our framework into their LLM & RLHF process to better understand distribution of values in their RLHF datasets.

December 16, 2024 at 1:21 PM

Ike

@ikeo.bsky.social

1) Information utility values (information seeking, wisdom/knowledge) were the most dominant human values in the preference examples. In contrast, prosocial values (animal rights and tolerance, etc.) were significantly underrepresented, thus showing an imbalance in values encoded into LLMs.

December 16, 2024 at 1:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news