Christina
christinabaek.bsky.social
Christina
@christinabaek.bsky.social
PhD at CMU / robust ML
I’m imagining a simpler setup where words are each a single token long and examples are each a random list of 15 words. If pretrained models already encode the notion of offensive, I bet one iteration of DPO with the right hyperparameter can solve this task.
November 22, 2024 at 7:41 PM