nlpdaily.bsky.social
@nlpdaily.bsky.social
Rationales from the I-->OR models. At last, they use human evaluation. Read more about their results here: aclanthology.org/2022.emnlp-m...
aclanthology.org
February 4, 2025 at 9:21 PM
T5 on e-SNLI and FLUTE. To evaluate the performance of their dataset, they compute the average of BERT score and BLEURT score. They also compute rationale quality for which they first train IR-->O and I-->O for e-SNLI and FLUTE and then compute the test accuracy for both models using predicted
⬇️
February 4, 2025 at 9:20 PM