to my coauthors @CarstenEickhoff and @ABH878 from the Health NLP Lab Tübingen for their amazing mentorship!
🔗 Read the full paper: openreview.net/forum?id=sbm...
💻 Code & steering-vector datasets: github.com/JoschkaCBrau...
to my coauthors @CarstenEickhoff and @ABH878 from the Health NLP Lab Tübingen for their amazing mentorship!
🔗 Read the full paper: openreview.net/forum?id=sbm...
💻 Code & steering-vector datasets: github.com/JoschkaCBrau...
A hybrid approach—combining steering vectors with prompt engineering—achieves the best balance between effective control and high quality summaries at moderate steering strengths.
A hybrid approach—combining steering vectors with prompt engineering—achieves the best balance between effective control and high quality summaries at moderate steering strengths.
High steering strengths (|λ| > 2) increase control over targeted properties but significantly degrade fluency, diversity & faithfulness of generated summaries, aligning with prior findings that stronger steering degrades model performance.
High steering strengths (|λ| > 2) increase control over targeted properties but significantly degrade fluency, diversity & faithfulness of generated summaries, aligning with prior findings that stronger steering degrades model performance.
• Steering vectors effectively control topical focus, sentiment & readability (stronger λ → larger effects)
• Steering alone can’t induce toxicity in safety-tuned Llama 3; only a combination of steering and prompting yields toxic summaries
• Steering vectors effectively control topical focus, sentiment & readability (stronger λ → larger effects)
• Steering alone can’t induce toxicity in safety-tuned Llama 3; only a combination of steering and prompting yields toxic summaries
We apply steering vectors (Panickssery et al. '24) during summarization on NEWTS (Bahrainian et al. '22).
We evaluate how steering affects:
• Target properties
• Intrinsic quality (fluency & diversity)
• Extrinsic quality (faithfulness to human reference summary)
We apply steering vectors (Panickssery et al. '24) during summarization on NEWTS (Bahrainian et al. '22).
We evaluate how steering affects:
• Target properties
• Intrinsic quality (fluency & diversity)
• Extrinsic quality (faithfulness to human reference summary)
Can we adaptively control topical focus, sentiment, readability, and toxicity without degrading summary quality?
Prior work has mostly evaluated steering vectors in multiple-choice settings, reporting unreliable effect sizes (Tan et al. '24).
Can we adaptively control topical focus, sentiment, readability, and toxicity without degrading summary quality?
Prior work has mostly evaluated steering vectors in multiple-choice settings, reporting unreliable effect sizes (Tan et al. '24).