A hybrid approach—combining steering vectors with prompt engineering—achieves the best balance between effective control and high quality summaries at moderate steering strengths.
A hybrid approach—combining steering vectors with prompt engineering—achieves the best balance between effective control and high quality summaries at moderate steering strengths.
High steering strengths (|λ| > 2) increase control over targeted properties but significantly degrade fluency, diversity & faithfulness of generated summaries, aligning with prior findings that stronger steering degrades model performance.
High steering strengths (|λ| > 2) increase control over targeted properties but significantly degrade fluency, diversity & faithfulness of generated summaries, aligning with prior findings that stronger steering degrades model performance.
• Steering vectors effectively control topical focus, sentiment & readability (stronger λ → larger effects)
• Steering alone can’t induce toxicity in safety-tuned Llama 3; only a combination of steering and prompting yields toxic summaries
• Steering vectors effectively control topical focus, sentiment & readability (stronger λ → larger effects)
• Steering alone can’t induce toxicity in safety-tuned Llama 3; only a combination of steering and prompting yields toxic summaries