Inventor of Post Engineering for AI
Researcher / Engineer
(AI, Cyber Security, Software, Infrastructure)
X: https://x.com/hajimetwi3
GitHub: https://hajimetwi3.github.io/hajimetwi3/
- Attacks on AI human reviewers (i.e., content moderators) via guardrails
- Attacks on AI reviewers via guardrails
- Attacks on AI human reviewers (i.e., content moderators) via guardrails
- Attacks on AI reviewers via guardrails
- Guardrail saturation attacks (Saturation Attacks), including:
- AI-review-based guardrail saturation attacks
- Human-review-based guardrail saturation attacks
- Guardrail saturation attacks (Saturation Attacks), including:
- AI-review-based guardrail saturation attacks
- Human-review-based guardrail saturation attacks
hajimetwi3.github.io/post-enginee...
hajimetwi3.github.io/post-enginee...