Norman Mu
normanmu.com
Norman Mu
@normanmu.com
AI Safety @ xAI | AI robustness, PhD @ UC Berkeley | normanmu.com
Reasoning (o3-mini and R1) seems highly effective for more retrieval-bottlenecked prompts (i.e. forgetting relevant guardrails), less so for adversarial inputs/prompt injections. Definitely an exciting direction to explore further.
February 19, 2025 at 6:06 AM
Standard training techniques like good data curation, SFT -> DPO, work reasonably well, and the pass/fail nature of guardrail adherence enables the use of tricks like classifier-free guidance/contrastive decoding to further improve performance
February 19, 2025 at 6:06 AM
RealGuardrails is our new dataset to 1) evaluate system prompt robustness on realistic prompts scraped from the ChatGPT store, and 2) evaluate methods for improving open weight models like Llama 3
February 19, 2025 at 6:06 AM
I learned about how legal journals work earlier this year and have wondered if it could work for ML/AI: reviewing becomes a way for students to distinguish themselves rather than a chore
December 13, 2024 at 7:12 PM
mindboggling that bytedance is 1) suing the author for damages and sabotage and 2) keeping their name on the paper/award without retracting it
December 11, 2024 at 8:31 AM
Going off FLOP/s and power, looks like these are very roughly 3/4 of an H100?
December 3, 2024 at 8:58 PM
top loss curves do look sketchy but idk how bad this is for an RL task. bottom curves don't obviously asymptote but they also shouldn't with learning rate decay (which the open source release seems to use? github.com/google-resea...)
December 1, 2024 at 9:12 AM
my takeaway from Vighnesh's writeup: google's main argument about under-training/lack of pre-training isn't super convincing. convergence of train loss is not always desirable. early stopping can help, and their own paper shows mixed results on importance of pre-training
December 1, 2024 at 9:03 AM
I’d like to join!
January 15, 2024 at 2:55 AM