Here is the reading list:
• learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO)
• real-world LLM (Llama-3, Aya, Arena's)
• efficient LLM (MoMa, LoRA, QLoRA, LESS)
Here is the reading list:
• learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO)
• real-world LLM (Llama-3, Aya, Arena's)
• efficient LLM (MoMa, LoRA, QLoRA, LESS)