Latest Best ML Training & Fine-Tuning Blogs Articles

Why SFT isn't enough and how DPO and GRPO fix it

6.50/10 Medium LLM Fine-Tuning and Alignment
πŸ”§ DPO (Direct Preference Optimization), GRPO (Group Relative Policy Optimization), PPO (Proximal Policy Optimization), LoRA, QLoRA, vLLM, SGLang, LMDeploy