OpenAI’s Day 2 of 12 days: Reinforcement Fine-Tuning | OKKY