An organization OpenAI frequently partners with to probe the capabilities of its models and evaluate them for safety, Metr, suggests that it wasn’t given much time to test the company’s powerful new releases, o3 and o4-mini. In a blog post published Wednesday, Metr writes that its red teaming of o3 and o4-mini was “conducted in […]