Meta released an agentic testing environment, Agents Research Environment, and a new benchmark called Gaia2 to measure ...
Leaked documents give a snapshot of how Scale AI contractors test and train Meta's AI. They give examples of prompts that testers should reject, like roleplaying characters in the novel "Lolita." But ...