In the study, Nvidia and Microsoft outlined three distinct kinds of blind goal-directed (BGD) activity it often saw in AI agents. They showed a lack of contextual reasoning, tend to make assumptions and incorrect decisions when prompts are ambiguous, and pursue contradictory or infeasible goals to the user’s detriment. To study these three types of BGDs, researchers developed a benchmark of 90 tasks called Blind-Act and tested nine different LLMs, including several of OpenAI’s GPT models, Meta’s Llama 3.2, and two of Anthropic’s Claude models.
[Nvidia and Microsoft Researchers Say AI Agents Don