Sitemap

Sonnet 4.5 is now SOTA on GAIA

1 min readOct 1, 2025
Press enter or click to view image in full size

Sonnet 4.5 is now SOTA on GAIA with 74.6% overall accuracy using the HAL Generalist Agent scaffold.

GAIA is a benchmark that stresses general AI assistant skills like browsing, tool use, multi-step reasoning, and multimodality.

Sonnet 4.5 beats Opus 4.1 and GPT-5 Medium while being cheaper to run.

What is even more impressive is how that performance is achieved. Most models tend to shine in level 1 tasks (bread-and-butter reasoning). In contrast, Sonnet 4.5 stays comparatively even, from level 1 to level 3 tasks:

  • Level 1: 81 % (bread-and-butter reasoning)
  • Level 2: 72 % (integrating tools, moderately complex)
  • Level 3: 69 % (the really hard stuff: multi-hop reasoning + tool choreography)

This suggests Sonnet 4.5 is better at tool-conditioned reasoning and long-horizon problem solving.

Check out the full leaderboard:

--

--

FS Ndzomga
FS Ndzomga

Written by FS Ndzomga

Engineer passionate about data science, startups, philosophy and French literature. Built lycee.ai, discute.co and rimbaud.ai . Open for consulting gigs

Responses (1)