Anthropic Upgrades Claude AI Web Search Tools With 11% Accuracy Boost

Caroline Bishop
Feb 17, 2026 18:34

Claude’s new dynamic filtering characteristic cuts enter tokens by 24% whereas enhancing search accuracy. Opus 4.6 hits 61.6% on BrowseComp benchmark.

Anthropic has rolled out a major improve to Claude’s internet search capabilities, with the AI assistant now writing and executing code on the fly to filter search outcomes earlier than processing them. The development delivers a median 11% accuracy acquire whereas consuming 24% fewer enter tokens, in response to the corporate’s inner benchmarks.

The replace, launched alongside Claude Opus 4.6 and Sonnet 4.6, addresses a persistent problem in AI-powered internet search: context window bloat. Conventional search instruments pull total HTML information into reminiscence, a lot of it irrelevant noise that degrades response high quality and burns by tokens.

How Dynamic Filtering Works

Fairly than reasoning over uncooked HTML dumps, Claude now dynamically generates code to post-process question outcomes. The system retains related knowledge and discards the remainder earlier than something hits the context window. Consider it because the AI constructing its personal customized search scraper in real-time.

Anthropic examined the method on two trade benchmarks. On BrowseComp—which measures an agent’s potential to search out intentionally hard-to-find info throughout a number of web sites—Opus 4.6 jumped from 45.3% to 61.6% accuracy. Sonnet 4.6 climbed from 33.3% to 46.6%.

DeepsearchQA, which assessments systematic multi-step analysis with many appropriate solutions, confirmed comparable good points. Opus 4.6’s F1 rating rose from 69.8% to 77.3%, whereas Sonnet 4.6 improved from 52.6% to 59.4%.

Actual-World Validation

Quora’s Poe platform, which serves thousands and thousands of customers throughout 200+ AI fashions, has already examined the improve internally. “The mannequin behaves like an precise researcher, writing Python to parse, filter, and cross-reference outcomes fairly than reasoning over uncooked HTML in context,” mentioned Gareth Jones, the corporate’s Product and Analysis Lead. Quora discovered Opus 4.6 with dynamic filtering achieved the very best accuracy in opposition to different frontier fashions on their inner evaluations.

Token Economics Get Sophisticated

Price implications range by use case. Value-weighted tokens decreased for Sonnet 4.6 throughout each benchmarks, however truly elevated for Opus 4.6—the extra highly effective mannequin generally writes extra advanced filtering code. Anthropic recommends builders benchmark in opposition to their particular question patterns earlier than deployment.

Dynamic filtering ships enabled by default for the brand new internet search and internet fetch instruments on the Claude API. The corporate additionally graduated a number of associated instruments to basic availability: code execution sandboxes, persistent reminiscence throughout conversations, programmatic device calling, and dynamic device discovery.

For builders constructing search-heavy purposes—suppose analysis assistants, quotation verification instruments, or aggressive intelligence bots—the improve may meaningfully minimize operational prices whereas enhancing output high quality. The API documentation is stay now on Claude’s developer platform.

Picture supply: Shutterstock

Source link