Storyline

New architectures and benchmarks advance large language model agent reasoning and evaluation

Recent research introduces innovative architectures and benchmarks to improve large language model (LLM) agents' reasoning efficiency and reliability.

Evidence locked

Today's free sample is only available for the edition's flagship storyline.

Back Unlock Pro

Evidence preview

arXiv cs.CL RSS
arxiv.org