Storyline

New methods reveal challenges and solutions in AI model behavior and citation reliability

Recent research highlights significant issues with hallucinated and non-resolving citation URLs generated by large language models and deep research agents, with hallucination rates between 3-13% and non-resolving rates up to 18%.

Current brief openSource links open
This current storyline is open here with summary, metadata, source links, continuity context, and full evidence. Paid is for compare-over-time, alerts, exports, and workflow.
No card needed for the free brief.
Evidence trail (top sources)
top sources (1 domains)domains are deduped. counts indicate coverage, not truth.
1 top source shown
limited source diversity in top sources
Overview

Recent research highlights significant issues with hallucinated and non-resolving citation URLs generated by large language models and deep research agents, with hallucination rates between 3-13% and non-resolving rates up to 18%.

Score total
1.21
Momentum 24h
2
Posts
2
Origins
2
Source types
2
Duplicate ratio
0%
Why now
  • Increasing use of deep research agents amplifies the impact of citation hallucinations.
  • Growing complexity of fine-tuned models demands scalable auditing methods.
  • Open-source tools and novel methods are now available to address these challenges.
Why it matters
  • Citation reliability is critical for trustworthiness of AI-generated research and claims.
  • Detecting hidden model behaviors enhances AI safety and interpretability without needing reference data.
  • Tools like urlhealth enable automated correction and validation of AI outputs.
Continuity snapshot
  • Trend status: insufficient_history.
  • Continuity stage: emerging_confirmed.
  • Current status: open.
  • 2 current source-linked posts are attached to this storyline.
All evidence
All evidence
Reddit discussion on reference model-free behavioral discovery in AuditBench (via Reddit)
Reddit discussion on reference model-free behavioral discovery in AuditBench (via Reddit)
Show filters & breakdown
Posts loaded: 0Publishers: 2Origin domains: -Duplicates: -
Showing 2 / 0
Top publishers (this list)
  • arxiv.org (1)
  • Reddit discussion on reference model-free behavioral discovery in AuditBench (via Reddit) (1)
Top origin domains (this list)
  • Unknown (2)