AI Citation Verification for Researchers: How to Catch Hallucinated Sources Before Your Reviewer Does
Why AI-drafted papers are leaking fake citations into peer review — and a practical Zotero-based workflow to verify every reference in minutes, not hours.
AI Citation Verification for Researchers: How to Catch Hallucinated Sources Before Your Reviewer Does
A PhD candidate we work with sent us their literature review last winter. Thirty-two citations, neatly formatted, every entry sitting in their Zotero library. A reviewer flagged one of them — a paper that didn't exist. The DOI resolved to nothing, the authors had never co-published, the journal volume was real but the article wasn't. The candidate hadn't invented it. Their AI assistant had, six months earlier, and the citation had been in their Zotero ever since.
This is the new failure mode in academic writing. The model drafts confidently, the citation looks plausible, and unless someone opens the source PDF and checks the claim against the page, the bad reference rides through the entire workflow. Courts have seen it. Medical journals have seen it. Theses are starting to see it. AI citation verification is no longer a nice-to-have — it's the load-bearing step between an AI-assisted draft and a defensible publication.
Why Manual Verification Stops Scaling
The classic workflow is straightforward and does not survive contact with real deadlines. Open the cited PDF. Search for the keyword from the claim. Read the surrounding paragraph. Decide whether the source actually supports the claim, partially supports it, or contradicts it. Repeat thirty times.
At about five minutes per reference on a clean PDF, that's two and a half hours for a single chapter — assuming you don't get lost down a tangential argument in section 4. In practice, three things go wrong:
- Skim-and-trust under deadline. If the title looks right and the abstract matches, the citation gets a pass it didn't earn.
- Partial support coded as full support. The source says the effect was observed in mice; the draft claims it generalises to humans. The citation makes it past the eye.
- Secondary citations never get checked. "As Smith et al. (2021) note, Jones (2018) found…" — the Jones paper rarely gets opened.
Anyone who has supervised a thesis has seen all three. And anyone who has used an AI assistant to draft a literature review has produced all three without noticing.
What "Good Enough" Verification Actually Looks Like
Before recommending any tool, here's the criteria we'd hold a verification workflow to — vendor-neutral, in the order they matter:
- It reads the claim and the source PDF, not just the citation metadata. A DOI-matcher tells you the paper exists. That's not the question.
- It distinguishes supported, partially supported, and unsupported. Binary verdicts hide the most common failure mode: the half-right citation.
- It returns a confidence score and a quoted excerpt. You should be able to see why the verifier reached its verdict, not just the verdict.
- It catches model disagreement and flags it for human review. A single LLM can be confidently wrong. Two LLMs disagreeing is the signal you need to look harder.
- It keeps your sources on infrastructure you trust. For most European researchers that means FADP or GDPR data processing, not a US consumer cloud that retains your unpublished manuscript for training.
These five hold up whether you build a verifier in-house, glue one together from open-source parts, or buy something off the shelf.
The Zotero-Native Workflow We Built
We hit this problem hard enough on client work — and on our own internal writing — that we shipped a tool for it. It's called Acurio, and it sits directly inside the Zotero workflow most researchers are already running.
The flow is three steps:
- Export your DOCX with embedded Zotero citations from Word, Google Docs, or LibreOffice (the standard Zotero plugin handles this).
- Drop in your sources — the PDFs, BibTeX, and RIS files from your Zotero library. Acurio reads them in place.
- Read the report. Each citation comes back colour-coded as supported, partially supported, or unsupported, with a confidence score and a quoted excerpt from the source.
When two of the verifying models disagree about a citation, Acurio doesn't average them. It queues the disputed reference for an overnight second opinion by a different model stack and surfaces the new verdict the next morning. That's the failure-mode mitigation we'd recommend regardless of which tool you use — model disagreement is a signal, not noise.
A few details that matter for the academic audience specifically:
- Swiss data processing under FADP and GDPR. Your unpublished manuscript and your source PDFs stay on infrastructure that complies with European data law. See how it verifies Zotero citations for the data-handling specifics.
- DOCX in, DOCX out. The analysis report is a Word document you can hand to a supervisor or co-author without translating between tools.
- Works with the standard Zotero Word plugin (v6 and v7). No new reference manager to learn.
If you've already invested in Zotero — and most working researchers have — this is the lowest-friction path from "I have an AI-drafted chapter" to "I know which citations to trust."
The Cost / Time Math
The numbers we run for clients evaluating this:
| Manual baseline | Acurio | |
|---|---|---|
| Time per 30-citation chapter | ~2.5 hours | ~5 minutes of your time, plus model runtime |
| Cost | Your hourly rate × 2.5 | CHF 10/month for 20 analyses + 3 overnight second opinions |
| Catches partial-support failures | Only if you read carefully | Yes, with quoted excerpts |
| Catches fabricated DOIs | Yes if you click them | Yes |
| Audit trail for your supervisor | The PDFs on your desk | DOCX report you can hand over |
Month-to-month billing, one-click cancellation. For a single PhD chapter, the time saved pays for a year of the tool. Pricing and signup live on the Acurio site.
Limitations — What It Will Not Do
In the spirit of our other practitioner posts, here's the honest section.
It is a triage tool, not a free pass. A "supported" verdict from a multi-LLM verifier means the source plausibly backs the claim. It does not mean the source is the best source, the most current source, or the source your field expects you to cite. That judgment still belongs to you.
Non-standard citation methods are less reliable. Acurio works best with the standard Zotero Word plugin (v6/v7). Manually typed citations, Better BibTeX exports with unusual templates, and citations inserted via third-party plugins are supported but with lower confidence.
Image-only PDFs need OCR first. If your source is a scanned PDF without a text layer, run it through OCR before uploading. The verifier reads text, not pictures of text.
It does not write the citation. Acurio verifies what you (or your AI assistant) wrote. It doesn't choose sources, doesn't suggest better ones, doesn't compose the prose. That's still your job — which, frankly, is how it should be.
Where to Start Tomorrow
Pick the next chapter or manuscript you're drafting. Verify the first twenty citations manually, the way you always have. Time it. Then drop the same chapter into Acurio and compare the two reports side by side. If Acurio catches a partial-support failure or a fabricated reference that you missed, you've already justified the subscription. If it doesn't, you've at least audited your own process and can move faster on the next chapter with confidence.
The point of automated citation verification isn't to replace your judgment. It's to make sure your judgment is applied to the citations that actually deserve it.
For individual researchers, Try Acurio is the fastest path. For institutions or research groups thinking about rolling this out across a department — alongside the broader AI governance questions we cover in our ChatGPT governance checklist — book a free AI Potenzial-Check and we'll walk through it together.
The fake-citation problem isn't going away. The reviewers are getting better at spotting it. Get ahead of them.
