All casework§Casework No. 002

The cost of a hallucinated citation

When a lawyer cites a case that doesn't exist, the model isn't the one who loses the client.

Filed: 2026-04-08
Read: 4 min
Tags: ai · legaltech · reliability

There is a genre of news story, now maybe three years old, in which a lawyer submits a brief with citations to cases that do not exist. The cases have plausible names, plausible courts, plausible dates. They are plausible enough that the lawyer did not check. The opposing counsel checks, or the judge checks, and the story ends with the lawyer in front of a disciplinary committee explaining how ChatGPT works.

The reaction in legaltech circles is usually one of two things. The first is embarrassment on behalf of the profession: shouldn't a lawyer know to check? The second is technical dismissal: that's a solved problem now, the new models hallucinate less.

Both reactions miss the real lesson, which is this: the bar for a legal AI tool is not "hallucinates less than the average model." The bar is "cannot fabricate a citation, structurally, even if the model tries."

Those are different systems.

The reliability gap

A consumer AI tool can produce an answer that is mostly right. Mostly right is usable for brainstorming, first drafts, and rubber-ducking. A lawyer operating under client obligations cannot use mostly right. The failure mode of a fabricated citation is not that it annoys the user. The failure mode is that it ends up in a court filing, and the lawyer who signed the filing is personally responsible, and the model is not.

The gap between mostly right and cannot fabricate is roughly the gap between "training a better model" and "building a better system around the model." Every serious legal AI product I have seen lives in the second category. The interesting engineering is in the scaffolding, not the LLM.

What the scaffolding looks like

Four patterns, in increasing order of seriousness.

Grounded generation. The model never generates a citation from its own knowledge. It generates text that references a citation by ID, and the ID is looked up in a verified corpus. If the ID doesn't exist, the reference is replaced with a visible gap — not a silent removal — and the user sees the gap. The model's job is to write; the corpus's job is to be true.

Structured refusal. The prompts and the system instructions make the model refuse to produce a claim that it cannot anchor to a source. This is the opposite of the consumer instinct to be helpful. A legal AI that refuses to give you a case name, rather than inventing one, is a legal AI that respects the user's professional obligations.

Source-first interfaces. The primary user experience is not "ask a question, get an answer." It is "find the source, then draft around it." The AI is a drafting assistant working over a cited body of material the user has selected, not an oracle being asked what the law says. The user stays in the loop at the point where the law enters the draft.

Auditable output. Every generated paragraph carries, behind it, a machine-readable provenance trail: which source it's citing, which prompt produced it, which model version, which run. When a senior partner asks where did this paragraph come from, the answer is not "ChatGPT." It is a URL, a timestamp, and a reproducible prompt.

None of these patterns require a better model. All of them require a team that understands that the user is not the only person who will be held accountable for the output.

The deeper lesson

There is a category error at the heart of most legal AI conversation. The question is the model good enough yet treats the model as the product. In a serious legal tool, the model is a component. The product is the system that uses the model, and the reliability of that system is determined by the weakest link, which is almost never the model itself.

A junior developer can evaluate a model. A senior engineer builds a system that is structurally incapable of the failure the junior developer is testing for. That is a different skill, and it is the one that matters in this domain.

The goal is not to make the model stop hallucinating. The goal is to make the system unable to ship a hallucination to the user — even when the model does.

If you build a legaltech product with that goal from day one, the class of news stories about lawyers citing fake cases stops being about you. If you build it with the model is good enough now as your guiding star, sooner or later your name ends up in the article.