When your AI is confidently wrong
You asked a serious question. The answer came back fluent, organized, and confident. Then you checked one detail — and it was wrong. Or the citation didn't exist. Or two paragraphs quietly contradicted each other.
The unsettling part isn't the error. It's that the wrong answer sounded exactly like the right ones do. Once you've seen that, you can't unsee it: every confident answer is now a question mark, and you're back to checking everything by hand — which is the work you were trying to save.
Why this keeps happening
AI language models are built to produce plausible text, not verified conclusions. The confidence in the prose is a writing style, not a measurement. The model doesn't track where each statement came from, doesn't weigh whether the source was any good, and doesn't tell you when the honest answer is "the evidence is mixed." When sources disagree, it tends to pick one and narrate it smoothly — and you can't see that a choice was even made.
No prompt fixes this, because the problem isn't the prompt. The model has no ledger of evidence behind its words.
What you actually need
Not an AI that sounds more careful — an AI that's accountable to a ledger:
- Every claim attributed to a named source, with a stated reliability.
- Contradictions surfaced and shown, not smoothed over.
- Confidence as a number derived from the evidence — including an honest "uncertain."
- A trail you can walk from the conclusion back to who said what, when.
What Arbiter does about it
Housecarl Arbiter is a reasoning engine that sits alongside your AI. Connect it to Claude, ChatGPT, or any assistant that supports MCP, and when your question involves real sources and real stakes, the assistant hands the evidence to Arbiter instead of improvising. Arbiter weighs each source by reliability and recency, refuses to double-count repetition, keeps disagreements visible, and returns a verdict with the reasoning on the record.
Arbiter doesn't make the model smarter — it makes the answer checkable. Fabricated confidence has nowhere to hide when every conclusion has to trace back to a source you can see. And when the evidence genuinely doesn't settle the question, Arbiter says so, with a number, instead of guessing in a confident voice.
See it on a real case: a full fraud investigation, worked end to end.
Common questions
Does Arbiter stop my AI from hallucinating?
It doesn't change the model — nothing does, from the outside. What it changes is what you act on. Verdicts come from sources you can see and weights you can inspect, not from the model's prose. If a claim isn't backed by a source in the ledger, it isn't in the verdict.
Which AI tools does it work with?
Claude (web, mobile, Desktop, Claude Code), ChatGPT, Cursor, Windsurf — anything that speaks the Model Context Protocol — plus a plain REST API for your own software. The five-minute Claude setup needs no code and works on Claude's free plan.
Do I have to structure the evidence myself?
No. Ask your assistant in plain language and it extracts the sources and claims for you. If you're building your own pipeline, the extraction prompt automates it.
What does it cost?
It starts free. Paid plans run from about thirty to a few hundred dollars a month depending on volume — current numbers on the pricing page. If a wrong answer has ever cost you a day of rework — or a decision — the arithmetic is short.
Make your AI show its work. Connect Arbiter to Claude in five minutes, or open the console — free, no card required.