Fact-Checking AI: How to Tell When ChatGPT Is Making Things Up

If you’ve used ChatGPT for more than five minutes, you’ve probably seen it: a confident answer that feels right… and then you check a source and it’s just not true. Wrong dates. Fake citations. Invented product features. Sometimes it’s subtle, like a statistic that’s off by an order of magnitude, and sometimes it’s hilariously bold, like citing a “2019 Harvard study” that does not exist.

People call this “hallucination,” which is a nice, fuzzy word for something we should be blunt about: the model is guessing. It’s predicting plausible text, not looking up the truth.

This post is a field guide for fact-checking AI outputs. Not in a doom-y way. More like: if you want to use ChatGPT for real tasks (tech writing, research summaries, customer emails, code, policy notes), here’s how to tell when it’s making stuff up and what to do about it.

Why ChatGPT makes things up (in plain English)

Large language models are trained to predict the next token (word-ish chunk) given context. During training, they learn patterns that correlate with “good answers” in the data, but they’re not inherently connected to a database of verified facts.

So when you ask, “What’s the latest NH housing bill about?” the model tries to generate something that sounds like an answer someone might write. If the training data is incomplete, outdated, or contradictory, it will still produce output. Silence isn’t its default mode.

A few common drivers of hallucination:

Missing context: You didn’t specify the jurisdiction, timeframe, or definition. The model fills in gaps.
Ambiguous questions: “Is X safe?” Safe for what? In what dose? In what environment?
Pressure to be helpful: Many prompts implicitly reward confident completion. “Give me 10 sources” becomes “invent 10 sources.”
Out-of-distribution requests: Niche local details, very new events, private company info—things that aren’t in the model’s training.

The key mindset shift: treat the model like a talented intern who writes quickly and speaks confidently, and sometimes totally whiffs it.

The biggest red flags (a quick checklist)

When I’m scanning an AI answer, I look for these warning signs:

Specific numbers with no provenance
- “A 37% improvement” or “$12.4B market size” with no source is a big blinking light.
Citations that look real but don’t behave like real citations
- Generic journal titles (“International Journal of Advanced AI Research”), missing DOI, no author list, vague dates.
Overly neat timelines
- If it gives crisp historical sequences for complex events without any uncertainty, be suspicious.
Name-dropping that doesn’t match your memory
- Wrong agency names, mis-titled bills, “Senator So-and-so” in the wrong state.
Inconsistent details across paragraphs
- It says Model A is open-source, then later says it’s proprietary.
Confidence language with zero qualifiers
- “It is proven that…” “Researchers agree…” “Always/never…” Those are usually tells.
Phantom features in tools and APIs
- Especially in fast-moving ecosystems. If it claims an endpoint exists, double-check the docs.

You don’t need all seven. Two is enough for me to flip into verification mode.

A practical workflow: the “Claim → Source → Confirm” loop

Here’s a workflow we’ve used in meetups and in day jobs. It’s simple on purpose.

Step 1: Extract the atomic claims

Take the answer and break it into checkable pieces. Atomic claims are facts you can verify without interpretation.

Example AI output: “New Hampshire passed HB 123 in 2023 requiring all public schools to teach AI literacy, funded by a $5M grant.”

Atomic claims:

NH passed HB 123
It happened in 2023
It requires public schools to teach AI literacy
There is a $5M grant funding it

Now you know what to verify. Also you’ll often realize: wow, this is a lot of claims packed into one sentence.

Step 2: Ask the model for sources and a confidence rating per claim

This is a good prompt pattern:

“List each factual claim you made as bullet points. For each, provide: (1) your confidence (high/medium/low), (2) what kind of source would verify it, and (3) any assumptions you made.”

Two nice things happen:

The model sometimes admits it was guessing.
It tells you what to look for (bill text, agency report, academic paper), which speeds up your checking.

Don’t treat its “confidence” as truth. Treat it as triage. Still helpful.

Step 3: Verify using primary sources first

For anything that matters—legal, medical, financial, safety—go primary:

Laws/regulations: official state/federal legislature sites, government PDFs, register notices.
Academic claims: the actual paper, not a blog summary. Look for DOI, author list, publication venue.
Product features: vendor docs, release notes, GitHub repos.
Statistics: original dataset (BLS, Census, CDC), methodology notes.

Secondary sources (news articles, blog posts) are useful, but they sometimes repeat errors, and the model can easily mirror that same error.

Step 4: Cross-check with a second independent source

One source can be wrong, outdated, or misread. A second source reduces your chance of repeating a mistake.

A simple rule: if it’s a number, a quote, or a legal requirement, I want two sources. If you can’t find two, say so.

Step 5: Rewrite the final answer with citations and uncertainty

This is the part people skip. Don’t just correct the error silently—fix the style of the output so it doesn’t pretend certainty.

Good:

“As of Jan 2026, the NH Legislature site shows…”
“I couldn’t confirm the $5M figure; sources disagree (X says…, Y says…).”

That’s not “weak writing.” It’s honest writing.

Fast fact-checking prompts you can steal

When you’re in a hurry, these help.

1) Force it to separate facts from suggestions

“Split your answer into two sections: (A) Verified facts you are confident are correct, (B) Things that are likely but unverified, (C) Open questions.”

2) Ask for falsifiers (my favorite)

“What evidence would prove your answer wrong? List 5 specific falsifiable checks.”

3) Source-first mode

“Before answering, ask me 3 clarifying questions. Then provide an answer with citations. If you can’t cite, say ‘no reliable source found’.”

4) Quote audit

“Highlight any quotes you included. For each quote, give an exact URL and the surrounding paragraph for context.”

If it can’t, treat the quote as fabricated until proven otherwise.

A mini tutorial: catching fake citations in 90 seconds

Fake citations have patterns. Here’s a quick audit you can do without being a librarian.

Search the exact paper title in quotes
- If nothing comes up except AI-generated pages or low-quality scrape sites, that’s a bad sign.
Search author + key phrase
- Real papers leave trails: Google Scholar, publisher pages, university profiles.
Check the journal/conference
- Does it exist? Is it credible? Predatory journals are a whole other mess.
Look for DOI format
- Many legitimate papers have DOIs, and the DOI resolves. If the model gives a DOI that doesn’t resolve, nope.
Watch for “citation salad”
- The model mashes together real author names with fake titles and real venues. That’s the sneaky kind.

Using AI safely: what it’s great for vs. risky for

We don’t have to treat ChatGPT like a liar all the time. It’s genuinely useful, just in the right roles.

Great for (low risk, high value):

Brainstorming approaches
Summarizing text you provide (meeting notes, a paper you paste in)
Generating templates (checklists, email drafts, SQL skeletons)
Explaining concepts at different levels (with your review)

Risky for (needs verification):

Legal requirements and compliance
Medical or safety guidance
Financial/tax advice
Anything with precise numbers, dates, or quotes
“What’s new” in a fast-moving product/API

If you’re a business leader reading this: the practical policy is not “ban AI,” it’s “require citations and human review for high-stakes outputs.” That’s it. That’s the whole trick.

A simple team norm for NH AI Meetup folks (and your workplace)

Try this rule when you’re sharing AI-generated info in Slack or in a doc:

If it’s a fact that could embarrass you later, attach a link.

And if you can’t find a link:

Mark it as unverified.

People think this slows teams down. It doesn’t. It prevents those painful meetings where everyone’s arguing about a number that came from… nowhere.

Closing thought: treat fluency as a risk signal

The weirdest part of modern AI is that the best-sounding answers can be the most dangerous. Fluency creates trust, and trust makes us lazy about checking.

So yeah, use ChatGPT. Use it a lot. Just keep the habit: extract claims, demand sources, verify the important bits, and rewrite with honesty. If we do that, we get the speed without the faceplants.

If you want to practice this live, bring a suspicious AI answer to the next NH AI Meetup and we’ll do a group “hallucination audit.” It’s fun in a nerdy way, and you’ll leave with a sharper bullshit detector. Guaranteed-ish.