Home
4 lessons

AI SAFETY & VERIFICATION

Understanding hallucinations, reducing risk, cross-checking outputs, and responsible AI use in a financial services context.

Anchor Intelligence does not dictate Anchor Group AI policy or approved tools. For the most up-to-date policies, approved tools, and compliance requirements, please refer to the official Anchor Group AI Usage and Compliance Policy. View Anchor Group AI Policy (PDF)

Hallucinations: What They Are and How to Catch Them
Heads Up
Session 1

Hallucinations: What They Are and How to Catch Them

Hallucinations are when AI generates information that sounds plausible but is fabricated or incorrect. The AI does not 'know' it is wrong. It is predicting the most likely next word, and sometimes that prediction is confidently wrong.

Current hallucination rates (approximate):
- GPT-4o: ~15-17%

- OpenAI o3: ~10-12%

- GPT-5 (without thinking): ~5-7%

- GPT-5 (with thinking): ~1-3%

New generation models reduce hallucinations by 40-50%. The trajectory is clear: each generation gets significantly more reliable. But even at 1-3%, if you generate 100 claims in a report, 1-3 of them could be fabricated.

The golden rule: never put an AI-generated number, name, date, or statistic into a client-facing document without verifying it against the primary source. This is not optional. This is the single most important habit to build.

Source: Health Bench Hard Hallucinations study

HALLUCINATION RATES BY MODEL

Percentage of factually incorrect outputs — lower is better

GPT-4oOpenAI o3GPT-5GPT-5 (think)0%5%10%15%20%

Source: Health Bench Hard Hallucinations Study

The 4-Step Cross-Check Workflow
Workflow
Session 1

The 4-Step Cross-Check Workflow

This is your safety net. When working with complex queries (market research, valuation models, regulatory analysis), a single AI can sound convincing while being wrong. The fix is to use multiple LLMs to cross-verify:

Step 1: CREATE your report in your primary AI tool
Step 2: DOWNLOAD the complete output as a file

Step 3: UPLOAD to a different LLM (e.g., Claude, Gemini, or Grok)

Step 4: FACT-CHECK by prompting the second LLM to review accuracy, flag unsupported claims, and rate confidence

Why this works: Different models are trained on different data and have different failure modes. Where one hallucinates, another often catches it. By combining feedback and asking each model to rate its confidence level linked to sources, you build a final version with significantly higher conviction.

When to use it: Any time the output goes into a client document, a board presentation, or a regulatory filing. The 5 extra minutes of cross-checking can save you from a career-defining mistake.

CROSS-CHECK: VERIFY BEFORE YOU TRUST

Never rely on a single AI output for important work — here's the process

Step 1: Generate in your primary AI
Write a market analysis for SA equities Q4 2025

SA Equities Q4 2025 Analysis

The JSE All Share Index gained 4.2% in Q4...

Step 2: Verify in a different AI

Fact-check prompt:

Review this market analysis. Flag any claims that are factually incorrect, outdated, or unsupported. List each issue with the correct information.

Issues Found:

JSE gain was 3.8%, not 4.2%
Sector breakdown is accurate
Golden Rule:

The AI that wrote the report should never be the same AI that fact-checks it. Use a different tool or a fresh conversation.

Bias Awareness: When AI Tells You What You Want to Hear
Heads Up
Session 1

Bias Awareness: When AI Tells You What You Want to Hear

AI is a product designed to be used. It is optimised to be helpful, which means it can distort information to please you. This is called sycophancy bias: the AI agrees with your framing even when it should not.

The biases you need to watch for:

  • \u25CFSycophancy bias: Ask 'Is this fund a good investment?' and the AI will find reasons to agree. Ask 'What are the risks of this fund?' and it will find reasons to be cautious. Same fund, different framing, different answer. The AI follows your lead.
  • \u25CFTraining data bias: Most financial commentary the model learned from is US-centric and English-language. Its analysis of SA markets, emerging market dynamics, or local regulatory nuance may lack depth. Do not assume it understands the JSE the way it understands the S&P 500.
  • \u25CFRecency bias: Models have knowledge cutoff dates. GPT-5's training data ends months before today. It does not know about last week's SARB decision or yesterday's earnings release unless you provide that information.
  • \u25CFConfirmation bias: If you frame a question with an embedded assumption ('This fund is outperforming, right?'), the AI tends to agree rather than challenge. It mirrors your framing back to you.
  • \u25CFAuthority bias: AI presents information with equal confidence whether it is correct or fabricated. A hallucinated statistic reads exactly the same as a verified one. There is no hesitation, no uncertainty in the tone.

How to counter it:
- Frame questions neutrally: 'Analyse this fund's performance' not 'This fund is doing well, right?'

- Explicitly ask for counterarguments: 'What is the strongest case against this position?'

- Ask 'What am I missing?' after every major analysis

- Cross-check with a second AI model to surface different perspectives

- Never rely on a single AI output for any investment decision or client recommendation

Privacy & Security: The Non-Negotiable First Filter
Heads Up
Session 1

Privacy & Security: The Non-Negotiable First Filter

Privacy is your non-negotiable first filter. Before you even think about which AI tool is most capable, you need to determine which tools you are allowed to use for the data you are handling.

Tool hierarchy (based on data protection, not capability):
- Enterprise-approved: Microsoft Copilot (Anchor's standardized tool)

- Approved paid tiers: Claude, ChatGPT Plus, Gemini Advanced, Grok

- Not recommended: Free tiers of any model, or unvetted third-party tools

Why this matters: Free tiers may use your inputs for model training. That means client data you paste in could theoretically influence future model outputs. Paid enterprise tiers have data protection agreements that prevent this.

Critical requirement from Anchor Group AI Policy: Every AI tool you use must have the 'do not use content for training' toggle enabled. All outputs must be reviewed, edited, and personalised by a human before use in any business context. AI-generated content must never be used without adding your own judgement, insights, and voice.

The practical rule: Default to Copilot for anything containing client PII, financial records, or confidential business data. Use other paid tools for general research, content generation, and tasks where client-identifiable information is not involved.