Incremental intelligence: The case for AI as an intern, not a partner
When it comes to AI, accuracy matters and leaders should make their AI models 'earn' privileges before they allow them help make decisions
If you hire a graduate, you don't start them out working on merger strategy or a regulator briefing. You give them bounded work, check everything they touch and then expand scope slowly and only after they prove they're reliable. AI deserves the same treatment within modern firms.
Why an intern, not a partner?
AI is phenomenal at work involving volume - such as reading at scale, summarising and cross-referencing - but it still guesses. That's fine for a first draft; but not fine for decisions that carry licence, safety, reputational or market risk.
In August, the Australian Financial Review reported that a government-commissioned report had cited non-existent sources and an apparently invented legal quote - errors that were later clearly linked to the use of generative AI. The public lesson is pretty clear: if you can't prove accuracy and review, you will bear the consequences.
A roadmap for incremental progress (and what constitutes promotion)
Treating automation as an accelerant, not a substitute, is the line smart teams are drawing. What's more, they're setting a 'probation period' in which scope is limited, outputs are tested and misses are catalogued.
Within this mindset, AI is managed in the same way as a talented junior and promotion gates are not bureaucracy; they're protection that helps ensure it's fast where speed helps, cautious where judgment matters and auditable all the way through.
Keep in mind that unlike talented juniors, who generally become more reliable as they learn, AI systems can sometimes 'backslide'. A change in data, an upgrade, or a shift in context can unexpectedly and rapidly degrade performance. This means AI requires ongoing review and revalidation, not just initial training.
Stage 1: Data extraction (read, label, file)
When you start relying on AI, start with narrow, checkable work and make sure it can prove accuracy and reliability. Add functionality slowly and make sure you set quality markers that should be cleared before you ask it to do more.
Start by giving the system a narrow brief: read large volumes of material, pull out the facts, classify and tag them, and possibly build useful registers. The wins here include speed and consistency - big piles of content become structured data so people can spend time on judgment, not hunting for fields.
When it earns a step up: Only after it consistently hits a high accuracy bar (think mid-90s on your own held-out test set). Recheck every month for drift and switch off auto-publish if the accuracy score drops by more than a couple of points. Make every value traceable back to a source snippet and if you can't show where a field came from, you haven't saved time - you've taken on risk. Regulators care about evidence, not velocity.
Stage 2: Pattern spotting (from summaries to signals)
Once your tool's extraction is dependable, you can allow it to do something smarter: group similar complaints, flag inconsistent language, compare changes across markets and surface anomalies in KPIs or control data. This is all about early warning and triage so the right issues get human attention immediately.
When it earns a step up: Only after reasonable period of back-testing shows those signals line up with outcomes that matter - such as losses and churn. Importantly, you should regularly try to break the system on purpose. Feed in exclusions, clashing rules and note where it stumbles so you can add even more guardrails. Keep a light human review in the loop and track how often specialists agree with the system's call because in complex domains, outputs can look fine while missing something important. Your goal is transparency you can explain and defend – an important loop in the governance process.
Stage 3: Predictive alerts (signals to next-best actions)
At this point the system can start nudging action: 'This supplier trend looks risky' or 'This control may fail', 'This filing date is at risk' or 'Re-price this client now'. Done well, this can shorten reaction time and make escalation more consistent.
Keep it honest: Maintain a decision log for each alert - inputs, rationale and the human decision taken. Set a 'noise budget' so that false alarms don't swamp the team - and if more than roughly one in ten alerts are being dismissed, dial the sensitivity down. Review outcomes quarterly: did alerts reduce incidents, cost or time to resolution? If not, you should scale them back. Alerts change behaviour and behaviour carries risk, so you should be able to explain why you acted - or chose not to - based on what the system flagged.
The overriding principle should be: if you can't defend it in a review, don't ship it.
What to do on Monday
If you've already progressed your AI model through the 'ranks', and wondering how you can quickly set clear boundaries and add simple checks that will keep you audit-ready, consider these general guidelines:
Firm up your scope. Write down what your AI can do today, what it cannot do and who owns the decision when it's wrong.
Show your work. Add sources, rationale and reviewer stamps to every AI-informed output.
Run regular drills. Every quarter, try to break your riskiest AI-managed workflows and share what you learned with the team.
Grant privileges incrementally. Once the model meets the bar for 90 days, allow it to do a bit more - but if quality dips, pull it back.
AI will make you faster, but speed without evidence is tomorrow's problem; and sometimes tomorrow's headline. Treat it like a smart intern: let it help, make it earn your trust and promote it only if it can pass real tests, and record what it did and why. That's how you keep the upside of automation without handing your reputation over to a black box.