Blog

When the Auditor Asks How Your AI Got That Number

Michael Whitmire

May 12, 2026

Beyond the Pilot: What Happens When AI Meets HubSpot's General Ledger?

The Build Tax, Quantified

AI Skills: Standardizing Accounting Workflows

Featured Resources

Unlocking the Power of AI in Accounting: Opportunities and Challenges

Get accounting insights delivered directly to your inbox!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

The auditor is at the controller's desk. On her laptop is a single line item: a $2.1M revenue accrual that posted on the second working day of close. The journal entry references an AI agent the team built last quarter to draft accruals from contract data and prior-period activity.

She asks for the original record. The exact prompt the AI received. The data it pulled. The output it generated before the controller reviewed it.

The controller pulls up the chat history. The most recent entries look fine. But the entry from the night the accrual ran has been edited. Or it might have been. There's no way to tell either way. The tool allows edits and doesn't preserve a history of them. The controller knows nobody touched it. He cannot prove that to the auditor.

That moment, when you realize "we know it's fine" isn't an answer, is the one a lot of finance teams are about to live through.

Audit firms are catching up to AI in the close faster than most teams expect. What was a "future problem" in 2024 is a current-year procedure in 2026.

Auditors aren't asking whether AI is in your workflow. They're asking whether the controls around it can withstand the same scrutiny as any other system tied to financial reporting.

Updated guidance from the Committee of Sponsoring Organizations of the Treadway Commission (COSO), the body whose internal-control framework underpins SOX, now addresses generative AI directly.¹

The guidance pushes finance teams toward two requirements that are difficult for any DIY AI deployment to meet, and that translates, in practice, into two requests every auditor will ask:

"Show me what the AI saw."
"Is this the same AI that ran last quarter?"

"Show me what the AI saw."

This is the first question, and it ends a lot of conversations.

The auditor isn't asking for a summary. She's asking for a record: the actual inputs the AI received, the prompt it ran, the output it returned, and the timestamped sign-off from whoever reviewed it before posting.

That record has to be non-editable. An editable record can’t work as evidence. It's a document somebody could have changed.

In a DIY setup, this is where the architecture gives way. Conversation history can be edited, cleared, or deleted. Spreadsheet "audit tabs," where teams paste outputs and add review notes, are editable by design. Teams can revise explanations, overwrite prior versions, and clean up records without preserving what changed or when it changed.

Even if nobody alters the record, that's not the test. The test is whether anyone could have. If the answer is yes, the control is deficient.

COSO's updated guidance is specific on this point: AI-driven controls require a preserved, tamper-evident record of what the system did and who reviewed it.¹ That's not a documentation preference. That's the standard the control will be tested against.

For a public company, the consequences sit one step further down. PCAOB Auditing Standard 2201 defines a material weakness as a deficiency, or combination of deficiencies, in internal control over financial reporting that creates a reasonable possibility a material misstatement will not be prevented or detected on a timely basis.²

If the evidence behind a control can't be authenticated, the control itself becomes deficient. It doesn't take a proven misstatement; it takes the reasonable possibility that one could occur without the control detecting it.

The auditor moves from "test the control" to "test the population manually," and the efficiency case for the AI disappears into the audit fee.

"Is this the same AI that ran last quarter?"

Most teams haven't thought to prepare for the second question.

The auditor wants to know whether the system that posted the September accruals is the same system that posted the December accruals. Not the same brand, the same vendor, the same login. The same system, in the sense that the logic behind a control needs to be stable across the period under review.

For most DIY AI environments, it usually isn’t.

The underlying models behind general-purpose AI tools update on the vendor's schedule, not yours. A prompt that produced one output in January can produce a slightly different output in March because the model behind it changed. Your team didn't change anything, but the system running your control did.

That's drift, and it's the part of the architecture finance teams underestimate most. There's no notification that the model under your prompt has changed. No report or tool to explain the differences. No regression test against prior-period outputs. The close just runs, and whatever the system returns becomes the new control output.

A variance threshold that triggered review last quarter may no longer trigger review after a silent model update. An accrual classification the AI handled consistently for months may suddenly be interpreted differently, without anyone realizing the logic underneath has changed.

If the output happens to be correct, fine. If it isn't, you find out during review, and even then, you can't easily tell whether the reviewer caught a one-time error or a structural shift in how the AI is now interpreting your inputs.

Teams sometimes try to engineer around this by reusing a script the AI generated once: same script every close, no live model call. That helps with the script's reproducibility, but it doesn't solve the control problem.

The next version — needed when the chart of accounts changes, an entity is added, or a threshold shifts — gets regenerated by the same non-deterministic system, often without a preserved record of what changed between versions. The AI didn't leave the control. It moved into the change-management process, where it's even harder to test.

COSO's risk-assessment guidance pushes directly at this.¹ A control whose underlying AI can change without your knowledge isn't a stable control, and the framework expects those changes to be governed and documented before the next close runs on them.

Without version-controlled logic and a preserved record of what changed and when, the evidence simply doesn't exist.

The consequence is a re-validation exercise on every close. Not a one-time implementation cost. Recurring labor that nobody scoped for, performed against a system that can change between the test and the next reporting cycle.

That cost was never part of the original business case for building the AI in-house. But it becomes part of every close calendar going forward.

Architectural gaps, not feature gaps

By the end of the audit conversation, the auditor's conclusion isn't about a missing report or a weak review step. The team produced output. The reviewer signed off. The work, by every internal measure, was done.

What failed wasn't the work. It was the architecture the work ran on.

That's the part most teams realize too late. The instinct after a finding like this is to add more process: a longer review checklist, a screenshot policy, a quarterly memo on AI usage. None of that closes the gap.

Documentation can't make an editable record non-editable. A review checklist can't make a non-deterministic system reproducible. A change memo can't reconstruct what the model was doing on the day the accrual ran.

These aren't feature gaps that better processes can paper over. They're architectural gaps the underlying tools were never designed to address.

Here's the DIY Tax line item that never makes the budget conversation.³ The first close after deployment looks like a win. The fourth close, when an internal audit walkthrough or a current-year external test arrives, is when the bill arrives.

It comes due in audit fees, re-validation work, management-letter findings, and in the time the controller now spends explaining a control instead of running one. None of that was on the spreadsheet that justified the build.

The finance teams succeeding with AI in production are running it inside systems built for financial controls from the start: records that can't be edited after the fact, logic that's version-controlled and change-managed, outputs that can be reproduced and reviewed against a preserved evidence chain.

These are the same architectural principles finance already expects from every other system tied to financial reporting.

Auditors aren't asking finance teams to abandon AI in the close. They're asking the same questions they ask of any system that touches the financial statements:

Show me what the system did.
Show me the system didn't change underneath you.
Show me a human reviewed the right thing at the right time, and prove the record of that review is what it claims to be.

The teams that will get through their first AI-aware audit cleanly are the ones that recognize, before the auditor arrives, that the gap wasn't a missing feature.

It was the architecture underneath the AI itself.

Key Takeaways:

Audit-day failure for DIY AI doesn't look like a wrong number. It looks like a valid number, but its evidence can't be authenticated. That distinction is what determines whether a control passes or fails a test under PCAOB AS 2201.
The two questions that decide (“Show me what the AI saw,” and “Is this the same AI that ran last quarter?”) are architectural questions, not documentation questions. No review checklist, no screenshot policy, no IT handoff closes the gap.
The re-validation cost is permanent and recurring. It runs on every model update, every process change, every close cycle the AI touches. That line item never appears in the original business case for building the AI in-house. It appears in the audit fee.

Footnotes

1 Committee of Sponsoring Organizations of the Treadway Commission (COSO), "Achieving Effective Internal Control Over Generative AI," February 23, 2026. https://www.coso.org/generative-ai. Referenced directionally; framework content not reproduced.

2 Public Company Accounting Oversight Board, AS 2201: "An Audit of Internal Control Over Financial Reporting That Is Integrated with An Audit of Financial Statements," Appendix A, paragraph .A7. https://pcaobus.org/oversight/standards/auditing-standards/details/AS2201. A material weakness is a deficiency, or a combination of deficiencies, in internal control over financial reporting, such that there is a reasonable possibility that a material misstatement of the company's annual or interim financial statements will not be prevented or detected on a timely basis.

3 For a fuller treatment of the cost dynamics behind DIY AI in the close, see "The Build Tax: Why DIY AI in Accounting Costs More Than You Think" and "The Build Tax, Quantified" — both at floqast.com/blog.

Michael Whitmire

As CEO and Co-Founder, Mike leads FloQast’s corporate vision, strategy and execution. Prior to founding FloQast, he managed the accounting team at Cornerstone OnDemand, a SaaS company in Los Angeles. He began his career at Ernst & Young in Los Angeles where he performed public company audits, opening balance sheet audits, cash to GAAP restatements, compilation reviews, international reporting, merger and acquisition audits and SOX compliance testing. He holds a Bachelor’s degree in Accounting from Syracuse University.

Expand

"Show me what the AI saw."

"Is this the same AI that ran last quarter?"

Architectural gaps, not feature gaps

Footnotes

Related Blog Articles