Get accounting insights delivered directly to your inbox!
"Where do these AI agents actually live, and who owns the review process when an auditor comes to test the control?"
Controllers and CFOs ask me this constantly. Nobody has a clean answer — not because it's technically complicated, but because most of these deployments weren't built with audit season in mind.
There's a gap between an AI agent that produces the right output and one that you can defend to an auditor. The first builds in a sprint. The second can show what the AI saw, what logic it ran, and prove that none of it has changed since the last close.
COSO released new guidance on generative AI and internal controls in February 2026.[1] For teams running DIY AI agents in accounting workflows, the guidance creates two requirements that are difficult to meet without purpose-built infrastructure:
In practice, those two requirements translate into three questions an auditor will ask. Audit firms are catching up faster than most finance teams expect. Auditors are starting to ask about AI use in the close as a current-year test. If you built for efficiency and didn't build for evidence, that's the gap they'll find.
When an auditor arrives to test an AI-driven accrual or journal entry workflow, the first thing they want is a record: what data did the model receive, what logic did it apply, and who reviewed the output before it was posted? That record needs to be non-editable. An editable record isn't evidence of a control; it's a document someone could have modified.
This is where most DIY setups come apart. General-purpose AI tools store session history that can be cleared, modified, or deleted. Spreadsheet "audit tabs" (where teams paste AI outputs and add annotations) are editable by design. There's no mechanism preventing someone from cleaning up the record before the auditor arrives. Even if nobody does, there's no way to prove nobody did. That's exactly what auditors are trained to look for.
COSO's 2026 guidance is specific on this point. Effective monitoring of AI-driven processes requires a complete audit trail — capturing prompts, inputs, outputs, model and configuration versions, and evidence of human review — sufficient to reconstruct what the AI acted on and show that the control functioned as designed.[1] For public company accountants, this matters beyond best practice: a control that can't demonstrate this linkage may not survive PCAOB AS 2201 scrutiny, which defines a material weakness as a deficiency that creates a reasonable possibility that a material misstatement of the company's annual or interim financial statements will not be prevented or detected on a timely basis.[2]
Audit-ready AI keeps the record locked, timestamped, tied to the specific input, and attached to whoever signed off. That's what makes it evidence.
This is the question that catches most teams off guard.
General-purpose AI models drift. A prompt that worked in January may behave differently in March because the underlying model has been updated. Your team changed nothing, but the system running your controls did. In most DIY environments, that happens without notification or testing. The close just runs on a different system than the one you validated.
COSO's risk assessment guidance requires organizations to manage this kind of model change.[1] The goal is to know when these models change, verify that the change didn't affect your outputs, and document that the control still holds before the next close runs. Without version-controlled logic, that documentation doesn't exist. Without it, you can't show that the workflow is a stable, governed control.
The non-determinism problem makes this harder. Unlike deterministic code, general-purpose AI can return different outputs for identical inputs. You can't prove completeness and accuracy on something you can't reproduce. This turns every close cycle into a re-validation exercise. Not a one-time build cost, but a recurring operations burden.
Process changes compound it. A new entity or a chart-of-accounts update typically means regenerating the script in a DIY build, which reopens every auditability and determinism question from scratch. Finance teams in enterprises with shared AI infrastructure often find their budgets deprioritized when consumption spikes across the business. The re-validation labor on each change is the cost that never appears in the original business case, and one that my previous post in this series didn't capture in its TCO model.
The workaround most teams land on when the compliance burden gets uncomfortable: hand the script to IT.
If IT manages the code, does the audit obligation go with it?
It doesn't. IT can own the code, but they cannot tell you whether a model update changed how your revenue accruals are being calculated. Compliance responsibility remains with the finance organization because it is accountable for the accuracy of the financial statements, regardless of who maintains the tooling.
The COSO requirements don't transfer with the code. Version control documentation has to exist. Change management has to produce audit evidence. Re-validation after any model or process change requires someone who can evaluate whether the accounting output is actually correct — not just whether the code ran without errors. Moving the script to IT shifts maintenance, but accountability stays put.
It also recreates the key-person dependency problem in a different department. IT is now maintaining AI logic they didn't write, for accounting workflows they don't perform, with no clear owner for the correctness of the output. When something goes wrong during the close (and something always goes wrong during close), neither team has clear authority over the fix.
COSO doesn't issue certifications.s But the controls that make AI-driven accounting workflows auditable consistently turn the two requirements above into the same architectural pattern. Here's how auditors test them:
Purpose-built platforms include these by design. DIY approaches leave teams adding compliance after the fact, typically using the same manual processes that created the audit gap in the first place.
The deterministic execution row is worth pausing on. Effective AI controls in accounting don't call a live model every time the workflow runs. The AI generates the logic at build time; once validated, that logic runs as deterministic code. Same inputs, same outputs, every cycle, tied to a specific approved version. That's reproducibility in an audit context, and it's exactly what auditors are testing when they pull the record.
Keeping that property intact as models change is far more of an accounting problem than a technical one. Deciding whether a change to the underlying AI affects the correctness of your accruals requires someone who understands both the model and the workflow. Most DIY builds don't account for that combination. That's what determines whether the controls hold after the next model update, not just at launch.
When the controls are in place, the audit conversation changes. The Controller produces a versioned record that includes what logic ran, when it was validated, and who signed off. The auditor closes the test and moves on.
Key Takeaways: