What the EU AI Act, SOX and HIPAA actually demand from an AI agent audit log – the exact fields, retention windows, the attribution trap, and a schema you can ship.
- What an AI agent audit log must actually capture
- What the EU AI Act, SOX and HIPAA each require
- The attribution problem: why your agent audit log lies by default
- A concrete AI agent audit log schema you can ship
- Retention, storage tiers and tamper-evidence
- Builder’s take
- Frequently asked questions
- What is an AI agent audit log?
- Does the EU AI Act require specific audit log fields for AI agents?
- How long must I retain AI agent audit logs?
- Why can’t I just use my existing service account to log agent actions?
- Has the EU AI Act high-risk logging deadline changed for 2026?
- How do I make agent audit logs tamper-evident?
- Primary sources
What an AI agent audit log must actually capture
A compliance-grade AI agent audit log must capture five things that ordinary application logs leave out: the full decision context, every tool call with its parameters and response, the policy-evaluation record, the data-flow lineage, and every human-intervention point. A line that says agent_42 called send_payment is not an audit trail – it is a breadcrumb. When a regulator or your own incident team asks why the agent did that, you need to reconstruct the state that produced the decision, not just the side effect.
This matters because agents differ from deterministic software in one critical way: they decide at runtime. The same prompt and tools can produce different actions on different runs, so the audit record has to preserve the inputs that drove the branch. Help Net Security‘s April 2026 analysis of agent logging makes the point bluntly – most current systems record that something happened but not the governance applied in the process, and PII that entered context through a tool call often cannot be traced back to its source.
The table below maps each required field to the question it answers in an investigation. Treat the right-hand column as your acceptance test: if you cannot answer the question from the log alone, the field is incomplete.

A simple gate before you ship: take any single completed agent run, hand only its audit records to an engineer who did not build the system, and ask them to explain in plain language why the agent took each external action. If they cannot, your decision-context or policy-evaluation capture has a hole – fix it before the field gives you a false sense of coverage.
| Field group | What it contains | Investigation question it answers |
|---|---|---|
| Decision context | System prompt, accumulated message history, model + version, retrieved documents, reasoning/plan state | Why did the agent choose this action over the alternatives? |
| Tool calls | Tool name, full input parameters, raw response, latency, success/error | What did the agent actually do to external systems, and what came back? |
| Policy evaluation | Which policy fired, the inputs evaluated, the allow/deny verdict, the rule version | What governance was applied, and did anything get blocked or escalated? |
| Data-flow lineage | Source of each data element, transformations, where it ended up (output, downstream tool) | Can you trace this PII from the user request to the response that exposed it? |
| Human intervention | Approval/override events, reviewer identity, timestamp, decision rationale | Who was in the loop, and did a human approve, modify, or halt the action? |
What the EU AI Act, SOX and HIPAA each require
No single regulation hands you a field list – you derive the schema by stacking three sources: the EU AI Act’s automatic-logging and human-oversight mandate, SOX’s roughly seven-year audit-document retention for financial records, and HIPAA’s six-year retention plus per-access audit controls. Where they overlap, the strictest rule wins; where they diverge, you log the union.
The EU AI Act is the most agent-specific. Article 12 requires high-risk systems to technically allow for the automatic recording of events over the lifetime of the system – automatic, not manual, and across the whole lifetime rather than just the current release. Article 12(2) frames the purpose around three goals: spotting risk situations or substantial modifications, supporting post-market monitoring, and enabling deployer operational monitoring. As Help Net Security notes, the Act deliberately does not prescribe specific fields or a format – only those purposes – which means the schema is your responsibility to design and defend. Article 14 adds the human-oversight dimension: systems must let humans monitor, interpret, intervene in, and halt the AI, which is precisely why human-intervention points belong in the log.
One timing nuance you must get right in 2026: under the Digital Omnibus political agreement reached on 7 May 2026, the high-risk obligations for Annex III standalone systems were provisionally deferred from August 2026 to 2 December 2027, and Annex I embedded systems to 2 August 2028 (Gibson Dunn). That agreement is provisional – it only takes legal effect on formal adoption and Official Journal publication – and Article 50 transparency obligations still bite on 2 August 2026. Build to the original requirements now; the delay buys preparation time, not a pass.
SOX and HIPAA supply the retention spine. SOX practice points to retaining audit documentation for at least seven years for financial-reporting records. HIPAA’s §164.316 requires six years from creation or last-effective date, and §164.312(b) audit controls require logging every access event – view, create, modify, delete, export – with user identity, timestamp, and action (IS Partners). A publicly traded healthcare company subject to both defaults to the longer SOX window.
The 7 May 2026 Omnibus agreement is provisional until published in the Official Journal, and retention interpretations vary by sector and member state. Treat the dates and year-counts here as a planning baseline, confirm against the enacted text and your regulators, and design retention to be reconfigurable rather than hard-coded.
| Regime | Core logging hook | Retention floor | Agent-specific catch |
|---|---|---|---|
| EU AI Act | Art. 12 automatic event logging over system lifetime; Art. 14 human oversight | Min. 6 months for auto-generated logs (Art. 26(6)); align with sector rules | No prescribed fields – you design and justify the schema |
| SOX | Audit trail for financial reporting + executive accountability | ~7 years for audit documentation | Agent actions affecting financial records need attributable identity |
| HIPAA | §164.312(b) audit controls on every ePHI access | 6 years (§164.316) from creation/last-effective | Data-flow lineage must trace ePHI through tool calls |
The attribution problem: why your agent audit log lies by default
The most dangerous defect in an AI agent audit log is attribution: agents typically inherit a shared service-account credential, so the log records the service account’s identity, not which agent – or which user – actually drove the action. If three agents share one credential and one is hijacked by prompt injection, you have lost attribution for all three. Your beautiful seven-year archive becomes seven years of unattributable events.
The fix is to carry identity end to end rather than collapsing it at the credential boundary. AI agents are a subclass of non-human identity, but a stricter one – they chain tool calls and can be steered at runtime – so a static API key shared across a fleet is exactly the wrong primitive. The current best practice is cryptographic workload identity (SPIFFE/SPIRE-issued SVIDs or OIDC-federated tokens) to identify the agent, combined with delegated authority to capture the human on whose behalf it acts.
For the human delegation chain, OAuth 2.0 Token Exchange (RFC 8693) is the mechanism that survives an audit. The act claim records the acting party while preserving the original subject, so the token itself encodes user U authorized agent A to act on their behalf. Your log then records both. The honest caveat: multi-hop delegation – Agent A spawns B which calls C – is where this strains. RFC 8693 can nest act claims, but prior-actor claims in those chains are informational only, so for deep chains you must log each hop’s delegation explicitly rather than trusting the token to carry the whole history.
Every audit record should answer four identity questions without joining to another system: which human initiated the workflow, which agent (with cryptographic identity) executed the step, under what delegated scope, and which tool/service received the call. If any answer is ‘the shared service account’, you have an attribution gap to close before retention even matters.
“A seven-year archive of actions you cannot attribute to a specific agent or user is not compliance evidence – it is seven years of plausible deniability for whoever broke in.”
Surya Koritala, founder of Cyntr
A concrete AI agent audit log schema you can ship
A workable schema is a single structured event per agent step that nests the five required field groups, aligns its names with the OpenTelemetry GenAI semantic conventions, and carries its own integrity hash. Aligning with OTel matters because it is the emerging standard: agent identity maps to gen_ai.agent.id, the operation to gen_ai.operation.name (values like invoke_agent or execute_tool), and tool data flows through gen_ai.tool.name and the input/output message attributes (OpenTelemetry). Note one default that bites compliance teams: OTel instrumentations are told not to capture prompt or completion content by default, to avoid PII leaks – so content capture for your decision context is an explicit, governed opt-in, not something you get for free.
The record below is intentionally verbose. Each top-level key corresponds to a field group from the first section, every external effect is attributable, the policy verdict is first-class rather than a buried debug line, and integrity.prev_hash chains records so a deletion or edit breaks the chain visibly. The retention_class field lets your storage tier route the same event to both a 90-day hot index and a 7-year immutable archive.
{
"schema_version": "1.0",
"event_id": "01J9Z8...ULID",
"timestamp": "2026-05-30T14:22:07.412Z",
"retention_class": "compliance-7y", // also indexed to debug-90d
"identity": { // attribution, not a shared account
"agent_id": "gen_ai.agent.id=billing-agent-7",
"agent_credential": "spiffe://corp/ns/agents/billing-agent",
"on_behalf_of_user": "u_88421", // RFC 8693 subject
"delegation_act_chain": ["u_88421", "orchestrator", "billing-agent-7"],
"session_id": "gen_ai.conversation.id=conv_5f1"
},
"operation": "gen_ai.operation.name=execute_tool",
"decision_context": { // explicit opt-in content capture
"model": "frontier-model-x-2026-04",
"system_prompt_hash": "sha256:9af3...", // hash if prompt is sensitive
"input_messages_ref": "s3://ctx/conv_5f1/step12.jsonl",
"plan_step": "Issue refund for disputed charge #DC-7781"
},
"tool_call": {
"tool_name": "gen_ai.tool.name=issue_refund",
"parameters": { "charge_id": "DC-7781", "amount_cents": 4200, "currency": "USD" },
"response": { "status": "ok", "refund_id": "rf_0091" },
"latency_ms": 318
},
"policy_evaluation": { // governance is first-class
"engine": "opa/rego",
"rule": "refunds.under_5000_auto_approve",
"rule_version": "2026-05-12",
"inputs": { "amount_cents": 4200, "user_tier": "verified" },
"verdict": "allow",
"escalated": false
},
"data_lineage": [ // trace PII to its source
{ "element": "charge_id", "source": "tool:lookup_charge", "sink": "tool:issue_refund" }
],
"human_intervention": { // Art. 14 oversight points
"required": false,
"reviewer_id": null,
"action": null // approve | modify | halt
},
"integrity": { // signed outside agent control
"prev_hash": "sha256:1c0b...",
"record_hash": "sha256:7e44...",
"signed_by": "audit-signer-kms"
}
}
Step 1: Emit from the orchestration layer, not the model SDK
Hook the audit emit into your agent runtime’s tool-dispatch and policy-check path – the one place that sees the model decision, the policy verdict, and the tool result together. Emitting from the model SDK alone gives you tokens in and tokens out but loses the governance and data-flow context. In Cyntr we treat every tool dispatch as a transaction boundary that produces exactly one audit record.Step 2: Stamp attributable identity onto every record
Resolve the agent’s cryptographic identity (SPIFFE SVID or OIDC token) and the delegated user from the RFC 8693 token at dispatch time, and write both into theidentity block. Never let the record fall back to the shared service account. For multi-hop chains, write each hop explicitly rather than trusting nested act claims to be authoritative.Step 3: Capture content deliberately, with PII controls
Because OTel and good privacy practice default to not logging prompt/completion content, make decision-context capture an explicit, policy-gated choice. Where the raw context is sensitive, store a hash plus a reference to a separately access-controlled blob, so you can prove what was in context without scattering PII across your log index.Step 4: Hash-chain and sign outside the agent’s reach
Computerecord_hash over the canonicalized record, include the previous record’s hash, and sign with a key the agent process cannot access (a KMS or a sidecar signer). This is what converts logs from ‘editable application output’ into tamper-evident evidence – the property Help Net Security flags as the difference between an audit trail and a liability.Step 5: Route to two retention tiers from one event
Useretention_class to fan the same event into a hot store (30-90 days, full content, fast queries for debugging) and a cold immutable store (6-7+ years, often hashed/compacted, WORM-backed for SOX/HIPAA). One emit, two lifecycles – so debugging speed and legal retention never fight over the same bucket.Retention, storage tiers and tamper-evidence
6 months
EU AI Act minimum
Floor for retaining automatically generated high-risk logs (Art. 26(6)); sector rules often require far longer
6 years
HIPAA retention
§164.316, from creation or last-effective date – whichever is later
~7 years
SOX audit docs
Common practice for financial-reporting audit documentation; strictest rule wins when regimes overlap
30-90 days
Hot debug tier
Typical internal window for full-content, fast-query operational logs before tiering to cold storage
Build the agent audit log as evidence, not telemetry
Split retention into two tiers driven by purpose: a 30-90 day hot tier for debugging and incident response, and a 6-year-to-7-year-plus cold, immutable tier for external compliance. The hot tier optimizes for query speed and full content so engineers can reconstruct a bad run today; the cold tier optimizes for immutability and cost so you can satisfy a SOX or HIPAA request years later. Conflating them is how teams end up either paying hot-storage prices for a seven-year archive or letting a 90-day debug TTL silently delete legally required evidence.
Tamper-evidence is the property that gives the cold tier its value. Article 12 does not explicitly mandate tamper-proofing, but as Help Net Security argues, if your logs can be silently altered and you cannot show otherwise, their evidentiary value is zero. The practical pattern is hash-chaining each record to its predecessor and writing to write-once-read-many (WORM) or append-only storage, with signing keys held outside the agent’s blast radius. HIPAA’s audit-control and integrity expectations push in the same direction.
One more decision teams underestimate: retention is also a deletion obligation. The EU AI Act sets a six-month floor for automatically generated logs, but data-protection rules can require you to purge or minimize personal data sooner than your seven-year financial-records clock. Keeping the structured audit metadata (who, what, verdict, hashes) for the full compliance window while expiring the raw content blob earlier is usually the way to satisfy both – which is exactly why the schema separates content references from the record itself.
Builder’s take
I run Cyntr, an agent orchestration runtime, and the single most expensive mistake I see teams make is treating the agent audit log as an afterthought you bolt on the week before an audit. By then the decision context is gone – you logged that a tool fired, but not the reasoning state, the policy verdict, or who the action was really on behalf of. You cannot reconstruct what you never captured.
- Emit the audit record from the orchestration layer, not the model SDK. The runtime is the only place that sees the full decision context, every tool call, the policy verdict, and the human-intervention point in one coherent event.
- Solve attribution before you solve retention. If five agents share one service account, your seven-year archive is just five years of ‘something happened’ – useless in an investigation. Wire per-action delegation (RFC 8693 act claims) first.
- Sign records outside the agent’s blast radius. A log the agent can silently rewrite has zero evidentiary value. Hash-chain entries and push signatures to an append-only store the agent cannot reach.
- Split your retention: 30-90 days hot for debugging, 6 years to 7+ years cold and immutable for compliance. Do not pay hot-storage prices to satisfy SOX, and do not let a debug TTL quietly delete your legal evidence.
Frequently asked questions
What is an AI agent audit log?
An AI agent audit log is a structured, tamper-evident record of everything an autonomous agent did and why. Unlike ordinary application logs, a compliance-grade agent audit log captures the full decision context, every tool call with parameters and responses, the policy-evaluation verdict, data-flow lineage, and any human-intervention points – enough to reconstruct and defend each action years later.
Does the EU AI Act require specific audit log fields for AI agents?
No. EU AI Act Article 12 requires high-risk systems to automatically record events over the system’s lifetime for three purposes – risk detection, post-market monitoring, and deployer operational monitoring – but it deliberately does not prescribe a format or specific fields. You design the schema and must be able to justify that it serves those purposes, which is why teams derive fields by stacking the Act with SOX, HIPAA, and sector rules.
How long must I retain AI agent audit logs?
It depends on which regimes apply. The EU AI Act sets a six-month floor for automatically generated high-risk logs, HIPAA requires six years under §164.316, and SOX practice points to roughly seven years for financial-reporting audit documentation. When multiple regimes apply, the strictest retention period generally controls, so many teams default to seven-plus years in immutable cold storage.
Why can’t I just use my existing service account to log agent actions?
Because a shared service account destroys attribution. If several agents share one credential, the log records the account, not which agent or user actually drove the action – so a single compromised agent makes every action under that account unattributable. The fix is cryptographic workload identity (such as SPIFFE/SPIRE) for the agent plus OAuth 2.0 Token Exchange act claims to record the human on whose behalf it acted.
Has the EU AI Act high-risk logging deadline changed for 2026?
Yes, provisionally. A Digital Omnibus political agreement reached on 7 May 2026 deferred Annex III standalone high-risk obligations from August 2026 to 2 December 2027, and Annex I embedded systems to 2 August 2028. The change is provisional until published in the Official Journal, and Article 50 transparency obligations still apply from 2 August 2026, so teams should keep building to the original requirements.
How do I make agent audit logs tamper-evident?
Hash-chain each record to the previous one and write to append-only or write-once-read-many (WORM) storage, with signing keys held outside the agent’s process so a compromised agent cannot rewrite history. Article 12 does not explicitly mandate this, but logs that can be silently altered have little evidentiary value – so cryptographic integrity is what turns telemetry into defensible audit evidence.
Primary sources
- Article 12: Record-Keeping — EU Artificial Intelligence Act (artificialintelligenceact.eu)
- EU AI Act Omnibus Agreement – Postponed High-Risk Deadlines — Gibson Dunn
- What the EU AI Act requires for AI agent logging — Help Net Security
- Semantic Conventions for GenAI agent and framework spans — OpenTelemetry
- On-Behalf-Of authentication for AI agents: scoped, auditable delegation — Scalekit
- Should HIPAA Audit Logs be Kept for 6 Years? — IS Partners
Last updated: May 31, 2026. Related: Identity Provenance.