AI agents: building trust via cryptographic proof

AI systems can now execute arbitrary tasks autonomously — running code, invoking external APIs, and making decisions without direct human oversight.

This creates a foundational trust problem: when an agent acts independently, how do you know the results are accurate, repeatable, and untampered with? For regulated or mission-critical environments, these questions demand concrete answers.

The EU's proposed AI Act requires traceability and tamper-evident logging for all high-risk AI systems. Yet most agent workflows still rely on standard log entries or short-lived records that can be easily forged or altered by malicious actors or faulty system components.

As one industry expert observed, AI systems can generate code faster than any team can review it, making new approaches to validating programmatically generated outputs essential.

Trust must be built on a new base-layer approach that:

Binds data and code together through cryptographic means
Ensures deterministic processing on every run
Provides an unalterable history of all actions taken

Content-addressed artifacts

Immutability is central to verifiability. All code and models an agent uses should be linked to a cryptographic hash, treating tools, skills, and prompts as content-addressed artifacts with Content IDs (CIDs).

Any modification creates a new CID, instantly breaking downstream references and making unauthorized changes immediately detectable.

An agent's full identity — including model versions, library versions, and skill definitions — can be expressed as a set of hashes or signatures, so any attempt to load a malicious code module fails immediately on hash mismatch.

ContextSubstrate puts this into practice by documenting each agent run as an immutable context package tied to a SHA-256 hash.

Every input, parameter, interim step, and output is stored in a single content-addressable bundle with a unique context URI (e.g., ctx://sha256:...).

💡

Runs can be inspected with 'ctx show' or compared with 'ctx diff'. Storing every model and tool in an immutable registry — such as an OCI Registry or IPFS — eliminates all ambiguity about what version ran at what time, and is the first concrete step toward verifiable execution.

Deterministic and reproducible inference

Content-addressing fixes what code runs; determinism ensures it produces the same result every time. Modern LLMs have traditionally been non-deterministic, but recent research shows this is not an inherent constraint:

Karvonen et al. found that using fixed random seeds and sampling parameters produced identical tokens in approximately 98% of cases across repeated runs.
EigenAI demonstrated true bit-for-bit deterministic inference on GPUs by carefully controlling the execution environment and removing all sources of non-determinism, achieving identical output byte streams on every run.

EigenAI paired this with a blockchain-style cryptographic log — encrypting and recording all requests and responses on an immutable ledger.

Verification then reduces to a simple hash comparison of the output, giving every model prediction a self-contained proof of correctness.

Where full determinism is not achievable, reproducibility commitments offer a practical alternative.

An agent declares that its results will be deterministic within an acceptable variance boundary, and a verifier can later confirm this by replaying the run with the same seed, prompt, and model configuration.

Code generation tasks tend to be fully repeatable; more variable outputs can be assessed using semantic equivalence comparisons or thresholded edit distance.

Run-time isolation and sandboxing

Reproducibility addresses the integrity of outputs; isolation constrains what an agent can do in the first place. As NVIDIA's AI Red Team notes, AI coding agents often execute commands with the user's full system privileges, vastly expanding the attack surface. A compromised or errant agent could:

Write to critical system files
Exfiltrate sensitive data
Spawn unauthorized rogue processes

The practical guidance is to treat all agent tool-calling as untrusted code execution. Key mandatory controls include:

Blocking all unapproved network egress to prevent unauthorized external connections or data exfiltration
Confining file-system writes to a designated workspace, disallowing access to sensitive paths such as ~/.zshrc or .gitconfig
Dropping root privileges and applying kernel-level isolation via secure runtimes like gVisor or Firecracker microVMs, OS sandboxing tools such as SELinux or macOS Seatbelt, or eBPF/seccomp filters

WebAssembly (Wasm) offers a compelling lightweight option: a portable bytecode sandbox with no system calls by design.

Agent code compiled to Wasm can only access explicitly granted host functions, eliminating the shared-kernel risks of traditional containers. Combined with memory and time limits, Wasm provides a powerful execution environment for generated scripts and tools.

The principle holds: autonomy should be earned through demonstrated trustworthiness, not granted by default.

Tamper-resistant logging and proof bundles

Isolation and determinism control what agents do; logging provides accountability for what they did. Standard logs lack cryptographic linkage, meaning entries can be removed or altered without detection.

A better solution is an append-only, Merkle-chain audit trail where each log entry's hash is chained to the previous one — any deletion or modification breaks the chain immediately.

Zhou et al.'s Verifiable Interaction Ledger takes this further: every agent-tool transaction is both hashed and bilaterally signed by two parties, meaning no entry can be secretly added or modified.

💡

Compared to traditional telemetry, the key advantage is that neither the agent nor the host needs to be trusted — the cryptographic structure enforces integrity independently.

Conclusion: toward a trustworthy agent ecosystem

Verifiable execution applies established techniques — content hashing, reproducible builds, and sandbox confinement — to LLM agents, creating a multi-layered trust framework where:

Agents are tied to specific code sets via digitally signed certificates
Models run deterministically under fixed random seed conditions
Every step occurs within a hardened, isolated sandbox
All interactions are recorded in a tamper-evident hash chain

The result is full auditability: any party can replay the sequence of hashes and verify that an agent's actions were consistent with the original intent and declared policy.

The momentum behind this approach is real.

Academic work — including the VET and Genupixel frameworks — has formally characterised chainable verification. Commercial SDKs are beginning to emerge, and regulatory pressure from the EU AI Act is pushing organizations to demonstrate tamper-resistant logs and reproducibility for high-risk AI uses.

The black-box era of agentic AI is coming to an end. It will be replaced by a paradigm where every autonomous decision carries a verifiable proof of integrity — from content-addressed code to digitally signed audit trails.

As AI agents take on more of our digital work, this verification layer will be the essential safeguard against error, manipulation, and loss of confidence.

Verifiable execution for AI agents