NVIDIA just made something very real.

With NemoClaw, the conversation around AI agents shifted. We are no longer only talking about whether agents matter. We are now talking about how to run them safely in the real world.

That is a major change.

Sandboxed execution. Guarded inference. Policy-aware routing. Controlled environments. These are the kinds of things that have to exist before agents can be trusted with meaningful work.

NemoClaw pushes that world forward.

But the moment secure execution becomes real, another question shows up right behind it:

How do you prove what the agent actually did?

That is the problem we have been building around at Substr8.


The gap after the sandbox

A secure runtime answers one class of question.

It tells you:

  • what the agent was allowed to do
  • what environment it ran inside
  • what policies constrained it
  • which systems, models, or networks it could reach

Those are important questions.

But they are not the same as:

  • what actually happened during the run
  • which tools were invoked
  • what the results were
  • what changed
  • whether the evidence can be verified independently

That is a different layer.

And we think it is going to become unavoidable.

Because once agents move beyond chat and into real execution, every serious team will eventually hit the same wall:

“It ran in a secure environment” is not the same as “here is the receipt.”


The missing layer is proof

At Substr8, we call that layer RunProof.

The simplest way to explain it is this:

NemoClaw secures execution. RunProof proves execution.

That distinction matters.

A sandbox helps constrain an agent. A proof helps explain and verify its behavior.

What we think agents need is not just observability, and not just logs hidden in a vendor dashboard.

They need portable, cryptographically verifiable receipts.

Something that says:

  • this is what the runtime saw
  • this is what the agent did
  • this is the chain of events
  • this is the root hash
  • this is the signature
  • this is how to verify it

Not trust by screenshot. Not trust by dashboard. Trust by verification.


What we just shipped

We set ourselves a very specific goal:

Could we generate a native proof from an OpenClaw-based runtime running inside NemoClaw / OpenShell?

Not a mockup. Not a one-off wrapper pretending to be a product. A real proof path.

The answer is now yes.

We built a native plugin path that captures execution events from inside the runtime and forwards them into RunProof.

That gives us a real flow:

plugin capture → RunProof API → receipt generation → verification

That matters because it turns the proof layer from an idea into a working system attached to live execution.


What the current proof path actually captures

One of the more interesting things we learned is that not all runtime surfaces are equal.

In embedded CLI mode, some hooks fire and some do not.

What we can capture today, natively, is what we call execution-proof.

That includes:

  • prompt/environment build
  • tool invocation
  • tool result capture
  • run completion
  • receipt generation
  • verification

What embedded mode does not yet give us is the full message envelope. The inbound and outbound message hooks only appear when the interaction moves through a gateway or channel path.

So we are naming that difference clearly.

Execution-proof

Available in embedded CLI mode and gateway/channel mode.

This proves the agent did work:

  • tools invoked
  • results returned
  • execution context captured
  • receipt finalized
  • proof verified

Full-run-proof

Available via gateway/channel mode.

This adds:

  • input message envelope
  • output message envelope
  • richer conversational lineage

We think it is better to be precise about that distinction than to blur it.


Why this matters now

NemoClaw is important because it validates the secure runtime category.

But that is not the end of the story. It is the beginning of a new one.

Once agents are running in controlled environments, the next wave of questions becomes inevitable:

  • what did the agent actually do?
  • what was the exact chain of events?
  • what policy governed it?
  • can the evidence be inspected later?
  • can someone outside the platform verify it?

That is why we think the stack is splitting into distinct layers:

  1. secure runtime
  2. policy enforcement
  3. portable proof
  4. independent verification

The industry is moving quickly on the first two.

RunProof is our contribution to the next two.


What this taught us architecturally

There is a deeper product lesson here too.

The proof layer should not be hardwired into one runtime or one deployment shape. It should sit beside runtimes in a clean, composable way.

The architecture that is emerging for us looks like this:

  • a thin runtime plugin inside the execution environment
  • a canonical observation envelope emitted from that runtime
  • an external RunProof service that turns those observations into proof artifacts

That separation matters.

The runtime should observe and emit. The proof layer should canonicalize, hash, sign, verify, and later compose.

That is how you avoid building a one-off integration. That is how you start building a proof standard.

The adapter was the proof-of-concept.

The native plugin path is the platform play.


What is live today

We have now packaged the native proof path enough that it is usable and visible.

The docs are live. The proof path is documented. The verify endpoint is documented. The golden path is defined.

Today, that golden path is:

install plugin → run demo → manually finalize → verify

There is one clear caveat:

In embedded CLI mode, finalization is manual in v1 until an automatic terminal trigger is stabilized.

That is not a proof validity problem. It is a lifecycle/UX problem. The proof loop itself is already closed.


What comes next

The next immediate milestone is full-run-proof v1 through the gateway/channel path.

That will add:

  • input message envelope
  • output message envelope
  • richer run fidelity
  • cleaner automatic finalization

After that, we move into the broader protocol surface we have been designing toward:

  • inter-agent handoff proofs
  • DAG composition
  • state transition proofs
  • append-only ledger continuity

But the order matters.

You do not build a real protocol by starting with the graph.

You close one proof loop honestly. Then you compose. Then you scale the proof model outward.

That is where we are now.


The bigger point

NemoClaw helps make secure execution real.

That is a big deal.

But if agents are going to do real work, secure execution alone is not enough.

They need evidence. They need receipts. They need a way to explain what happened without asking everyone to trust the platform blindly.

That is the lane we are in at Substr8.

NemoClaw secures execution. RunProof proves it.

And we think that distinction is only going to get more important from here.


If AI agents are going to do real work, they need receipts.