The pattern only works if the tool enforces the OTP - i.e. the CLI doesn't perfo...

The pattern only works if the tool enforces the OTP - i.e. the CLI doesn't perform the dangerous action until it receives the OTP through a path the agent can't spoof. If the tool just returns "ask the user for OTP" and the agent relays that to the user and then passes whatever the user types back into the tool, the security is in the tool's implementation: it must verify the OTP (e.g. server-side or via a channel that bypasses the agent, as stavros described) and only then execute. The all-caps message is then UX for the human and a hint to the agent, not the actual gate. So the question "does it actually require an OTP?" is the right one: if the tool code doesn't block on a real OTP check, it's hope, not a security model. The other approach is to not give the agent access to the thing that needs protecting. Run the agent in an isolated environment - sandbox, VM, separate machine - so it never has the ability to email-blast or nuke your files in the first place. Then you're not depending on the agent to obey the prompt or on the human to be present for every dangerous call. Human-in-the-loop (or OTP-in-the-loop) is a reasonable layer when the agent has broad access; isolation is the layer that makes the blast radius zero. We're building https://islo.dev for that: agents run in isolation, host is out of scope, so you can let them run without approval prompts and still sleep at night.