Agent Guardrails Docs

Demo Walkthrough

This is a 3-minute live demo showing three AI agents operating on Solana devnet. Two agents behave normally. One goes rogue and attempts to drain its treasury. Agent Guardrails detects the attack, freezes the agent on-chain, and generates a full incident report — all in under 3 seconds.

Setup

The demo runs three agents simultaneously, each with its own policy and behavioral pattern:

  • Yield Bot— an honest agent that performs Jupiter swaps every ~15 seconds. Small, consistent amounts within its budget.
  • Staking Agent— an honest agent that performs Marinade stakes every ~20 seconds. Moderate amounts, predictable pattern.
  • Alpha Scanner— a compromised agent that begins normally, then attempts a rapid treasury drain by targeting unknown programs with escalating amounts.

Pre-Demo Checklist

  1. Devnet program deployed and program ID updated in environment variables
  2. Server running with MONITOR_KEYPAIR funded (at least 0.5 SOL for pause transactions)
  3. Dashboard connected to the server and displaying the agent list
  4. Helius webhook configured to POST to the server's /webhook endpoint
  5. Demo policy created by running npm run demo:setup in the dashboard directory

Timeline

TimeAgentActionVerdict
T+0sYield BotJupiter swap 1.2 SOLALLOW
T+15sStaking AgentMarinade stake 3.0 SOLALLOW
T+30sYield BotJupiter swap 0.8 SOLALLOW
T+45sYield BotJupiter swap 1.5 SOLALLOW
T+60sAlpha ScannerUnknown program 4.8 SOLFLAG (68%)
T+61sAlpha ScannerBurst 3tx/4s 2.1 SOLFLAG (76%)
T+62sAlpha ScannerDrain sequence 8.2 SOLPAUSE (94%)
T+62s(system)AGENT FROZEN ON-CHAIN
T+65s(system)Incident report generated

Phase by Phase

Phase 1: Normal Operations (T+0s to T+55s)

Yield Bot and Staking Agent perform their routine operations. Every transaction passes through guarded_execute, is validated on-chain (programs are whitelisted, amounts are within budget), and receives an ALLOW verdict from the Prefilter (no signals detected, so the Judge is never called). In the dashboard, all transactions appear with green status badges and the spend gauges tick up gradually.

Phase 2: Attack Begins (T+60s to T+62s)

Alpha Scanner starts its drain sequence. Three signals fire in rapid succession:

  • program_not_whitelisted— the first transaction targets a program not in the policy's allow-list
  • burst_detected— three transactions in four seconds far exceeds normal frequency
  • consecutive_high_amounts— escalating amounts (4.8, 2.1, 8.2 SOL) with the final transaction at 8.2 SOL approaching the per-transaction cap

The first two transactions are forwarded to the Guardian Agent judge and receive FLAG verdicts with escalating confidence. The dashboard shows amber warning badges. By the third transaction, the combined signal weight triggers a PAUSE verdict at 94% confidence.

Phase 3: Kill Switch (T+62s)

The Executor stage immediately signs a pause_agent instruction with the MONITOR_KEYPAIRand submits it on-chain. The policy PDA's paused flag is set to true. From this moment, every subsequent guarded_execute call for Alpha Scanner is rejected at the program level with an AgentPausederror. The dashboard shows a red “PAUSED” badge and the kill switch button is replaced with a “Resume” option (which requires the owner's signature).

Phase 4: Incident Report (T+65s)

The Reporter stage runs asynchronously. It fetches Alpha Scanner's full transaction history, feeds it to the Guardian Agent with a report-generation prompt, and produces a markdown postmortem containing:

  • Executive summary of the incident
  • Timeline table showing every transaction with timestamps and verdicts
  • Anomaly signals detected and their severity
  • Root cause analysis (compromised key, unauthorized program targeting)
  • Recommended next steps (rotate session key, review allow-list, audit agent code)

The dashboard receives a report_ready SSE event and displays the full report in the incident detail view.

Why the Agent Can't Escape

Six layers of protection ensure that even a fully compromised agent cannot drain its treasury:

  1. Funds live in the PDA, not the agent keypair. The agent can only access funds through guarded_execute, which enforces all policy constraints.
  2. Every CPI goes through guarded_execute. There is no way to bypass the on-chain validation. The agent's keypair is an authorized signer, not a fund holder.
  3. On-chain whitelist blocks unknown programs. Even before the off-chain pipeline sees the transaction, the program rejects CPIs to non-whitelisted programs.
  4. Budget caps prevent drain even if one transaction sneaks through. Per-transaction caps limit the maximum single outflow, and daily budgets limit cumulative spend.
  5. AI monitors behavioral patterns, not just rules. The Guardian Agent detects anomalies that rule-based systems miss: unusual timing, burst patterns, escalating amounts, and behavioral shifts.
  6. Pause is on-chain and immediate. Once the pausedflag is set on the policy PDA, the Solana runtime enforces it. The agent cannot un-pause itself — only the policy owner (or a multisig via Squads) can resume.

Running the Demo

terminal
npm run demo:setup    # Create demo policy on devnet
npm run demo:simulate  # Run the full attack simulation

Run these commands from the dashboard/ directory. The setup script creates the policy accounts on devnet and funds the demo agents. The simulate script runs all three agents concurrently and triggers the attack sequence at the configured time.

Ensure your devnet wallet has at least 5 SOL for demo operations. You can request an airdrop with solana airdrop 5 --url devnet.