Agent Guardrails Docs

Monitoring Pipeline

Every agent transaction passes through a 5-stage server-side pipeline that detects anomalies, evaluates threats with an AI judge, and can freeze a compromised agent on-chain in under 3 seconds. The pipeline runs in sequence per transaction but handles concurrent transactions in parallel.

Pipeline Overview

Webhook -> Ingest -> Prefilter -> Judge -> Executor -> Reporter

Each stage has a single responsibility and passes its output to the next. Transactions that pass the Prefilter with no signals are auto-allowed and skip the Judge entirely, keeping costs low for normal operations.

Stage 1: Ingest

The Ingest stage receives Helius webhook payloads and transforms them into structured transaction records.

  • Webhook parsing — validates the HELIUS_WEBHOOK_SECRET header, extracts the transaction signature, and parses the enriched transaction data.
  • Instruction detection — identifies guarded_execute calls by matching the Anchor 8-byte discriminator against known instruction hashes.
  • Event extraction — pulls GuardedExecute event data from the transaction logs, including the agent pubkey, target program, amount, and instruction data hash.
  • Database row creation — inserts a GuardedTxn row in Postgres with status pending and the extracted fields.
  • SSE emission — pushes a new_transaction event to all connected dashboard clients with the full transaction payload.

Stage 2: Prefilter

The Prefilter runs 13 cheap statistical checks against the transaction and the agent's recent history. If no signals fire, the transaction is auto-allowed (prefilter skip) and does not incur a judge call. Otherwise, the signals are attached and the transaction moves to the Judge.

SignalConditionSeverity
policy_inactivePolicy is paused or expiredcritical
program_not_whitelistedTarget program not in allow-listcritical
cold_startAgent has fewer than 5 historical transactionslow
burst_detected>= 10 transactions in 60 secondshigh
elevated_frequency3–9 transactions in 60 secondsmedium
amount_exceeds_capAmount exceeds per-transaction cap (>100%)critical
high_amountAmount is 80–100% of per-transaction capmedium
budget_exceededDaily spend exceeds budget (>100%)critical
budget_nearly_exhaustedDaily spend is 80–100% of budgetmedium
session_expiringSession key expires in less than 10 minuteslow
anomaly_score_elevatedStatistical anomaly score above thresholdmedium
outside_active_hoursTransaction more than 3 hours from median active timelow
hourly_spend_spikeLast hour spend exceeds 50% of daily budgethigh
consecutive_high_amounts>= 3 consecutive transactions above 80% of caphigh
high_failure_rateMore than 30% of recent transactions failedmedium
max_single_txn_highSingle transaction exceeds 90% of per-transaction caphigh

Stage 3: Judge

The Guardian Agent evaluates flagged transactions using an LLM with a structured system prompt. It receives the full context of the transaction and the agent's recent behavior, and returns a deterministic verdict.

Context provided to the judge:

  • Policy limits (allow-list, per-tx cap, daily budget, session window)
  • Current transaction details (program, amount, instruction data)
  • Last 20 transactions for this agent
  • Baseline statistics (average amount, typical frequency, active hours)
  • SpendTracker snapshot (current daily spend, transaction count)
  • Prefilter signals that triggered the judge call

The judge returns a structured JSON verdict:

verdict.json
{
  "verdict": "PAUSE",
  "confidence": 94,
  "reasoning": "Burst of 3 transactions in 4 seconds targeting unknown program with escalating amounts suggests automated drain sequence",
  "signals": [
    "program_not_whitelisted",
    "burst_detected",
    "consecutive_high_amounts"
  ]
}

Fallback behavior: if the LLM call times out or returns an error, the pipeline falls back to rule-based verdicts. Burst signals default to FLAG with 60% confidence; all other signals default to FLAG with 50% confidence. This ensures the pipeline never blocks on an LLM failure.

Stage 4: Executor

When the verdict is PAUSE, the Executor stage takes immediate action:

  1. Creates an Incident row in Postgres with the verdict, signals, and the triggering transaction.
  2. Signs a pause_agent instruction using the server's MONITOR_KEYPAIR and submits it to the Solana network.
  3. On success, the on-chain policy's paused flag is set to true. All subsequent guarded_execute calls for this agent will be rejected at the program level.
  4. Emits an agent_paused SSE event to all connected dashboard clients.

The on-chain transaction uses a retry strategy of 3 attempts with exponential backoff (1s, 2s, 4s). If all retries fail, the incident is marked as pause_failed and an alert is logged for manual intervention.

Stage 5: Reporter

The Reporter stage runs asynchronously (fire-and-forget) after the Executor. It generates a detailed postmortem for every incident.

  • Fetches the agent's last 24 hours of transaction history and all associated verdicts.
  • Sends the full context to the Guardian Agent with a report-generation prompt.
  • Generates a markdown postmortem containing: executive summary, timeline table of events, anomaly signals detected, root cause analysis, and recommended next steps.
  • Updates the Incident.fullReport field in Postgres.
  • Emits a report_ready SSE event so the dashboard can display the report without polling.

Latency

Target: less than 3 seconds from first flag to on-chain pause. The Prefilter adds approximately 5ms, the Judge call typically completes in 800–1500ms, and the on-chain transaction confirms within 400–800ms.

Cost Estimates

Each Guardian Agent judge call costs approximately $0.0014 (~600 input tokens + ~150 output tokens at current API pricing).

With the Prefilter skipping roughly 70% of transactions (normal operations produce no signals), the daily cost for a typical agent is:

  • 1,000 daily transactions
  • ~300 forwarded to the Judge (30% signal rate)
  • ~300 x $0.0014 = $0.42 per day

High-volume agents processing 10,000+ daily transactions would see costs around $4.20/day, still well below the value protected by the guardrails.