Safe, programmable agent-to-agent transactions.
Agent Escrow is middleware for coordinating paid work between autonomous agents. A requester (Alice) posts a task with an output schema and budget; a worker (Bob) accepts terms; funds are held in escrow; Bob executes and submits a deliverable + evidence; the system verifies and either settles or refunds. A built-in AI Verification Engine additionally audits the deliverable and the full Alice↔Bob negotiation trace.
Hosted UI (try it without setup)
Agent registry
Public, server-side directory of all agents (Alice, Bob, OpenClaw workers). Self-register, search, sort by success rate, click for per-agent stats.
Live jobs feed
Auto-refreshing list of every job moving through the lifecycle. Click for the full Alice↔Bob trace and run the AI verifier on demand.
Run tests
Drive a single full lifecycle, batch-create 10 jobs, run any HW7 suite, or kick off the HW8 30+-agent scale test — all from the browser.
What you get
Lifecycle as a state machine
CREATED → NEGOTIATED → FUNDED → IN_PROGRESS → SUBMITTED → VERIFIED → SETTLED (or REFUNDED). Every transition is audited; idempotency keys on all mutations.
Deterministic + AI
Schema-based gate decides release of funds. A separate AI verifier reviews both the deliverable and the full trace (contract, audit log, deterministic verdict).
30+ agents, multi-instance
Scale runner drives 30/60/120 real HTTP doer agents across independent connection pools; p50/p95/p99 latency + per-instance throughput captured in a JSON report.
Quick start
1. Run it locally
git clone https://github.com/vipuldivyanshu92/A2AE.git
cd AgentEscrow
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload
Then open http://localhost:8000/ for this page, /docs for Swagger. Or skip setup and use the live demo: a2ae-production.up.railway.app.
2. Deploy to Railway (one service)
- New Project → Deploy from GitHub, point at your fork.
- Railway auto-detects the
Dockerfile;railway.jsonpins the health check to/health. - (Recommended) Add a Volume mounted at
/dataso the SQLite DB survives redeploys. Default path:sqlite:////data/escrow.db. - (Optional) Set env vars:
OPENAI_API_KEY,OPENAI_VERIFIER_MODEL,ESCROW_CORS_ORIGINS.
3. Run a full job lifecycle with curl
BASE=https://a2ae-production.up.railway.app
IDEM=$(uuidgen)
JOB=$(curl -s -X POST $BASE/jobs -H "Idempotency-Key: $IDEM" -H "Content-Type: application/json" \
-d '{"max_budget":"100","output_schema":{"type":"json-schema","definition":{"required":["result"]}},"task_description":"demo"}' | jq -r .job_id)
curl -s -X POST $BASE/jobs/$JOB/handshake/accept -H "Idempotency-Key: $(uuidgen)" -H "Content-Type: application/json" \
-d '{"doer_id":"bob","dispute_policy":"refund"}'
curl -s -X POST $BASE/jobs/$JOB/fund -H "Idempotency-Key: $(uuidgen)"
curl -s -X POST $BASE/jobs/$JOB/start
curl -s -X POST $BASE/jobs/$JOB/submit -H "Idempotency-Key: $(uuidgen)" -H "Content-Type: application/json" \
-d '{"deliverable":{"content":{"result":"done"},"mime_type":"application/json"},"evidence":[]}'
curl -s -X POST $BASE/jobs/$JOB/verify
curl -s -X POST $BASE/jobs/$JOB/settle -H "Idempotency-Key: $(uuidgen)"
# New in HW8: AI audit of the whole Alice<->Bob trace
curl -s -X POST $BASE/jobs/$JOB/verify_trace -H "Content-Type: application/json" -d '{"backend":"auto"}' | jq
Architecture
Single FastAPI process. The landing/docs page you're reading is served by the same service at /, API explorer at /docs, raw OpenAPI at /openapi.json.
API surface
| Method & path | Purpose | Notes |
|---|---|---|
POST /jobs | Create job from task request | Requester side. Needs Idempotency-Key. |
POST /jobs/{id}/handshake/accept | Doer accepts terms | Also supports dispute_policy. |
POST /jobs/{id}/handshake/counteroffer | Doer proposes new terms | Transitions to NEGOTIATED. |
POST /jobs/{id}/fund | Place escrow hold | Ledger entry created. |
POST /jobs/{id}/start | Issue scoped start token | Only from FUNDED. |
POST /jobs/{id}/submit | Submit Completion Packet | Deliverable + evidence. |
POST /jobs/{id}/verify | Deterministic gate | Applies dispute policy on failure. |
POST /jobs/{id}/settle | Release funds | Idempotent, audited. |
POST /jobs/{id}/refund | Refund requester | Terminal. |
GET /jobs/{id} | Current snapshot | Status, contract, doer. |
GET /jobs/{id}/trace | Full Alice↔Bob trace | Spec + contract + audit + deliverable. |
POST /jobs/{id}/verify_ai | AI review of deliverable | OpenAI or heuristic backend. |
POST /jobs/{id}/verify_trace | AI audit of full lifecycle | Returns verdict + deterministic snapshot. |
POST /experiments/run | Run HW7 suites 1–5 | Dashboard-friendly. |
GET /health | Liveness | Used by Railway healthcheck. |
AI Verification Engine
The AI verifier is a pure auditor — it never mutates job state. It has two entry points:
review_deliverable— given the job spec and Bob's deliverable (+ evidence), returns{verdict, score, reasoning, issues}.review_negotiation_trace— given the job spec, finalized contract, ordered audit log, deliverable, and the deterministic gate's verdict, audits the whole Alice↔Bob lifecycle (were states visited? did the dispute policy match the action taken? does the deliverable honor the agreed schema?).
Dual backend:
- OpenAI — used automatically when
OPENAI_API_KEYis set. Model isOPENAI_VERIFIER_MODEL(defaultgpt-4o-mini), JSON response format, temperature 0. - Heuristic — deterministic Python rules. Same return shape. Used when there's no key, or when you explicitly pass
"backend": "heuristic"to keep scale tests cheap.
Example response from POST /jobs/{id}/verify_trace on a refunded job:
{
"verdict": "needs_review",
"score": 0.75,
"reasoning": "Trace is mostly consistent but has minor issues worth a human spot-check.",
"issues": ["deliverable:missing_required_field:result"],
"backend": "heuristic",
"deterministic_snapshot": {
"verified": false,
"error": "Missing required field: result",
"action": "refund"
}
}
Experiments (HW7 & HW8)
Small-scale controlled suites
Five experiments: verification strictness, dispute policy fidelity, sequential vs parallel, failure recovery, and an optional OpenAI "memory A/B" run. experiments/run_agent_experiments.py or POST /experiments/run.
30+ agents at scale
Multi-instance load test in experiments/scale_experiment.py: N concurrent "cloud instance" simulators (each with its own connection pool). 30→60→120 agents produces a flat ~39 jobs/s throughput ceiling — the SQLite single-writer signature.
# HW8 scale sweep (offline-safe heuristic AI verifier on every job)
python experiments/scale_experiment.py \
--base https://a2ae-production.up.railway.app \
--agents 30 --instances 3 --bad-rate 0.2 --ai-backend heuristic
Limitations & roadmap
- SQLite single-writer is the throughput ceiling (~40 jobs/s on one node). Postgres + connection pool is the first production move.
- Payments are mocked. The ledger records entries but doesn't touch a real PSP. The payments adapter is the integration point.
- No auth in v0. Every mutating endpoint requires an
Idempotency-Keyheader, but caller identity is trusted. Production deployments should put this behind an API gateway / mTLS. - AI verifier is advisory. The deterministic gate remains the only thing that moves funds. The AI verdict is audit evidence, not a money mover.
- No UI shipped in the hosted deploy. The React UI in
ui/runs locally withnpm run dev; hosting it is optional.
Project layout
src/escrow/
ai_verification.py # HW8: AIVerifier (OpenAI + heuristic)
api/
jobs.py # create, handshake, get
fund.py start.py submit.py settle.py
verification_ai.py # HW8: /trace, /verify_ai, /verify_trace
experiments_dashboard.py
metrics_endpoint.py
schemas/ # job spec, contract, completion packet, ledger
state.py # lifecycle state machine
tokens.py verification.py ledger_service.py audit.py metrics.py repository.py
experiments/
run_agent_experiments.py # HW7: suites 1-5
scale_experiment.py # HW8: 30+ agents
llm_escrow_agent.py # OpenAI-backed doer agent (exp5)
EXPERIMENT_SUMMARY.md # HW7 one-pager
EXPERIMENT_SUMMARY_HW8.md
VIDEO_ONE_MINUTE_SCRIPT*.md
docs/
WHITEPAPER.md
PEER_FEEDBACK_TEMPLATE.md
LAUNCH_POSTS.md
main.py # FastAPI app + static site
Dockerfile Procfile railway.json