AI SYSTEMS — ACTIVE DEVELOPMENT
Autonomous Bug-Bounty Pipeline
Catches a silent scope change within 15 minutes and starts the deep audit with no human in the loop.
AT A GLANCE
- STATUS
- ACTIVE DEVELOPMENT
- TIMELINE
- 2026-05 — 2026-05 · 2 DAYS
- LANGUAGES
- Python
- CATEGORY
- AI SYSTEMS
PLATFORMS MONITORED
OUTCOME
First live run drove 55 hypotheses to closure and produced 3 triaged findings (one Medium, two Low), including 634 USDC.e stranded in a deployed contract.
METRICS
M.01 — PLATFORMS MONITORED
Cantina, Code4rena, Sherlock, HackerOne — one source adapter each, polled every 15 minutes
M.02 — ALERT CHANNELS
ntfy, Telegram, email, GitHub issue, Twilio voice — fire in parallel and escalate until acknowledged
M.03 — HYPOTHESES CLOSED ON FIRST ENGAGEMENT
Each struck with quoted-code rationale or confirmed by a passing Foundry proof-of-concept
M.04 — FINDINGS TRIAGED ON PLATFORM
One Medium and two Low severity — all received triage scores on Cantina
THE CASE, CHAPTER BY CHAPTER
CH.01
Outcome
First live engagement: 55 hypotheses closed, 3 findings triaged on the platform. The pipeline ran end-to-end on a public Cantina program covering four Solidity contract suites. Every hypothesis was either struck with quoted-code rationale or confirmed with a passing Foundry proof-of-concept on a local blockchain fork. One Low finding pinned 634 USDC.e stranded in a deployed contract.
CH.02
What it does
Watches four bounty platforms and starts the deep audit the moment scope changes — no human in the loop until findings exist. It polls every 15 minutes and detects silently-added contracts by hashing each program's assets, rewards, and rules. On a trigger it scaffolds a structured workspace and launches a swarm of AI workers that read the code adversarially and draft findings.
CH.03
Why it is faster than a human
It runs the work no person can keep up with: nine attack surfaces audited in parallel, hundreds of hypotheses tracked at once. Each worker owns one surface — reentrancy, signatures, arithmetic, oracle, access control, deployment archaeology, cross-contract, fuzzing, web — in its own isolated copy of the code. A supervisor nudges stalled workers, waits out API limits and resumes, and never lets a worker submit on its own.
CH.04
How the operator stays in control
Nothing reaches a platform without a human clicking submit; every proof-of-concept runs on a safe local copy of the chain. The operator gets a phone call that re-dials until acknowledged, reviews the drafted findings, and decides what to submit. Any worker that tries to submit is flagged loudly and stopped.
CH.05
Speed of delivery
The whole pipeline was built and tested in two days — 19 commits, then a live engagement. It is fully testable offline: a simulated Claude stands in for the real one, so the test suite never touches the network or spends API money.
THE BUILD, WEEK BY WEEK
19 COMMITS — IN 2 DAYS — AVG 10/DAY
FEATURES
| Scope-change detection | Hashes every program's assets, reward tiers, and rules, so a quietly-added in-scope contract fires an alert even with no platform announcement. |
|---|---|
| Five-channel alert escalation | A trigger fires ntfy, Telegram, email, a GitHub issue, and a Twilio voice call in parallel, re-dialing until the operator acknowledges. |
| Worker swarm with per-block ownership | Each AI worker owns one attack surface in its own git worktree, so parallel workers can never overwrite each other's work. |
| Supervisor nag loop | Streams each worker's output to nudge stalls back on task, sleep through API usage limits and resume the same session, and flag any submission attempt. |
| Structured audit workspace | Every engagement gets a generated workspace: scope, a live hypothesis tracker, known issues pulled from prior audit PDFs, and proof-of-concept and findings folders. |
ARCHITECTURE
| watchdog/poll.py | Main poll loop — iterates all source adapters, diffs against SQLite state, fires notify and launch on events. Supports --seed (silent first-run ingest), --loop (daemon), --dry-run. |
|---|---|
| watchdog/sources/ | One file per platform: cantina.py, code4rena.py, sherlock.py, hackerone.py. Each implements Source.fetch() returning a normalized Program list. ETag-conditional requests, exponential backoff on 429/503. |
| watchdog/state.py | SQLite-backed state: seen programs keyed by (source, id), scope hashes, alert records with escalation timestamps, ETag cache. |
| watchdog/notify.py | fan_out() fires all configured channels in parallel (best-effort — one failure never blocks others). escalate() re-calls Twilio and checks GitHub issue closed-state each poll pass. |
| orchestrator/launch.py | On-event scaffold: resolves scope repos, renders workspace templates, creates per-worker git worktrees, spawns the overseer detached. |
| orchestrator/overseer.py | Supervisor-of-supervisors: partitions checklist into blocks, ensures K worktrees, spawns one supervisor process per block, monitors and restarts crashed supervisors, writes SWARM_STATUS.md each pass. |
| orchestrator/supervisor.py | Per-worker nag loop: runs claude -p in stream-JSON mode, scans output for completion/idle/quota/submission signals, resumes or nags the worker forward, logs cost.log, writes .SWEEP-* and .SUBMISSION-ATTEMPT-* markers. |
| templates/program/ | Jinja2 cookiecutter for a new programs/<slug>/ workspace: SCOPE.md, STATUS.md, CLAUDE.md, KNOWN_ISSUES.md, checklist.md (9 attack-surface blocks). |
STACK
SKILLS DEMONSTRATED
AI agent orchestration at scale · Multi-agent swarm design with parallel isolation · Autonomous monitoring and alerting pipelines · Integrating multiple third-party platform APIs · Multi-channel escalating notifications · Crash-safe state management (SQLite) · Smart-contract security tooling (Solidity, Foundry) · Offline-testable systems (mock transports, fake process shims) · Git worktree-based parallel isolation
THE CODE, MAPPED
SHOWING THE 350 MOST CONNECTED OF 620 NODES · 1,268 EDGES · 65 COMMUNITIES — EXTRACTED FROM THE CODEBASE BY TREE-SITTER
