CH.01 · Outcome0%

IN THE HUBThis case study with discussion and the related researchOPEN →

AI SYSTEMS — ACTIVE DEVELOPMENT —

Autonomous Bug-Bounty Pipeline

Catches a silent scope change within 15 minutes and starts the deep audit with no human in the loop.

by Danylo Pravda

≈ 3 min read

Read this with AI

VIEW MARKDOWN OPEN IN CHATGPT ↗OPEN IN CLAUDE ↗

AT A GLANCE

AT A GLANCEBUG-BOUNTY-HUNTING

STATUS: ACTIVE DEVELOPMENT
TIMELINE: 2026-05 — 2026-05 · 2 DAYS
LANGUAGES: Python
CATEGORY: AI SYSTEMS

PLATFORMS MONITORED

OUTCOME

First live run drove 55 hypotheses to closure and produced 3 triaged findings (one Medium, two Low), including 634 USDC.e stranded in a deployed contract.

METRICS

M.01 — PLATFORMS MONITORED

Cantina, Code4rena, Sherlock, HackerOne, one source adapter each, polled every 15 minutes

M.02 — ALERT CHANNELS

ntfy, Telegram, email, GitHub issue, Twilio voice, fire in parallel and escalate until acknowledged

M.03 — HYPOTHESES CLOSED ON FIRST ENGAGEMENT

Each struck with quoted-code rationale or confirmed by a passing Foundry proof-of-concept

M.04 — FINDINGS TRIAGED ON PLATFORM

One Medium and two Low severity, all received triage scores on Cantina

M.05 — PYTHON LINES OF CODE

Pipeline source across watchdog/, orchestrator/, and tests/, built and tested in two days

CH.01

Outcome

First live engagement: 55 hypotheses closed, 3 findings triaged on the platform. The pipeline ran end-to-end on a public Cantina program covering four Solidity contract suites. Every hypothesis was either struck with quoted-code rationale or confirmed with a passing Foundry proof-of-concept on a local blockchain fork. Three findings were submitted and received triage scores: one Medium and two Low severity. One Low finding pinned 634 USDC.e stranded in a deployed contract.

CH.02

What it does

Watches four bounty platforms and starts the deep audit the moment scope changes, with no human in the loop until findings exist. Public bug-bounty programs run on a first-to-report model. A scope change quietly added to a live program is worth more than racing the field on a fresh launch, but catching it within minutes and immediately running a deep adversarial audit requires infrastructure that does not exist off-the-shelf.

The watchdog polls Cantina, Code4rena, Sherlock, and HackerOne using their public APIs. For each program it computes a scope hash over asset names, reward tiers, and rules, so a silently-added contract fires a SCOPE CHANGED event even when the platform makes no announcement. On a trigger it fans alerts across 5 channels in parallel: ntfy push, Telegram, email (SMTP), GitHub issue (which doubles as the acknowledgment channel), and a Twilio voice call that re-dials every 15 minutes until the operator closes the issue.

The orchestrator receives the event, renders a structured workspace (scope, known-issues pulled from prior audit PDFs, attack-surface checklist), creates one git worktree per checklist block so workers never share a checkout, then spawns one supervisor per worker. Each supervisor runs claude -p in headless stream-JSON mode, scans output for completion claims, idle timeouts, API usage limits, and any attempt to submit, and responds appropriately: nag the worker back to the next unchecked item, sleep to the quota reset time and resume the same session, or flag a submission attempt loudly for the operator.

CH.03

Why it is faster than a human

It runs the work no person can keep up with: nine attack surfaces audited in parallel, hundreds of hypotheses tracked at once. The checklist driving each worker covers nine blocks: reentrancy and callback surfaces, signature replay, arithmetic and accounting invariants, oracle trust, access-control archaeology, deployment archaeology (bytecode-vs-source diffs and silent-fix analysis across git tags), cross-contract emergent behavior, invariant fuzzing, and web/app scope.

Each worker owns exactly one block in its own isolated copy of the code. Workers append every hypothesis and verdict to a live STATUS.md, and that file is how the supervisor measures progress between nags. The overseer tracks all worker processes, restarts any that crash, and writes a SWARM_STATUS.md each pass so the operator has a single dashboard for the entire swarm.

CH.04

How the operator stays in control

Nothing reaches a platform without a human clicking submit, and every proof-of-concept runs on a safe local copy of the chain. A worker that attempts to reach a submission endpoint is flagged with a .SUBMISSION-ATTEMPT-* marker and the operator is notified immediately. The operator gets a phone call that re-dials until acknowledged, reviews the drafted findings in the workspace, and decides what to submit.

The test suite enforces this guarantee with no network calls: httpx.MockTransport for all HTTP, a scripted _fake_claude.py shim that emits programmed stream-JSON output for supervisor testing, and throwaway SQLite databases under tmp_path. Zero API spend, zero network calls. The full pipeline runs offline before any real engagement.

CH.05

Speed of delivery

The whole pipeline was built and tested in two days, 19 commits, then a live engagement. 6.2k lines of Python across watchdog/, orchestrator/, and tests/. Adding a new platform requires writing exactly one file (a source adapter implementing a single fetch() method) and touching nothing else in the poll loop, notify fan-out, or worker scaffold.

20 COMMITS — IN 34 DAYS — AVG 1/DAY

Docs

38.4% 18.1K

LOG

35.4% 16.7K

Python

13.2% 6.2K

SOL

10.4% 4.9K

Shell

1.2% 554

Config

0.7% 347

OTHER

0.7% 315

SHOWING THE 350 MOST CONNECTED OF 620 NODES · 1,268 EDGES · 65 COMMUNITIES — EXTRACTED FROM THE CODEBASE BY TREE-SITTER

GRAPH

FEATURES

Scope-change detection	Hashes every program's assets, reward tiers, and rules, so a quietly-added in-scope contract fires an alert even with no platform announcement.
Five-channel alert escalation	A trigger fires ntfy, Telegram, email, a GitHub issue, and a Twilio voice call in parallel, re-dialing until the operator acknowledges.
Worker swarm with per-block ownership	Each AI worker owns one attack surface in its own git worktree, so parallel workers can never overwrite each other's work.
Supervisor nag loop	Streams each worker's output to nudge stalls back on task, sleep through API usage limits and resume the same session, and flag any submission attempt.
Structured audit workspace	Every engagement gets a generated workspace: scope, a live hypothesis tracker, known issues pulled from prior audit PDFs, and proof-of-concept and findings folders.

ARCHITECTURE

watchdog/poll.py	Main poll loop: iterates all source adapters, diffs against SQLite state, fires notify and launch on events. Supports --seed (silent first-run ingest), --loop (daemon), --dry-run.
watchdog/sources/	One file per platform: cantina.py, code4rena.py, sherlock.py, hackerone.py. Each implements Source.fetch() returning a normalized Program list. ETag-conditional requests, exponential backoff on 429/503.
watchdog/state.py	SQLite-backed state: seen programs keyed by (source, id), scope hashes, alert records with escalation timestamps, ETag cache.
watchdog/notify.py	fan_out() fires all configured channels in parallel (best-effort, so one failure never blocks others). escalate() re-calls Twilio and checks GitHub issue closed-state each poll pass.
orchestrator/launch.py	On-event scaffold: resolves scope repos, renders workspace templates, creates per-worker git worktrees, spawns the overseer detached.
orchestrator/overseer.py	Supervisor-of-supervisors: partitions checklist into blocks, ensures K worktrees, spawns one supervisor process per block, monitors and restarts crashed supervisors, writes SWARM_STATUS.md each pass.
orchestrator/supervisor.py	Per-worker nag loop: runs claude -p in stream-JSON mode, scans output for completion/idle/quota/submission signals, resumes or nags the worker forward, logs cost.log, writes .SWEEP-* and .SUBMISSION-ATTEMPT-* markers.
templates/program/	Jinja2 cookiecutter for a new programs/<slug>/ workspace: SCOPE.md, STATUS.md, CLAUDE.md, KNOWN_ISSUES.md, checklist.md (9 attack-surface blocks).

STACK

LANGPython

FXpytesthttpxJinja2Foundry (Solidity PoCs)

INFRASQLiteGitHub ActionsWindows Task SchedulerTwilioTelegram Bot APIntfy

AIClaude (claude -p headless, Max subscription)Claude Code (orchestration and development)

SKILLS DEMONSTRATED

AI agent orchestration at scale · Multi-agent swarm design with parallel isolation · Autonomous monitoring and alerting pipelines · Integrating multiple third-party platform APIs · Multi-channel escalating notifications · Crash-safe state management (SQLite) · Smart-contract security tooling (Solidity, Foundry) · Offline-testable systems (mock transports, fake process shims) · Git worktree-based parallel isolation

KEEP GOING

Take this with you.

SHAREX LinkedIn

DashboardPrefer email? Turn it on in your dashboard.