CH.01 · The problem0%

IN THE HUBThis case study with discussion and the related researchOPEN →

CLIENT WORK — ACTIVE DEVELOPMENT —

AI Image Compositor for CRE Marketing

Turns 2–3 days of manual AI image editing into one supervised batch, sharper every project.

by Danylo Pravda

≈ 3 min read

Read this with AI

VIEW MARKDOWN OPEN IN CHATGPT ↗OPEN IN CLAUDE ↗

AT A GLANCE

AT A GLANCETROPHY-AI-IMAGE

STATUS: ACTIVE DEVELOPMENT
TIMELINE: 2026-06 — 2026-06 · 7 DAYS
LANGUAGES: TypeScript / Python
CATEGORY: CLIENT WORK

COMMITS IN 7 DAYS

OUTCOME

That 2–3 days now fits in one supervised sitting, prompt know-how is captured for reuse, and a real end-to-end run cost $0.235 against OpenAI's live image model.

AI Image Compositor for CRE Marketing cover

METRICS

M.01 — COMMITS IN 7 DAYS

Full compositor built from contracts to canvas in a single week. Deliberately window-scoped: the project has 764 commits today, and this measures the first seven days.

M.02 — TYPESCRIPT SOURCE LINES

Core engine, contracts, providers, and UI (TypeScript only, .ts files). The project kept going after the week the commit figure above measures, so this is current rather than week-one.

M.03 — TSX UI LINES

React Flow canvas, mask editors, review surface. Current, like the .ts figure above.

M.04 — REAL SPEND TO VALIDATE THE FULL PIPELINE

One end-to-end run on live gpt-image-2 proved the pipeline, spend cap, and retry loop

2–3 days → 1 sitting

M.05 — IMAGE WORK COMPRESSED PER PROJECT

Manual AI image work that consumed 2–3 days now fits one supervised batch run

CH.01

The problem

A NYC commercial real estate marketing agency was losing 2–3 days per project to image editing done by hand, one picture at a time.

Every office-building website needs photo work before launch: empty offices staged with furniture, scaffolding removed from exteriors, rough architect renderings polished, people added at ground level for scale. The client was doing all of it manually, 5–10 prompt iterations per image in the OpenAI playground, with no batch, no variant history, and no record of what worked. Every project started from scratch.

The output quality from gpt-image-2 was already proven. The bottleneck was the system around it: manual, slow, and incapable of compounding. Each new project inherited nothing from the last.

CH.02

What it does

A node-based image compositor batches the entire edit queue in one supervised sitting instead of grinding through each image by hand.

The architecture a Nuke or Photoshop compositor would use, built as a local-first TypeScript application for exactly this domain. Three parts work together:

A headless execution engine validates and executes typed node graphs. Every node is a function, and the graph is the provenance record. The engine enforces a hard spend cap, writes an append-only run journal before every paid API call, and resumes mid-batch after a crash without re-billing completed work.

A React Flow canvas UI lets the operator wire nodes visually (Image, Generate, Mask, Merge, Export) then run the whole project at once. A Konva mask editor lets the operator paint exactly which regions the model should touch. A review surface shows source versus result side-by-side with swipe compare and star ratings.

An offline Claude layer reads the client brief and photos and authors the entire pipeline itself: analyzing each image's content, selecting the right generation preset from the learned rulebook, injecting the building's positioning clause into prompts, and writing out a ready-to-run batch job. The operator opens it on the canvas, reviews, and runs rather than building from scratch. After each session, Claude reads the operator's ratings and writes distilled lessons back to a living rulebook, so the next project's first attempt starts sharper.

CH.03

Pixel-identical output

The AI changes only the masked region. Every other pixel comes back byte-for-byte identical to the source.

The Merge step registers the AI edit back onto the original using computer-vision feature matching (ORB/AKAZE) and composites through the mask in one pass. A TypeScript 4-DOF solver handles the transform math. opencv-js proved incompatible with the test harness, so the CV pipeline ships as an out-of-process Python sidecar. A confidence valve degrades gracefully to manual placement rather than forcing a bad alignment.

This solves the core failure of masked AI edits: the untouched region distorting. The unmasked region is not "close to identical." It is byte-identical, because the composite-back step never re-encodes regions the model didn't touch.

CH.04

Safe to run unattended

A hard spend cap plus an append-only journal means a crash mid-batch resumes with zero re-billing.

Before every paid API call, the engine writes the request to disk. On restart it skips completed work via idempotency keys. Three replay tiers keep the full pipeline testable without any API spend: ReplayStrict runs the entire path byte-identical in CI with no key, ReplayResume handles crash recovery, and Record mode captures live fixtures for later use.

An opt-in full-auto tier can generate, judge, and retry on its own, bounded by a dollar cap, a time limit, and a kill switch, but human taste stays in control by default. The operator opts in to automated accept only when a prompt pattern has already earned it through rated sessions.

CH.05

Built and proven

409 commits in one focused build week, validated end-to-end on live OpenAI for $0.235.

The full edit path (source, Generate, Mask, Merge, Export, with the unmasked region byte-identical) ran against real gpt-image-2, confirming the spend cap and retry loop in production for a fraction of a dollar. The learning harvester aggregates ratings across projects into a JSONL corpus that feeds the rulebook distillation step, so the compounding is real: each project improves the baseline for the next.

The codebase is a five-package TypeScript monorepo (contracts / engine / providers / app / ui) plus a Python computer-vision sidecar: 39k lines of TypeScript across the core engine, contracts, and providers, and 12.6k lines of TSX for the React Flow canvas, mask editors, and review surface. Contracts define every data boundary as Zod schemas with emitted JSON Schemas for cross-language use. Keyless replay paths stay green in CI without ever spending.

763 COMMITS — IN 55 DAYS — AVG 14/DAY

TypeScript

58.1% 55.2K

Docs

23.0% 21.8K

Python

7.5% 7.1K

JavaScript

6.6% 6.3K

Config

4.3% 4.1K

CSS

0.3% 259

OTHER

0.3% 224

SHOWING THE 350 MOST CONNECTED OF 2,796 NODES · 4,644 EDGES · 244 COMMUNITIES — EXTRACTED FROM THE CODEBASE BY TREE-SITTER

GRAPH

FEATURES

Node graph execution engine	A headless TypeScript library that runs a whole project's edit queue in one pass, with a hard spend cap and crash-safe resume baked in.
Pixel-identical composite-back	Computer-vision alignment puts the AI edit back onto the source so every region outside the mask is byte-for-byte identical to the original photo.
Claude authors the pipeline	Claude reads the client brief and photos and emits a ready-to-run batch job, so the operator reviews and runs instead of building each step by hand.
Compounding prompt knowledge	After each session, Claude reads the operator's ratings and writes the lessons back to a living rulebook, so the next project's first attempt starts sharper.
Testable without spending a cent	Three replay tiers let the full pipeline run byte-identical in CI with no API key, so tests never cost a cent against OpenAI.
Optional full-auto loop	An opt-in tier lets the system generate, judge, and retry on its own, bounded by a dollar cap, a time limit, and a kill switch. Off by default so human taste stays in control.
An invite door, funded and capped by the owner	A guest with an invite runs the real machine on their own photo from the browser, picking which treatment to apply. The owner sets that invite's generation budget when minting it, hard-capped in the ledger, so a stranger can try the product without anyone risking an open spend. Identity is Clerk; admin is a verified email, not a shared passphrase.
Readable by other agents, not just people	A 42-tool MCP read plane exposes the job server, so an agent can inspect runs and state directly instead of scraping a UI. The dashboard also runs a per-service readiness probe and says plainly when something is not available, rather than showing a hopeful zero.

ARCHITECTURE

aie-contracts	Zod schemas for every data boundary: graph, run-journal-event, CAS layout, provenance sidecar, cost estimate, job API, run-error taxonomy. Emits JSON Schemas for cross-language use.
aie-engine	Pure headless library: graph validation, DAG executor (cell fan-out, spend cap, journal-before-paid-call, backoff/idempotency), content-addressed store, atomic writes, cv2-sidecar auto-align, Stage-2b primitives (upscale, remove-background, comment).
aie-providers	OpenAI provider behind a spend gate. Three replay tiers: ReplayStrict (CI, byte-identical, no key), ReplayResume (crash recovery), Record (live + fixture capture).
aie-app	Fastify composition root: POST /runs job model, re-subscribable SSE, RunStore, provider factory, CLI, debug console.
aie-ui	React + React Flow node canvas (typed ports, Blender-style IDs), Konva mask/crop editors, review surface (contact sheet, swipe compare, star ratings, export).
cv2 sidecar (Python)	Out-of-process computer vision: ORB/AKAZE feature matching + warp + NCC metrics. The TypeScript 4-DOF solver handles the transform math. opencv-js proved tsx/vitest-incompatible, so the CV pipeline runs as a spawned subprocess.
Offline Claude Code layer	Claude Opus (offline) authors node graphs from client brief + images, curates the prompt preset library, and runs the learning pass, reading provenance records and writing distilled lessons to ai-prompting.md.

STACK

LANGTypeScriptPython

FXReactReact FlowKonvaViteFastifyZustandVitestPlaywrightZod

INFRANode.js 22+npm workspacessharp (libvips)OpenCV (cv2 sidecar)Local-first / Vercel-deployable

AIOpenAI gpt-image-2Claude Opus (offline pipeline authoring + learning pass)Claude Code (orchestration)

SKILLS DEMONSTRATED

Turning a manual workflow into a batch system · Capping and proving AI API spend in production · Crash-safe pipelines that resume without re-billing · Computer vision for pixel-exact image compositing · Letting Claude author and improve the workflow itself · Drag-and-drop visual tooling for non-technical operators · Full-stack TypeScript delivery for a paying client

KEEP GOING

Take this with you.

SHAREX LinkedIn

DashboardPrefer email? Turn it on in your dashboard.