MASTER PLAN — THE SYSTEM — 2026-06-16WORK IN PROGRESS

The complete agentic operating system I'm building on Windows — full plan and execution

A Windows-native build plan for a complete agentic operating system that turns my real automation work into content, funnels, and clients. Every tool is verified for Windows 11 as of mid-2026 and wired to what I already run — Claude Code, a knowledge graph, a live members funnel on Neon and Vercel, a 14,099-post corpus — with mermaid architecture, honest cost economics, and a phased build that earns autonomy before it takes it.

≈ 15 min read

VIEW MARKDOWNOPEN IN CHATGPT ↗OPEN IN CLAUDE ↗

The complete agentic operating system I'm building on Windows — full plan and execution

This is the master plan, and it supersedes an earlier write-up that collapsed seventeen detailed builds into "just run n8n." That was wrong — but the correction needs its own honesty. When you actually verify the field's "agentic OS" tools, they split cleanly in two: a real, open-source core (Claude Code, Hermes, OpenClaw — all genuinely installable, all now running natively on Windows) and a paid-community branding layer on top of it (the "Pantheon" of personas, the overnight "dreaming" loop, "Mission Control" dashboards) that is sold through subscription communities and is not part of any official codebase. This plan adopts the real core, replicates the good ideas from the branding layer with tools I already own, and runs the whole thing on the machine I actually sit at: Windows 11. Every tool below was checked for Windows support and current status in June 2026. The faithful tool-by-tool reference behind every choice is attached: the complete agentic-OS reference.

CONTENTS

CH.01

The loop and the money

One closed loop turns my real automation work into clients. Five stages, each feeding the next:

flowchart LR
  S["1 SENSE"] --> T["2 STRATEGIZE"]
  T --> A["3 ACT"]
  A --> O["4 OBSERVE"]
  O --> L["5 LEARN"]
  L --> S

SENSE — read competitors honestly (winners and failures) and my own world: the 14,099-post automation corpus I already mined, plus the cheap social-intelligence funnel I already built.
STRATEGIZE — turn one real project into an optimized post-series and a funnel that points at the service.
ACT — gated auto-posting and replies, inside a safe envelope, every outbound action approved from my phone.
OBSERVE — a "what happened in the last 4 hours?" digest, computed from logs, narrated by the model — never invented.
LEARN — the system compares its own results to its baselines and rewrites the memory that feeds STRATEGIZE.

Money hierarchy: clients first; subscribers are the asset; views are fuel; ad-revenue is a rounding error. Content is build-in-public of real work — one artifact is portfolio, proof, and distribution at once. The difference from the old plan: this loop is run by a complete agentic OS that remembers, schedules, controls, and improves itself — not a single workflow canvas.

CH.02

The machine: a Windows-native host

The field's builds assume a Mac — a MacBook or always-on Mac Mini, Homebrew, ~/.config, desktop cron. I run Windows 11, so every layer has to translate. The good news from the research: the core tools are now first-class on Windows. The honest news: the "always-on PC as a 24/7 server" story has real holes, and I should plan around them rather than pretend they don't exist.

flowchart TB
  subgraph PC["Always-on Windows 11 PC"]
    CC["Claude Code (native, PowerShell)"]
    TS["Task Scheduler (PC awake)"]
    subgraph WSL["WSL2 + Docker Engine + systemd"]
      SVC["n8n / Postgres / services"]
    end
  end
  CC --> SVC
  TS --> CC
  CLOUD["Claude Routines (cloud, PC asleep)"] -.-> GH["GitHub repo"]
  GH -.-> CC
  CC --> TG["Telegram (control from phone)"]
  TG --> CC

The translation, tool by tool — every row verified against primary docs in June 2026:

Field default (Mac-centric)	Windows-native equivalent	Verified status
Claude Code on macOS/Linux	Claude Code native on Windows 11 (`irm https://claude.ai/install.ps1 \| iex`, or `winget install Anthropic.ClaudeCode`)	Native, no WSL2 required. `claude -p` headless works. Sandboxing is the one gap — it needs WSL2
Homebrew	winget	`winget install -e --id OpenJS.NodeJS.LTS` · `Docker.DockerDesktop` · `Obsidian.Obsidian` · `Python.Python.3.13` · `Git.Git`
Desktop cron	Windows Task Scheduler	Native; "run whether user is logged on or not" + missed-run catch-up. Env vars need a wrapper script; a named account (not SYSTEM) for user-installed CLIs
Always-on Mac Mini	Always-on Windows 11 PC	Works, with caveats (below)
n8n via Homebrew/pm2	Docker Engine inside WSL2 with systemd	`pm2 startup` is broken on Windows; Docker Engine-in-WSL2 (`systemctl enable --now docker`) is the dependable always-on path
bash / zsh	PowerShell (+ Git Bash when installed)	Native PowerShell tool in Claude Code; Git Bash unlocks the Bash tool

The honest gate on the host. Docker Desktop on Windows does not run headless — it needs an interactive login to start containers (Docker's own roadmap issue for this is still open). WSL2 is updated through the Microsoft Store, and that update kills running instances without warning. Windows forced restarts can still fire outside an 18-hour Active-Hours window. A home PC realistically lands around 95–98% uptime, and running it 24/7 adds real electricity cost. So the plan is layered: the PC-awake tier runs interactive and scheduled work (Task Scheduler + Docker Engine in WSL2), and the truly always-on, machine-asleep tier runs in the cloud (Claude Routines), with a cheap Linux VPS (Hetzner from ~€3.49/mo) as the honest fallback if I ever need a service that genuinely cannot blink. I am not going to pretend a desktop is a datacenter.

CH.03

What I already have vs. what I need

The reason this is buildable in weeks, not months: most of the OS already exists in this repo and the systems around it. The job is mostly wiring, not inventing.

OS layer	My actual asset	Status
Core runtime	Claude Code + `.claude/skills` + pre-commit hooks + `CLAUDE.md` discipline gates	Have — this site runs on it
Code memory	graphify (298 nodes / 597 edges / 15 communities on this repo; the real payoff is on btc-bot, 728k lines)	Have — `graphify-out/` is committed
Semantic memory	`CLAUDE.md` + `docs/STATE.md` spine; a self-improving wiki is the gap	Partial — add the LLM-wiki pattern
Control surface	Next.js 16 + Vercel + Neon + Auth.js v5 — the live `/notes` members funnel and `/dashboard` cabinet	Have — the dashboard foundation already ships
SENSE data	The 14,099-post / 7,526-account corpus + the Apify discovery→enrich→score funnel	Have — base rates already de-overfitted
Cheap inference	A pool of free NVIDIA NIM models — `minimaxai/minimax-m3`, `google/diffusiongemma-26b-a4b-it`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `mistralai/mistral-medium-3.5-128b` (separate free key per model, one concurrent worker each)	Have — free-tier per key; pooled for throughput + multi-model coverage (see Tooling)
Channels	Telegram bot as the two-way approval surface	Build — small, well-understood
Cadence	`/loop` (have) + Task Scheduler (build) + Cloud Routines (build)	Partial
Self-improvement	Self-updating `CLAUDE.md`, per-skill `learnings.md`, an overnight digest	Build — the compounding layer

Two honest notes on my own assets. First, graphify earns its keep on btc-bot, not on this site — the site's code is small enough that graphify itself prints "you may not need a graph." I'll point it at the 728k-line trading system, where querying summaries instead of re-reading source is the real token win. Second, the control dashboard should be a Next.js page on my existing Vercel + Neon + Auth.js stack — I already built password-gated, OAuth-backed surfaces for the members funnel — not the field's Obsidian-plugin dashboard, whose embedded-terminal plugins have open, unresolved Windows 11 bugs as of May 2026.

CH.04

The seven layers, on Windows

The target architecture, with each layer mapped to a Windows-native tool and, where possible, to something I already run.

flowchart TB
  subgraph CORE["Core — the brain"]
    CCW["Claude Code on Windows"]
  end
  subgraph MEM["Memory"]
    CMD["CLAUDE.md + skills + learnings.md"]
    WIKI["LLM wiki (Obsidian, Karpathy pattern)"]
    GRAPH["graphify code graph"]
  end
  subgraph CONN["Connections — the hands"]
    CLI["CLI-first + direct REST"]
    MCP["MCP: Context7 + Tool Search"]
    TGRAM["Telegram control"]
  end
  subgraph CAD["Cadence — the heartbeat"]
    LOOP["/loop (session)"]
    TASK["Task Scheduler (PC awake)"]
    ROUT["Cloud Routines (PC asleep)"]
  end
  subgraph CTRL["Control — the cockpit"]
    DASH["Next.js dashboard on Vercel + Neon"]
  end
  CCW --> CMD
  CCW --> GRAPH
  CCW --> CLI
  CCW --> MCP
  LOOP --> CCW
  TASK --> CCW
  ROUT --> CCW
  CCW --> DASH
  DASH --> TGRAM

1 — Core runtime (the brain). Claude Code, native on Windows, is the OS. It reads CLAUDE.md, runs skills, calls tools, schedules work, and runs headless behind buttons. The same plain-folder setup is portable — it would run unchanged in Codex or Cursor — but Claude Code is the home base. Per-task model economics is the lever the field calls "personas": I route the work by model tier.

2 — Memory (so it stops forgetting). Three real stores. Semantic: CLAUDE.md plus a self-improving LLM wiki in Obsidian, built on Karpathy's published pattern — a raw/ folder of sources and a wiki/ folder the model owns, navigated by an index.md rather than vector search (no embeddings needed at hundreds-of-pages scale). Procedural: the Agent Skills standard, each skill self-improving via a learnings.md read before every run. Code: graphify, on btc-bot, where querying god nodes and summaries instead of re-reading source cuts tokens hard One shared vault path so Claude Code and the wiki read one universal memory.

3 — Connections (the hands). Cheapest-correct wins: CLI-first and direct REST over MCP where a CLI exists, because piping a CLI through the bash tool is far leaner than loading a fat MCP schema Where a standard connector wins, MCP: Context7 for version-correct library docs (npx ctx7 setup --claude), and Anthropic's Tool Search, now auto-enabled in Claude Code, which defers tool schemas and collapses the system-tool overhead from ~15k tokens to under 1k. Telegram is the two-way control and approval channel on my phone.

4 — Cadence (the heartbeat) — three tiers. /loop for session-scoped work (minutes to ~3 days); Windows Task Scheduler for persistent jobs while the PC is awake; Claude Cloud Routines for work that must run while the machine is off — the morning brief, the 4-hour digest. Routines clone the repo into a 4 vCPU / 16 GB cloud box, run a saved prompt, and push a branch back; the minimum interval is 1 hour, and they draw on the same subscription quota. Every autonomous post still passes a Telegram approval tap.

5 — Control surface (the cockpit). A Next.js dashboard on my existing Vercel + Neon + Auth.js stack — not a bolt-on. Mission Control (the active goal and the me-vs-agent split), live AI spend, the memory and schedule panels, and skill buttons that run Claude Code headless. I already ship OAuth-gated, server-rendered surfaces for the members funnel; the dashboard is the same machinery pointed inward.

6 — Self-improvement (why it compounds). An overnight digest: read the day's session history, find patterns and unused capabilities, emit a morning brief tied to my goals. Combined with a self-updating CLAUDE.md and per-skill learnings.md, the OS gets a little better while I sleep — this is the LEARN stage applied to the system itself, on top of LEARN for the content.

CH.05

What the field research validates (and what I'm adopting)

A 95-post teardown of the field's loudest agentic-OS seller — kept as private competitor research — confirms the spine of this plan and sharpens a few ideas worth adopting verbatim:

Everything around those ideas in the source — the revenue boasts, the listicle volume, the "free, no API keys" tool claims — is marketing scaffolding, not method. The architecture and the cost-engineering are the parts worth taking; the funnel theatre isn't.

CH.06

Tooling decisions (verified) — adopt, defer, skip

This is where the research changes the plan. Every tool below was checked for real Windows-11 support and current status in June 2026, and rated honestly.

Tool	Windows-11 reality (verified)	Decision	Why
Claude Code	Native (Win 10 1809+/11); `winget` or PowerShell install; `-p` headless works; sandboxing needs WSL2	Adopt — core	Already the OS this site runs on
Hermes (Nous Research)	Real, MIT, native Windows desktop app since v0.16.0 (June 2026); only the dashboard's embedded terminal needs WSL2	Adopt — base	The open-source agent is a core layer of the stack (the research / long-running-workhorse tier), adopted now — not deferred. Only the paid-community "Pantheon / dreaming / Mission Control" branding ($59/mo Skool) is skipped; the codebase itself is base.
OpenClaw	Real, MIT, native Windows Hub app — but CVE-2026-32922 (CVSS 9.9) + ~135k internet-exposed instances, 63% unauthenticated	Skip	A control plane with that exposure record is not going on my machine; the capability isn't worth the blast radius
n8n	Runs stable on the 24/7 Windows PC three ways: Docker Engine in WSL2 (`restart: unless-stopped`, truly headless); Docker Desktop (fine on an always-on auto-login box); or native npm (`npm i -g n8n`) run as a service via nssm or Task-Scheduler-at-logon. Only `pm2 startup` is genuinely broken on Windows.	Adopt	It works on Windows — use it for deterministic canvas flows + the 400+ app connectors (webhooks, schedules, OAuth integrations) where a visual graph beats code; Claude Code skills still own the code-side logic.
NVIDIA NIM (free pool)	Hosted API works from Windows; a pool of 5 free models (minimax-m3 / diffusiongemma-26b / kimi-k2.6 / glm-5.1 / mistral-medium-3.5), ~40 RPM per model-key → run one concurrent worker per model for N× throughput; self-host containers are Linux-only	Adopt	Free bulk inference; per-model ~40 RPM is real, so pool the model-keys (one worker each) and keep gemma as the universal fallback. Proven in the mining pipeline (281 posts distilled across the pool). deepseek-v4-pro was dropped — its endpoint times out.
Context7 + Tool Search	Both GA and cross-platform; Tool Search auto-on in Claude Code	Adopt	Live docs + big context savings, near-zero setup
Obsidian	Obsidian + CustomJS/Dataview fine on Windows; the Terminal/Shell-Commands plugins have open Windows bugs	Adopt — base	Obsidian is the base memory / knowledge layer — local-first, plain-markdown, hugely adopted; the self-improving LLM wiki (Karpathy pattern) lives here and every agent reads it for context. Only the buggy embedded-terminal plugin dashboard is skipped — build that on my own stack.
Telegram + Railway	Telegram Bot API is pure HTTPS (cross-platform); Railway Hobby ~$5/mo for a tiny always-on bot	Adopt	Approve/reject via inline buttons is the firewall
Granola (meeting ingest)	Has a native Windows app + REST API since 2025 (not Mac-only)	Optional	Only if meeting capture becomes part of SENSE

The headline corrections: Hermes is real and runs on Windows, but most of what the videos sell under its name is a configuration layer I can reproduce myself — so it's a defer, not a must-buy. OpenClaw is a hard skip on security grounds. And the field's nicest demo — the Obsidian dashboard with an embedded terminal — is exactly the piece that's flaky on Windows, so I route around it onto the Next.js stack I already run.

CH.07

The phased build

Build green, validate before autonomy. The gate after Phase 1 is load-bearing: the machine is earned, not assumed.

flowchart LR
  P0["Phase 0 — OS skeleton"] --> P1["Phase 1 — SENSE + STRATEGIZE (by hand)"]
  P1 --> GATE{"Beats baseline?"}
  GATE -- no --> P1
  GATE -- yes --> P2["Phase 2 — memory that grows"]
  P2 --> P3["Phase 3 — dashboard + model routing"]
  P3 --> P4["Phase 4 — OBSERVE + overnight digest"]
  P4 --> P5["Phase 5 — ACT, gated"]
  P5 --> P6["Phase 6 — LEARN closes the loop"]

Phase 0 — the OS skeleton. Harden what I have: CLAUDE.md + identity files; the skills library; hooks; graphify on btc-bot; the Obsidian LLM-wiki scaffold. Stand it on the Windows PC with Claude Code native, Docker Engine in WSL2, and one Task Scheduler job. Done when: one end-to-end "hello" — a scheduled task pulls data → a skill processes it → a Telegram tap writes a result.
Phase 1 — SENSE + STRATEGIZE, by hand. The content-intelligence skill (honest, follower- normalized base rates, n≥30) + the project→campaign generator. Post manually for two weeks and measure against baseline. Gate: if it doesn't beat baseline, fix the strategy — build no autonomy.
Phase 2 — memory that grows. Wire the LLM wiki + per-skill learnings.md + the shared vault path. Done when: a second STRATEGIZE run visibly drops a losing pattern it learned about itself.
Phase 3 — dashboard + model routing. Stand up the Next.js Mission-Control dashboard on Vercel/Neon; implement the per-task model routing. Done when: the dashboard shows live state and a headless button runs a skill.
Phase 4 — OBSERVE + the overnight digest. The "what happened in 4h?" Telegram digest (SQL counts; the model only narrates) + the morning brief. Done when: a 4-hour digest reconciles exactly with the logs and a morning brief lands tied to my goals.
Phase 5 — ACT, gated. Autonomous reply-drafting to high-signal niche posts, every post through the Telegram approval tap, inside a safe envelope with a hard daily cap. Only after Phase 1 passed.
Phase 6 — LEARN closes the loop. Weekly, the system compares its results to benchmarks and rewrites the memory that feeds STRATEGIZE; the overnight digest rewrites the OS itself.

CH.08

Honest gates, real stakes, and cost

Validate the content strategy by hand before any autonomy (the Phase 1 gate). This is the one rule that, if skipped, wastes everything downstream.
Permission scoping is the real stakes. A credential on the ring means the action can fire regardless of instructions — the field's 150k-inbox cautionary tale, and OpenClaw's 63%-of-instances- unauthenticated record, are the same lesson twice. Gate every outbound post behind the Telegram tap; never automate likes, follows, or DMs.
Cost discipline. Verified Claude API pricing (June 2026): Opus 4.8 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5 per million tokens in/out; cache reads are ~0.1× and writes ~1.25×. The OS leans on prompt caching, the /compact cadence, and the model split.
Security. Verify every package before install (Claude Code itself had two patched CVEs in 2025–26; OpenClaw's record is worse); treat hostile post text as prompt injection; keep secrets out of prompts and in environment variables; the Telegram approval tap is the firewall.
Reliability honesty. The Windows PC is the convenient host, not a guaranteed one. The 24/7, machine-asleep tier belongs in Claude Routines or on a small VPS — and the plan says so up front rather than discovering it in production.

Experimental and in motion. The complete tool-by-tool reference behind every choice here — including everything the n8n-only write-up dropped — is the attached agentic-OS reference, the faithful synthesis of all 17 field builds. The Windows-specific status, costs, and security findings above were verified against primary sources in June 2026.

agentic-osautomationai-systemswindowsgrowth

DISCUSSION

No comments yet — start the conversation.