MASTER PLAN — THE SYSTEM — 2026-06-16WORK IN PROGRESS
The complete agentic operating system I'm building on Windows — full plan and execution
A Windows-native build plan for a complete agentic operating system that turns my real automation work into content, funnels, and clients. Every tool is verified for Windows 11 as of mid-2026 and wired to what I already run — Claude Code, a knowledge graph, a live members funnel on Neon and Vercel, a 14,099-post corpus — with mermaid architecture, honest cost economics, and a phased build that earns autonomy before it takes it.
≈ 15 min read

This is the master plan, and it supersedes an earlier write-up that collapsed seventeen detailed builds into "just run n8n." That was wrong — but the correction needs its own honesty. When you actually verify the field's "agentic OS" tools, they split cleanly in two: a real, open-source core (Claude Code, Hermes, OpenClaw — all genuinely installable, all now running natively on Windows) and a paid-community branding layer on top of it (the "Pantheon" of personas, the overnight "dreaming" loop, "Mission Control" dashboards) that is sold through subscription communities and is not part of any official codebase. This plan adopts the real core, replicates the good ideas from the branding layer with tools I already own, and runs the whole thing on the machine I actually sit at: Windows 11. Every tool below was checked for Windows support and current status in June 2026. The faithful tool-by-tool reference behind every choice is attached: the complete agentic-OS reference.
CONTENTS
CH.01
The loop and the money
One closed loop turns my real automation work into clients. Five stages, each feeding the next:
flowchart LR
S["1 SENSE"] --> T["2 STRATEGIZE"]
T --> A["3 ACT"]
A --> O["4 OBSERVE"]
O --> L["5 LEARN"]
L --> S
- SENSE — read competitors honestly (winners and failures) and my own world: the 14,099-post automation corpus I already mined, plus the cheap social-intelligence funnel I already built.
- STRATEGIZE — turn one real project into an optimized post-series and a funnel that points at the service.
- ACT — gated auto-posting and replies, inside a safe envelope, every outbound action approved from my phone.
- OBSERVE — a "what happened in the last 4 hours?" digest, computed from logs, narrated by the model — never invented.
- LEARN — the system compares its own results to its baselines and rewrites the memory that feeds STRATEGIZE.
Money hierarchy: clients first; subscribers are the asset; views are fuel; ad-revenue is a rounding error. Content is build-in-public of real work — one artifact is portfolio, proof, and distribution at once. The difference from the old plan: this loop is run by a complete agentic OS that remembers, schedules, controls, and improves itself — not a single workflow canvas.
CH.02
The machine: a Windows-native host
The field's builds assume a Mac — a MacBook or always-on Mac Mini, Homebrew, ~/.config, desktop
cron. I run Windows 11, so every layer has to translate. The good news from the research: the
core tools are now first-class on Windows. The honest news: the "always-on PC as a 24/7 server"
story has real holes, and I should plan around them rather than pretend they don't exist.
flowchart TB
subgraph PC["Always-on Windows 11 PC"]
CC["Claude Code (native, PowerShell)"]
TS["Task Scheduler (PC awake)"]
subgraph WSL["WSL2 + Docker Engine + systemd"]
SVC["n8n / Postgres / services"]
end
end
CC --> SVC
TS --> CC
CLOUD["Claude Routines (cloud, PC asleep)"] -.-> GH["GitHub repo"]
GH -.-> CC
CC --> TG["Telegram (control from phone)"]
TG --> CC
The translation, tool by tool — every row verified against primary docs in June 2026:
| Field default (Mac-centric) | Windows-native equivalent | Verified status |
|---|---|---|
| Claude Code on macOS/Linux | Claude Code native on Windows 11 (irm https://claude.ai/install.ps1 | iex, or winget install Anthropic.ClaudeCode) |
Native, no WSL2 required. claude -p headless works. Sandboxing is the one gap — it needs WSL2 |
| Homebrew | winget | winget install -e --id OpenJS.NodeJS.LTS · Docker.DockerDesktop · Obsidian.Obsidian · Python.Python.3.13 · Git.Git |
| Desktop cron | Windows Task Scheduler | Native; "run whether user is logged on or not" + missed-run catch-up. Env vars need a wrapper script; a named account (not SYSTEM) for user-installed CLIs |
| Always-on Mac Mini | Always-on Windows 11 PC | Works, with caveats (below) |
| n8n via Homebrew/pm2 | Docker Engine inside WSL2 with systemd | pm2 startup is broken on Windows; Docker Engine-in-WSL2 (systemctl enable --now docker) is the dependable always-on path |
| bash / zsh | PowerShell (+ Git Bash when installed) | Native PowerShell tool in Claude Code; Git Bash unlocks the Bash tool |
The honest gate on the host. Docker Desktop on Windows does not run headless — it needs an interactive login to start containers (Docker's own roadmap issue for this is still open). WSL2 is updated through the Microsoft Store, and that update kills running instances without warning. Windows forced restarts can still fire outside an 18-hour Active-Hours window. A home PC realistically lands around 95–98% uptime, and running it 24/7 adds real electricity cost. So the plan is layered: the PC-awake tier runs interactive and scheduled work (Task Scheduler + Docker Engine in WSL2), and the truly always-on, machine-asleep tier runs in the cloud (Claude Routines), with a cheap Linux VPS (Hetzner from ~€3.49/mo) as the honest fallback if I ever need a service that genuinely cannot blink. I am not going to pretend a desktop is a datacenter.
CH.03
What I already have vs. what I need
The reason this is buildable in weeks, not months: most of the OS already exists in this repo and the systems around it. The job is mostly wiring, not inventing.
| OS layer | My actual asset | Status |
|---|---|---|
| Core runtime | Claude Code + .claude/skills + pre-commit hooks + CLAUDE.md discipline gates |
Have — this site runs on it |
| Code memory | graphify (298 nodes / 597 edges / 15 communities on this repo; the real payoff is on btc-bot, 728k lines) | Have — graphify-out/ is committed |
| Semantic memory | CLAUDE.md + docs/STATE.md spine; a self-improving wiki is the gap |
Partial — add the LLM-wiki pattern |
| Control surface | Next.js 16 + Vercel + Neon + Auth.js v5 — the live /notes members funnel and /dashboard cabinet |
Have — the dashboard foundation already ships |
| SENSE data | The 14,099-post / 7,526-account corpus + the Apify discovery→enrich→score funnel | Have — base rates already de-overfitted |
| Cheap inference | A pool of free NVIDIA NIM models — minimaxai/minimax-m3, google/diffusiongemma-26b-a4b-it, moonshotai/kimi-k2.6, z-ai/glm-5.1, mistralai/mistral-medium-3.5-128b (separate free key per model, one concurrent worker each) |
Have — free-tier per key; pooled for throughput + multi-model coverage (see Tooling) |
| Channels | Telegram bot as the two-way approval surface | Build — small, well-understood |
| Cadence | /loop (have) + Task Scheduler (build) + Cloud Routines (build) |
Partial |
| Self-improvement | Self-updating CLAUDE.md, per-skill learnings.md, an overnight digest |
Build — the compounding layer |
Two honest notes on my own assets. First, graphify earns its keep on btc-bot, not on this site — the site's code is small enough that graphify itself prints "you may not need a graph." I'll point it at the 728k-line trading system, where querying summaries instead of re-reading source is the real token win. Second, the control dashboard should be a Next.js page on my existing Vercel + Neon + Auth.js stack — I already built password-gated, OAuth-backed surfaces for the members funnel — not the field's Obsidian-plugin dashboard, whose embedded-terminal plugins have open, unresolved Windows 11 bugs as of May 2026.
CH.04
The seven layers, on Windows
The target architecture, with each layer mapped to a Windows-native tool and, where possible, to something I already run.
flowchart TB
subgraph CORE["Core — the brain"]
CCW["Claude Code on Windows"]
end
subgraph MEM["Memory"]
CMD["CLAUDE.md + skills + learnings.md"]
WIKI["LLM wiki (Obsidian, Karpathy pattern)"]
GRAPH["graphify code graph"]
end
subgraph CONN["Connections — the hands"]
CLI["CLI-first + direct REST"]
MCP["MCP: Context7 + Tool Search"]
TGRAM["Telegram control"]
end
subgraph CAD["Cadence — the heartbeat"]
LOOP["/loop (session)"]
TASK["Task Scheduler (PC awake)"]
ROUT["Cloud Routines (PC asleep)"]
end
subgraph CTRL["Control — the cockpit"]
DASH["Next.js dashboard on Vercel + Neon"]
end
CCW --> CMD
CCW --> GRAPH
CCW --> CLI
CCW --> MCP
LOOP --> CCW
TASK --> CCW
ROUT --> CCW
CCW --> DASH
DASH --> TGRAM
1 — Core runtime (the brain). Claude Code, native on Windows, is the OS. It reads CLAUDE.md,
runs skills, calls tools, schedules work, and runs headless behind buttons. The same plain-folder
setup is portable — it would run unchanged in Codex or Cursor — but Claude Code is the home base.
Per-task model economics is the lever the field calls "personas": I route the work by model tier.
2 — Memory (so it stops forgetting). Three real stores. Semantic: CLAUDE.md plus a
self-improving LLM wiki in Obsidian, built on Karpathy's published pattern — a raw/ folder of
sources and a wiki/ folder the model owns, navigated by an index.md rather than vector search
(no embeddings needed at hundreds-of-pages scale). Procedural: the Agent Skills standard, each
skill self-improving via a learnings.md read before every run. Code: graphify, on btc-bot, where
querying god nodes and summaries instead of re-reading source cuts tokens hard
One shared vault path so Claude Code and the wiki read one universal memory.
3 — Connections (the hands). Cheapest-correct wins: CLI-first and direct REST over MCP where
a CLI exists, because piping a CLI through the bash tool is far leaner than loading a fat MCP schema
Where a standard connector wins, MCP: Context7 for version-correct library docs
(npx ctx7 setup --claude), and Anthropic's Tool Search, now auto-enabled in Claude Code, which
defers tool schemas and collapses the system-tool overhead from ~15k tokens to under 1k. Telegram
is the two-way control and approval channel on my phone.
4 — Cadence (the heartbeat) — three tiers. /loop for session-scoped work (minutes to ~3 days);
Windows Task Scheduler for persistent jobs while the PC is awake; Claude Cloud Routines for
work that must run while the machine is off — the morning brief, the 4-hour digest. Routines clone
the repo into a 4 vCPU / 16 GB cloud box, run a saved prompt, and push a branch back; the minimum
interval is 1 hour, and they draw on the same subscription quota. Every autonomous post still passes
a Telegram approval tap.
5 — Control surface (the cockpit). A Next.js dashboard on my existing Vercel + Neon + Auth.js stack — not a bolt-on. Mission Control (the active goal and the me-vs-agent split), live AI spend, the memory and schedule panels, and skill buttons that run Claude Code headless. I already ship OAuth-gated, server-rendered surfaces for the members funnel; the dashboard is the same machinery pointed inward.
6 — Self-improvement (why it compounds). An overnight digest: read the day's session history,
find patterns and unused capabilities, emit a morning brief tied to my goals. Combined with a
self-updating CLAUDE.md and per-skill learnings.md, the OS gets a little better while I sleep —
this is the LEARN stage applied to the system itself, on top of LEARN for the content.
CH.05
What the field research validates (and what I'm adopting)
A 95-post teardown of the field's loudest agentic-OS seller — kept as private competitor research — confirms the spine of this plan and sharpens a few ideas worth adopting verbatim:
- The memory / "Self" layer is the differentiator, not the agents. The single most-repeated,
most-defensible point in the whole field: context is the biggest driver of output quality — without a
persistent, business-specific memory, agents produce generic work; with it, output is in your voice and
facts. That is exactly Layer 2 above — the lesson is to treat the Obsidian LLM-wiki as the load-bearing
layer, not an afterthought, and to make every agent read it on every run. Worth adding alongside the
semantic/procedural/code stores: an episodic memory — an append-only
memory.mdof decisions, Git-backed nightly, refactored quarterly. - "Own the harness, rent the model." Harnesses last years; models cycle every six months. Build on
open, swappable agent shells (Claude Code, Hermes) and treat the model as a hot-swappable part — which is
precisely why the model layer here is the free NVIDIA pool + Opus only at the top, re-rankable as
ratings shift. The routing rule generalizes cleanly: a free "good-enough" tier carries the ~80% of
volume work, a frontier model touches only the ~20% that needs it (here: the NVIDIA pool for bulk, Opus
for synthesis — the same discipline the
/mine-corpuspipeline encodes). - A 4-layer mental model — Intelligence (brain) / Execution (hands) / Research (long-running workhorse) / Self (memory) — maps onto the layers above and is a clean way to decide where a job belongs: Claude Code = Intelligence, Hermes = Research/workhorse, the CLI/MCP connectors = Execution, Obsidian = Self.
- Reliability over features when picking agent tools. Demos are cherry-picked; real use is messier. A blunt, keepable heuristic: if a tool needs debugging on >20% of tasks, switch — favour the boring, reliable option for bulk work and reserve the flashy one for the narrow case it's genuinely best at.
- Local-first is a positioning lever, not only a privacy choice. "Data never leaves the machine" is a sellable premium to regulated buyers (legal / medical / finance) — a pricing angle for the offer, not just an architecture note.
Everything around those ideas in the source — the revenue boasts, the listicle volume, the "free, no API keys" tool claims — is marketing scaffolding, not method. The architecture and the cost-engineering are the parts worth taking; the funnel theatre isn't.
CH.06
Tooling decisions (verified) — adopt, defer, skip
This is where the research changes the plan. Every tool below was checked for real Windows-11 support and current status in June 2026, and rated honestly.
| Tool | Windows-11 reality (verified) | Decision | Why |
|---|---|---|---|
| Claude Code | Native (Win 10 1809+/11); winget or PowerShell install; -p headless works; sandboxing needs WSL2 |
Adopt — core | Already the OS this site runs on |
| Hermes (Nous Research) | Real, MIT, native Windows desktop app since v0.16.0 (June 2026); only the dashboard's embedded terminal needs WSL2 | Adopt — base | The open-source agent is a core layer of the stack (the research / long-running-workhorse tier), adopted now — not deferred. Only the paid-community "Pantheon / dreaming / Mission Control" branding ($59/mo Skool) is skipped; the codebase itself is base. |
| OpenClaw | Real, MIT, native Windows Hub app — but CVE-2026-32922 (CVSS 9.9) + ~135k internet-exposed instances, 63% unauthenticated | Skip | A control plane with that exposure record is not going on my machine; the capability isn't worth the blast radius |
| n8n | Runs stable on the 24/7 Windows PC three ways: Docker Engine in WSL2 (restart: unless-stopped, truly headless); Docker Desktop (fine on an always-on auto-login box); or native npm (npm i -g n8n) run as a service via nssm or Task-Scheduler-at-logon. Only pm2 startup is genuinely broken on Windows. |
Adopt | It works on Windows — use it for deterministic canvas flows + the 400+ app connectors (webhooks, schedules, OAuth integrations) where a visual graph beats code; Claude Code skills still own the code-side logic. |
| NVIDIA NIM (free pool) | Hosted API works from Windows; a pool of 5 free models (minimax-m3 / diffusiongemma-26b / kimi-k2.6 / glm-5.1 / mistral-medium-3.5), ~40 RPM per model-key → run one concurrent worker per model for N× throughput; self-host containers are Linux-only | Adopt | Free bulk inference; per-model ~40 RPM is real, so pool the model-keys (one worker each) and keep gemma as the universal fallback. Proven in the mining pipeline (281 posts distilled across the pool). deepseek-v4-pro was dropped — its endpoint times out. |
| Context7 + Tool Search | Both GA and cross-platform; Tool Search auto-on in Claude Code | Adopt | Live docs + big context savings, near-zero setup |
| Obsidian | Obsidian + CustomJS/Dataview fine on Windows; the Terminal/Shell-Commands plugins have open Windows bugs | Adopt — base | Obsidian is the base memory / knowledge layer — local-first, plain-markdown, hugely adopted; the self-improving LLM wiki (Karpathy pattern) lives here and every agent reads it for context. Only the buggy embedded-terminal plugin dashboard is skipped — build that on my own stack. |
| Telegram + Railway | Telegram Bot API is pure HTTPS (cross-platform); Railway Hobby ~$5/mo for a tiny always-on bot | Adopt | Approve/reject via inline buttons is the firewall |
| Granola (meeting ingest) | Has a native Windows app + REST API since 2025 (not Mac-only) | Optional | Only if meeting capture becomes part of SENSE |
The headline corrections: Hermes is real and runs on Windows, but most of what the videos sell under its name is a configuration layer I can reproduce myself — so it's a defer, not a must-buy. OpenClaw is a hard skip on security grounds. And the field's nicest demo — the Obsidian dashboard with an embedded terminal — is exactly the piece that's flaky on Windows, so I route around it onto the Next.js stack I already run.
CH.07
The phased build
Build green, validate before autonomy. The gate after Phase 1 is load-bearing: the machine is earned, not assumed.
flowchart LR
P0["Phase 0 — OS skeleton"] --> P1["Phase 1 — SENSE + STRATEGIZE (by hand)"]
P1 --> GATE{"Beats baseline?"}
GATE -- no --> P1
GATE -- yes --> P2["Phase 2 — memory that grows"]
P2 --> P3["Phase 3 — dashboard + model routing"]
P3 --> P4["Phase 4 — OBSERVE + overnight digest"]
P4 --> P5["Phase 5 — ACT, gated"]
P5 --> P6["Phase 6 — LEARN closes the loop"]
- Phase 0 — the OS skeleton. Harden what I have:
CLAUDE.md+ identity files; the skills library; hooks; graphify on btc-bot; the Obsidian LLM-wiki scaffold. Stand it on the Windows PC with Claude Code native, Docker Engine in WSL2, and one Task Scheduler job. Done when: one end-to-end "hello" — a scheduled task pulls data → a skill processes it → a Telegram tap writes a result. - Phase 1 — SENSE + STRATEGIZE, by hand. The content-intelligence skill (honest, follower- normalized base rates, n≥30) + the project→campaign generator. Post manually for two weeks and measure against baseline. Gate: if it doesn't beat baseline, fix the strategy — build no autonomy.
- Phase 2 — memory that grows. Wire the LLM wiki + per-skill
learnings.md+ the shared vault path. Done when: a second STRATEGIZE run visibly drops a losing pattern it learned about itself. - Phase 3 — dashboard + model routing. Stand up the Next.js Mission-Control dashboard on Vercel/Neon; implement the per-task model routing. Done when: the dashboard shows live state and a headless button runs a skill.
- Phase 4 — OBSERVE + the overnight digest. The "what happened in 4h?" Telegram digest (SQL counts; the model only narrates) + the morning brief. Done when: a 4-hour digest reconciles exactly with the logs and a morning brief lands tied to my goals.
- Phase 5 — ACT, gated. Autonomous reply-drafting to high-signal niche posts, every post through the Telegram approval tap, inside a safe envelope with a hard daily cap. Only after Phase 1 passed.
- Phase 6 — LEARN closes the loop. Weekly, the system compares its results to benchmarks and rewrites the memory that feeds STRATEGIZE; the overnight digest rewrites the OS itself.
CH.08
Honest gates, real stakes, and cost
- Validate the content strategy by hand before any autonomy (the Phase 1 gate). This is the one rule that, if skipped, wastes everything downstream.
- Permission scoping is the real stakes. A credential on the ring means the action can fire regardless of instructions — the field's 150k-inbox cautionary tale, and OpenClaw's 63%-of-instances- unauthenticated record, are the same lesson twice. Gate every outbound post behind the Telegram tap; never automate likes, follows, or DMs.
- Cost discipline. Verified Claude API pricing (June 2026): Opus 4.8 $5/$25, Sonnet 4.6 $3/$15,
Haiku 4.5 $1/$5 per million tokens in/out; cache reads are ~0.1× and writes ~1.25×. The OS leans on
prompt caching, the
/compactcadence, and the model split. - Security. Verify every package before install (Claude Code itself had two patched CVEs in 2025–26; OpenClaw's record is worse); treat hostile post text as prompt injection; keep secrets out of prompts and in environment variables; the Telegram approval tap is the firewall.
- Reliability honesty. The Windows PC is the convenient host, not a guaranteed one. The 24/7, machine-asleep tier belongs in Claude Routines or on a small VPS — and the plan says so up front rather than discovering it in production.
Experimental and in motion. The complete tool-by-tool reference behind every choice here — including everything the n8n-only write-up dropped — is the attached agentic-OS reference, the faithful synthesis of all 17 field builds. The Windows-specific status, costs, and security findings above were verified against primary sources in June 2026.
No comments yet — start the conversation.
Sign in to join the discussion — it's free.