# The complete agentic operating system I'm building on Windows — full plan and execution

> A Windows-native build plan for a complete agentic operating system that turns my real automation work into content, funnels, and clients. Every tool is verified for Windows 11 as of mid-2026 and wired to what I already run — Claude Code, a knowledge graph, a live members funnel on Neon and Vercel, a 14,099-post corpus — with mermaid architecture, honest cost economics, and a phased build that earns autonomy before it takes it.
>
> https://pravda.systems/notes/windows-agentic-os-build-plan · 2026-06-16

This is the master plan, and it supersedes an earlier write-up that collapsed seventeen detailed
builds into "just run n8n." That was wrong — but the correction needs its own honesty. When you
actually verify the field's "agentic OS" tools, they split cleanly in two: a **real, open-source
core** (Claude Code, Hermes, OpenClaw — all genuinely installable, all now running natively on
Windows) and a **paid-community branding layer** on top of it (the "Pantheon" of personas, the
overnight "dreaming" loop, "Mission Control" dashboards) that is sold through subscription
communities and is *not* part of any official codebase. This plan adopts the real core, replicates
the good ideas from the branding layer with tools I already own, and runs the whole thing on the
machine I actually sit at: **Windows 11**. Every tool below was checked for Windows support and
current status in June 2026. The faithful tool-by-tool reference behind every choice is attached:
[the complete agentic-OS reference](_private/agentic-os-reference.md).

## The loop and the money

One closed loop turns my real automation work into clients. Five stages, each feeding the next:

```mermaid
flowchart LR
  S["1 SENSE"] --> T["2 STRATEGIZE"]
  T --> A["3 ACT"]
  A --> O["4 OBSERVE"]
  O --> L["5 LEARN"]
  L --> S
```

- **SENSE** — read competitors honestly (winners *and* failures) and my own world: the 14,099-post
  automation corpus I already mined, plus the cheap social-intelligence funnel I already built.
- **STRATEGIZE** — turn one real project into an optimized post-series and a funnel that points at
  the service.
- **ACT** — gated auto-posting and replies, inside a safe envelope, every outbound action approved
  from my phone.
- **OBSERVE** — a "what happened in the last 4 hours?" digest, computed from logs, narrated by the
  model — never invented.
- **LEARN** — the system compares its own results to its baselines and rewrites the memory that
  feeds STRATEGIZE.

**Money hierarchy:** clients first; subscribers are the asset; views are fuel; ad-revenue is a
rounding error. Content is build-in-public of real work — one artifact is portfolio, proof, and
distribution at once. The difference from the old plan: this loop is run by a *complete* agentic OS
that remembers, schedules, controls, and improves itself — not a single workflow canvas.

## The machine: a Windows-native host

The field's builds assume a Mac — a MacBook or always-on Mac Mini, Homebrew, `~/.config`, desktop
cron. I run **Windows 11**, so every layer has to translate. The good news from the research: the
core tools are now first-class on Windows. The honest news: the "always-on PC as a 24/7 server"
story has real holes, and I should plan around them rather than pretend they don't exist.

```mermaid
flowchart TB
  subgraph PC["Always-on Windows 11 PC"]
    CC["Claude Code (native, PowerShell)"]
    TS["Task Scheduler (PC awake)"]
    subgraph WSL["WSL2 + Docker Engine + systemd"]
      SVC["n8n / Postgres / services"]
    end
  end
  CC --> SVC
  TS --> CC
  CLOUD["Claude Routines (cloud, PC asleep)"] -.-> GH["GitHub repo"]
  GH -.-> CC
  CC --> TG["Telegram (control from phone)"]
  TG --> CC
```

The translation, tool by tool — every row verified against primary docs in June 2026:

| Field default (Mac-centric) | Windows-native equivalent | Verified status |
|---|---|---|
| Claude Code on macOS/Linux | **Claude Code native on Windows 11** (`irm https://claude.ai/install.ps1 \| iex`, or `winget install Anthropic.ClaudeCode`) | Native, no WSL2 required. `claude -p` headless works. Sandboxing is the one gap — it needs WSL2 |
| Homebrew | **winget** | `winget install -e --id OpenJS.NodeJS.LTS` · `Docker.DockerDesktop` · `Obsidian.Obsidian` · `Python.Python.3.13` · `Git.Git` |
| Desktop cron | **Windows Task Scheduler** | Native; "run whether user is logged on or not" + missed-run catch-up. Env vars need a wrapper script; a named account (not SYSTEM) for user-installed CLIs |
| Always-on Mac Mini | **Always-on Windows 11 PC** | Works, with caveats (below) |
| n8n via Homebrew/pm2 | **Docker Engine inside WSL2 with systemd** | `pm2 startup` is broken on Windows; Docker Engine-in-WSL2 (`systemctl enable --now docker`) is the dependable always-on path |
| bash / zsh | **PowerShell** (+ Git Bash when installed) | Native PowerShell tool in Claude Code; Git Bash unlocks the Bash tool |

**The honest gate on the host.** Docker *Desktop* on Windows does **not** run headless — it needs an
interactive login to start containers (Docker's own roadmap issue for this is still open). WSL2 is
updated through the Microsoft Store, and that update **kills running instances without warning**.
Windows forced restarts can still fire outside an 18-hour Active-Hours window. A home PC realistically
lands around 95–98% uptime, and running it 24/7 adds real electricity cost. So the plan is layered:
the **PC-awake** tier runs interactive and scheduled work (Task Scheduler + Docker Engine in WSL2),
and the truly **always-on, machine-asleep** tier runs in the cloud (Claude Routines), with a cheap
Linux VPS (Hetzner from ~€3.49/mo) as the honest fallback if I ever need a service that genuinely
cannot blink. I am not going to pretend a desktop is a datacenter.

## What I already have vs. what I need

The reason this is buildable in weeks, not months: most of the OS already exists in this repo and
the systems around it. The job is mostly *wiring*, not *inventing*.

| OS layer | My actual asset | Status |
|---|---|---|
| Core runtime | Claude Code + `.claude/skills` + pre-commit hooks + `CLAUDE.md` discipline gates | **Have** — this site runs on it |
| Code memory | **graphify** (298 nodes / 597 edges / 15 communities on this repo; the real payoff is on btc-bot, 728k lines) | **Have** — `graphify-out/` is committed |
| Semantic memory | `CLAUDE.md` + `docs/STATE.md` spine; a self-improving wiki is the gap | **Partial** — add the LLM-wiki pattern |
| Control surface | Next.js 16 + Vercel + Neon + Auth.js v5 — the live `/notes` members funnel and `/dashboard` cabinet | **Have** — the dashboard foundation already ships |
| SENSE data | The 14,099-post / 7,526-account corpus + the Apify discovery→enrich→score funnel | **Have** — base rates already de-overfitted |
| Cheap inference | A **pool of free NVIDIA NIM models** — `minimaxai/minimax-m3`, `google/diffusiongemma-26b-a4b-it`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `mistralai/mistral-medium-3.5-128b` (separate free key per model, one concurrent worker each) | **Have** — free-tier per key; pooled for throughput + multi-model coverage (see Tooling) |
| Channels | Telegram bot as the two-way approval surface | **Build** — small, well-understood |
| Cadence | `/loop` (have) + Task Scheduler (build) + Cloud Routines (build) | **Partial** |
| Self-improvement | Self-updating `CLAUDE.md`, per-skill `learnings.md`, an overnight digest | **Build** — the compounding layer |

Two honest notes on my own assets. First, **graphify earns its keep on btc-bot, not on this site** —
the site's code is small enough that graphify itself prints "you may not need a graph." I'll point it
at the 728k-line trading system, where querying summaries instead of re-reading source is the real
token win. Second, the control dashboard should be a **Next.js page on my existing Vercel + Neon +
Auth.js stack** — I already built password-gated, OAuth-backed surfaces for the members funnel — not
the field's Obsidian-plugin dashboard, whose embedded-terminal plugins have open, unresolved
Windows 11 bugs as of May 2026.

## The seven layers, on Windows

The target architecture, with each layer mapped to a Windows-native tool and, where possible, to
something I already run.

```mermaid
flowchart TB
  subgraph CORE["Core — the brain"]
    CCW["Claude Code on Windows"]
  end
  subgraph MEM["Memory"]
    CMD["CLAUDE.md + skills + learnings.md"]
    WIKI["LLM wiki (Obsidian, Karpathy pattern)"]
    GRAPH["graphify code graph"]
  end
  subgraph CONN["Connections — the hands"]
    CLI["CLI-first + direct REST"]
    MCP["MCP: Context7 + Tool Search"]
    TGRAM["Telegram control"]
  end
  subgraph CAD["Cadence — the heartbeat"]
    LOOP["/loop (session)"]
    TASK["Task Scheduler (PC awake)"]
    ROUT["Cloud Routines (PC asleep)"]
  end
  subgraph CTRL["Control — the cockpit"]
    DASH["Next.js dashboard on Vercel + Neon"]
  end
  CCW --> CMD
  CCW --> GRAPH
  CCW --> CLI
  CCW --> MCP
  LOOP --> CCW
  TASK --> CCW
  ROUT --> CCW
  CCW --> DASH
  DASH --> TGRAM
```

**1 — Core runtime (the brain).** Claude Code, native on Windows, is the OS. It reads `CLAUDE.md`,
runs skills, calls tools, schedules work, and runs headless behind buttons. The same plain-folder
setup is portable — it would run unchanged in Codex or Cursor — but Claude Code is the home base.
Per-task model economics is the lever the field calls "personas": I route the work by model tier.
*(reserved for members — sign in free at pravda.systems)*

**2 — Memory (so it stops forgetting).** Three real stores. *Semantic:* `CLAUDE.md` plus a
self-improving **LLM wiki** in Obsidian, built on Karpathy's published pattern — a `raw/` folder of
sources and a `wiki/` folder the model owns, navigated by an `index.md` rather than vector search
(no embeddings needed at hundreds-of-pages scale). *Procedural:* the Agent Skills standard, each
skill self-improving via a `learnings.md` read before every run. *Code:* graphify, on btc-bot, where
querying god nodes and summaries instead of re-reading source cuts tokens hard
*(reserved for members — sign in free at pravda.systems)*
One shared vault path so Claude Code and the wiki read one universal memory.

**3 — Connections (the hands).** Cheapest-correct wins: **CLI-first and direct REST** over MCP where
a CLI exists, because piping a CLI through the bash tool is far leaner than loading a fat MCP schema
*(reserved for members — sign in free at pravda.systems)*
Where a standard connector wins, MCP: **Context7** for version-correct library docs
(`npx ctx7 setup --claude`), and Anthropic's **Tool Search**, now auto-enabled in Claude Code, which
defers tool schemas and collapses the system-tool overhead from ~15k tokens to under 1k. **Telegram**
is the two-way control and approval channel on my phone.

**4 — Cadence (the heartbeat) — three tiers.** `/loop` for session-scoped work (minutes to ~3 days);
**Windows Task Scheduler** for persistent jobs while the PC is awake; **Claude Cloud Routines** for
work that must run while the machine is *off* — the morning brief, the 4-hour digest. Routines clone
the repo into a 4 vCPU / 16 GB cloud box, run a saved prompt, and push a branch back; the minimum
interval is 1 hour, and they draw on the same subscription quota. Every autonomous post still passes
a Telegram approval tap.

**5 — Control surface (the cockpit).** A **Next.js dashboard on my existing Vercel + Neon + Auth.js
stack** — not a bolt-on. Mission Control (the active goal and the me-vs-agent split), live AI spend,
the memory and schedule panels, and skill buttons that run Claude Code headless. I already ship
OAuth-gated, server-rendered surfaces for the members funnel; the dashboard is the same machinery
pointed inward.

**6 — Self-improvement (why it compounds).** An overnight **digest**: read the day's session history,
find patterns and unused capabilities, emit a morning brief tied to my goals. Combined with a
self-updating `CLAUDE.md` and per-skill `learnings.md`, the OS gets a little better while I sleep —
this is the LEARN stage applied to the *system itself*, on top of LEARN for the content.

## What the field research validates (and what I'm adopting)

A 95-post teardown of the field's loudest agentic-OS seller — kept as private competitor research —
confirms the spine of this plan and sharpens a few ideas worth adopting verbatim:

- **The memory / "Self" layer is the differentiator, not the agents.** The single most-repeated,
  most-defensible point in the whole field: *context is the biggest driver of output quality — without a
  persistent, business-specific memory, agents produce generic work; with it, output is in your voice and
  facts.* That is exactly Layer 2 above — the lesson is to treat the Obsidian LLM-wiki as the load-bearing
  layer, not an afterthought, and to make every agent read it on every run. Worth adding alongside the
  semantic/procedural/code stores: an **episodic** memory — an append-only `memory.md` of decisions,
  Git-backed nightly, refactored quarterly.
- **"Own the harness, rent the model."** *Harnesses last years; models cycle every six months.* Build on
  open, swappable agent shells (Claude Code, Hermes) and treat the model as a hot-swappable part — which is
  precisely why the model layer here is the **free NVIDIA pool + Opus only at the top**, re-rankable as
  ratings shift. The routing rule generalizes cleanly: a free "good-enough" tier carries the ~80% of
  volume work, a frontier model touches only the ~20% that needs it (here: the NVIDIA pool for bulk, Opus
  for synthesis — the same discipline the `/mine-corpus` pipeline encodes).
- **A 4-layer mental model** — Intelligence (brain) / Execution (hands) / Research (long-running workhorse)
  / Self (memory) — maps onto the layers above and is a clean way to decide where a job belongs: Claude
  Code = Intelligence, Hermes = Research/workhorse, the CLI/MCP connectors = Execution, Obsidian = Self.
- **Reliability over features when picking agent tools.** Demos are cherry-picked; real use is messier. A
  blunt, keepable heuristic: if a tool needs debugging on >20% of tasks, switch — favour the boring,
  reliable option for bulk work and reserve the flashy one for the narrow case it's genuinely best at.
- **Local-first is a positioning lever, not only a privacy choice.** "Data never leaves the machine" is a
  sellable premium to regulated buyers (legal / medical / finance) — a pricing angle for the offer, not
  just an architecture note.

Everything *around* those ideas in the source — the revenue boasts, the listicle volume, the "free, no
API keys" tool claims — is marketing scaffolding, not method. The architecture and the cost-engineering
are the parts worth taking; the funnel theatre isn't.

## Tooling decisions (verified) — adopt, defer, skip

This is where the research changes the plan. Every tool below was checked for real Windows-11 support
and current status in June 2026, and rated honestly.

| Tool | Windows-11 reality (verified) | Decision | Why |
|---|---|---|---|
| **Claude Code** | Native (Win 10 1809+/11); `winget` or PowerShell install; `-p` headless works; sandboxing needs WSL2 | **Adopt — core** | Already the OS this site runs on |
| **Hermes (Nous Research)** | Real, MIT, native Windows desktop app since v0.16.0 (June 2026); only the dashboard's embedded terminal needs WSL2 | **Adopt — base** | The open-source agent is a **core layer of the stack** (the research / long-running-workhorse tier), adopted now — not deferred. Only the paid-community "Pantheon / dreaming / Mission Control" *branding* ($59/mo Skool) is skipped; the codebase itself is base. |
| **OpenClaw** | Real, MIT, native Windows Hub app — **but** CVE-2026-32922 (CVSS 9.9) + ~135k internet-exposed instances, 63% unauthenticated | **Skip** | A control plane with that exposure record is not going on my machine; the capability isn't worth the blast radius |
| **n8n** | Runs stable on the 24/7 Windows PC three ways: **Docker Engine in WSL2** (`restart: unless-stopped`, truly headless); **Docker Desktop** (fine on an always-on auto-login box); or **native npm** (`npm i -g n8n`) run as a service via **nssm** or Task-Scheduler-at-logon. Only `pm2 startup` is genuinely broken on Windows. | **Adopt** | It works on Windows — use it for deterministic canvas flows + the 400+ app connectors (webhooks, schedules, OAuth integrations) where a visual graph beats code; Claude Code skills still own the code-side logic. |
| **NVIDIA NIM (free pool)** | Hosted API works from Windows; a **pool of 5 free models** (minimax-m3 / diffusiongemma-26b / kimi-k2.6 / glm-5.1 / mistral-medium-3.5), ~40 RPM **per model-key** → run one concurrent worker per model for N× throughput; self-host containers are Linux-only | **Adopt** | Free bulk inference; per-model ~40 RPM is real, so pool the model-keys (one worker each) and keep gemma as the universal fallback. Proven in the mining pipeline (281 posts distilled across the pool). deepseek-v4-pro was dropped — its endpoint times out. |
| **Context7 + Tool Search** | Both GA and cross-platform; Tool Search auto-on in Claude Code | **Adopt** | Live docs + big context savings, near-zero setup |
| **Obsidian** | Obsidian + CustomJS/Dataview fine on Windows; the Terminal/Shell-Commands plugins have open Windows bugs | **Adopt — base** | Obsidian is the **base memory / knowledge layer** — local-first, plain-markdown, hugely adopted; the self-improving LLM wiki (Karpathy pattern) lives here and every agent reads it for context. Only the buggy embedded-terminal plugin *dashboard* is skipped — build that on my own stack. |
| **Telegram + Railway** | Telegram Bot API is pure HTTPS (cross-platform); Railway Hobby ~$5/mo for a tiny always-on bot | **Adopt** | Approve/reject via inline buttons is the firewall |
| **Granola (meeting ingest)** | Has a native Windows app + REST API since 2025 (not Mac-only) | **Optional** | Only if meeting capture becomes part of SENSE |

The headline corrections: **Hermes is real and runs on Windows**, but most of what the videos sell
under its name is a configuration layer I can reproduce myself — so it's a *defer*, not a *must-buy*.
**OpenClaw is a hard skip on security grounds.** And the field's nicest demo — the Obsidian dashboard
with an embedded terminal — is exactly the piece that's flaky on Windows, so I route around it onto
the Next.js stack I already run.

## The phased build

Build green, validate before autonomy. The gate after Phase 1 is load-bearing: the machine is earned,
not assumed.

```mermaid
flowchart LR
  P0["Phase 0 — OS skeleton"] --> P1["Phase 1 — SENSE + STRATEGIZE (by hand)"]
  P1 --> GATE{"Beats baseline?"}
  GATE -- no --> P1
  GATE -- yes --> P2["Phase 2 — memory that grows"]
  P2 --> P3["Phase 3 — dashboard + model routing"]
  P3 --> P4["Phase 4 — OBSERVE + overnight digest"]
  P4 --> P5["Phase 5 — ACT, gated"]
  P5 --> P6["Phase 6 — LEARN closes the loop"]
```

- **Phase 0 — the OS skeleton.** Harden what I have: `CLAUDE.md` + identity files; the skills library;
  hooks; graphify on btc-bot; the Obsidian LLM-wiki scaffold. Stand it on the Windows PC with Claude
  Code native, Docker Engine in WSL2, and one Task Scheduler job. *Done when:* one end-to-end "hello"
  — a scheduled task pulls data → a skill processes it → a Telegram tap writes a result.
- **Phase 1 — SENSE + STRATEGIZE, by hand.** The content-intelligence skill (honest, follower-
  normalized base rates, n≥30) + the project→campaign generator. **Post manually for two weeks and
  measure against baseline.** *Gate:* if it doesn't beat baseline, fix the strategy — build no
  autonomy.
- **Phase 2 — memory that grows.** Wire the LLM wiki + per-skill `learnings.md` + the shared vault
  path. *Done when:* a second STRATEGIZE run visibly drops a losing pattern it learned about itself.
- **Phase 3 — dashboard + model routing.** Stand up the Next.js Mission-Control dashboard on
  Vercel/Neon; implement the per-task model routing. *Done when:* the dashboard shows live state and a
  headless button runs a skill.
- **Phase 4 — OBSERVE + the overnight digest.** The "what happened in 4h?" Telegram digest (SQL
  counts; the model only narrates) + the morning brief. *Done when:* a 4-hour digest reconciles
  exactly with the logs and a morning brief lands tied to my goals.
- **Phase 5 — ACT, gated.** Autonomous reply-drafting to high-signal niche posts, every post through
  the Telegram approval tap, inside a safe envelope with a hard daily cap. *Only after Phase 1 passed.*
- **Phase 6 — LEARN closes the loop.** Weekly, the system compares its results to benchmarks and
  rewrites the memory that feeds STRATEGIZE; the overnight digest rewrites the OS itself.

## Honest gates, real stakes, and cost

- **Validate the content strategy by hand before any autonomy** (the Phase 1 gate). This is the one
  rule that, if skipped, wastes everything downstream.
- **Permission scoping is the real stakes.** A credential on the ring means the action *can* fire
  regardless of instructions — the field's 150k-inbox cautionary tale, and OpenClaw's 63%-of-instances-
  unauthenticated record, are the same lesson twice. Gate every outbound post behind the Telegram tap;
  never automate likes, follows, or DMs.
- **Cost discipline.** Verified Claude API pricing (June 2026): Opus 4.8 $5/$25, Sonnet 4.6 $3/$15,
  Haiku 4.5 $1/$5 per million tokens in/out; cache reads are ~0.1× and writes ~1.25×. The OS leans on
  prompt caching, the `/compact` cadence, and the model split.
  *(reserved for members — sign in free at pravda.systems)*
- **Security.** Verify every package before install (Claude Code itself had two patched CVEs in
  2025–26; OpenClaw's record is worse); treat hostile post text as prompt injection; keep secrets out
  of prompts and in environment variables; the Telegram approval tap is the firewall.
- **Reliability honesty.** The Windows PC is the convenient host, not a guaranteed one. The 24/7,
  machine-asleep tier belongs in Claude Routines or on a small VPS — and the plan says so up front
  rather than discovering it in production.

*Experimental and in motion. The complete tool-by-tool reference behind every choice here — including
everything the n8n-only write-up dropped — is the attached
[agentic-OS reference](_private/agentic-os-reference.md), the faithful synthesis of all 17 field
builds. The Windows-specific status, costs, and security findings above were verified against primary
sources in June 2026.*