# The complete agentic operating system I'm building on Windows: full plan and execution

> A solo automation engineer's full plan for an agentic operating system on Windows 11: a five-stage SENSE-to-LEARN content-growth loop built on Claude Code, Obsidian memory, a free NVIDIA model pool, Telegram approvals, and three scheduler tiers, with every tool verified and every honest caveat named.
>
> https://pravda.systems/blog/windows-agentic-os-build-plan · 2026-06-18

It's tempting to boil the field's "agentic OS" down to three words: *just run n8n*. Seventeen detailed builds compressed into a single workflow canvas. Tidy. Also wrong. The honest version is messier, because the mistake isn't laziness, it's trusting the demos. When you stop watching the highlight reels and actually install the field's "agentic OS" tools, they split cleanly in two: a **real, open-source core** ([Claude Code](https://claude.com/claude-code), Hermes, OpenClaw, all genuinely installable, all now running natively on Windows) and a **paid-community branding layer** bolted on top (the "Pantheon" of personas, the overnight "dreaming" loop, the "Mission Control" dashboards), sold through subscription communities and absent from any official codebase. This plan takes the real core, rebuilds the good ideas from the branding layer with tools I already own, and runs the whole thing on the machine I actually sit at: **Windows 11**. Every tool below was checked for Windows support and current status in June 2026. The faithful tool-by-tool reference behind every choice is attached: [the complete agentic-OS reference](_private/agentic-os-reference.md).

## What does the loop actually do, and where's the money?

**One closed loop turns real automation work into clients, five stages, each feeding the next. The money order never changes: clients first, everything else downstream.**

```mermaid
flowchart LR
  S["1 SENSE"] --> T["2 STRATEGIZE"]
  T --> A["3 ACT"]
  A --> O["4 OBSERVE"]
  O --> L["5 LEARN"]
  L --> S
```

| Stage | What it does |
|---|---|
| **1 · SENSE** | Read competitors honestly, winners *and* failures, plus my own world: the 14,099-post automation corpus I already mined, and the cheap social-intelligence funnel I already built. |
| **2 · STRATEGIZE** | Turn one real project into an optimized post-series and a funnel that points at the service. |
| **3 · ACT** | Gated auto-posting and replies inside a safe envelope, every outbound action approved from my phone. |
| **4 · OBSERVE** | A "what happened in the last 4 hours?" digest, computed from logs and narrated by the model, never invented. |
| **5 · LEARN** | Compare results to baselines, then rewrite the memory that feeds STRATEGIZE. |

The order of the money matters more than any single stage:

> **Clients first. Subscribers are the asset. Views are fuel. Ad-revenue is a rounding error.**

Content is build-in-public of real work, so a single artifact is portfolio, proof, and distribution at once. The difference isn't a fancier diagram. **It's that the loop is run by a *complete* agentic OS that remembers, schedules, controls, and improves itself, not a single workflow canvas.**

## Can this really run on Windows, or is it a Mac-only fantasy?

**The field assumes a Mac. I run Windows 11, and the verified answer is that the core tools are now first-class here, but the "24/7 server" story has real holes I plan around instead of pretending away.**

Every field build quietly assumes a MacBook or an always-on Mac Mini: [Homebrew](https://brew.sh), `~/.config`, desktop cron. On Windows 11, each layer has to translate. The good news is that the core tools are now genuinely native. The honest news is that running a home PC as a always-on server has gaps, and I'd rather design around them than discover them in production.

```mermaid
flowchart TB
  subgraph PC["Always-on Windows 11 PC"]
    CC["Claude Code (native, PowerShell)"]
    TS["Task Scheduler (PC awake)"]
    subgraph WSL["WSL2 + Docker Engine + systemd"]
      SVC["n8n / Postgres / services"]
    end
  end
  CC --> SVC
  TS --> CC
  CLOUD["Claude Routines (cloud, PC asleep)"] -.-> GH["GitHub repo"]
  GH -.-> CC
  CC --> TG["Telegram (control from phone)"]
  TG --> CC
```

The translation, tool by tool, every row verified against primary docs in June 2026:

| Field default (Mac-centric) | Windows-native equivalent | Verified status |
|---|---|---|
| Claude Code on macOS/Linux | **Claude Code native on Windows 11** (`irm https://claude.ai/install.ps1 \| iex`, or `winget install Anthropic.ClaudeCode`) | Native, no WSL2 required. `claude -p` headless works. Sandboxing is the one gap, it needs WSL2 |
| Homebrew | **winget** | `winget install -e --id OpenJS.NodeJS.LTS` · `Docker.DockerDesktop` · `Obsidian.Obsidian` · `Python.Python.3.13` · `Git.Git` |
| Desktop cron | **Windows Task Scheduler** | Native. "run whether user is logged on or not" + missed-run catch-up. Env vars need a wrapper script, a named account (not SYSTEM) for user-installed CLIs |
| Always-on Mac Mini | **Always-on Windows 11 PC** | Works, with caveats (below) |
| n8n via Homebrew/pm2 | **Docker Engine inside WSL2 with systemd** | `pm2 startup` is broken on Windows. Docker Engine-in-WSL2 (`systemctl enable --now docker`) is the dependable always-on path |
| bash / zsh | **PowerShell** (+ Git Bash when installed) | Native PowerShell tool in Claude Code. Git Bash unlocks the Bash tool |

Here's the part the demos skip. [Docker](https://www.docker.com) *Desktop* on Windows does **not** run headless. It needs an interactive login to start containers, and Docker's own roadmap issue for this is still open. WSL2 updates through the Microsoft Store, and that update **kills running instances without warning**. Windows forced restarts can still fire outside an 18-hour Active-Hours window. A home PC realistically lands around **95–98% uptime**, and running it 24/7 burns real electricity. So the plan is layered: the **PC-awake** tier runs interactive and scheduled work (Task Scheduler + Docker Engine in WSL2), and the truly **always-on, machine-asleep** tier runs in the cloud (Claude Routines), with a cheap [Hetzner](https://www.hetzner.com) Linux VPS (from ~€3.49/mo) as the honest fallback if I ever need a service that genuinely cannot blink.

> I am not going to pretend a desktop is a datacenter.

## What do I already have, and what do I still need to build?

**Most of this OS already exists in my repo and the systems around it. The job is mostly wiring, not inventing, which is why it's a matter of weeks, not months.**

| OS layer | My actual asset | Status |
|---|---|---|
| Core runtime | Claude Code + `.claude/skills` + pre-commit hooks + `CLAUDE.md` discipline gates | **Have**: this site runs on it |
| Code memory | **graphify** (298 nodes / 597 edges / 15 communities on this repo. The real payoff is on btc-bot, 728k lines) | **Have**: `graphify-out/` is committed |
| Semantic memory | `CLAUDE.md` + `docs/STATE.md` spine. A self-improving wiki is the gap | **Partial**: add the LLM-wiki pattern |
| Control surface | [Next.js](https://nextjs.org) 16 + [Vercel](https://vercel.com) + [Neon](https://neon.tech) + [Auth.js](https://authjs.dev) v5, the live `/notes` members funnel and `/dashboard` cabinet | **Have**: the dashboard foundation already ships |
| SENSE data | The 14,099-post / 7,526-account corpus + the Apify discovery→enrich→score funnel | **Have**: base rates already de-overfitted |
| Cheap inference | A **pool of free NVIDIA NIM models**: `minimaxai/minimax-m3`, `google/diffusiongemma-26b-a4b-it`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `mistralai/mistral-medium-3.5-128b` (separate free key per model, one concurrent worker each) | **Have**: free-tier per key, pooled for throughput + multi-model coverage |
| Channels | Telegram bot as the two-way approval surface | **Build**: small, well-understood |
| Cadence | `/loop` (have) + Task Scheduler (build) + Cloud Routines (build) | **Partial** |
| Self-improvement | Self-updating `CLAUDE.md`, per-skill `learnings.md`, an overnight digest | **Build**: the compounding layer |

Two honest notes on my own assets. First, **graphify earns its keep on btc-bot, not on this site**. The site's code is small enough that graphify itself prints "you may not need a graph." I'll point it at the 728k-line trading system, where querying summaries instead of re-reading source is the real token win. Second, the control dashboard should be a **Next.js page on my existing Vercel + Neon + Auth.js stack**, I already built password-gated, OAuth-backed surfaces for the members funnel, not the field's [Obsidian](https://obsidian.md)-plugin dashboard, whose embedded-terminal plugins have open, unresolved Windows 11 bugs as of May 2026.

## What are the layers, and how do they map to Windows?

**Six layers, each mapped to a Windows-native tool and, wherever possible, to something I already run.**

```mermaid
flowchart TB
  subgraph CORE["Core — the brain"]
    CCW["Claude Code on Windows"]
  end
  subgraph MEM["Memory"]
    CMD["CLAUDE.md + skills + learnings.md"]
    WIKI["LLM wiki (Obsidian, Karpathy pattern)"]
    GRAPH["graphify code graph"]
  end
  subgraph CONN["Connections — the hands"]
    CLI["CLI-first + direct REST"]
    MCP["MCP: Context7 + Tool Search"]
    TGRAM["Telegram control"]
  end
  subgraph CAD["Cadence — the heartbeat"]
    LOOP["/loop (session)"]
    TASK["Task Scheduler (PC awake)"]
    ROUT["Cloud Routines (PC asleep)"]
  end
  subgraph CTRL["Control — the cockpit"]
    DASH["Next.js dashboard on Vercel + Neon"]
  end
  CCW --> CMD
  CCW --> GRAPH
  CCW --> CLI
  CCW --> MCP
  LOOP --> CCW
  TASK --> CCW
  ROUT --> CCW
  CCW --> DASH
  DASH --> TGRAM
```

**1: Core runtime (the brain).** Claude Code, native on Windows, *is* the OS. It reads `CLAUDE.md`, runs skills, calls tools, schedules work, and runs headless behind buttons. The same plain-folder setup is portable. It would run unchanged in Codex or [Cursor](https://cursor.com), but Claude Code is home base. Per-task model economics is the lever the field dresses up as "personas": I just route the work by model tier. *(reserved for members — sign in free at pravda.systems)*

**2: Memory (so it stops forgetting).** Three real stores. *Semantic:* `CLAUDE.md` plus a self-improving **LLM wiki** in Obsidian, built on the pattern [Andrej Karpathy](https://x.com/karpathy) published, a `raw/` folder of sources and a `wiki/` folder the model owns, navigated by an `index.md` rather than vector search (no embeddings needed at hundreds-of-pages scale). *Procedural:* the Agent Skills standard, each skill self-improving via a `learnings.md` read before every run. *Code:* graphify, on btc-bot, where querying god nodes and summaries instead of re-reading source cuts tokens hard *(reserved for members — sign in free at pravda.systems)* One shared vault path so Claude Code and the wiki read one universal memory.

**3: Connections (the hands).** Cheapest-correct wins: **CLI-first and direct REST** over MCP wherever a CLI exists, because piping a CLI through the bash tool is far leaner than loading a fat MCP schema *(reserved for members — sign in free at pravda.systems)* Where a standard connector wins, MCP earns it: **Context7** for version-correct library docs (`npx ctx7 setup --claude`), and Anthropic's **Tool Search**, now auto-enabled in Claude Code, which defers tool schemas and collapses system-tool overhead from ~15k tokens to under 1k. [Telegram](https://telegram.org) is the two-way control and approval channel on my phone.

**4: Cadence (the heartbeat), three tiers.** `/loop` for session-scoped work (minutes to ~3 days), **Windows Task Scheduler** for persistent jobs while the PC is awake, **Claude Cloud Routines** for work that must run while the machine is *off*: the morning brief, the 4-hour digest. Routines clone the repo into a 4 vCPU / 16 GB cloud box, run a saved prompt, and push a branch back. The minimum interval is 1 hour, and they draw on the same subscription quota. Every autonomous post still passes a Telegram approval tap.

**5: Control surface (the cockpit).** A **Next.js dashboard on my existing Vercel + Neon + Auth.js stack**, not a bolt-on. Mission Control (the active goal and the me-vs-agent split), live AI spend, the memory and schedule panels, and skill buttons that run Claude Code headless. I already ship OAuth-gated, server-rendered surfaces for the members funnel. The dashboard is the same machinery pointed inward.

**6: Self-improvement (why it compounds).** An overnight **digest**: read the day's session history, find patterns and unused capabilities, emit a morning brief tied to my goals. Combined with a self-updating `CLAUDE.md` and per-skill `learnings.md`, the OS gets a little better while I sleep. This is the LEARN stage turned on the *system itself*, on top of LEARN for the content.

## What does the field actually get right (and what's just marketing)?

**The field gets one big thing right: the memory layer is the real differentiator, not the agents. Almost everything else loud about it is marketing.**

A 95-post teardown of the field's loudest agentic-OS seller, kept as private competitor research, confirms the spine of this plan and sharpens a few ideas worth taking verbatim.

- **The memory / "Self" layer is the differentiator, not the agents.** The single most-repeated, most-defensible claim in the whole field: *context is the biggest driver of output quality. Without a persistent, business-specific memory, agents produce generic work, but with it, the output is in your voice and facts.* That's exactly Layer 2. The lesson is to treat the Obsidian LLM-wiki as the load-bearing layer, not an afterthought, and to make every agent read it on every run. Worth adding alongside the semantic/procedural/code stores: an **episodic** memory, an append-only `memory.md` of decisions, Git-backed nightly, refactored quarterly.
- **"Own the harness, rent the model."** *Harnesses last years, models cycle every six months.* Build on open, swappable agent shells (Claude Code, Hermes) and treat the model as a hot-swappable part, which is precisely why the model layer here is the **free NVIDIA pool + Opus only at the top**, re-rankable as ratings shift. The routing rule generalizes cleanly: a free "good-enough" tier carries the ~80% of volume work, and a frontier model touches only the ~20% that needs it, the same discipline the `/mine-corpus` pipeline already encodes.
- **A clean 4-layer mental model** for deciding where any job belongs:

| Layer | Role | My tool |
|---|---|---|
| Intelligence | the brain | Claude Code |
| Execution | the hands | CLI / MCP connectors |
| Research | the long-running workhorse | Hermes |
| Self | memory | Obsidian |

- **Reliability over features when picking agent tools.** Demos are cherry-picked, real use is messier. A blunt, keepable heuristic: **if a tool needs debugging on more than 20% of tasks, switch.** Favour the boring, reliable option for bulk work, reserve the flashy one for the narrow case it's genuinely best at.
- **Local-first is a positioning lever, not only a privacy choice.** "Data never leaves the machine" is a sellable premium to regulated buyers, legal, medical, finance, and that's a pricing angle for the offer, not just an architecture note.

Everything *around* those ideas in the source (the revenue boasts, the listicle volume, the "free, no API keys" tool claims) is scaffolding, not method. **The architecture and the cost-engineering are the parts worth taking. The funnel theatre isn't.**

## Which tools make the cut, adopt, defer, or skip?

**Adopt the boring, verified core, skip the one tool with a 9.9-severity CVE, route around the demo that's flaky on Windows.** Every tool below was checked for real Windows-11 support and current status in June 2026, and rated honestly.

| Tool | Windows-11 reality (verified) | Decision | Why |
|---|---|---|---|
| **Claude Code** | Native (Win 10 1809+/11), `winget` or PowerShell install, `-p` headless works, sandboxing needs WSL2 | **Adopt: core** | Already the OS this site runs on |
| **Hermes** ([Nous Research](https://nousresearch.com)) | Real, MIT, native Windows desktop app since v0.16.0 (June 2026), only the dashboard's embedded terminal needs WSL2 | **Adopt: base** | The open-source agent is the research / long-running-workhorse tier, adopted now. Only the paid "Pantheon / dreaming / Mission Control" *branding* ($59/mo Skool) is skipped |
| **OpenClaw** | Real, MIT, native Windows Hub app, **but** CVE-2026-32922 (CVSS 9.9) + ~135k internet-exposed instances, 63% unauthenticated | **Skip** | A control plane with that exposure record is not going on my machine, the capability isn't worth the blast radius |
| **[n8n](https://n8n.io)** | Stable on a 24/7 Windows PC three ways: **Docker Engine in WSL2** (`restart: unless-stopped`, truly headless), **Docker Desktop** (fine on an always-on auto-login box), or **native npm** (`npm i -g n8n`) run as a service via **nssm** or Task-Scheduler-at-logon. Only `pm2 startup` is genuinely broken | **Adopt** | It works on Windows. Use it for deterministic canvas flows + the 400+ app connectors (webhooks, schedules, OAuth) where a visual graph beats code, Claude Code skills still own the code-side logic |
| **[NVIDIA NIM](https://build.nvidia.com)** (free pool) | Hosted API works from Windows, a pool of 5 free models (minimax-m3 / diffusiongemma-26b / kimi-k2.6 / glm-5.1 / mistral-medium-3.5), ~40 RPM **per model-key** → one worker per model for N× throughput, self-host containers are Linux-only | **Adopt** | Free bulk inference, pool the model-keys, keep gemma as the universal fallback. Proven in the mining pipeline (281 posts distilled across the pool). deepseek-v4-pro was dropped, its endpoint times out |
| **Context7 + Tool Search** | Both GA and cross-platform, Tool Search auto-on in Claude Code | **Adopt** | Live docs + big context savings, near-zero setup |
| **Obsidian** | Obsidian + CustomJS/Dataview fine on Windows, the Terminal/Shell-Commands plugins have open Windows bugs | **Adopt: base** | The base memory layer, local-first, plain-markdown, hugely adopted, the self-improving LLM wiki lives here and every agent reads it. Only the buggy embedded-terminal *dashboard* is skipped |
| **Telegram + [Railway](https://railway.app)** | Telegram Bot API is pure HTTPS (cross-platform), Railway Hobby ~$5/mo for a tiny always-on bot | **Adopt** | Approve/reject via inline buttons is the firewall |
| **[Granola](https://www.granola.ai)** (meeting ingest) | Native Windows app + REST API since 2025 (not Mac-only) | **Optional** | Only if meeting capture becomes part of SENSE |

Three headline corrections fall out of that table. **Hermes the codebase is real and runs on Windows**, adopt it as base, but most of what the videos sell *under its name* is a configuration layer I can reproduce myself, so that marketed layer is a defer, not a must-buy. **OpenClaw is a hard skip on security grounds.** And the field's nicest demo (the Obsidian dashboard with an embedded terminal) is exactly the piece that's flaky on Windows, so I route around it onto the Next.js stack I already run.

## How do I build it without fooling myself?

**Build green, validate before autonomy. The gate after Phase 1 is load-bearing. The machine is earned, not assumed.**

```mermaid
flowchart LR
  P0["Phase 0 — OS skeleton"] --> P1["Phase 1 — SENSE + STRATEGIZE (by hand)"]
  P1 --> GATE{"Beats baseline?"}
  GATE -- no --> P1
  GATE -- yes --> P2["Phase 2 — memory that grows"]
  P2 --> P3["Phase 3 — dashboard + model routing"]
  P3 --> P4["Phase 4 — OBSERVE + overnight digest"]
  P4 --> P5["Phase 5 — ACT, gated"]
  P5 --> P6["Phase 6 — LEARN closes the loop"]
```

1. **Phase 0: the OS skeleton.** Harden what I have: `CLAUDE.md` + identity files, the skills library, hooks, graphify on btc-bot, the Obsidian LLM-wiki scaffold. Stand it on the Windows PC with Claude Code native, Docker Engine in WSL2, and one Task Scheduler job. **Done when:** one end-to-end "hello": a scheduled task pulls data → a skill processes it → a Telegram tap writes a result.
2. **Phase 1: SENSE + STRATEGIZE, by hand.** The content-intelligence skill (honest, follower-normalized base rates, n≥30) + the project→campaign generator. **Post manually for two weeks and measure against baseline.** **Gate:** if it doesn't beat baseline, fix the strategy. Build no autonomy.
3. **Phase 2: memory that grows.** Wire the LLM wiki + per-skill `learnings.md` + the shared vault path. **Done when:** a second STRATEGIZE run visibly drops a losing pattern it learned about itself.
4. **Phase 3: dashboard + model routing.** Stand up the Next.js Mission-Control dashboard on Vercel/Neon. Implement the per-task model routing. **Done when:** the dashboard shows live state and a headless button runs a skill.
5. **Phase 4: OBSERVE + the overnight digest.** The "what happened in 4h?" Telegram digest (SQL counts, the model only narrates) + the morning brief. **Done when:** a 4-hour digest reconciles exactly with the logs and a morning brief lands tied to my goals.
6. **Phase 5: ACT, gated.** Autonomous reply-drafting to high-signal niche posts, every post through the Telegram approval tap, inside a safe envelope with a hard daily cap. *Only after Phase 1 passed.*
7. **Phase 6: LEARN closes the loop.** Weekly, the system compares its results to benchmarks and rewrites the memory that feeds STRATEGIZE. The overnight digest rewrites the OS itself.

## What could go wrong, and what does it cost?

**Three things can sink this: skipping the by-hand validation, handing an agent a credential it can fire unsupervised, and letting Opus touch routine work. Each has a named fix.**

**Validate the content strategy by hand before any autonomy** (the Phase 1 gate). This is the one rule that, if skipped, wastes everything downstream. No exceptions for being in a hurry.

**Permission scoping is the real stakes.** A credential on the ring means the action *can* fire regardless of what the prompt says.

> The field's 150k-inbox cautionary tale and OpenClaw's 63%-of-instances-unauthenticated record are the same lesson twice. Gate every outbound post behind the Telegram tap. Never automate likes, follows, or DMs.

**Cost discipline.** Verified Claude API pricing, June 2026 (per million tokens):

| Model | Input | Output |
|---|---|---|
| Opus 4.8 | $5 | $25 |
| Sonnet 4.6 | $3 | $15 |
| Haiku 4.5 | $1 | $5 |

Cache reads run ~0.1× and writes ~1.25×. The OS leans on prompt caching, the `/compact` cadence, and the model split. *(reserved for members — sign in free at pravda.systems)*

**Security.** Verify every package before install, as Claude Code itself had two patched CVEs in 2025–26, and OpenClaw's record is worse. Treat hostile post text as prompt injection. Keep secrets out of prompts and in environment variables. The Telegram approval tap is the firewall, not a formality.

**Reliability honesty.** The Windows PC is the convenient host, not a guaranteed one. The 24/7, machine-asleep tier belongs in Claude Routines or on a small VPS, and the plan says so up front, rather than discovering it in production.

*Experimental and in motion. The complete tool-by-tool reference behind every choice here (including everything the n8n-only write-up dropped) is the attached [agentic-OS reference](_private/agentic-os-reference.md), the faithful synthesis of all 17 field builds. The Windows-specific status, costs, and security findings above were verified against primary sources in June 2026.*