RL Agent Skeleton

A short hub for the agent / RL teaching pages. If you only have ten minutes, this is the route I’d recommend:

  1. Blank RL Agent Template — minimal Agent class and the First runnable system (Gymnasium loop). The shape of an agent before any libraries do the showing.

  2. learn() progression — same page, Evolving learn(): memory tally → tabular Q update → neural batch sketch. Same hook, three different bodies.

  3. PyTorch DQN Agent Walkthrough — minimal DQN-style learn() with PyTorch in full context.

The same conceptual categories show up in larger RL systems — environment interaction, policy, memory, exploration, learning. These pages are minimal teaching examples, not a full training stack. The point isn’t to compete on benchmark scores. The point is to make the slots visible before the implementations get clever, so when a serious RL system breaks later, you can still tell which slot broke.

Who this is for

You’ve run a Gymnasium (CartPole-style) loop before and you’re okay reading Python. You don’t need a production RL framework — these pages deliberately stay small. If you’re only wiring an API-backed “agent,” the shape here still helps: the same slots (what is observed, what is stored, what is updated, what is executed) reappear in tool-using systems, just with different implementations.

Pick a page by goal

Goal

Start here

See the smallest Agent + env loop and evolve learn() in place

Blank RL Agent Template

Same ideas with a real learn() body (targets, replay sketch, PyTorch)

PyTorch DQN Agent Walkthrough

Map this teaching stack to memory, evaluation, and runtime work on the site

Projects overview -> Obversary-OS, Evaluation Systems

After the skeleton

Once the slots are visible, the debugging question stops being “the network is bad” and becomes which slot diverged — bad observations, stale memory, wrong exploration pressure, or a learning update that doesn’t match the policy you thought you trained. That’s the same observability instinct behind structured failure traces and the wider evaluation lane, just applied to RL-shaped systems.

Companion code direction

If you want the runnable code separately, a minimal RL agent repository would be laid out roughly like:

src/minimal_rl_agent/
scripts/train_cartpole.py
scripts/evaluate_cartpole.py

The website you’re reading is the documentation layer (the public /docs/ site). Runnable RL code belongs in a separate codebase — training scripts and a real package layout — not inside this website repo. I keep them apart on purpose so the deployed site stays small and the code story stays versioned where it actually runs.