Configuration

config/langgraph-config.yaml is the runtime config. Loaded at server boot by graph/config.py::LangGraphConfig.from_yaml(). All fields have defaults; the YAML only needs to override what's changing.

Template vs. live file. The repo tracks config/langgraph-config.example.yaml (the shipped template, with defaults + comments). The live config/langgraph-config.yaml is untracked — it's per-deployment state, written by the setup wizard / settings drawer. On first run the server copies the template into place (config_io.ensure_live_config), so edits never dirty a tracked file. Secrets are split out further into config/secrets.yaml (see Secrets).

Full example

yaml

model:
  provider: openai
  name: protolabs/reasoning
  api_base: http://gateway:4000/v1
  api_key: ""
  temperature: 0.2
  max_tokens: 32768
  max_iterations: 50

subagents:
  researcher:
    enabled: true
    tools:
      - current_time
      - web_search
      - fetch_url
      - memory_recall
      - memory_list
    max_turns: 40

middleware:
  knowledge: true
  audit: true
  memory: true
  scheduler: true

knowledge:
  db_path: /sandbox/knowledge/agent.db
  embed_model: nomic-embed-text
  top_k: 5

`model`

Key	Default	What
`provider`	`openai`	LangChain LLM provider. The template's `graph/llm.py` only uses `openai` (via LiteLLM gateway).
`name`	`protolabs/reasoning`	Gateway alias or direct model name.
`api_base`	`http://gateway:4000/v1`	OpenAI-compatible endpoint.
`api_key`	`""`	Secret — not stored here. Managed in the untracked `config/secrets.yaml` (see Secrets); falls back to the `OPENAI_API_KEY` env var.
`temperature`	`0.2`	Sampling temperature.
`max_tokens`	`32768`	Per-call output cap. 32k headroom for the Qwen models we run.
`max_iterations`	`50`	Upper bound on tool-call loops per task.
`favorites`	`[]`	Pinned go-to models for the chat `/model` quick-switch — the inline picker offers these, in this order, instead of the gateway's full list. Manage (add/remove/reorder) in Settings ▸ Model ▸ Favorite models. Empty = `/model` shows every gateway model with a hint to pin favorites.
`request_timeout`	`120.0`	Per-call gateway timeout (seconds) — bounds a hung/slow gateway so a turn fails cleanly.
`max_retries`	`2`	Transient-retry cap on the LLM client (→ `llm_max_retries`).
`top_p`	(unset)	Nucleus sampling. Standard OpenAI param; sent only when set.
`presence_penalty`	(unset)	Standard OpenAI param; sent only when set.
`top_k`	`-1`	Top-k sampling. Rides `extra_body` (vLLM-style gateways). `-1`/negative = gateway default.
`repetition_penalty`	(unset)	Rides `extra_body`; sent only when set.
`chat_template_kwargs`	(unset)	Dict passed via `extra_body` to the vLLM renderer, e.g. `{preserve_thinking: true}` to keep historical `<think>`/`<scratch_pad>` blocks across turns.

All sampling params are optional — omit to use the gateway / model-card defaults. temperature, max_tokens, top_p, and presence_penalty are standard OpenAI fields; top_k, repetition_penalty, and chat_template_kwargs are sent via extra_body for vLLM-compatible gateways.

Secrets

Two core fields are secrets and are never written to the tracked config YAML: the model api_key and the A2A auth.token. (Plugins may declare more — e.g. discord.bot_token, google.client_secret — which are routed and stripped the same way via a dynamic secret_paths(); ADR 0019.) The setup wizard and settings drawer persist them to an untracked sibling file, config/secrets.yaml (gitignored, dockerignored, written 0600):

yaml

# config/secrets.yaml — never committed
model:
  api_key: sk-...
auth:
  token: bearer-...
  federation_token: fed-...   # optional (ADR 0066) — semi-trusted peers get THIS, not `token`

LangGraphConfig.from_yaml overlays this file on top of the main config at load time. Precedence for each secret: secrets.yaml → main YAML value → env var (OPENAI_API_KEY / A2A_AUTH_TOKEN). So env-injected deployments (e.g. infisical run) work unchanged — just leave secrets.yaml absent. Every config save also strips any secret keys the main YAML might still carry, so a checkout converges to secret-free — and the strip relocates, never drops: an inline value secrets.yaml doesn't already hold (e.g. a hand-seeded model.api_key on a fresh instance with no secrets file yet) is written to the overlay in the same save, an existing overlay value is never overwritten by a stale inline copy, and if the overlay write fails the key stays inline rather than being lost (#1645). The /api/config endpoint redacts both fields to ""; runtime status reports only whether a key is set (model.api_key_configured), never the value.

External secrets manager (ADR 0080)

Instead of hand-maintaining env vars (or wrapping the process in infisical run), the server can pull secrets from Infisical itself and export them as env vars — at boot, on every config reload, and on a refresh interval, so rotation lands without a restart. Enable it with the secrets_manager section (or Settings ▸ Secrets manager, which includes a connection test and a sync-now button):

yaml

secrets_manager:
  enabled: true
  provider: infisical
  host: https://us.infisical.com   # or your self-hosted URL
  project_id: "<project id>"
  environment: prod                # the dev sandbox instance can point at `dev`
  path: /
  recursive: true
  refresh_seconds: 300             # 0 = fetch only at boot / config reload
  required: false                  # true = refuse to boot when the manager is unreachable
  override_env: false              # true = manager values beat pre-existing env vars

Semantics — deliberately boring:

Fetched values land in the env fallback tier. The precedence above is unchanged: secrets.yaml → main YAML → env; the manager merely populates env. An env var you exported yourself always shadows the manager (flip override_env for rotation-wins), and only vars the hydrator set itself are ever updated or removed on refresh.
Bootstrap credentials stay local — the universal-auth machine identity (client_id / client_secret) is the one pair that can't come from the manager. It resolves like any core secret: secrets_manager.client_id/client_secret in secrets.yaml (where the Settings UI stores them, never echoed back) → main YAML → the INFISICAL_CLIENT_ID / INFISICAL_CLIENT_SECRET env vars. The recommended posture: secrets.yaml holds only the machine identity; every other credential lives in the manager. Fetched values can never overwrite the bootstrap pair or PROTOAGENT_* instance identity.
Failures don't take the boot down — a fetch error logs one warning and the server continues with whatever the env already has; set required: true to fail fast instead. PROTOAGENT_NO_SECRETS_HYDRATE=1 disables hydration entirely (debugging escape hatch). Nothing is ever cached to disk.
Operator surface: GET /api/secrets/status (last outcome + which env vars are manager-owned — names, never values), POST /api/secrets/sync, POST /api/secrets/test.
Scope the machine identity least-privilege in Infisical (that project, that environment, read-only). Values are also registered with the audit-log redaction layer, so a manager-sourced credential echoed by a tool is scrubbed by exact match.

Federation token — operator vs peer (ADR 0066)

A deployment that hands its /a2a endpoint to semi-trusted A2A peers (a fleet hub, a partner agent) can issue them a second credential instead of the operator bearer: set auth.federation_token (secret; env fallback A2A_FEDERATION_TOKEN). A request authenticated with the federation token reaches only the /a2a + /v1 consumer surfaces — it is denied the whole /api operator surface (plugin install/enable = host code-exec, config/SOUL rewrite, subagent runs, the operator goal set-path) with a 403. The operator bearer keeps full access. Leaving federation_token blank is single-token mode (unchanged behavior): every bearer holder is the operator. Rotate existing peers onto the federation token — until they do, they still hold the operator credential.

This is the R1 path ceiling behind the goal trust-gate: the dangerous goal verifiers (command/test/ci, data+expr) are refused from a /goal chat message for everyone (Phase 1), and the operator sets them through the operator-tier POST /api/goals endpoint — safe precisely because the /api ceiling confines that endpoint to the operator.

`subagents`

One entry per subagent name. Each entry matches a SubagentConfig in graph/subagents/config.py and a SubagentDef field in LangGraphConfig.

Key	Default	What
`enabled`	`true`	If false, the subagent is still registered but dispatches return "disabled" errors.
`tools`	`[]`	Allowlist. Tool names not listed here are invisible to this subagent.
`max_turns`	`30`	Recursion cap.

Two subagents-block keys govern fan-out via the task_batch tool (concurrent delegation):

Key	Default	What
`max_concurrency`	`4`	Cap on in-flight subagents per `task_batch` call (protects the gateway + context budget).
`output_truncate`	`6000`	Per-subagent returned-text cap (chars) under `task_batch`, so a wide fan-out can't blow the parent context. Single `task` is unbounded.

yaml

subagents:
  max_concurrency: 4
  output_truncate: 6000
  researcher:
    enabled: true
    tools: [...]

Adding a new subagent name to the YAML requires matching entries in graph/subagents/config.py::SUBAGENT_REGISTRY, graph/config.py::LangGraphConfig, and the from_yaml() loop. See Configure subagents.

`middleware`

Key	Default	What
`knowledge`	`true`	Inject retrieved knowledge into state before LLM calls. Backed by the bundled `KnowledgeStore` (sqlite + FTS5). Set `false` for a stateless agent.
`audit`	`true`	Append every tool call to `/sandbox/audit/audit.jsonl`.
`memory`	`true`	Persist a reasoning-stripped session summary at terminal turn / session end (read back as `<prior_sessions>` by the knowledge middleware).
`scheduler`	`true`	Wire the bundled scheduler backend (local sqlite). Drops the `schedule_task` / `list_schedules` / `cancel_schedule` tools from the agent loop when `false`. Has the same effect as `SCHEDULER_DISABLED=1` — but `middleware.scheduler: false` is the canonical opt-out (drawer/wizard editable, survives restarts), while the env var is a runtime escape hatch for fleet operators who can't edit YAML in the moment.
`enforcement`	`false`	Opt-in safety gate that blocks tool calls before they execute (see `enforcement` block below). YAML / code seam — not surfaced in the console, since it's a no-op until a deny list, rate limit, or predicate is configured.

`enforcement`

Optional pre-execution gate (graph/middleware/enforcement.py). Only read when middleware.enforcement: true. Blocked calls return a ToolMessage explaining the denial (the model reads it and adapts) instead of running the tool. Forks needing richer policy (scope/cost/etc.) can attach a predicate(tool_name, args) -> reason|None in code.

This is intentionally a fork seam, not a console feature: the bare middleware.enforcement toggle is hidden from the operator settings UI (it does nothing until you add a disallowed_tools list, rate_limits, or a predicate here), so configure it in YAML / code.

yaml

middleware:
  enforcement: true
enforcement:
  disallowed_tools: [fetch_url]          # exact names never allowed
  rate_limits:
    web_search: { max: 20, window_seconds: 60 }

Key	Default	What
`disallowed_tools`	`[]`	Tool names that are always blocked.
`rate_limits`	`{}`	Per-tool sliding-window limit: `{max, window_seconds}`.

`prompt_cache`

PromptCacheMiddleware (graph/middleware/prompt_cache.py) does two things at the model-call boundary: (1) delivers the volatile knowledge/skills/hot-memory context that KnowledgeMiddleware produces — create_agent builds a static system prompt and doesn't read the context state key, so this is what actually gets that context to the model; (2) sets Anthropic cache_control on the stable system-prompt prefix, with the volatile context placed after the breakpoint so it never invalidates the cached prefix.

Caching is gated to Anthropic-family models (safe no-op elsewhere); context delivery happens regardless, so the middleware is always wired.

yaml

prompt_cache:
  enabled: true     # caching half (delivery is unconditional)
  ttl: "5m"         # "5m" ephemeral, or "1h" persistent (agent turns exceed 5m)
  force: false      # cache even when the model name doesn't look Anthropic
                    # (use when your gateway alias hides a Claude model)
  warm:             # cache-warming heartbeat (off by default)
    enabled: false
    interval_seconds: 3300   # 55m — just under the "1h" tier

Key	Default	What
`enabled`	`true`	Apply `cache_control` (Anthropic). No-op on non-Anthropic models.
`ttl`	`"5m"`	Cache tier: `5m` (ephemeral) or `1h` (persistent).
`force`	`false`	Bypass the Anthropic-name heuristic (opaque gateway aliases).
`warm.enabled`	`false`	Run a background heartbeat (`graph/cache_warmer.py`) that periodically reproduces the cached system prefix so the first request after an idle gap hits a warm cache instead of a full miss.
`warm.interval_seconds`	`3300`	Heartbeat period. Set just under `ttl` (default 55m for the `1h` tier).

When to enable warm: sporadic but latency-sensitive traffic on the 1h tier — the ~1-token ping per interval is cheap relative to a cold miss on a multi-thousand-token prefix while a user waits. Leave it off for steady traffic (the cache stays warm on its own — warming is then pure cost) and on non-Anthropic models (nothing to warm; the warmer no-ops at start unless force is set). It runs as its own asyncio task (started/stopped with the server), not through the scheduler — the scheduler fires full agent turns, the wrong primitive for a keep-alive.

`compaction`

Wires langchain's SummarizationMiddleware to summarize old history near the context limit (enables long-horizon runs; we otherwise only cap via max_iterations). Opt-in.

yaml

compaction:
  enabled: true
  trigger: "fraction:0.8"   # or "tokens:120000" / "messages:80"
  keep_messages: 20          # most-recent messages kept verbatim
  model: ""                  # blank = summarize with the main model; or a cheaper one

`execute_code`

Opt-in programmatic tool calling (tools/execute_code.py). Adds an execute_code tool: the model writes one Python script that calls several tools, loops/filters/composes their results in code, and print()s only the final answer — collapsing a long tool-call chain into a single turn (the model reads just the stdout, not every intermediate payload).

The script runs in a child process with a scrubbed environment (only PATH + the bridge fds — no gateway keys / auth tokens) and a hard timeout. Tools are invoked back in the parent over an fd-based RPC bridge, so they run with the parent's credentials, audit, and trace context; the child only orchestrates. Inside the script, tools are reached via an injected tools object (tools.web_search(query=...)). The execute_code tool never exposes itself, so scripts can't recurse.

yaml

execute_code:
  enabled: false           # OFF by default — runs model-authored code
  timeout: 30.0            # seconds before the child process is killed
  tools: []                # allowlist; empty = all tools except execute_code
  output_truncate: 6000    # cap on returned stdout (chars)

Key	Default	What
`enabled`	`false`	Register the `execute_code` tool.
`timeout`	`30.0`	Wall-clock limit; the child is killed past it.
`tools`	`[]`	Tool-name allowlist exposed to scripts (empty = all but `execute_code`).
`output_truncate`	`6000`	Max returned stdout chars.

Security: subprocess + env-scrub + timeout is isolation, not a true sandbox — the child can still touch the filesystem and network as the server user. Enable only for trusted-model output or inside a hardened container (seccomp / read-only FS / network policy). Narrow tools to the minimum the workload needs.

`tools`

Deferred tools — progressive tool disclosure for high tool counts (ADR 0005). When enabled, only a small base set + a search_tools meta-tool are shown to the model each turn; the rest stay bound (callable) but their schemas are withheld until the agent calls search_tools to load them. This cuts the per-turn tool-schema footprint and improves selection accuracy once you routinely exceed ~15 tools.

yaml

tools:
  disabled: []              # tool names to DROP (the operator's denylist)
  deferred:
    enabled: false          # OFF by default — the full tool set is shown
    keep: []                # always-on tool names; empty = built-in base

Key	Default	What
`disabled`	`[]`	Tool names to drop from the agent at graph build — covers the fully assembled set: core, plugin, MCP, the delegation tools, and the filesystem tools (so `disabled: [run_command]` removes shell access for this agent). Live-reloadable — in the console, every row at Settings ▸ Capabilities ▸ Tools carries an on/off switch that edits this list (a toggled-off tool stays listed, dimmed, so it can be re-enabled). Plugins still ADD tools on top (see Plugins). (ADR 0005)
`deferred.enabled`	`false`	Withhold most tool schemas; expose them via `search_tools`.
`deferred.keep`	`[]`	Tool names always shown. Empty → built-in base (keyless core + `task`/`task_batch`/`run_workflow`/`save_workflow` + `search_tools`). `search_tools` is always kept regardless.

Every tool remains executable even while deferred — create_agent registers all executors; deferral only trims what the model sees per turn. The agent loads tools by calling search_tools("github pull request"); matches stay available for the rest of the thread. Leave off unless you have a large catalog (e.g. a chatty MCP server) — for a handful of tools it adds a discovery hop for no benefit.

`telemetry`

Local per-turn cost/latency rollup (ADR 0006). One row per terminal A2A turn (tokens incl. cache, USD cost, duration, LLM/tool call counts), queryable at /api/telemetry/summary + /api/telemetry/recent.

yaml

telemetry:
  enabled: true                 # one cheap write per turn
  db_path: /sandbox/telemetry.db

Key	Default	What
`enabled`	`true`	Write a per-turn row at terminal time. `false` → no store; endpoints return `{enabled:false}`.
`db_path`	`/sandbox/telemetry.db`	SQLite path; `/sandbox`→`~/.protoagent` fallback, instance-scoped (ADR 0004).

`filesystem`

Fenced multi-project filesystem toolset (ADR 0007) — a generic primitive that gives the agent read/write/list/search + fenced command execution over a registry of project directories. It is ON by default, fenced to a default workspace dir when no explicit projects are set (override with PROTOAGENT_WORKSPACE). The capability a forked operator (e.g. "Roxy") composes into a multi-project manager — see the operator-fork guide.

yaml

filesystem:
  enabled: true                  # ON by default
  allow_run: true                # run_command available (ON); HITL-gated below — false = never built
  run_requires_approval: true    # each run_command pauses for operator approval
  bypass_allowed: true           # false = /bypass can't skip the approval gate
  projects:
    - { name: orbis, path: /Users/kj/dev/ORBIS, write: false }   # read-only monitor
    - { name: pixelgen, path: /Users/kj/dev/pixelgen, write: true }

Key	Default	What
`enabled`	`true`	Expose the fs tools (`list_projects`/`read_file`/`list_dir`/`find_files`/`search_files`/`write_file`/`edit_file`/`delete_file`). Off → no fs tools.
`allow_run`	`true`	Also expose `run_command` (fenced `cwd`, but arbitrary argv — dual-use, like `execute_code`). `false` is the per-agent kill switch: the tool is never built, so the model can't see or call it.
`run_requires_approval`	`true`	Each `run_command` call pauses for HITL operator approval (A2A `input-required`). Drop to `false` to let commands run unattended.
`bypass_allowed`	`true`	Permit the per-tab `/bypass` chat toggle to skip the approval gate. `false` = approvals enforced regardless of caller-supplied metadata.
`projects`	`[]`	Managed workspaces: `{name, path, write, no_delete}`. Empty falls back to a default `workspace` dir (so the tools are usable out of the box). Every path is fenced under a project root (`..`/symlink escapes refused); `write:false` makes a project read-only; `write:true` + `no_delete:true` is read-write-no-delete (create/edit, never delete — the third Cowork mount mode); invalid paths are skipped.

The four toggles are editable per agent in the console via the Shell & filesystem chip on Settings ▸ Capabilities ▸ Tools (hot-reload — a save rebuilds the graph). tools.disabled: [run_command] (above) is an equivalent per-tool route — in the console, that's the run_command row switch in the same panel's Filesystem group.

Security: the project roots are the hard fence — every tool resolves paths under a root and refuses escapes; write_file/edit_file need write:true; delete_file additionally needs no_delete:false and always pauses for approval (a permanent-delete floor the /bypass toggle can't skip); the agent's own repo is not a project unless you add it. All mutations are audited. See ADR 0007 §4 and ADR 0083 D5.

`egress`

Deny-by-default outbound-host allowlist (ADR 0008) enforced in fetch_url — the tool where the model picks an arbitrary host (the in-process exfiltration / SSRF vector). Also the single source of truth the OpenShell network policy is generated from (scripts/gen_openshell_policy.py). Editable in the console at Settings ▸ Box ▸ Network (host-scoped, hot-reloads).

yaml

egress:
  allowed_hosts:
    - api.proto-labs.ai
    - "*.github.com"      # wildcard: apex + any subdomain

Key	Default	What
`allowed_hosts`	`[]`	Hosts `fetch_url` may reach. Empty = permissive (off, with a built-in SSRF guard still blocking private / loopback / cloud-metadata addresses). When set, any other host is denied. `*.host` matches subdomains + apex; case-insensitive, port-agnostic. Hot-reloads.

When the allowlist is set, your configured model gateway (model.api_base) host is permitted automatically — you don't have to list it, and the connection-test / "Get models" probes for a custom base URL won't be blocked. (With an empty allowlist this auto-add is a no-op; adding one host there would flip the guard into deny-by-default for every other host.) Covers fetch_url only; execute_code/run_command process-level egress is fenced by running under OpenShell (see Sandboxing & egress).

`security`

Opt-in CIDR allowlist for the outbound A2A destinations the agent POSTs to — push-notification callbacks (caller-supplied webhook URLs) and peer_consult (PEER_<HANDLE>_URL). Empty/unset = today's behavior: callbacks keep their default private-IP denylist (a2a_stores), peer_consult is unrestricted.

yaml

security:
  callback_allowlist:
    - 100.64.0.0/10   # tailnet
    - 10.0.0.0/8      # private fleet

Key	Default	What
`callback_allowlist`	`[]`	CIDRs an outbound callback / peer destination may resolve into. Empty = off. When set it becomes the policy: a destination is allowed iff every resolved IP is inside a listed range (overrides the default callback denylist, so you can permit a specific internal/tailnet range; everything else is rejected). Hot-reloads.

`routing`

Wires langchain's ModelFallbackMiddleware: on a primary-model error, retry on each fallback model (same gateway) in order. Opt-in (empty = no fallback). aux_model is a separate, optional cheap/fast alias for non-reasoning calls.

yaml

routing:
  fallback_models: [claude-haiku-4-5, gpt-5]
  aux_model: ""        # cheap/fast alias for summarization, goal-verify, subagent delegation

Key	Default	What
`fallback_models`	`[]`	Models to retry on a primary-model error, in order (same gateway). Empty = no fallback.
`aux_model`	`""`	Single cheap/fast alias for non-reasoning calls (compaction summarizer, goal verifier, subagent delegation). Blank = everything runs on the main model; each path's own override still wins.

`goal`

Goal mode (graph/goals/) lets you give the agent a testable outcome it self-drives toward. After each terminal turn (the agent stops with a final answer), the goal's verifier decides whether it's met; if not, the agent is re-invoked with a continuation prompt — carrying the verifier's evidence and the running plan the agent records with the update_goal_plan tool — until the verifier passes, the iteration budget runs out (exhausted), or the goal is flagged unachievable (a no-progress streak, or the agent calling the abandon_goal tool). Unlike a pure-LLM "are we done?" check, completion is backed by a real verifier.

The machinery is wired when enabled, but no goal is active until one is set via the /goal control message (works over A2A / the React console / OpenAI-compat) or the /api/goals/{session_id} endpoints. State is persisted per session under GOAL_PATH → /sandbox/goals → ~/.protoagent/goals.

yaml

goal:
  enabled: true            # machinery available; no goal active until set
  max_iterations: 8        # continuation budget per goal
  no_progress_limit: 3     # identical verifier evidence N times -> unachievable
  eval_model: ""           # blank = main model (llm verifier / fuzzy goals)
  verify_timeout: 120      # seconds for command/test/ci verifiers

Key	Default	What
`enabled`	`true`	Wire goal mode. No goal runs until set.
`max_iterations`	`8`	Max continuation turns before a goal is `exhausted`.
`no_progress_limit`	`3`	Same verifier reason+evidence this many times in a row → `unachievable`.
`eval_model`	`""`	Model for the `llm` verifier (blank = main model).
`verify_timeout`	`120`	Wall-clock seconds for `command`/`test`/`ci` verifiers.

Setting a goal — /goal <text> (fuzzy, llm-verified) or a JSON spec:

/goal {"condition": "unit tests pass", "verifier": {"type": "test", "command": "python -m pytest -q"}}

/goal shows status; /goal clear (aliases: stop, off, cancel, reset, none) clears it.

Verifier types (verifier.type): command (exit 0 = met), test (command + surfaces the runner summary), ci (gh pr checks <pr> or latest run on branch), data (a file contains substring, or an expr over parsed JSON as data), llm (transcript judgment — fuzzy fallback).

Security: command/test/ci verifiers execute on the server host. Setting a goal is an operator action — only accept goal specs from trusted input. See Goal mode.

`watches`

Watches (ADR 0067, graph/watches/) are standing tripwires — a condition polled on a cadence, out-of-band, that resumes the agent when it trips. See Watches.

yaml

watches:
  enabled: false           # default OFF — the watch TOOLS are not bound

Key	Default	What
`enabled`	`false`	Bind the `create_watch` / `list_watches` / `clear_watch` tools.

Three things this flag deliberately does not do, because the distinction matters when you toggle it on a running agent:

It is a tool-availability flag only. Turning it off never deletes or mutates stored watch state, and the background watch poller is untouched — existing watches keep polling and keep firing their on_met hooks. You are removing the agent's ability to create and manage watches, not the watches themselves.
The tools ride inside the goal-enabled tool group, so goal.enabled must also be true for them to appear. watches.enabled: true with goal mode off binds nothing.
It defaults off while the feature settles (#2020). Before this flag existed the tools were always bound, so an agent upgrading from ≤0.105.2 that relies on them must now opt in explicitly.

`knowledge`

Only read when middleware.knowledge is true.

Key	Default	What
`db_path`	`/sandbox/knowledge/agent.db`	SQLite file path. Falls back to `~/.protoagent/knowledge/agent.db` automatically when the configured path isn't writable (e.g. running locally without `/sandbox`). Override at runtime with `KNOWLEDGE_DB_PATH`.
`scope`	`""` (→ `scoped`)	Tier (ADR 0041): `scoped` (private, default) · `shared` (the whole store is the host-level commons) · `layered` (read commons ∪ private, write private, operator-`promote`d). The commons lives at `commons.path`/knowledge.db and is host-level + un-scoped (every agent reading the same `commons.path` shares it). A shared/layered fleet must share one `embed_model` — the commons is stamped with it and a mismatched agent serves the commons tier FTS5-only (no incompatible-vector fusion).
`embeddings`	`false`	Opt-in hybrid `HybridKnowledgeStore` (FTS5 keyword + vector similarity, RRF-fused); off = keyword-only FTS5. Off by default so a fresh install never depends on a gateway embedding route (#1681).
`embed_model`	`qwen3-embedding`	Gateway embedding model used when `embeddings` is on — must be a model your gateway serves (not the chat model).
`facts`	`true`	Extract semantic facts during the conversation-harvest pass.
`top_k`	`10`	RAG hits auto-injected into the prompt per turn.

The bundled store is keyword-only FTS5 by default; once your gateway serves embed_model, opt in with embeddings: true for hybrid search — keyword fused with vector similarity (RRF), with an embedding circuit breaker that falls back to FTS5 on an outage. One chunks table; the domain column distinguishes operator-set notes (memory_ingest), always-on hot facts (hot), episodic summaries stored by conversation_harvest (conversation), and extracted facts (fact).

Hot memory — chunks stored under domain='hot' are always-on: KnowledgeMiddleware injects them into context every turn (vs. retrieved-on-relevance), re-read each turn so a freshly-added hot fact is seen immediately. Set one with memory_ingest(content, domain="hot") for facts the agent should never forget (operator preferences, standing constraints).

`skills`

Human-authored skills in the AgentSkills SKILL.md format — a folder with YAML frontmatter (name + description) and a markdown body. Loaded from disk into an FTS5 index on boot; KnowledgeMiddleware lists the index (name + summary) as an always-on <available_skills> block and the agent loads a skill's full body on demand via load_skill (progressive disclosure, ADR 0060).

Key	Default	What
`enabled`	`true`	Load `SKILL.md` skills and list the `<available_skills>` index.
`db_path`	`/sandbox/skills.db`	FTS5 index path. Falls back to `~/.protoagent/skills.db` when the configured path isn't writable.
`top_k`	`5`	Max skills listed in the always-on `<available_skills>` index per turn (the rest stay reachable via `list_skills`; any one's body loads on demand via `load_skill`).
`dir`	`""`	Optional override for the writable skills root. Default: `<config-dir>/skills` (where `<config-dir>` honors `PROTOAGENT_CONFIG_DIR`).

Skills load from two roots — bundled (config/skills/, shipped) and writable (<config-dir>/skills/, your drop-ins); live skills override bundled ones by name. GET /api/runtime/status reports skills.count. See the Skills guide for authoring.

`a2a`

Your fork's A2A agent card identity — the advertised skills and description a caller sees. Declare them here (or contribute card skills from a plugin via register_a2a_skill) instead of editing server/a2a.py. Distinct from skills above: those are disk SKILL.md procedural memory retrieved at inference; these are what the card advertises. Omit both keys and the template ships one free-text chat placeholder so a fresh clone stays callable. The card name follows identity.name / AGENT_NAME.

yaml

a2a:
  description: "Acme Bot — turns support tickets into triaged, drafted replies."
  skills:
    - id: triage_ticket
      name: Triage Ticket
      description: Classify a support ticket and draft a reply.
      tags: [support]
      examples: ["triage ticket #1234"]
      # Optional structured output — enforced + emitted as a typed DataPart (#476):
      # result_mime: application/vnd.protolabs.triage-v1+json
      # output_schema: { type: object, properties: { ... }, required: [ ... ] }

Key	Default	What
`description`	template placeholder	The agent card's `description`.
`skills`	one `chat` placeholder	Advertised `AgentSkill`s (`id`/`name`/`description` + optional `tags`/`examples`). A skill declaring `result_mime` + `output_schema` returns schema-enforced structured output as a typed DataPart (#476); the MIME is advertised in its `output_modes`.
`require_routable_url`	`false`	When `true`, refuse to boot if the card would advertise a loopback URL (e.g. `A2A_PUBLIC_URL` unset on a deployed agent → silently unreachable to remote callers). Off by default — local/desktop runs should advertise loopback.

`mcp`

Connect external Model Context Protocol servers; their tools become agent tools (namespaced <server>__<tool>). Off by default — adding a server is the opt-in. Built on langchain-mcp-adapters.

Key	Default	What
`enabled`	`false`	Connect the configured servers and expose their tools.
`timeout_seconds`	`20`	Per-server discovery timeout. A slow/unreachable server is skipped, never fatal.
`denylist`	`[]`	Namespaced tool names to drop (e.g. `filesystem__write_file`).
`servers`	`[]`	List of `{name, transport, …}`. `stdio` → `command`/`args`/`env`/`cwd`; `streamable_http`/`sse` → `url`/`headers`. Per-server: `enabled: false` skips connecting it (lazy); `tools: {include: [...], exclude: [...]}` filters which of its tools bind.

Per-server tools.include is an allowlist (only those tools bind) — the fix for a server with a large catalog flooding context; exclude drops from the remainder (include wins on conflict). The global denylist is the cross-server hard block. Both match the bare or namespaced tool name. See ADR 0005 on tool pollution.

Servers are discovered at startup/reload. GET /api/runtime/status reports mcp.servers and mcp.tool_count. See the MCP guide and examples/mcp/echo_server.py.

`checkpoint`

The conversation-history checkpointer (durable chat memory across restarts) and its pruning/harvest knobs.

yaml

checkpoint:
  db_path: /sandbox/checkpoints.db   # blank = in-memory (history lost on restart)
  keep_per_thread: 5
  max_age_days: 30
  prune_interval_hours: 6
  harvest_enabled: true

Key	Default	What
`db_path`	`/sandbox/checkpoints.db`	SQLite path (`/sandbox`→`~/.protoagent` fallback, instance-scoped). Blank → in-memory (chat history doesn't survive a restart).
`keep_per_thread`	`5`	How many checkpoints to retain per conversation thread.
`max_age_days`	`30`	Drop checkpoints older than this.
`prune_interval_hours`	`6`	How often the background pruner runs.
`harvest_enabled`	`true`	On thread retire, harvest its history into the knowledge store before purging.

`background`

Background subagent jobs (task(run_in_background=true), ADR 0050) and how their results are delivered (ADR 0070).

yaml

background:
  auto_resume: true

Key	Default	What
`auto_resume`	`true`	When a background job finishes, immediately run a turn in the session that spawned it — its `<task-notification>` drains into that turn and the agent briefs the operator. `false` restores pull-only delivery (the report waits for the session's next manual turn; the ADR 0050 Activity idle-wake covers autonomous reaction instead). Never fires for canceled jobs, incognito-spawned jobs, or jobs spawned from another background turn.

`workflows`

Declarative multi-step recipes over subagents (ADR 0002) — the run_workflow / save_workflow tools.

yaml

workflows:
  enabled: true
  dir: /sandbox/workflows   # writable recipe root

Key	Default	What
`enabled`	`true`	Expose `run_workflow` / `save_workflow` and load `*.yaml` recipes.
`dir`	`/sandbox/workflows`	Writable recipe root (`/sandbox`→`~/.protoagent` fallback). Bundled recipes also load from `workflows/`.

`plugins`

Drop-in plugins (manifest + register()) that contribute tools, bundled skills, FastAPI routes, background surfaces, subagents, and managed MCP servers (ADR 0018/0019). They run in-process with the agent's privileges, so a third-party plugin is disabled by default — only enable plugins you trust. (First-party bundled plugins like discord/google ship enabled: true in their own manifest.)

Key	Default	What
`enabled`	`[]`	Plugin `id`s to load. A plugin also loads if its own manifest has `enabled: true`.
`disabled`	`[]`	Plugin `id`s to force OFF even when their manifest says `enabled: true` — the way a fork drops a bundled first-party plugin (e.g. `discord`, `google`) without deleting its directory or editing core.
`dir`	`""`	Override the writable plugins root (default `<config-dir>/plugins`).
`sources.allow`	`[]`	Optional allowlist of host/org globs for git-URL installs (e.g. `[github.com/yourorg/*]`); empty = any URL (gated install). (ADR 0027.)
`update_policy`	`{}`	Opt-in background auto-updates, keyed by plugin `id`. Each value is `{track, when}`: a non-empty `track` arms the plugin (the ref comes from `plugins.lock`); `when` is `idle` (default — defer while a chat turn is/was just in flight) or `always`. A SHA-pinned plugin is never auto-updated. Empty = manual-only (the default). (#1720; see the Plugins guide.)
`autoupdate_interval_hours`	`6`	Cadence of the auto-update sweep in hours; `0` disables the loop. Only plugins in `update_policy` are ever touched.

Plugins load from two roots — bundled (plugins/, e.g. hello, discord, google, plugin-devkit) and writable (<config-dir>/plugins/, where git-URL installs land); live overrides bundled by id. Plugin tools that shadow a core/MCP tool are skipped. GET /api/runtime/status reports plugins[] (id, enabled, loaded, tools, skills, views, routes/surfaces/subagents counts). Plugins are installable from a git URL (python -m server plugin install <url>) — see Install & publish plugins — and a repo can ship tools, subagents, skills, workflows, and console views. See the Plugins guide.

Plugin-declared config sections (ADR 0019)

A plugin can claim a top-level config section and declare its keys/secrets/Settings in its manifest (config_section / config defaults / secrets / settings). The section is resolved (manifest defaults ⊕ YAML ⊕ secrets overlay) into config.plugin_config["<section>"] and surfaced as a Settings group — with no edit to config.py / config_io.py / settings_schema.py. This is where the Discord config now lives:

yaml

discord:                 # claimed by plugins/discord/ — NOT a core config field
  enabled: false
  admin_ids: []
  # bot_token → secrets.yaml (plugin-declared secret)

A plugin section colliding with a reserved built-in (model, mcp, plugins, …) is ignored. Plugin secrets (e.g. discord.bot_token) route to secrets.yaml dynamically — see Secrets above. External plugins (e.g. a Google or Slack integration installed from its own repo) claim their own sections the same way.

Scheduler

Scheduler enable/disable is YAML-controlled (middleware.scheduler above) so the drawer can flip it without a restart. Backend selection and runtime knobs (which backend, where to write the sqlite, where to publish, etc.) are env-driven so the same container image can run under either backend without a rebuild. See Schedule future work for the full guide.

Env var	Default	What
`SCHEDULER_DB_DIR`	`/sandbox/scheduler`	Parent directory for `<agent_name>/jobs.db`. Falls back to `~/.protoagent/scheduler/<agent_name>/jobs.db` when unwritable.
`SCHEDULER_INVOKE_URL`	`http://127.0.0.1:<active_port>`	Local backend: where to POST `message/send` when a job fires. Override only if the agent's A2A endpoint isn't on localhost.
`SCHEDULER_DISABLED`	unset	Runtime escape hatch — set to `1` / `true` to drop the scheduler tools entirely without editing YAML. `middleware.scheduler: false` is the canonical opt-out.

Configuration ​

Full example ​

model ​

Secrets ​

External secrets manager (ADR 0080) ​

Federation token — operator vs peer (ADR 0066) ​

subagents ​

middleware ​

enforcement ​

prompt_cache ​

compaction ​

execute_code ​

tools ​

telemetry ​

filesystem ​

egress ​

security ​

routing ​

goal ​

watches ​

knowledge ​

skills ​

a2a ​

mcp ​

checkpoint ​

background ​

workflows ​

plugins ​

Plugin-declared config sections (ADR 0019) ​

Scheduler ​

Configuration

Full example

`model`

Secrets

External secrets manager (ADR 0080)

Federation token — operator vs peer (ADR 0066)

`subagents`

`middleware`

`enforcement`

`prompt_cache`

`compaction`

`execute_code`

`tools`

`telemetry`

`filesystem`

`egress`

`security`

`routing`

`goal`

`watches`

`knowledge`

`skills`

`a2a`

`mcp`

`checkpoint`

`background`

`workflows`

`plugins`

Plugin-declared config sections (ADR 0019)

Scheduler