ACP Thread Bound Agents
Overview
This plan defines how OpenClaw should support ACP coding agents in thread-capable channels (Discord first) with production-level lifecycle and recovery. Related document: Target user experience:- a user spawns or focuses an ACP session into a thread
- user messages in that thread route to the bound ACP session
- agent output streams back to the same thread persona
- session can be persistent or one shot with explicit cleanup controls
Decision summary
Long term recommendation is a hybrid architecture:- OpenClaw core owns ACP control plane concerns
- session identity and metadata
- thread binding and routing decisions
- delivery invariants and duplicate suppression
- lifecycle cleanup and recovery semantics
- ACP runtime backend is pluggable
- first backend is an acpx-backed plugin service
- runtime does ACP transport, queueing, cancel, reconnect
North-star architecture (holy grail)
Treat ACP as a first-class control plane in OpenClaw, with pluggable runtime adapters. Non-negotiable invariants:- every ACP thread binding references a valid ACP session record
- every ACP session has explicit lifecycle state (
creating,idle,running,cancelling,closed,error) - every ACP run has explicit run state (
queued,running,completed,failed,cancelled) - spawn, bind, and initial enqueue are atomic
- command retries are idempotent (no duplicate runs or duplicate Discord outputs)
- bound-thread channel output is a projection of ACP run events, never ad-hoc side effects
AcpSessionManageris the single ACP writer and orchestrator- manager lives in gateway process first; can be moved to a dedicated sidecar later behind the same interface
- per ACP session key, manager owns one in-memory actor (serialized command execution)
- adapters (
acpx, future backends) are transport/runtime implementations only
- move ACP control-plane state to a dedicated SQLite store (WAL mode) under OpenClaw state dir
- keep
SessionEntry.acpas compatibility projection during migration, not source-of-truth - store ACP events append-only to support replay, crash recovery, and deterministic delivery
Delivery strategy (bridge to holy-grail)
- short-term bridge
- keep current thread binding mechanics and existing ACP config surface
- fix metadata-gap bugs and route ACP turns through a single core ACP branch
- add idempotency keys and fail-closed routing checks immediately
- long-term cutover
- move ACP source-of-truth to control-plane DB + actors
- make bound-thread delivery purely event-projection based
- remove legacy fallback behavior that depends on opportunistic session-entry metadata
Why not pure plugin only
Current plugin hooks are not sufficient for end to end ACP session routing without core changes.- inbound routing from thread binding resolves to a session key in core dispatch first
- message hooks are fire-and-forget and cannot short-circuit the main reply path
- plugin commands are good for control operations but not for replacing core per-turn dispatch flow
- ACP runtime can be pluginized
- ACP routing branch must exist in core
Existing foundation to reuse
Already implemented and should remain canonical:- thread binding target supports
subagentandacp - inbound thread routing override resolves by binding before normal dispatch
- outbound thread identity via webhook in reply delivery
/focusand/unfocusflow with ACP target compatibility- persistent binding store with restore on startup
- unbind lifecycle on archive, delete, unfocus, reset, and delete
Architecture
Boundary model
Core (must be in OpenClaw core):- ACP session-mode dispatch branch in the reply pipeline
- delivery arbitration to avoid parent plus thread duplication
- ACP control-plane persistence (with
SessionEntry.acpcompatibility projection during migration) - lifecycle unbind and runtime detach semantics tied to session reset/delete
- ACP runtime worker supervision
- acpx process invocation and event parsing
- ACP command handlers (
/acp ...) and operator UX - backend-specific config defaults and diagnostics
Runtime ownership model
- one gateway process owns ACP orchestration state
- ACP execution runs in supervised child processes via acpx backend
- process strategy is long lived per active ACP session key, not per message
Core runtime contract
Add a core ACP runtime contract so routing code does not depend on CLI details and can switch backends without changing dispatch logic:- first backend:
AcpxRuntimeshipped as a plugin service - core resolves runtime via registry and fails with explicit operator error when no ACP runtime backend is available
Control-plane data model and persistence
Long-term source-of-truth is a dedicated ACP SQLite database (WAL mode), for transactional updates and crash-safe recovery:acp_sessionssession_key(pk),backend,agent,mode,cwd,state,created_at,updated_at,last_error
acp_runsrun_id(pk),session_key(fk),state,requester_message_id,idempotency_key,started_at,ended_at,error_code,error_message
acp_bindingsbinding_key(pk),thread_id,channel_id,account_id,session_key(fk),expires_at,bound_at
acp_eventsevent_id(pk),run_id(fk),seq,kind,payload_json,created_at
acp_delivery_checkpointrun_id(pk/fk),last_event_seq,last_discord_message_id,updated_at
acp_idempotencyscope,idempotency_key,result_json,created_at, unique(scope, idempotency_key)
- keep
SessionEntry.acpas a compatibility projection during migration - process ids and sockets stay in memory only
- durable lifecycle and run status live in ACP DB, not generic session JSON
- if runtime owner dies, gateway rehydrates from ACP DB and resumes from checkpoints
Routing and delivery
Inbound:- keep current thread binding lookup as first routing step
- if bound target is ACP session, route to ACP runtime branch instead of
getReplyFromConfig - explicit
/acp steercommand usesmode: "steer"
- ACP event stream is normalized to OpenClaw reply chunks
- delivery target is resolved through existing bound destination path
- when a bound thread is active for that session turn, parent channel completion is suppressed
- stream partial output with coalescing window
- configurable min interval and max chunk bytes to stay under Discord rate limits
- final message always emitted on completion or failure
State machines and transaction boundaries
Session state machine:creating -> idle -> running -> idlerunning -> cancelling -> idle | erroridle -> closederror -> idle | closed
queued -> running -> completedrunning -> failed | cancelledqueued -> cancelled
- spawn transaction
- create ACP session row
- create/update ACP thread binding row
- enqueue initial run row
- close transaction
- mark session closed
- delete/expire binding rows
- write final close event
- cancel transaction
- mark target run cancelling/cancelled with idempotency key
Per-session actor model
AcpSessionManager runs one actor per ACP session key:
- actor mailbox serializes
submit,cancel,close, andstreamside effects - actor owns runtime handle hydration and runtime adapter process lifecycle for that session
- actor writes run events in-order (
seq) before any Discord delivery - actor updates delivery checkpoints after successful outbound send
Idempotency and delivery projection
All external ACP actions must carry idempotency keys:- spawn idempotency key
- prompt/steer idempotency key
- cancel idempotency key
- close idempotency key
- Discord messages are derived from
acp_eventsplusacp_delivery_checkpoint - retries resume from checkpoint without re-sending already delivered chunks
- final reply emission is exactly-once per run from projection logic
Recovery and self-healing
On gateway start:- load non-terminal ACP sessions (
creating,idle,running,cancelling,error) - recreate actors lazily on first inbound event or eagerly under configured cap
- reconcile any
runningruns missing heartbeats and markfailedor recover via adapter
- if binding exists but ACP session is missing, fail closed with explicit stale-binding message
- optionally auto-unbind stale binding after operator-safe validation
- never silently route stale ACP bindings to normal LLM path
Lifecycle and safety
Supported operations:- cancel current run:
/acp cancel - unbind thread:
/unfocus - close ACP session:
/acp close - auto close idle sessions by effective TTL
- effective TTL is minimum of
- global/session TTL
- Discord thread binding TTL
- ACP runtime owner TTL
- allowlist ACP agents by name
- restrict workspace roots for ACP sessions
- env allowlist passthrough
- max concurrent ACP sessions per account and globally
- bounded restart backoff for runtime crashes
Config surface
Core keys:acp.enabledacp.dispatch.enabled(independent ACP routing kill switch)acp.backend(defaultacpx)acp.defaultAgentacp.allowedAgents[]acp.maxConcurrentSessionsacp.stream.coalesceIdleMsacp.stream.maxChunkCharsacp.runtime.ttlMinutesacp.controlPlane.store(sqlitedefault)acp.controlPlane.storePathacp.controlPlane.recovery.eagerActorsacp.controlPlane.recovery.reconcileRunningAfterMsacp.controlPlane.checkpoint.flushEveryEventsacp.controlPlane.checkpoint.flushEveryMsacp.idempotency.ttlHourschannels.discord.threadBindings.spawnAcpSessions
- backend command/path overrides
- backend env allowlist
- backend per-agent presets
- backend startup/stop timeouts
- backend max inflight runs per session
Implementation specification
Control-plane modules (new)
Add dedicated ACP control-plane modules in core:src/acp/control-plane/manager.ts- owns ACP actors, lifecycle transitions, command serialization
src/acp/control-plane/store.ts- SQLite schema management, transactions, query helpers
src/acp/control-plane/events.ts- typed ACP event definitions and serialization
src/acp/control-plane/checkpoint.ts- durable delivery checkpoints and replay cursors
src/acp/control-plane/idempotency.ts- idempotency key reservation and response replay
src/acp/control-plane/recovery.ts- boot-time reconciliation and actor rehydrate plan
src/acp/runtime/session-meta.ts- remains temporarily for projection into
SessionEntry.acp - must stop being source-of-truth after migration cutover
- remains temporarily for projection into
Required invariants (must enforce in code)
- ACP session creation and thread bind are atomic (single transaction)
- there is at most one active run per ACP session actor at a time
- event
seqis strictly increasing per run - delivery checkpoint never advances past last committed event
- idempotency replay returns previous success payload for duplicate command keys
- stale/missing ACP metadata cannot route into normal non-ACP reply path
Core touchpoints
Core files to change:src/auto-reply/reply/dispatch-from-config.ts- ACP branch calls
AcpSessionManager.submitand event-projection delivery - remove direct ACP fallback that bypasses control-plane invariants
- ACP branch calls
src/auto-reply/reply/inbound-context.ts(or nearest normalized context boundary)- expose normalized routing keys and idempotency seeds for ACP control plane
src/config/sessions/types.ts- keep
SessionEntry.acpas projection-only compatibility field
- keep
src/gateway/server-methods/sessions.ts- reset/delete/archive must call ACP manager close/unbind transaction path
src/infra/outbound/bound-delivery-router.ts- enforce fail-closed destination behavior for ACP bound session turns
src/discord/monitor/thread-bindings.ts- add ACP stale-binding validation helpers wired to control-plane lookups
src/auto-reply/reply/commands-acp.ts- route spawn/cancel/close/steer through ACP manager APIs
src/agents/acp-spawn.ts- stop ad-hoc metadata writes; call ACP manager spawn transaction
src/plugin-sdk/**and plugin runtime bridge- expose ACP backend registration and health semantics cleanly
src/discord/monitor/message-handler.preflight.ts- keep thread binding override behavior as the canonical session-key resolver
ACP runtime registry API
Add a core registry module:src/acp/runtime/registry.ts
requireAcpRuntimeBackendthrows a typed ACP backend missing error when unavailable- plugin service registers backend on
startand unregisters onstop - runtime lookups are read-only and process-local
acpx runtime plugin contract (implementation detail)
For the first production backend (extensions/acpx), OpenClaw and acpx are
connected with a strict command contract:
- backend id:
acpx - plugin service id:
acpx-runtime - runtime handle encoding:
runtimeSessionName = acpx:v1:<base64url(json)> - encoded payload fields:
name(acpx named session; uses OpenClawsessionKey)agent(acpx agent command)cwd(session workspace root)mode(persistent | oneshot)
- ensure session:
acpx --format json --json-strict --cwd <cwd> <agent> sessions ensure --name <name>
- prompt turn:
acpx --format json --json-strict --cwd <cwd> <agent> prompt --session <name> --file -
- cancel:
acpx --format json --json-strict --cwd <cwd> <agent> cancel --session <name>
- close:
acpx --format json --json-strict --cwd <cwd> <agent> sessions close <name>
- OpenClaw consumes ndjson events from
acpx --format json --json-strict text=>text_delta/outputthought=>text_delta/thoughttool_call=>tool_calldone=>doneerror=>error
Session schema patch
PatchSessionEntry in src/config/sessions/types.ts:
SessionEntry.acp?: SessionAcpMeta
- phase A: dual-write (
acpprojection + ACP SQLite source-of-truth) - phase B: read-primary from ACP SQLite, fallback-read from legacy
SessionEntry.acp - phase C: migration command backfills missing ACP rows from valid legacy entries
- phase D: remove fallback-read and keep projection optional for UX only
- legacy fields (
cliSessionIds,claudeCliSessionId) remain untouched
Error contract
Add stable ACP error codes and user-facing messages:ACP_BACKEND_MISSING- message:
ACP runtime backend is not configured. Install and enable the acpx runtime plugin.
- message:
ACP_BACKEND_UNAVAILABLE- message:
ACP runtime backend is currently unavailable. Try again in a moment.
- message:
ACP_SESSION_INIT_FAILED- message:
Could not initialize ACP session runtime.
- message:
ACP_TURN_FAILED- message:
ACP turn failed before completion.
- message:
- return actionable user-safe message in-thread
- log detailed backend/system error only in runtime logs
- never silently fall back to normal LLM path when ACP routing was explicitly selected
Duplicate delivery arbitration
Single routing rule for ACP bound turns:- if an active thread binding exists for the target ACP session and requester context, deliver only to that bound thread
- do not also send to parent channel for the same turn
- if bound destination selection is ambiguous, fail closed with explicit error (no implicit parent fallback)
- if no active binding exists, use normal session destination behavior
Observability and operational readiness
Required metrics:- ACP spawn success/failure count by backend and error code
- ACP run latency percentiles (queue wait, runtime turn time, delivery projection time)
- ACP actor restart count and restart reason
- stale-binding detection count
- idempotency replay hit rate
- Discord delivery retry and rate-limit counters
- structured logs keyed by
sessionKey,runId,backend,threadId,idempotencyKey - explicit state transition logs for session and run state machines
- adapter command logs with redaction-safe arguments and exit summary
/acp sessionsincludes state, active run, last error, and binding status/acp doctor(or equivalent) validates backend registration, store health, and stale bindings
Config precedence and effective values
ACP enablement precedence:- account override:
channels.discord.accounts.<id>.threadBindings.spawnAcpSessions - channel override:
channels.discord.threadBindings.spawnAcpSessions - global ACP gate:
acp.enabled - dispatch gate:
acp.dispatch.enabled - backend availability: registered backend for
acp.backend
- when ACP is configured (
acp.enabled=true,acp.dispatch.enabled=true, oracp.backend=acpx), plugin auto-enable marksplugins.entries.acpx.enabled=trueunless denylisted or explicitly disabled
min(session ttl, discord thread binding ttl, acp runtime ttl)
Test map
Unit tests:src/acp/runtime/registry.test.ts(new)src/auto-reply/reply/dispatch-from-config.acp.test.ts(new)src/infra/outbound/bound-delivery-router.test.ts(extend ACP fail-closed cases)src/config/sessions/types.test.tsor nearest session-store tests (ACP metadata persistence)
src/discord/monitor/reply-delivery.test.ts(bound ACP delivery target behavior)src/discord/monitor/message-handler.preflight*.test.ts(bound ACP session-key routing continuity)- acpx plugin runtime tests in backend package (service register/start/stop + event normalization)
src/gateway/server.sessions.gateway-server-sessions-a.e2e.test.ts(extend ACP reset/delete lifecycle coverage)- ACP thread turn roundtrip e2e for spawn, message, stream, cancel, unfocus, restart recovery
Rollout guard
Add independent ACP dispatch kill switch:acp.dispatch.enableddefaultfalsefor first release- when disabled:
- ACP spawn/focus control commands may still bind sessions
- ACP dispatch path does not activate
- user receives explicit message that ACP dispatch is disabled by policy
- after canary validation, default can be flipped to
truein a later release
Command and UX plan
New commands
/acp spawn <agent-id> [--mode persistent|oneshot] [--thread auto|here|off]/acp cancel [session]/acp steer <instruction>/acp close [session]/acp sessions
Existing command compatibility
/focus <sessionKey>continues to support ACP targets/unfocuskeeps current semantics/session idleand/session max-agereplace the old TTL override
Phased rollout
Phase 0 ADR and schema freeze
- ship ADR for ACP control-plane ownership and adapter boundaries
- freeze DB schema (
acp_sessions,acp_runs,acp_bindings,acp_events,acp_delivery_checkpoint,acp_idempotency) - define stable ACP error codes, event contract, and state-transition guards
Phase 1 Control-plane foundation in core
- implement
AcpSessionManagerand per-session actor runtime - implement ACP SQLite store and transaction helpers
- implement idempotency store and replay helpers
- implement event append + delivery checkpoint modules
- wire spawn/cancel/close APIs to manager with transactional guarantees
Phase 2 Core routing and lifecycle integration
- route thread-bound ACP turns from dispatch pipeline into ACP manager
- enforce fail-closed routing when ACP binding/session invariants fail
- integrate reset/delete/archive/unfocus lifecycle with ACP close/unbind transactions
- add stale-binding detection and optional auto-unbind policy
Phase 3 acpx backend adapter/plugin
- implement
acpxadapter against runtime contract (ensureSession,submit,stream,cancel,close) - add backend health checks and startup/teardown registration
- normalize acpx ndjson events into ACP runtime events
- enforce backend timeouts, process supervision, and restart/backoff policy
Phase 4 Delivery projection and channel UX (Discord first)
- implement event-driven channel projection with checkpoint resume (Discord first)
- coalesce streaming chunks with rate-limit aware flush policy
- guarantee exactly-once final completion message per run
- ship
/acp spawn,/acp cancel,/acp steer,/acp close,/acp sessions
Phase 5 Migration and cutover
- introduce dual-write to
SessionEntry.acpprojection plus ACP SQLite source-of-truth - add migration utility for legacy ACP metadata rows
- flip read path to ACP SQLite primary
- remove legacy fallback routing that depends on missing
SessionEntry.acp
Phase 6 Hardening, SLOs, and scale limits
- enforce concurrency limits (global/account/session), queue policies, and timeout budgets
- add full telemetry, dashboards, and alert thresholds
- chaos-test crash recovery and duplicate-delivery suppression
- publish runbook for backend outage, DB corruption, and stale-binding remediation
Full implementation checklist
- core control-plane modules and tests
- DB migrations and rollback plan
- ACP manager API integration across dispatch and commands
- adapter registration interface in plugin runtime bridge
- acpx adapter implementation and tests
- thread-capable channel delivery projection logic with checkpoint replay (Discord first)
- lifecycle hooks for reset/delete/archive/unfocus
- stale-binding detector and operator-facing diagnostics
- config validation and precedence tests for all new ACP keys
- operational docs and troubleshooting runbook
Test plan
Unit tests:- ACP DB transaction boundaries (spawn/bind/enqueue atomicity, cancel, close)
- ACP state-machine transition guards for sessions and runs
- idempotency reservation/replay semantics across all ACP commands
- per-session actor serialization and queue ordering
- acpx event parser and chunk coalescer
- runtime supervisor restart and backoff policy
- config precedence and effective TTL calculation
- core ACP routing branch selection and fail-closed behavior when backend/session is invalid
- fake ACP adapter process for deterministic streaming and cancel behavior
- ACP manager + dispatch integration with transactional persistence
- thread-bound inbound routing to ACP session key
- thread-bound outbound delivery suppresses parent channel duplication
- checkpoint replay recovers after delivery failure and resumes from last event
- plugin service registration and teardown of ACP runtime backend
- spawn ACP with thread, exchange multi-turn prompts, unfocus
- gateway restart with persisted ACP DB and bindings, then continue same session
- concurrent ACP sessions in multiple threads have no cross-talk
- duplicate command retries (same idempotency key) do not create duplicate runs or replies
- stale-binding scenario yields explicit error and optional auto-clean behavior
Risks and mitigations
- Duplicate deliveries during transition
- Mitigation: single destination resolver and idempotent event checkpoint
- Runtime process churn under load
- Mitigation: long lived per session owners + concurrency caps + backoff
- Plugin absent or misconfigured
- Mitigation: explicit operator-facing error and fail-closed ACP routing (no implicit fallback to normal session path)
- Config confusion between subagent and ACP gates
- Mitigation: explicit ACP keys and command feedback that includes effective policy source
- Control-plane store corruption or migration bugs
- Mitigation: WAL mode, backup/restore hooks, migration smoke tests, and read-only fallback diagnostics
- Actor deadlocks or mailbox starvation
- Mitigation: watchdog timers, actor health probes, and bounded mailbox depth with rejection telemetry
Acceptance checklist
- ACP session spawn can create or bind a thread in a supported channel adapter (currently Discord)
- all thread messages route to bound ACP session only
- ACP outputs appear in the same thread identity with streaming or batches
- no duplicate output in parent channel for bound turns
- spawn+bind+initial enqueue are atomic in persistent store
- ACP command retries are idempotent and do not duplicate runs or outputs
- cancel, close, unfocus, archive, reset, and delete perform deterministic cleanup
- crash restart preserves mapping and resumes multi turn continuity
- concurrent thread bound ACP sessions work independently
- ACP backend missing state produces clear actionable error
- stale bindings are detected and surfaced explicitly (with optional safe auto-clean)
- control-plane metrics and diagnostics are available for operators
- new unit, integration, and e2e coverage passes
Addendum: targeted refactors for current implementation (status)
These are non-blocking follow-ups to keep the ACP path maintainable after the current feature set lands.1) Centralize ACP dispatch policy evaluation (completed)
- implemented via shared ACP policy helpers in
src/acp/policy.ts - dispatch, ACP command lifecycle handlers, and ACP spawn path now consume shared policy logic
2) Split ACP command handler by subcommand domain (completed)
src/auto-reply/reply/commands-acp.tsis now a thin router- subcommand behavior is split into:
src/auto-reply/reply/commands-acp/lifecycle.tssrc/auto-reply/reply/commands-acp/runtime-options.tssrc/auto-reply/reply/commands-acp/diagnostics.ts- shared helpers in
src/auto-reply/reply/commands-acp/shared.ts
3) Split ACP session manager by responsibility (completed)
- manager is split into:
src/acp/control-plane/manager.ts(public facade + singleton)src/acp/control-plane/manager.core.ts(manager implementation)src/acp/control-plane/manager.types.ts(manager types/deps)src/acp/control-plane/manager.utils.ts(normalization + helper functions)
4) Optional acpx runtime adapter cleanup
extensions/acpx/src/runtime.tscan be split into:- process execution/supervision
- ndjson event parsing/normalization
- runtime API surface (
submit,cancel,close, etc.) - improves testability and makes backend behavior easier to audit