PTY and Process Supervision Plan
1. Problem and goal
We need one reliable lifecycle for long-running command execution across:execforeground runsexecbackground runsprocessfollow up actions (poll,log,send-keys,paste,submit,kill,remove)- CLI agent runner subprocesses
2. Scope and boundaries
- Keep implementation internal in
src/process/supervisor. - Do not create a new package for this.
- Keep current behavior compatibility where practical.
- Do not broaden scope to terminal replay or tmux style session persistence.
3. Implemented in this branch
Supervisor baseline already present
- Supervisor module is in place under
src/process/supervisor/*. - Exec runtime and CLI runner are already routed through supervisor spawn and wait.
- Registry finalization is idempotent.
This pass completed
- Explicit PTY command contract
SpawnInputis now a discriminated union insrc/process/supervisor/types.ts.- PTY runs require
ptyCommandinstead of reusing genericargv. - Supervisor no longer rebuilds PTY command strings from argv joins in
src/process/supervisor/supervisor.ts. - Exec runtime now passes
ptyCommanddirectly insrc/agents/bash-tools.exec-runtime.ts.
- Process layer type decoupling
- Supervisor types no longer import
SessionStdinfrom agents. - Process local stdin contract lives in
src/process/supervisor/types.ts(ManagedRunStdin). - Adapters now depend only on process level types:
src/process/supervisor/adapters/child.tssrc/process/supervisor/adapters/pty.ts
- Process tool lifecycle ownership improvement
src/agents/bash-tools.process.tsnow requests cancellation through supervisor first.process kill/removenow use process-tree fallback termination when supervisor lookup misses.removekeeps deterministic remove behavior by dropping running session entries immediately after termination is requested.
- Single source watchdog defaults
- Added shared defaults in
src/agents/cli-watchdog-defaults.ts. src/agents/cli-backends.tsconsumes the shared defaults.src/agents/cli-runner/reliability.tsconsumes the same shared defaults.
- Dead helper cleanup
- Removed unused
killSessionhelper path fromsrc/agents/bash-tools.shared.ts.
- Direct supervisor path tests added
- Added
src/agents/bash-tools.process.supervisor.test.tsto cover kill and remove routing through supervisor cancellation.
- Reliability gap fixes completed
src/agents/bash-tools.process.tsnow falls back to real OS-level process termination when supervisor lookup misses.src/process/supervisor/adapters/child.tsnow uses process-tree termination semantics for default cancel/timeout kill paths.- Added shared process-tree utility in
src/process/kill-tree.ts.
- PTY contract edge-case coverage added
- Added
src/process/supervisor/supervisor.pty-command.test.tsfor verbatim PTY command forwarding and empty-command rejection. - Added
src/process/supervisor/adapters/child.test.tsfor process-tree kill behavior in child adapter cancellation.
4. Remaining gaps and decisions
Reliability status
The two required reliability gaps for this pass are now closed:process kill/removenow has a real OS termination fallback when supervisor lookup misses.- child cancel/timeout now uses process-tree kill semantics for default kill path.
- Regression tests were added for both behaviors.
Durability and startup reconciliation
Restart behavior is now explicitly defined as in-memory lifecycle only.reconcileOrphans()remains a no-op insrc/process/supervisor/supervisor.tsby design.- Active runs are not recovered after process restart.
- This boundary is intentional for this implementation pass to avoid partial persistence risks.
Maintainability follow-ups
runExecProcessinsrc/agents/bash-tools.exec-runtime.tsstill handles multiple responsibilities and can be split into focused helpers in a follow-up.
5. Implementation plan
The implementation pass for required reliability and contract items is complete. Completed:process kill/removefallback real termination- process-tree cancellation for child adapter default kill path
- regression tests for fallback kill and child adapter kill path
- PTY command edge-case tests under explicit
ptyCommand - explicit in-memory restart boundary with
reconcileOrphans()no-op by design
- split
runExecProcessinto focused helpers with no behavior drift
6. File map
Process supervisor
src/process/supervisor/types.tsupdated with discriminated spawn input and process local stdin contract.src/process/supervisor/supervisor.tsupdated to use explicitptyCommand.src/process/supervisor/adapters/child.tsandsrc/process/supervisor/adapters/pty.tsdecoupled from agent types.src/process/supervisor/registry.tsidempotent finalize unchanged and retained.
Exec and process integration
src/agents/bash-tools.exec-runtime.tsupdated to pass PTY command explicitly and keep fallback path.src/agents/bash-tools.process.tsupdated to cancel via supervisor with real process-tree fallback termination.src/agents/bash-tools.shared.tsremoved direct kill helper path.
CLI reliability
src/agents/cli-watchdog-defaults.tsadded as shared baseline.src/agents/cli-backends.tsandsrc/agents/cli-runner/reliability.tsnow consume same defaults.
7. Validation run in this pass
Unit tests:pnpm vitest src/process/supervisor/registry.test.tspnpm vitest src/process/supervisor/supervisor.test.tspnpm vitest src/process/supervisor/supervisor.pty-command.test.tspnpm vitest src/process/supervisor/adapters/child.test.tspnpm vitest src/agents/cli-backends.test.tspnpm vitest src/agents/bash-tools.exec.pty-cleanup.test.tspnpm vitest src/agents/bash-tools.process.poll-timeout.test.tspnpm vitest src/agents/bash-tools.process.supervisor.test.tspnpm vitest src/process/exec.test.ts
pnpm vitest src/agents/cli-runner.test.tspnpm vitest run src/agents/bash-tools.exec.pty-fallback.test.ts src/agents/bash-tools.exec.background-abort.test.ts src/agents/bash-tools.process.send-keys.test.ts
- Use
pnpm build(andpnpm checkfor full lint/docs gate) in this repo. Older notes that mentionpnpm tsgoare obsolete.
8. Operational guarantees preserved
- Exec env hardening behavior is unchanged.
- Approval and allowlist flow is unchanged.
- Output sanitization and output caps are unchanged.
- PTY adapter still guarantees wait settlement on forced kill and listener disposal.
9. Definition of done
- Supervisor is lifecycle owner for managed runs.
- PTY spawn uses explicit command contract with no argv reconstruction.
- Process layer has no type dependency on agent layer for supervisor stdin contracts.
- Watchdog defaults are single source.
- Targeted unit and e2e tests remain green.
- Restart durability boundary is explicitly documented or fully implemented.
10. Summary
The branch now has a coherent and safer supervision shape:- explicit PTY contract
- cleaner process layering
- supervisor driven cancellation path for process operations
- real fallback termination when supervisor lookup misses
- process-tree cancellation for child-run default kill paths
- unified watchdog defaults
- explicit in-memory restart boundary (no orphan reconciliation across restart in this pass)