Enhanced Graceful Shutdown (Drain In-Flight Runs) #80

Open
opened 2026-02-23 10:07:36 +00:00 by ottomata · 0 comments
Owner

Context

Phase 1.9 implemented basic HTTP drain. This extends it to also wait for in-flight job runs to complete before exiting.

Tasks

  • In main.go shutdown sequence, after server.Shutdown(ctx), add:
    1. scheduler.Stop() — stop accepting new scheduled runs
    2. watcher.Stop() — stop accepting new file-triggered runs
    3. pool.Drain() — wait for all in-flight runs to complete (max 5 min timeout)
    4. pool.Close() — shut down pool
    5. db.Close() — close DB pool
  • The drain timeout (5 min) is configurable via SHUTDOWN_DRAIN_TIMEOUT_SECS
  • Log a summary of in-flight runs waiting for completion: "Waiting for 3 runs to complete..."
  • If drain timeout is exceeded, log a warning: "Drain timeout exceeded, 1 runs forcefully terminated" and cancel their contexts
  • Update Phase 1.9 issue/code to reflect this extended sequence

Acceptance Criteria

  • SIGTERM waits for in-flight runs up to the configured timeout
  • After the timeout, remaining runs are force-cancelled
  • Scheduler and file watcher stop accepting new work before drain begins
  • Full shutdown sequence is logged step by step
### Context Phase 1.9 implemented basic HTTP drain. This extends it to also wait for in-flight job runs to complete before exiting. ### Tasks - [ ] In `main.go` shutdown sequence, after `server.Shutdown(ctx)`, add: 1. `scheduler.Stop()` — stop accepting new scheduled runs 2. `watcher.Stop()` — stop accepting new file-triggered runs 3. `pool.Drain()` — wait for all in-flight runs to complete (max 5 min timeout) 4. `pool.Close()` — shut down pool 5. `db.Close()` — close DB pool - [ ] The drain timeout (5 min) is configurable via `SHUTDOWN_DRAIN_TIMEOUT_SECS` - [ ] Log a summary of in-flight runs waiting for completion: "Waiting for 3 runs to complete..." - [ ] If drain timeout is exceeded, log a warning: "Drain timeout exceeded, 1 runs forcefully terminated" and cancel their contexts - [ ] Update Phase 1.9 issue/code to reflect this extended sequence ### Acceptance Criteria - [ ] SIGTERM waits for in-flight runs up to the configured timeout - [ ] After the timeout, remaining runs are force-cancelled - [ ] Scheduler and file watcher stop accepting new work before drain begins - [ ] Full shutdown sequence is logged step by step
ottomata added this to the Phase 8 project 2026-02-23 10:09:19 +00:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ottomata/acsm#80
No description provided.