Skip to content

Runbook

Terminal window
wakeplaned serve --db /var/lib/wakeplane/wakeplane.db --addr :8080

Environment variables:

VariableDefaultDescription
WAKEPLANE_DB./wakeplane.dbSQLite database path
WAKEPLANE_ADDR:8080Listen address
WAKEPLANE_LOG_LEVELinfoLog level: debug, info, warn, error

Confirm the daemon is healthy:

Terminal window
curl http://localhost:8080/health # {"status":"ok"}
curl http://localhost:8080/ready # {"status":"ok"} — loops initialized
EndpointProbeReturns ok when
/healthLivenessProcess is running
/readyReadinessDatabase open, planner and dispatcher loops initialized

Normal drain sequence emits structured log lines:

level=info msg="shutting down" phase=run_loop
level=info msg="run loop exited"
level=info msg="dispatcher drained" active_workers=0
level=info msg="store closed"

If shutdown stalls, look for:

level=warn msg="dispatcher shutdown stalled" active_workers=N

This indicates non-cooperative executors. Extend the shutdown timeout or use process supervision to handle termination.

Expose at /metrics in Prometheus text format.

MetricAlert thresholdAction
wakeplane_due_runs> 100 sustainedDispatcher may be stalled or overloaded
wakeplane_running_runs> expected concurrencyCheck for stuck runs
wakeplane_dead_letter_runs_total> 0Investigate failed runs
wakeplane_expired_claims_totalgrowingWorkers may be crashing during execution
wakeplane_planner_tick_duration_seconds> 1sDatabase pressure

Cause: Executor died without completing. Lease expired.
Detection: wakeplane run list --status running shows old runs. wakeplane_expired_claims_total growing.
Recovery: Expired leases are automatically returned to pending by the planner on the next tick.

Cause: Dispatcher claimed a run but did not start the executor.
Detection: wakeplane run list --status claimed shows old runs.
Recovery: Expired claimed leases are also returned to pending automatically.

Cause: Runs are exhausting their retry budget.
Detection: wakeplane_dead_letter_runs_total increasing.
Recovery: Investigate run receipts. Fix the underlying target. Optionally re-trigger manually.

Cause: Dispatcher is not keeping up with the planner.
Detection: wakeplane_due_runs increasing steadily.
Recovery: Check dispatcher logs for errors. Check database write latency. Consider reducing concurrency or increasing --tick-interval.

Cause: Multiple writers attempting concurrent SQLite writes.
Detection: database is locked in logs.
Recovery: Ensure only one wakeplaned process is running against the same database file. Wakeplane uses SetMaxOpenConns(1) but multi-process access is unsupported.

Cause: Schedule is paused, spec is invalid, or timezone is wrong.
Detection: wakeplane schedule get <name> — check state, next_run_at, and timezone.
Recovery: Correct the schedule spec or timezone. Resume if paused.

Terminal window
# Online backup (safe while daemon is running)
sqlite3 /var/lib/wakeplane/wakeplane.db ".backup /var/lib/wakeplane/wakeplane.bak"
# Verify
sqlite3 /var/lib/wakeplane/wakeplane.bak "PRAGMA integrity_check;"