Heartbeat and Status Monitoring
Heartbeat monitoring keeps runtime targets observable.
An environment is considered healthy when it reports within the expected heartbeat window.
Status Semantics
online: heartbeat received within thresholdoffline: heartbeat stale or missing
Cloud targets are generally continuously available; custom targets depend on agent health and network stability.
Monitoring Targets
Track these indicators:
- last heartbeat timestamp
- status transition frequency (flapping)
- sync latency
- workflow execution failure rate by environment
Recommended Alert Rules
- alert on offline duration > agreed SLO
- alert on frequent online/offline transitions
- alert when sync fails repeatedly
- alert when execution queue stalls
Operational Response Playbook
- Confirm container/agent process state
- Confirm network path to platform API
- Confirm token validity and permission scope
- Confirm host resource availability
- Restart agent if needed and verify recovery
