Heartbeat and Status Monitoring

Heartbeat monitoring keeps runtime targets observable.

An environment is considered healthy when it reports within the expected heartbeat window.


Status Semantics

  • online: heartbeat received within threshold
  • offline: heartbeat stale or missing

Cloud targets are generally continuously available; custom targets depend on agent health and network stability.


Monitoring Targets

Track these indicators:

  • last heartbeat timestamp
  • status transition frequency (flapping)
  • sync latency
  • workflow execution failure rate by environment

Recommended Alert Rules

  • alert on offline duration > agreed SLO
  • alert on frequent online/offline transitions
  • alert when sync fails repeatedly
  • alert when execution queue stalls

Operational Response Playbook

  1. Confirm container/agent process state
  2. Confirm network path to platform API
  3. Confirm token validity and permission scope
  4. Confirm host resource availability
  5. Restart agent if needed and verify recovery

Related Pages