Operating model for agent fleets: monitoring, observability, on-call, incident response, drift detection, cost control, and continuous improvement.
Stand up observability and alerting for agent behavior
Manage cost, latency, and quality as first-class SLOs
Run an agent ops cadence: reviews, postmortems, retros