Put an upper bound on delay so a permanently broken endpoint does not stall workers for hours between attempts—pair backoff with a max attempts ceiling.
Stop after N tries and surface to operators with context. Include the last error code, truncated body, and correlation ID so the first responder does not reproduce the whole incident from scratch.
Send poison messages to a dead-letter queue for manual inspection. Monitor DLQ depth: a flat line at zero might mean alerts are misconfigured; a rising line means systemic failure.