Health and Recovery for ACS¶
Maintaining high availability and having a robust recovery plan for Azure Communication Services.
graph TD
ServiceHealth[Service Health Monitor] --> Alert[Health Alert]
Alert --> Incident[Incident Response]
Incident --> Recovery[Recovery Strategies]
Recovery --> Failover[Regional Failover]
Recovery --> Backup[Backup Channels]
Recovery --> Retry[Retry Policies] Service Health Monitoring¶
ACS is a global service with data residency in specific regions. Monitor the following for health:
- Service Health Dashboard: Check for Azure-wide outages.
- Resource Health: Check for specific ACS resource issues.
- Diagnostic Settings: Monitor logs for delivery failures.
Incident Response Procedures¶
If a communication channel fails:
- Identify the impacted channel (SMS, Email, Chat, Calling).
- Check the Azure Service Health Dashboard for outages.
- Check your diagnostic logs for API errors or delivery failures.
- Notify your stakeholders and switch to backup communication methods.
Failover Strategies¶
To ensure high availability, consider the following strategies:
- Regional Redundancy: Use multiple ACS resources in different regions.
- Application-Level Failover: Implement logic in your application to switch between ACS resources.
- Backup Communication Channels: Have a backup channel (e.g., if Email fails, send SMS).
Backup Communication Channels¶
| Primary Channel | Backup Channel |
|---|---|
| SMS | Email or Push Notifications |
| SMS or Voice Call | |
| Chat | SMS or Voice Call |
| Calling | PSTN or Chat |