Observability Operations¶
This guide covers production observability operations for Container Apps using Log Analytics, Application Insights, and distributed tracing.
Signals and Alerting Architecture¶
flowchart TD
METRICS[Metrics: Requests, ResponseTime, CPU, Memory, Replicas] --> METRIC_ALERTS[Metric alert rules]
LOGS[Log Analytics queries:<br/>ContainerAppConsoleLogs_CL<br/>ContainerAppSystemLogs_CL] --> LOG_ALERTS[Log search alert rules]
METRIC_ALERTS --> AG[Action Group]
LOG_ALERTS --> AG
AG --> EMAIL[Email]
AG --> TEAMS[Teams]
AG --> WEBHOOK[Webhook] Prerequisites¶
- Log Analytics workspace connected to the Container Apps environment
- Application Insights configured for application telemetry
Log Analytics Operations¶
Identify workspace connected to the environment:
az containerapp env show \
--name "$ENVIRONMENT_NAME" \
--resource-group "$RG" \
--query "properties.appLogsConfiguration" \
--output json
Example output (PII scrubbed):
{
"destination": "log-analytics",
"logAnalyticsConfiguration": {
"customerId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
}
Run a KQL query for recent errors:
az monitor log-analytics query \
--workspace "<log-analytics-workspace-id>" \
--analytics-query "ContainerAppConsoleLogs_CL | where ContainerAppName_s == '$APP_NAME' | where Log_s contains 'ERROR' | limit 50" \
--output table
Example output:
Application Insights Operations¶
List availability and request telemetry for the app:
az monitor app-insights query \
--app "<app-insights-name>" \
--resource-group "$RG" \
--analytics-query "requests | where cloud_RoleName == '$APP_NAME' | summarize count() by resultCode, bin(timestamp, 5m)" \
--output table
Use container logs directly during active incidents:
az containerapp logs show \
--name "$APP_NAME" \
--resource-group "$RG" \
--type console \
--follow false
Example output from the running revision:
Track replica and revision health as platform signals:
az containerapp revision list \
--name "$APP_NAME" \
--resource-group "$RG" \
--output json
az containerapp replica list \
--name "$APP_NAME" \
--resource-group "$RG" \
--revision "ca-myapp--0000001" \
--output json
Example output (PII scrubbed):
[
{
"name": "ca-myapp--0000001",
"active": true,
"trafficWeight": 100,
"replicas": 1,
"healthState": "Healthy",
"runningState": "Running"
}
]
[
{
"name": "ca-myapp--0000001-646779b4c5-bhc2v",
"properties": {
"containers": [
{
"name": "ca-myapp",
"ready": true,
"restartCount": 0,
"runningState": "Running"
}
],
"runningState": "Running"
}
}
]
Distributed Tracing Operations¶
Confirm trace context propagation across services by querying end-to-end operation IDs in Application Insights.
az monitor app-insights query \
--app "<app-insights-name>" \
--resource-group "$RG" \
--analytics-query "dependencies | where cloud_RoleName == '$APP_NAME' | project timestamp, operation_Id, target, resultCode | limit 20" \
--output table
Verification Steps¶
Check that logs and traces are flowing within expected delay windows.
az monitor app-insights component show \
--app "<app-insights-name>" \
--resource-group "$RG" \
--output json
Example output (PII masked):
{
"id": "/subscriptions/<subscription-id>/resourceGroups/rg-myapp/providers/microsoft.insights/components/<app-insights-name>",
"name": "<app-insights-name>",
"provisioningState": "Succeeded"
}
Observability Decision Matrix¶
| Signal Type | Best For | Query Surface | Typical Alert Latency |
|---|---|---|---|
| Platform metrics | Fast saturation and availability detection | Azure Monitor metrics | 1-2 minutes |
| Console/system logs | Detailed failure context and root cause hints | Log Analytics (KQL) | 2-5 minutes |
| Distributed traces | Cross-service request path analysis | Application Insights | 2-5 minutes |
Pair every alert with an investigation query
For each alert rule, store a companion KQL query that responders can run immediately. This shortens MTTR by removing first-response guesswork.
Avoid unbounded log volume
Excessive debug logging can increase ingestion cost and hide actionable events. Use structured JSON logs and severity controls in production.
Telemetry Freshness Workflow¶
sequenceDiagram
participant App as Container App
participant LA as Log Analytics
participant AI as Application Insights
participant Ops as On-call Engineer
App->>LA: Console and system logs
App->>AI: Traces and requests
Ops->>LA: KQL error and restart checks
Ops->>AI: End-to-end trace validation
Ops->>Ops: Decide rollback or mitigation Troubleshooting¶
No logs in workspace¶
- Confirm environment log configuration points to the expected workspace.
- Check regional alignment between app, environment, and workspace.
- Validate IAM permissions for querying telemetry resources.
Missing distributed traces¶
- Verify OpenTelemetry exporter endpoint and connection string settings.
- Ensure incoming requests include trace context headers.
Advanced Topics¶
- Define SLO-based alerts (latency, error rate, saturation).
- Build dashboards combining infra metrics and app traces.
- Use sampling strategies to balance fidelity and telemetry cost.