Observability in Azure Container Apps¶
Azure Container Apps provides several built-in observability features that help you monitor and diagnose the state of your application throughout its lifecycle.
Data Flow Diagram¶
graph TD
App[Container App] -->|Stdout/Stderr| ConsoleLogs[Container Console Logs]
App -->|System Events| SystemLogs[System Logs]
ConsoleLogs -->|Diagnostic Settings| LAW[Log Analytics Workspace]
SystemLogs -->|Diagnostic Settings| LAW
App -->|Push| Metrics[Azure Monitor Metrics]
LAW -->|Queries| KQL[KQL Queries] Log Types¶
Container Apps separates logs into two main categories:
- Console Logs: These are the
stdoutandstderrstreams from your containers. They are stored in theContainerAppConsoleLogs_CLtable. - System Logs: These logs are generated by the Container Apps service itself (e.g., scaling events, deployment status). They are stored in the
ContainerAppSystemLogs_CLtable.
Treat these two streams as different evidence sources instead of as interchangeable logs:
- Console logs answer application questions
- Did the code throw an exception?
- Did dependency calls time out?
- Did the container start with the expected configuration?
- System logs answer platform questions
- Did a new revision fail readiness or provisioning?
- Did a scale rule trigger a replica change?
- Did secret resolution, ingress, or environment configuration fail before the app even started?
During incidents, start with system logs when the symptom began right after deployment, scale activity, or secret rotation. Start with console logs when the symptom is request failures, error bursts, or a single revision behaving differently from the rest.
Scaling Metrics¶
Container Apps can scale based on various metrics:
- HTTP Requests: Scale based on the number of concurrent HTTP requests per second.
- CPU Utilization: Scale when average CPU usage exceeds a threshold.
- Memory Utilization: Scale when average memory usage exceeds a threshold.
- Custom Metrics: Scale based on external triggers like Azure Service Bus queue length or KEDA scalers.
Configuration Examples¶
Viewing Streamed Logs via CLI¶
To view live logs from a specific container app, use the az containerapp logs show command.
az containerapp logs show \
--resource-group "my-resource-group" \
--name "my-container-app" \
--follow true \
--format text
KQL Query Examples¶
Search Console Logs for Errors¶
Identify application-level errors by searching the console logs.
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(1h)
| where Log_s contains "error" or Log_s contains "exception"
| project TimeGenerated, ContainerName_s, Log_s
| order by TimeGenerated desc
Monitor Scaling Events¶
Track when and why your container app scaled.
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(24h)
| where Type_s == "Scaling"
| project TimeGenerated, Reason_s, Log_s
| order by TimeGenerated desc
Find Revision Rollout Problems¶
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(24h)
| where Log_s has_any ("revision", "deployment", "provisioning", "failed")
| project TimeGenerated, RevisionName_s, ReplicaName_s, Log_s
| order by TimeGenerated desc
Identify Noisy Replicas¶
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(1h)
| summarize LogLines=count() by ContainerAppName_s, RevisionName_s, ReplicaName_s
| order by LogLines desc
Sample output:
ContainerAppName_s RevisionName_s ReplicaName_s LogLines
------------------ ----------------- ---------------------------- --------
payments-api payments-api--r12 payments-api--r12-6c9d7c9 3120
payments-api payments-api--r12 payments-api--r12-8m4s1j2 2988
Monitoring Baseline¶
Container Apps monitoring should cover four areas together:
- Ingress and request behavior
- Request volume and latency
- HTTP error spikes
- Revision health
- New revision startup failures
- Replica churn after configuration changes
- Scaling behavior
- KEDA-triggered scale-out and scale-in events
- Saturation before new replicas are added
- Application logs
- Dependency failures
- Crash loops
- Configuration or secret resolution errors
CLI Workflow¶
Show Container App configuration¶
Sample output:
{
"name": "my-container-app",
"properties": {
"configuration": {
"ingress": {
"external": true,
"targetPort": 8080
}
},
"latestReadyRevisionName": "my-container-app--r12",
"runningStatus": "Running"
}
}
View recent revisions¶
az containerapp revision list \
--resource-group "my-resource-group" \
--name "my-container-app" \
--output table
Sample output:
Name Active Replicas CreatedTime
--------------------- -------- ---------- -------------------------
my-container-app--r12 True 3 2026-04-06T00:22:00+00:00
my-container-app--r11 False 0 2026-04-05T18:10:00+00:00
Query logs from the workspace¶
az monitor log-analytics query \
--workspace "law-monitoring-prod" \
--analytics-query "ContainerAppSystemLogs_CL | where TimeGenerated > ago(15m) | take 5" \
--output table
Diagnostic Settings Strategy¶
Container Apps observability is strongest when you deliberately route both log categories to the same Log Analytics workspace that your alerts and workbooks use.
Recommended categories to enable¶
- Always enable
ContainerAppConsoleLogsContainerAppSystemLogs
- Keep metrics enabled
AllMetrics
This split keeps application evidence and platform evidence queryable without mixing ownership boundaries. Application teams usually act on console logs, while platform teams investigate system logs for revision rollout, scaler, and infrastructure issues.
Create diagnostic settings for both log streams¶
az monitor diagnostic-settings create \
--name "diag-containerapp-observability" \
--resource "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.App/containerApps/my-container-app" \
--workspace "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/law-monitoring-prod" \
--logs '[
{
"category": "ContainerAppConsoleLogs",
"enabled": true
},
{
"category": "ContainerAppSystemLogs",
"enabled": true
}
]' \
--metrics '[
{
"category": "AllMetrics",
"enabled": true
}
]'
Validate enabled categories¶
az monitor diagnostic-settings show \
--name "diag-containerapp-observability" \
--resource "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.App/containerApps/my-container-app"
Sample output:
{
"logs": [
{
"category": "ContainerAppConsoleLogs",
"enabled": true
},
{
"category": "ContainerAppSystemLogs",
"enabled": true
}
],
"metrics": [
{
"category": "AllMetrics",
"enabled": true
}
],
"workspaceId": "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/law-monitoring-prod"
}
Additional KQL for Scale and Dapr Analysis¶
Separate platform failures from app failures¶
Use this query when users report errors after a deployment and you need to determine whether the fault started in the platform or inside the container.
let PlatformSignals =
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(2h)
| summarize SystemEvents=count() by bin(TimeGenerated, 10m), ContainerAppName_s;
let AppSignals =
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(2h)
| where Log_s has_any ("ERROR", "Exception", "timeout", "connection refused")
| summarize ConsoleErrors=count() by bin(TimeGenerated, 10m), ContainerAppName_s;
PlatformSignals
| join kind=fullouter AppSignals on TimeGenerated, ContainerAppName_s
| project TimeGenerated, ContainerAppName_s, SystemEvents=coalesce(SystemEvents, 0), ConsoleErrors=coalesce(ConsoleErrors, 0)
| order by TimeGenerated desc
Sample output:
| TimeGenerated | ContainerAppName_s | SystemEvents | ConsoleErrors | Interpretation |
|---|---|---|---|---|
| 2026-04-06T01:10:00Z | payments-api | 14 | 0 | Platform activity spiked without matching app errors; check revision rollout or scaling. |
| 2026-04-06T01:20:00Z | payments-api | 2 | 37 | App-level failure pattern; inspect code, secrets, or dependencies in console logs. |
Track scale decisions and replica churn¶
This query surfaces whether scale activity is concentrated on one app or revision.
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(24h)
| where Log_s has_any ("scale", "scaler", "replica", "KEDA")
| summarize ScaleEvents=count(), LatestEvent=max(TimeGenerated) by ContainerAppName_s, RevisionName_s, Reason_s
| order by ScaleEvents desc
Sample output:
| ContainerAppName_s | RevisionName_s | Reason_s | ScaleEvents | LatestEvent | Interpretation |
|---|---|---|---|---|---|
| payments-api | payments-api--r12 | HTTP concurrency | 26 | 2026-04-06T01:22:00Z | Normal scale activity under rising request load. |
| order-worker | order-worker--r08 | Service Bus scaler | 18 | 2026-04-06T01:18:00Z | Queue-driven workload is actively scaling. Validate backlog drain rate. |
| email-sender | email-sender--r03 | Revision provisioning | 9 | 2026-04-06T00:54:00Z | Replica churn is tied to revision issues, not customer demand. |
Inspect Dapr sidecar warnings¶
If your environment uses Dapr, service invocation and pub/sub failures frequently appear as sidecar or component-related messages in system and console logs.
union isfuzzy=true
(ContainerAppSystemLogs_CL | project TimeGenerated, ContainerAppName_s, RevisionName_s, Source="system", Message=Log_s),
(ContainerAppConsoleLogs_CL | project TimeGenerated, ContainerAppName_s, RevisionName_s, Source="console", Message=Log_s)
| where TimeGenerated > ago(6h)
| where Message has_any ("dapr", "sidecar", "pubsub", "state store", "service invocation")
| order by TimeGenerated desc
Sample output:
| TimeGenerated | ContainerAppName_s | RevisionName_s | Source | Message | Interpretation |
|---|---|---|---|---|---|
| 2026-04-06T00:48:00Z | checkout-api | checkout-api--r09 | system | Dapr sidecar failed readiness probe for component pubsub-orders | Dapr sidecar health blocked the revision from stabilizing. |
| 2026-04-06T00:49:00Z | checkout-api | checkout-api--r09 | console | error publishing event via dapr pubsub component | Application code sees the same Dapr issue; check component config and secrets. |
Scaling Metrics and Decision Monitoring¶
Azure Container Apps emits platform metrics to Azure Monitor even when the most detailed explanation of a scale event comes from system logs. Use metrics for thresholds and trends, then use system logs to explain why scaling happened.
Metrics to pin first¶
- Replica count
- Detect whether the app is stuck at min replicas or oscillating rapidly.
- Requests / requests per replica
- Validate whether scale-out kept up with load.
- CPU percentage and memory working set
- Confirm resource pressure before blaming the scaler.
- Ingress response status distribution
- Correlate scale activity with 4xx or 5xx bursts.
Metric check from CLI¶
az monitor metrics list \
--resource "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.App/containerApps/my-container-app" \
--metric "Requests" "Replicas" \
--interval "PT5M" \
--aggregation "Average" "Maximum"
Sample output:
Dapr Integration Observability¶
When Dapr is enabled for Container Apps, add a separate investigation path for the sidecar and component layer:
- Sidecar health
- Review system logs for readiness or startup failures.
- Component reachability
- Look for secret, state store, or pub/sub initialization errors.
- App-to-sidecar interaction
- Search console logs for connection-refused, timeout, or component-not-found messages.
- Business symptom correlation
- Confirm whether retries, duplicate deliveries, or missing events align with Dapr warnings.
If Dapr is not enabled, keep the workbook tabs ready but hidden; the same guide can still be reused across mixed environments.
Practical Alert Examples¶
Alert on revision provisioning failures¶
az monitor scheduled-query create \
--name "aca-revision-failed" \
--resource-group "my-resource-group" \
--scopes "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/law-monitoring-prod" \
--condition "count 'ContainerAppSystemLogs_CL | where TimeGenerated > ago(5m) | where Log_s has_any (\"revision failed\", \"provisioning failed\")' > 0" \
--description "Azure Container Apps revision provisioning failed" \
--evaluation-frequency "5m" \
--window-size "5m" \
--severity 2 \
--action-groups "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Insights/actionGroups/ag-app-oncall"
Alert on application error bursts in console logs¶
az monitor scheduled-query create \
--name "aca-console-errors" \
--resource-group "my-resource-group" \
--scopes "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/law-monitoring-prod" \
--condition "count 'ContainerAppConsoleLogs_CL | where TimeGenerated > ago(5m) | where Log_s has_any (\"ERROR\", \"Exception\")' > 30" \
--description "Container App console logs contain elevated error volume" \
--evaluation-frequency "5m" \
--window-size "5m" \
--severity 2 \
--action-groups "/subscriptions/<subscription-id>/resourceGroups/my-resource-group/providers/Microsoft.Insights/actionGroups/ag-app-oncall"
Triage Workflow¶
- Check latest revision status
- Did a new revision just roll out?
- Read system logs
- Are there provisioning, secret, or scaler errors?
- Read console logs
- Are dependencies timing out or is the app crashing?
- Review scaling events
- Did demand exceed the current replica count?
- Validate ingress
- Are HTTP failures concentrated on one revision?
Operational Tips¶
- Keep revision names visible in dashboards so you can compare pre-deployment and post-deployment behavior.
- Separate alerts for system logs and console logs; they signal different owners and different fixes.
- Streaming logs are useful for live investigation, but workspace queries are better for pattern detection and alerting.
- For event-driven workloads, pair Container Apps logs with the upstream trigger metric or queue depth metric.
Workbook Suggestions¶
- Revision timeline with deployment markers
- Replica count and scaling event trend
- Console error rate by revision
- Top messages in system logs for the last 24 hours
- Correlation view between ingress errors and scaling activity
Dashboard and Workbook Recommendations¶
Use the built-in Azure Monitor workbook experience as the primary operator surface, then pin a few high-signal charts to a shared dashboard for NOC or on-call use.
Workbook layout¶
- Overview tab
- Current revision and latest ready revision
- Active replica count
- Request volume and failed request rate
- Scale behavior tab
- Replica count over time
- System log count for scaling reasons
- Requests per replica during peak windows
- Deployment health tab
- Revision rollout events
- Provisioning failures
- Secret or configuration resolution messages
- Dapr tab
- Dapr sidecar warnings
- Component-related failures
- Retry and timeout messages from console logs
Dashboard pins¶
- Pin Replicas and Requests side by side.
- Pin a log tile for
ContainerAppSystemLogs_CLwhereLog_scontainsfailed. - Pin a log tile for
ContainerAppConsoleLogs_CLwhereLog_scontainsException. - Pin a deployment marker chart so rollout-related incidents are visible without opening Logs.
Common Pitfalls¶
Mistake 1: Alerting only on console logs¶
What happens: Revision provisioning and scaler issues are missed until the app is already unhealthy.
Correction: Create separate alerts for ContainerAppSystemLogs_CL and ContainerAppConsoleLogs_CL, because platform failures often occur before the application emits any message.
Mistake 2: Using streaming logs as the main historical source¶
What happens: Operators can observe a live issue, but they cannot trend it, correlate it, or alert on it later.
Correction: Use log streaming only for active troubleshooting. Persist the same evidence in Log Analytics through diagnostic settings.
Mistake 3: Ignoring scale decisions when latency increases¶
What happens: Teams blame code changes even though the real issue is scaler lag, min replica limits, or external trigger delay.
Correction: Review replica metrics and scaling-related system logs before concluding that a deployment introduced the slowdown.
Cost Considerations¶
Container Apps costs usually grow from log volume faster than from metrics volume.
- Console log estimate
- A busy API writing 1 KB per request for 500,000 requests per day can ingest roughly 500 MB/day before indexing overhead.
- System log estimate
- System logs are usually much smaller, often tens of MB/day per app, but they become valuable during scale churn or failed revisions.
- Optimization tips
- Reduce chatty info-level application logging in production.
- Keep stack traces for errors, but avoid emitting full request payloads.
- Use metrics for continuous scale trend visualization instead of writing custom heartbeat logs.
- Send Dapr troubleshooting logs only to the needed workspace and control retention according to incident response needs.
When ingestion cost rises unexpectedly, query the noisiest revisions first and determine whether one rollout, one scaler, or one Dapr component introduced the volume increase.