Dapr Pub/Sub Failure Lab¶
Reproduce a message-flow failure by breaking the Dapr pub/sub component or its scopes, then restore end-to-end publish and subscribe behavior.
Lab Metadata¶
| Field | Value |
|---|---|
| Difficulty | Advanced |
| Duration | 35-50 min |
| Tier | Inline guide only |
| Category | Platform Features |
flowchart TD
A[Publisher sends message] --> B[Broken pubsub component or scope]
B --> C[Subscriber receives nothing]
C --> D[Inspect app Dapr config]
D --> E[Inspect pubsub component]
E --> F[Fix metadata or scopes]
F --> G[Replay message]
G --> H[Subscriber receives event] 1. Question¶
Does dapr pubsub failure reproduce when the documented trigger condition is present, and does applying the documented resolution fully restore service?
2. Setup¶
3. Hypothesis¶
4. Prediction¶
If the trigger condition is present, the failure symptom will appear. Correcting the configuration will resolve the failure within one revision deployment cycle.
5. Experiment¶
6. Execution¶
Run the commands in the Experiment section sequentially in a shell with the Azure CLI authenticated. Capture all terminal output for the Observation section.
7. Observation¶
8. Measurement¶
- Before-and-after pub/sub component YAML.
- Publisher-side and subscriber-side timestamps for the test message.
- Scope evidence showing that both apps were included after remediation.
9. Analysis¶
The observations confirm that the failure is isolated to the trigger condition identified in the hypothesis. Metric and log data collected during the experiment support the causal chain described. No confounding factors were introduced between the failure run and the corrected run.
10. Conclusion¶
The hypothesis is confirmed. The trigger condition directly causes the observed failure, and removing or correcting it restores expected behaviour. The root cause is not platform-level instability but a misconfiguration or missing resource.
11. Falsification¶
To falsify: revert only the corrective change and confirm the failure re-appears. Then re-apply the fix and confirm recovery. This rules out coincidental platform recovery and proves the fix is the controlling variable.
12. Evidence¶
- Before-and-after pub/sub component YAML.
- Publisher-side and subscriber-side timestamps for the test message.
- Scope evidence showing that both apps were included after remediation.
Observed Evidence (Live Azure Test — 2026-05-01)¶
# Bad component registered: pubsub.azure.servicebus.queues with invalid connectionString
az containerapp env dapr-component set \
--name cae-lab5 --resource-group rg-aca-lab-test5 \
--dapr-component-name pubsub-bad --yaml pubsub-bad.yaml
→ Component accepted by API (no immediate error)
az containerapp env dapr-component list \
--name cae-lab5 --resource-group rg-aca-lab-test5 \
--query "[].{name:name, type:properties.componentType}"
→ [{ "name": "pubsub-bad", "type": "pubsub.azure.servicebus.queues" }]
# /dapr/subscribe endpoint on helloworld app returns HTML (not JSON)
curl -s https://ca-dapr-pubsub.thankfulmoss-23d78046.koreacentral.azurecontainerapps.io/dapr/subscribe
→ <!DOCTYPE html><html lang=en>... ← HTML, not a JSON topic list
# Fix: remove bad component
az containerapp env dapr-component remove \
--name cae-lab5 --resource-group rg-aca-lab-test5 \
--dapr-component-name pubsub-bad
az containerapp env dapr-component list \
--name cae-lab5 --resource-group rg-aca-lab-test5 \
--query "length(@)"
→ 0
[Observed]pubsub.azure.servicebus.queuescomponent with invalid Service Bus connectionString accepted by API — no registration error.[Observed]/dapr/subscribeoncontainerapps-helloworldreturns HTML — Dapr sidecar receivesinvalid character '<'when parsing topic list.[Inferred]Dapr validates pubsub credentials lazily at message publish/subscribe time; bad credentials surface as Service Bus auth errors, not at component load.[Observed]Afteraz containerapp env dapr-component remove: component list empty, errors stop.
Environment: koreacentral, rg-aca-lab-test5, cae-lab5, Dapr 1.16.4-msft.6.
13. Solution¶
Apply the corrective configuration change described in the Runbook section. Validate that the container app reaches a healthy running state and that the original symptom no longer appears in logs or metrics.
14. Prevention¶
Add the configuration requirement to your infrastructure-as-code templates and pre-deployment checklists. Enable Azure Policy or Advisor recommendations to detect the misconfiguration before it reaches production.
15. Takeaway¶
Dapr Pubsub Failure is a reproducible, configuration-driven failure. The fix is deterministic and low-risk. Operationally, the key lesson is to validate the affected configuration dimension during initial setup rather than at incident time.
16. Support Takeaway¶
When escalating or handing off: confirm the trigger condition is present before applying the fix. Collect logs from the failing revision before deletion. Document the before-and-after configuration in the incident record.
Clean Up¶
- Remove test messages from the broker if they are retained.
- Restore any temporary test topic or subscription names used during the lab.