Common Azure Functions Anti-Patterns¶
This reference consolidates high-impact anti-patterns that repeatedly cause Azure Functions incidents across hosting, triggers, security, deployment, and operations. Use it during design review, pre-production checks, and incident retrospectives.
Use with operational runbooks
Pair this page with Troubleshooting Playbooks to translate each anti-pattern into detection and recovery actions.
Why This Matters¶
| Anti-pattern | Category | Severity | Fix link |
|---|---|---|---|
| Choosing plan by name, not workload | Hosting | High | Fix |
| Stateful functions | Runtime model | High | Fix |
| Non-idempotent message processing | Triggers/Reliability | High | Fix |
| Unbounded scale | Scaling/Cost | High | Fix |
| Connection string secrets in app settings | Security | High | Fix |
| Function keys as primary security | Security/API | High | Fix |
| Blob trigger on Flex without Event Grid | Trigger compatibility | High | Fix |
| Long-running synchronous HTTP | Execution model | Medium | Fix |
| Subnet too small for scale-out | Networking | High | Fix |
| Missing DNS zone link for private endpoints | Networking | High | Fix |
| App Insights without sampling | Monitoring/Cost | Medium | Fix |
| Ignoring poison/dead-letter messages | Reliability | High | Fix |
| Shared storage account for many apps | Storage/Scale | Medium | Fix |
| Cold start surprise on Consumption | Hosting/SLO | Medium | Fix |
| Mutable deployment | Deployment safety | High | Fix |
flowchart LR
A[Wrong hosting plan selection] --> B[Trigger and networking mismatch]
B --> C[Scale and listener instability]
C --> D[Retries and backlog growth]
D --> E[Poison messages or data delay]
E --> F[Incident with cost spike and SLO breach] Recommended Practices¶
flowchart TD
A[Start: symptom or design concern] --> B{Main symptom category}
B -->|Cost spike| C{Telemetry-driven?}
C -->|Yes| C1[Check App Insights sampling and storage sharing]
C -->|No| C2[Check unbounded scale and wrong plan baseline]
B -->|"Latency/SLO breach"| D{Cold-path or long request?}
D -->|Cold-path| D1[Check plan fit and cold-start assumptions]
D -->|Long request| D2[Check long-running synchronous HTTP]
B -->|Data inconsistency| E{Duplicate or lost messages?}
E -->|Duplicate| E1[Check non-idempotent processing]
E -->|"Lost/Delayed"| E2[Check poison handling ownership]
B -->|Connectivity failure| F{Private networking enabled?}
F -->|Yes| F1[Check subnet size and DNS zone links]
F -->|No| F2[Check trigger compatibility and storage contention] Common Mistakes / Anti-Patterns¶
Anti-pattern: Choosing plan by name, not workload¶
What: Selecting Premium or Dedicated because it appears "enterprise" without validating trigger profile, networking, and latency needs.
Why it hurts: Teams pay baseline cost or inherit constraints that do not match runtime behavior, then re-platform under pressure.
Fix: Choose plan from workload characteristics: trigger volume, private access requirement, cold-start tolerance, and timeout profile.
| Field | Detail |
|---|---|
| What | Plan is chosen by perceived tier label instead of measured workload needs |
| Impact | Baseline overspend, latency surprises, and forced re-platforming |
| Fix | Select plan by trigger profile, networking constraints, and SLO |
| Severity | High |
Best practice
Use Platform: Hosting and Operations: Cost Optimization together before approving plan choice.
Anti-pattern: Stateful functions¶
What: Storing cross-invocation state in memory or local disk and assuming it survives scale events.
Why it hurts: Instance recycling and scale-out create inconsistent behavior and data loss risks.
Fix: Keep handlers stateless; externalize state to durable stores and pass correlation IDs explicitly.
| Field | Detail |
|---|---|
| What | Invocation state is stored in memory or local disk |
| Impact | Data loss risk and inconsistent behavior across scale/restart events |
| Fix | Persist state externally and treat each invocation as independent |
| Severity | High |
Best practice
Treat each invocation as independent and recoverable; use external persistence for checkpoints and progress.
Anti-pattern: Non-idempotent message processing¶
What: Queue/Event Hub/Service Bus handlers apply side effects without duplicate protection.
Why it hurts: At-least-once delivery and retries can create duplicate writes, duplicate notifications, or double charges.
Fix: Implement idempotency keys, deduplication checks, and safe retry semantics.
| Field | Detail |
|---|---|
| What | Message handlers execute side effects without duplicate protection |
| Impact | Duplicate writes, repeated notifications, and billing errors |
| Fix | Add idempotency keys and deduplication-safe retries |
| Severity | High |
Best practice
Align retry policy with idempotent handler logic and poison workflow ownership in Troubleshooting Playbooks.
Anti-pattern: Unbounded scale¶
What: Leaving scale limits unconstrained for high-throughput triggers.
Why it hurts: Sudden fan-out can throttle dependencies, increase retries, and amplify spend.
Fix: Set functionAppScaleLimit or plan-equivalent maximum instance count from downstream capacity.
| Field | Detail |
|---|---|
| What | Scale-out is unconstrained for queue/event workloads |
| Impact | Runaway spend and downstream throttling cascades |
| Fix | Define max instances and trigger concurrency from dependency budgets |
| Severity | High |
Best practice
Define scale limit and trigger concurrency together; see Platform: Scaling and Best Practices: Cost Optimization.
Anti-pattern: Connection string secrets in app settings¶
What: Embedding credentials directly in application settings for runtime dependencies.
Why it hurts: Secret sprawl increases rotation failures and leak surface.
Fix: Prefer managed identity and Key Vault references; minimize plaintext secrets.
| Field | Detail |
|---|---|
| What | Credentials are embedded directly in app settings |
| Impact | Secret sprawl, weak rotation hygiene, and expanded leak surface |
| Fix | Use managed identity and Key Vault references |
| Severity | High |
Best practice
Use identity-first access paths and restrict remaining secret exposure to least privilege scopes.
Anti-pattern: Function keys as primary security¶
What: Relying on function or host keys as full API security boundary.
Why it hurts: Keys are coarse-grained shared secrets and do not provide robust user/service identity controls.
Fix: Place proper authentication and authorization in front (Easy Auth, APIM, JWT validation).
| Field | Detail |
|---|---|
| What | Function keys are treated as full identity and authorization model |
| Impact | Coarse shared-secret model and weak access governance |
| Fix | Enforce identity-aware authn/authz at platform or API layer |
| Severity | High |
Best practice
Keep function keys for operational access control, not end-user identity enforcement.
Anti-pattern: Blob trigger on Flex Consumption without Event Grid¶
What: Using polling-based Blob trigger assumptions on FC1.
Why it hurts: Blob trigger does not fire as expected on Flex unless Event Grid-based integration is used.
Fix: Configure Event Grid sourced blob triggering for FC1 workloads.
| Field | Detail |
|---|---|
| What | Polling blob trigger assumptions are used on Flex Consumption |
| Impact | Events are missed or delayed because trigger model is incompatible |
| Fix | Use Event Grid-based blob triggering on FC1 |
| Severity | High |
Best practice
Validate trigger compatibility before migration and confirm event subscription health after deployment.
Anti-pattern: Long-running synchronous HTTP¶
What: Holding HTTP requests open for minutes while doing heavy processing.
Why it hurts: Client timeouts, gateway limits, and poor user experience increase failures.
Fix: Use async HTTP pattern (accept + status endpoint), queue handoff, or Durable Functions orchestration.
| Field | Detail |
|---|---|
| What | HTTP request stays open while long compute executes |
| Impact | Timeout failures and poor user/client reliability |
| Fix | Shift to async request pattern with queue/orchestration back-end |
| Severity | Medium |
Best practice
Keep HTTP triggers thin and move long-running work to queue or orchestration pipelines.
Anti-pattern: Subnet too small for scale-out¶
What: Premium/Flex networking configured on tiny subnets (for example /28 or /29) with no growth headroom.
Why it hurts: Scale-out stalls due to IP exhaustion and private connectivity failures under load.
Fix: Reserve subnet capacity for target peak instances and platform overhead before production.
| Field | Detail |
|---|---|
| What | Networking subnets are undersized for expected scale-out |
| Impact | IP exhaustion blocks scale and breaks private connectivity |
| Fix | Allocate subnet with headroom for peak scale and platform needs |
| Severity | High |
Best practice
Include subnet capacity checks in pre-production load test gates.
Anti-pattern: Missing DNS zone link for private endpoints¶
What: Creating private endpoints without correct private DNS zone links to the app VNet.
Why it hurts: Name resolution fails intermittently or permanently, appearing as random connection errors.
Fix: Link required private DNS zones and validate resolution from function runtime subnet.
| Field | Detail |
|---|---|
| What | Private endpoints exist without correct private DNS zone links |
| Impact | Intermittent or persistent name-resolution failures |
| Fix | Link DNS zones and validate name resolution from runtime subnet |
| Severity | High |
Best practice
Add DNS validation to deployment smoke tests for every private dependency.
Anti-pattern: App Insights without sampling¶
What: Collecting all telemetry items at production traffic volumes.
Why it hurts: Ingestion costs grow rapidly and can exceed compute spend.
Fix: Enable sampling and preserve critical telemetry types (especially exceptions).
| Field | Detail |
|---|---|
| What | All telemetry is ingested at production volume without sampling |
| Impact | Observability ingestion cost grows faster than compute |
| Fix | Enable sampling and keep high-value telemetry unsampled |
| Severity | Medium |
Best practice
Configure sampling from day one and revisit after major traffic changes using Best Practices: Cost Optimization.
Anti-pattern: Ignoring poison messages¶
What: No alerting or operational ownership for poison/dead-letter queues.
Why it hurts: Failed messages accumulate silently and create delayed data loss incidents.
Fix: Alert on poison queue depth/age, define replay workflow, and assign owner.
| Field | Detail |
|---|---|
| What | Poison/dead-letter messages have no alerting or owner |
| Impact | Silent backlog and delayed data-loss incidents |
| Fix | Add alerts, replay runbook, and explicit operational ownership |
| Severity | High |
Best practice
Track poison handling as an SLO with explicit response time targets in Troubleshooting Playbooks.
Anti-pattern: Shared storage account across many function apps¶
What: Multiple unrelated apps use one AzureWebJobsStorage account.
Why it hurts: Transaction contention, throttling, and noisy-neighbor failure coupling increase.
Fix: Isolate host storage accounts by workload criticality and throughput profile.
| Field | Detail |
|---|---|
| What | Many unrelated function apps share one host storage account |
| Impact | Transaction contention and noisy-neighbor failures |
| Fix | Split host storage by criticality and throughput characteristics |
| Severity | Medium |
Best practice
Treat host storage as control-plane critical infrastructure, not a general shared utility account.
Anti-pattern: Cold start surprise¶
What: Production launch on Consumption without validating scale-to-zero startup latency impact.
Why it hurts: First-request latency breaches SLO and can cascade into retries/timeouts upstream.
Fix: Set stakeholder expectations, choose Flex/Premium where needed, and design async entry patterns.
| Field | Detail |
|---|---|
| What | Consumption cold-start latency is not validated before launch |
| Impact | First-hit latency breaches and upstream retry amplification |
| Fix | Validate cold path, adjust plan, and use async entry where needed |
| Severity | Medium |
Best practice
Include cold-start behavior in load and user-journey tests before go-live.
Anti-pattern: Mutable deployment¶
What: Deploying by modifying files in-place rather than mounting immutable artifacts.
Why it hurts: Version drift and partial file states create hard-to-reproduce production failures.
Fix: Use run-from-package (where supported), artifact versioning, and controlled rollback.
| Field | Detail |
|---|---|
| What | Production code is modified in-place during deployment |
| Impact | Version drift and partial-state rollout failures |
| Fix | Deploy immutable artifacts with versioned rollback strategy |
| Severity | High |
Best practice
Apply Best Practices: Deployment for plan-specific immutable release and rollback design.
Validation Checklist¶
Use this anti-pattern checklist at three control points:
- Architecture/design review.
- Pre-production readiness gate.
- Post-incident corrective action review.
Why this page is cross-cutting
Most Azure Functions incidents are multi-factor: hosting mismatch + trigger semantics + missing operational controls. Reviewing anti-patterns across domains catches these chained failures early.
See Also¶
- Best Practices Index
- Best Practices: Deployment
- Best Practices: Cost Optimization
- Platform: Hosting
- Platform: Scaling
- Troubleshooting Playbooks