Common Anti-Patterns¶

This document catalogs high-impact App Service anti-patterns that frequently cause outages, security incidents, cost spikes, and difficult troubleshooting cycles. Use it as a review checklist during design, architecture review, and release readiness gates.

How to Use This Guide¶

Apply this guide in three moments:

During architecture and platform design reviews
Before production go-live
During post-incident corrective action planning

Anti-patterns are preventable

Most recurring App Service incidents are caused by a small set of known anti-patterns that can be eliminated early.

Anti-Pattern Detection Flow¶

flowchart TD
    A[Design or operational change proposed] --> B{Touches production workload?}
    B -- No --> C[Apply standard quality checks]
    B -- Yes --> D[Run anti-pattern review checklist]
    D --> E{Any anti-pattern found?}
    E -- No --> F[Proceed with controlled rollout]
    E -- Yes --> G[Map risk severity and blast radius]
    G --> H[Apply recommended alternative pattern]
    H --> I[Retest and validate]
    I --> J{Residual risk acceptable?}
    J -- Yes --> F
    J -- No --> K[Escalate architecture review]

Anti-Pattern Catalog¶

Use the following table as a policy baseline.

Category	Pattern	Why It's Bad	What To Do Instead
Configuration	Storing secrets in app settings without Key Vault integration	Secrets become harder to rotate, higher accidental exposure risk, and weak auditability	Use Key Vault references with managed identity and secret rotation runbook
Configuration	Ignoring slot sticky settings	Swap can move environment-specific config into production and break dependencies	Mark app settings and connection strings as slot settings where required
Configuration	Keeping debug mode enabled in production	Increases attack surface, leaks internals, and adds unnecessary overhead	Use production-safe configuration profile with debug disabled
Configuration	Relying on mutable local file writes for state	App restarts, scale-out, or rehydration can lose local state	Externalize state to durable services (Storage, database, cache)
Deployment	Using FTP deployment for production	Manual and non-repeatable deployments, weak traceability, and increased human error	Use CI/CD with versioned artifacts and deployment slots
Deployment	Deploying directly to production slot	No safe validation stage and higher downtime risk during release	Deploy to staging slot, warm-up, validate, then swap
Deployment	Enabling server-side builds inconsistently across releases	Non-deterministic outputs and rollback complexity	Build once in CI, publish immutable artifacts, keep deployment deterministic
Deployment	No rollback plan or tested runbook	Recovery is slow and error-prone under pressure	Document and test swap-back and previous-artifact restore procedures
Networking	Binding app process to 127.0.0.1 instead of 0.0.0.0	App container/process may be unreachable from App Service front-end	Bind to 0.0.0.0 and expected platform port
Networking	Connection-per-request outbound pattern	Causes excessive socket churn and SNAT exhaustion under load	Use connection pooling, keep-alive, and dependency SDK reuse
Networking	Assuming DNS/egress always stable without retry policy	Transient failures become user-facing errors	Add timeout, retry with jitter, and circuit breaker behavior
Networking	Not planning VNet integration and outbound dependency paths	Leads to late-stage connectivity failures and hard troubleshooting	Design egress paths early and validate dependency reachability pre-go-live
Security	Running production on B1 for critical workloads	Limited scale, weaker isolation characteristics, and insufficient resilience margin	Use production-appropriate Standard/Premium SKU based on SLO and load profile
Security	Using broad-scoped service principals without least privilege	Compromise impact is amplified and audit posture weakens	Use managed identities and granular RBAC at minimum required scope
Security	Not enforcing HTTPS and secure transport defaults	Increases risk of data exposure and mixed-mode security gaps	Enforce HTTPS-only, modern TLS policy, and secure cookie headers
Performance	Single instance in production	Any recycle or failure causes downtime and user-visible errors	Run minimum two instances for high availability
Performance	Disabling Always On for production web apps	Cold starts and delayed readiness impact latency and reliability	Enable Always On for continuously serving production apps
Performance	Overusing ARR affinity for stateful sessions	Uneven load and hot instances reduce effective scaling	Externalize session state and disable affinity for stateless services
Monitoring	Not configuring health checks	Platform cannot reliably remove unhealthy instances from rotation	Configure health check path and verify dependency-aware readiness
Monitoring	Not enabling diagnostic logging and App Insights	Limited observability increases mean time to detect and repair	Enable logs, traces, and metric alerts with retention policy
Monitoring	Alerting only on CPU and ignoring latency/error rate	Misses user-impact incidents where CPU appears normal	Add SLO-aligned alerts for p95 latency, error rate, and dependency failures
Monitoring	No deployment annotation in telemetry	Hard to correlate incidents with release events	Emit deployment markers and release metadata into monitoring system

High-Risk Anti-Patterns to Eliminate First¶

If you cannot fix everything immediately, prioritize these first:

Single instance in production
No health check configuration
Secrets outside Key Vault references
Direct-to-production deployments without slot validation
Connection-per-request outbound calls causing SNAT pressure

Top-five anti-patterns drive disproportionate incidents

Eliminating these five patterns usually produces the biggest reliability and security gains in the shortest time.

Portal view: Diagnose and solve problems¶

The Diagnose and solve problems blade is the App Service equivalent of an anti-pattern detection gate after the fact — it surfaces the symptoms left behind by the patterns this guide eliminates upfront. The visible Risk alerts: Availability — 2 Critical panel is the platform's own confirmation that this particular app is hitting at least two anti-patterns from the catalog above; clicking through shows which ones (most commonly single-instance HA, missing health check, or SNAT/timeout patterns flagged by the platform). The seven Troubleshooting categories map cleanly onto the anti-pattern table sections — Availability and Performance reveals single-instance and missing-health-check anti-patterns, Configuration and Management surfaces slot-setting and EasyAuth misconfigurations, Networking exposes SNAT and outbound dependency anti-patterns, and Risk Assessments is the closest platform equivalent to a pre-go-live anti-pattern review. Treat any non-zero Risk alerts count as a release blocker, not as ambient noise.

Category Deep Dive¶

Configuration Anti-Patterns¶

Common symptoms:

Unexpected behavior after slot swap
Environment mismatch between staging and production
Leaked credentials in logs or scripts

Remediation baseline:

Configuration inventory with ownership
Slot setting review in each release checklist
Secret source policy (Key Vault by default)

Deployment Anti-Patterns¶

Common symptoms:

Deployments succeed but app fails after startup
Rollback takes too long due to missing artifact provenance
Frequent hotfixes with unclear change history

Remediation baseline:

Immutable artifact promotion model
Slot-based validation and controlled swap
Rollback rehearsal before high-risk changes

Networking Anti-Patterns¶

Common symptoms:

Intermittent 5xx under moderate load
Dependency connection timeouts at scale
Inconsistent behavior across instances

Remediation baseline:

Reuse outbound connections
Add explicit timeout and retry budgets
Validate network design with load tests

Security Anti-Patterns¶

Common symptoms:

Secrets copied into multiple systems
Over-permissioned identities
Drift between intended and actual TLS/security config

Remediation baseline:

Managed identity everywhere possible
Key Vault reference policy
Security baseline validation in CI/CD

Performance Anti-Patterns¶

Common symptoms:

Tail latency spikes during routine operations
Scale-out fails to improve user experience
High variability between instances

Remediation baseline:

Minimum two instances in production
Autoscale tied to meaningful metrics
Session/state design aligned to horizontal scaling

Monitoring Anti-Patterns¶

Common symptoms:

Incidents discovered by users first
Unclear root cause due to missing telemetry
Slow post-incident analysis

Remediation baseline:

Health, error, and latency dashboards
Actionable alert thresholds with ownership
Deployment and incident timeline correlation

Governance Pattern¶

Use an anti-pattern review gate in architecture and change workflows:

Design review checklist must include this document
Production change approval requires anti-pattern attestation
Exceptions require documented risk acceptance and expiry date

Operational Review Checklist¶

Before production release, validate:

No critical anti-pattern remains unresolved
All high-risk exceptions have mitigation owners
Deployment, monitoring, and rollback controls are tested
Runbooks match current architecture

# Example: verify key settings surface for review
az webapp config appsettings list \
    --resource-group $RG \
    --name $APP_NAME

Common Anti-Patterns¶

How to Use This Guide¶

Anti-Pattern Detection Flow¶

Anti-Pattern Catalog¶

High-Risk Anti-Patterns to Eliminate First¶

Portal view: Diagnose and solve problems¶

Category Deep Dive¶

Configuration Anti-Patterns¶

Deployment Anti-Patterns¶

Networking Anti-Patterns¶

Security Anti-Patterns¶

Performance Anti-Patterns¶

Monitoring Anti-Patterns¶

Governance Pattern¶

Operational Review Checklist¶

See Also¶

Sources¶