Hybrid Connectivity Issues¶
1. Summary¶
Hybrid failures across VPN or ExpressRoute usually come from tunnel health, BGP route learning, prefix advertisement, or MTU and provider-path issues.
mermaid graph TD A[Cloud to on-prem path] --> B{Tunnel up?} B -->|No| C[IKE / IPSec / provider issue] B -->|Yes| D{BGP session established?} D -->|No| E[ASN / peer IP / routing issue] D -->|Yes| F{Expected routes learned?} F -->|No| G[Prefix advertisement issue] F -->|Yes| H[MTU / policy / target issue]
2. Common Misreadings¶
- "Tunnel up means routing is healthy."
- "All missing prefixes are Azure route bugs."
- "Packet loss on hybrid paths is always a VPN problem."
3. Competing Hypotheses¶
- H1: Tunnel is down because of IKE/IPSec mismatch or provider issue.
- H2: BGP session is down or misconfigured.
- H3: Required prefixes are not being advertised or learned.
- H4: MTU, MSS, or asymmetric policy/path issues affect only some traffic.
4. What to Check First¶
| Validation point | View | Expected good signal |
|---|---|---|
| Tunnel health | Gateway connection status | Connected |
| Route learning | Effective routes and BGP tables | On-prem prefixes present |
| BGP state | Connection diagnostics | Established session |
| Path continuity | Connection troubleshoot / traceroute | Probe reaches on-prem target |
5. Evidence to Collect¶
- Gateway connection status and timestamps.
- BGP peer state, ASN, and learned routes.
- Local Network Gateway prefixes or on-prem advertisements.
- Traceroute / Connection Monitor results across the hybrid path.
- MTU / fragmentation evidence for problematic workloads.
6. Validation¶
| Hypothesis | Signals that support | Signals that weaken |
|---|---|---|
| H1 Tunnel down | disconnected status, IKE/IPSec errors | tunnel stable and connected |
| H2 BGP down | BGP not established, wrong ASN or peer IP | BGP established normally |
| H3 Missing routes | required prefixes absent from learned routes | prefixes present and active |
| H4 MTU / asymmetry | partial failures, fragmentation, one-way issues | clean symmetric path and no frag signs |
7. Root Cause Patterns¶
- Phase 1/2 settings drifted between Azure and on-prem device.
- Wrong BGP ASN or peer IP prevented route learning.
- Static prefixes in the Local Network Gateway were stale.
- MTU or asymmetric return path caused partial connectivity and retransmits.
8. Immediate Mitigations¶
- Restore tunnel health before changing downstream policies.
- Correct BGP peer settings and advertised prefixes.
- Update Local Network Gateway entries for static-route scenarios.
- Reduce MTU or clamp MSS when fragmentation is suspected.
9. Prevention¶
- Keep hybrid connection parameters version-controlled and documented.
- Validate learned routes after every gateway or on-prem change.
- Continuously monitor tunnel and BGP state for critical prefixes.
See Also¶
- Peering and Routing Issues
- Hybrid Connectivity Basics
- VPN and ExpressRoute Basics
- Hybrid Connectivity Best Practices