Skip to content

Node Not Ready

1. Summary

A node marked NotReady is a cluster-capacity and reliability risk. The cause may be kubelet health, CNI problems, resource pressure, or Azure VM-level issues.

flowchart TD
    A[Symptom] --> B[Hypotheses]
    B --> C[Evidence]
    C --> D[Disprove weak paths]
    D --> E[Mitigation]

2. Common Misreadings

  • The first visible symptom is the root cause.
  • Restarting the pod proves the issue is fixed.
  • If one namespace is affected, the cluster is healthy.

3. Competing Hypotheses

  • H1: Kubelet or node services are unhealthy.
  • H2: Disk, memory, or PID pressure caused readiness degradation.
  • H3: CNI or DNS components on the node failed.
  • H4: Underlying VM or network resource issues exist in Azure.

4. What to Check First

kubectl get nodes
kubectl describe node <node-name>
kubectl get pods -n kube-system -o wide

5. Evidence to Collect

  • Node conditions and taints.
  • Recent events tied to the node.
  • kube-system pod health on the affected node.
  • Azure VMSS instance or NIC status if the issue persists.

6. Validation and Disproof by Hypothesis

  • If pressure conditions are present, resource exhaustion is more likely than API auth issues.
  • If only one node in one pool is affected, compare it to healthy nodes in the same pool.
  • If all nodes in a pool degrade together, inspect pool-wide image or network changes.

7. Likely Root Cause Patterns

  • Resource pressure from runaway workloads.
  • CNI/daemonset failure after upgrade.
  • VMSS instance issues or subnet-level networking trouble.
  • Node image drift or failed extension updates.

8. Immediate Mitigations

  • Cordon and drain if the node is unstable.
  • Scale the pool out if capacity is tight.
  • Repair or replace the node if it does not recover quickly.
  • Validate daemonset health after recovery.

9. Prevention

  • Alert on node conditions before workloads are impacted.
  • Keep daemonsets and node images current.
  • Review pool isolation for noisy workloads.

See Also

Sources