Skip to content

Resource Health (Heartbeat Checks)

Monitor the heartbeat signal of virtual machines and on-premises servers that have the Log Analytics agent or Azure Monitor agent installed. Frequent heartbeats indicate that the agent is healthy and communicating with the platform.

Scenario

You want to quickly see which virtual machines haven't reported a heartbeat in the last 15 minutes, which might indicate the VM is down or the agent has stopped functioning.

KQL Query

Heartbeat
| where TimeGenerated > ago(24h)
| summarize 
    LastHeartbeat = max(TimeGenerated) 
    by Computer, OSType, ComputerEnvironment
| extend 
    TimeSinceLastHeartbeat = now() - LastHeartbeat
| where TimeSinceLastHeartbeat > 15m
| order by TimeSinceLastHeartbeat desc

Data Flow

graph TD
    A[Heartbeat table] --> B[Filter by 24h]
    B --> C[Summarize last timestamp by Computer]
    C --> D[Calculate time since last heartbeat]
    D --> E[Filter for gap > 15m]
    E --> F[Order by longest outage]

Sample Output

Computer OSType ComputerEnvironment LastHeartbeat TimeSinceLastHeartbeat
vm-prod-db-01 Linux Azure 2024-03-24 09:15 01:30:00.000
vm-dev-web-02 Windows Azure 2024-03-24 10:40 00:25:15.000
onprem-srv-05 Linux Non-Azure 2024-03-24 10:55 00:18:45.000

How to Read This

Focus on the TimeSinceLastHeartbeat. Large gaps for Azure virtual machines should be cross-referenced with the Resource Health dashboard in the portal. For on-premises servers, check for network connectivity or local agent status.

Limitations

  • Heartbeat frequency depends on the agent configuration (typically every 1 minute).
  • Lack of a heartbeat doesn't always mean the VM is down; it could be a transient networking issue.
  • This query assumes that the agents have successfully reported at least one heartbeat in the last 24 hours.

Common Variations

Recently healthy resources

Heartbeat
| where TimeGenerated > ago(15m)
| summarize LastHeartbeat = max(TimeGenerated) by Computer, OSType
| order by LastHeartbeat desc

Health grouped by environment

Heartbeat
| where TimeGenerated > ago(24h)
| summarize ResourceCount = dcount(Computer), LastHeartbeat = max(TimeGenerated) by ComputerEnvironment, OSType
| order by ResourceCount desc

Interpretation Guide

Pattern Indicates Action
One machine has long heartbeat gap Host-specific issue Check VM power state, AMA agent, and network
Many Azure VMs stale together Shared platform or routing issue Check workspace ingestion and regional incidents
Only Non-Azure machines stale Hybrid connectivity issue Check Arc, proxy, and firewall path

For the full investigation workflow, see No Data in Workspace.

See Also

Sources