Overview¶
Azure Monitor provides a full stack monitoring solution for your applications and infrastructure. This guide provides a structured approach to implementing observability using native Azure tools.
Core Capabilities¶
Azure Monitor collects and analyzes telemetry from your cloud and on-premises environments. It helps you understand how your applications perform and proactively identifies issues affecting them and the resources they depend on.
graph TD
A[Sources] --> B[Azure Monitor]
B --> C[Insights]
B --> D[Visualize]
B --> E[Analyze]
B --> F[Respond]
B --> G[Integrate]
subgraph "Data Platform"
B1[(Metrics)]
B2[(Logs)]
B3[(Traces)]
B4[(Changes)]
end Guide Scope¶
This guide covers the implementation and operational aspects of Azure Monitor. It focuses on:
- Platform Configuration: Setting up Log Analytics workspaces, Data Collection Rules, and Managed Identities.
- Service Monitoring: Specific strategies for AKS, App Service, Functions, and Virtual Machines.
- Operational Excellence: Alerting strategies, dashboarding, and cost management.
- Advanced Troubleshooting: Using Kusto Query Language (KQL) and specialized playbooks.
Target Audience¶
- Platform Engineers: Responsible for the shared monitoring infrastructure and governance.
- Developers: Implementing application-level instrumentation and tracing.
- SRE/Operations: Managing alerts, responding to incidents, and ensuring system reliability.
- Architects: Designing resilient systems with observability in mind.
Structure¶
The guide is organized into logical sections that follow the monitoring lifecycle:
- Start Here: Orientation, role-based learning paths, and repository navigation.
- Platform: Core Azure Monitor architecture, data platform, workspace design foundations, and security concepts.
- Best Practices: Recommended patterns for retention, alerting, access control, tagging, and cost optimization.
- Service Guides: Tailored monitoring implementations for App Service, Container Apps, Functions, AKS, and virtual machines.
- Operations: Repeatable day-2 runbooks for workspaces, diagnostics, alert rules, dashboards, exports, and cost control.
- Troubleshooting: Decision trees, evidence mapping, KQL query packs, and incident playbooks.
- Reference: Quick lookup material for CLI commands, KQL syntax, diagnostic tables, and platform limits.