Data Platform¶

Azure Monitor data platform combines specialized stores for metrics, logs, traces, and curated application telemetry. The design goal is not to force every signal into one engine, but to keep each signal in the storage model that best matches latency, scale, correlation, and cost requirements.

Architecture Overview¶

Azure Monitor data platform is easier to understand when separated into collection, routing, storage, query, and action layers.

flowchart TD
    SRC[Telemetry sources] --> COL[Collection mechanisms]
    COL --> ROUTE[Routing and transformation]
    ROUTE --> MET[(Metrics store)]
    ROUTE --> LOG[(Log Analytics workspace)]
    ROUTE --> APP[(Workspace-based Application Insights data)]
    MET --> CONSUME1[Metrics Explorer / Grafana / metric alerts]
    LOG --> CONSUME2[Log Analytics / workbooks / log alerts]
    APP --> CONSUME3[Application Insights experiences]
    LOG --> EXPORT[Export / Event Hubs / Storage]

The data platform has four practical responsibilities.

Accept telemetry from many producers
- Azure resources emit platform metrics automatically.
- Azure resources emit resource logs when diagnostic settings are configured.
- Azure Monitor Agent and the Logs Ingestion API emit workspace-bound log streams.
- Application Insights and Azure Monitor OpenTelemetry emit application telemetry.
Normalize data into supported streams
- Metrics are stored as time-series values with optional dimensions.
- Logs are stored as rows in Kusto-backed tables.
- Application telemetry becomes queryable in application-specific tables such as requests, dependencies, and exceptions.
Expose the right query model
- Metrics use metric namespaces, aggregations, intervals, and dimensions.
- Logs use Kusto Query Language, which supports filtering, parsing, joins, and summarization.
Support response and export
- Alerts, autoscale, workbooks, dashboards, and partner tools consume the stored data.
- Export patterns move selected data to Event Hubs or Storage when operational or compliance workflows need downstream copies. A production architecture review should always answer three questions.
Which signals must be near real time?
Which signals must retain rich context for investigation?
Which signals must be archived or shared outside Azure Monitor?

Data platform building blocks¶

Building block	Primary purpose	Typical data
Metrics store	Low-latency time series	CPU, latency, request count, availability
Log Analytics workspace	Investigative and operational log analytics	Resource logs, guest logs, application traces
Application Insights experience layer	Curated APM views on top of workspace-backed telemetry	Requests, dependencies, exceptions, availability
DCR and ingestion services	Route and transform logs before storage	AMA streams, custom logs, filtered records
Alerting layer	Evaluate conditions and trigger actions	Metric thresholds, KQL conditions, activity events

Core Concepts¶

Signal type decides the storage engine¶

Microsoft Learn consistently separates metrics, logs, and traces because the trade-offs are different. Metrics are optimized for speed and repeated evaluation. Logs are optimized for flexible analysis. Traces are specialized records that preserve operation context across services.

Metrics¶

Metrics are numerical measurements with a timestamp, metric name, namespace, and optional dimensions. Platform metrics are collected automatically for many Azure resources. Some scenarios also support custom metrics. Use metrics for: - Fast alert evaluation. - Visualization of trends over short to medium periods. - Autoscale and threshold-based automation. - Dimension-based slicing such as by status code, instance, or node. Be careful with metrics when: - You need per-event detail. - You need joins across resources. - You need to parse payloads or preserve long text values.

Logs¶

Logs are records with many columns and flexible schemas. The Kusto engine makes them suitable for correlation and investigation. Use logs for: - Root-cause analysis. - Correlation across subscriptions, workspaces, or resource types. - Parsing semi-structured data. - Long-term operational review. - Compliance-oriented retention and export patterns. Be careful with logs when: - You need very low alert latency. - The only condition is a simple threshold on a platform metric. - You have no plan to control ingestion volume.

Traces and application tables¶

Application Insights telemetry is still part of the Azure Monitor data platform. Requests, dependencies, exceptions, traces, page views, and browser timings are stored as structured tables in a workspace-based model. That means the telemetry is both application-centric and query-centric. Use traces when: - You need to follow one transaction through many services. - You need dependency timing, correlation IDs, and failure context. - You need performance analysis from the user transaction perspective rather than only resource health.

CLI example: inspect workspace data platform settings¶

az monitor log-analytics workspace show \
    --resource-group "$RG" \
    --workspace-name "$WORKSPACE_NAME" \
    --query "{name:name,location:location,sku:sku.name,retentionInDays:retentionInDays,features:features,workspaceCapping:workspaceCapping}" \
    --output json

Example output:

{
  "features": {
    "disableLocalAuth": false,
    "enableDataExport": true,
    "immediatePurgeDataOn30Days": false
  },
  "location": "koreacentral",
  "name": "law-prod-observability",
  "retentionInDays": 30,
  "sku": "PerGB2018",
  "workspaceCapping": {
    "dailyQuotaGb": -1,
    "dataIngestionStatus": "RespectQuota"
  }
}

This output shows the workspace as a data platform boundary. Retention, local authentication posture, and capping behavior all affect how logs are consumed and governed.

Data collection path decides schema and cost¶

The second design concept is that collection method shapes the resulting data. A resource log routed with diagnostic settings is not the same as a guest log collected with AMA. A transformed DCR stream is not the same as a raw application telemetry stream.

Common collection paths¶

Platform metrics path
- Source: Azure resource provider.
- Configuration: Usually automatic, sometimes with dimension or alert selection later.
- Store: Metrics store.
Diagnostic settings path
- Source: Supported Azure resource logs and exportable metrics.
- Configuration: Diagnostic setting on the resource.
- Store: Workspace, Storage, Event Hubs, or partner destination.
AMA plus DCR path
- Source: Guest OS and custom text logs.
- Configuration: Data collection rule plus resource association.
- Store: Workspace and supported metrics destinations.
Application Insights path
- Source: Instrumented application.
- Configuration: Connection string, SDK/OpenTelemetry, sampling, workspace linkage.
- Store: Workspace-based application tables. The same monitored service may use several paths. For example, an App Service application can emit platform metrics, resource logs, and application telemetry at the same time. Each path answers a different operational question.

CLI example: inspect metric definitions available in the metrics store¶

az monitor metrics list-definitions \
    --resource "$RESOURCE_ID" \
    --output table

Example output:

Name                          Unit       Primary Aggregation Type    Dimensions
----------------------------  ---------  --------------------------  -----------------------------
Percentage CPU                Percent    Average, Minimum, Maximum   VMName
Network In Total              Bytes      Total                       VMName
Network Out Total             Bytes      Total                       VMName
Disk Read Bytes               Bytes      Total                       LUN
Disk Write Bytes              Bytes      Total                       LUN

This is a pure metrics-platform view. If the signal you need is already here, you usually do not need to recreate it through log ingestion.

Table shape matters more than many teams expect¶

Log Analytics does not behave like a generic file bucket. Data lands in tables, and those table choices strongly influence KQL queries, retention decisions, and export patterns.

Common workspace table styles¶

Table style	Example	What it means
Azure-native application tables	`requests`, `dependencies`, `exceptions`	Curated application telemetry schema
Legacy/shared diagnostics table	`AzureDiagnostics`	Multi-resource table with broad schema
Resource-specific tables	`AppServiceHTTPLogs`, `StorageBlobLogs`	More structured resource provider output
Agent tables	`Perf`, `Heartbeat`, `Syslog`, `Event`	Guest OS and infrastructure visibility
Custom tables	`CustomAppAudit_CL`	Logs from custom ingestion or connectors
Choose resource-specific tables when supported because they are easier to query and often more predictable than highly generalized tables.

CLI example: query a workspace table to verify actual landing shape¶

az monitor log-analytics query \
    --workspace "$WORKSPACE_ID" \
    --analytics-query "Heartbeat | where TimeGenerated > ago(15m) | summarize LastSeen=max(TimeGenerated) by Computer, OSType" \
    --output table

Example output:

Computer        OSType    LastSeen
--------------  --------  ------------------------
vm-app-01       Linux     2026-04-05T08:21:37Z
vm-app-02       Linux     2026-04-05T08:21:12Z
vm-jump-01      Windows   2026-04-05T08:20:54Z

This validates that data is not only configured, but actually landing in a specific table with expected columns.

Data platform boundaries are architectural boundaries¶

A workspace is often treated as only a storage target. In practice it is also a boundary for: - RBAC and query access. - Regional placement. - Interactive retention settings. - Daily cap safety controls. - Data export and networking behavior. Likewise, the metrics store is a boundary for: - Dimension cardinality. - Aggregation model. - Retention window. - Alert latency expectations.

Data Flow¶

The Azure Monitor data platform uses several concurrent data flows. Understanding them helps explain why one signal appears in seconds while another appears after a delay.

Resource metric flow¶

Azure resource provider emits a point.
Azure Monitor metrics service stores it in the resource metric namespace.
Metrics Explorer, Grafana, autoscale, and metric alerts read the value. This is the lowest-friction path. It requires the least configuration and is usually the first operational signal available.

Resource log flow¶

Resource produces a log event in a supported category.
Diagnostic settings decide which categories and destinations are enabled.
Log lands in a workspace table, Storage account, Event Hubs namespace, or partner sink.
KQL queries, workbooks, and log alerts consume it. This path is category- and destination-dependent. Enabling everything without a purpose creates noisy workspaces and unnecessary spend.

AMA and DCR flow¶

Azure Monitor Agent runs on a machine or Arc-enabled server.
DCR declares sources, streams, transformations, and destinations.
Agent sends records to Azure Monitor ingestion endpoints.
Azure Monitor writes the records to target tables or metrics destinations. This path is flexible and is the preferred modern model for guest collection.

Application telemetry flow¶

Application code or auto-instrumentation emits telemetry.
Ingestion service enriches and processes the telemetry.
Workspace-backed tables store the data.
Application Insights experiences and KQL analytics use the result. Sampling and enrichment decisions affect both cost and diagnostic fidelity.

Flow comparison¶

Flow	Best for	Main latency profile	Main design control
Platform metrics	Threshold alerting and trend charts	Low	Scope, aggregation, dimensions
Diagnostic settings	Resource provider detail	Medium	Category and destination
AMA plus DCR	Guest and custom collection	Medium	Streams, transformation, destination
App Insights	APM and traces	Medium, with separate live features	Instrumentation and sampling

Data platform sequencing diagram¶

sequenceDiagram
    participant S as Source
    participant C as Collector
    participant P as Processing
    participant D as Data store
    participant U as User or alert
    S->>C: Emit telemetry
    C->>P: Route selected signal
    P->>D: Store metric or log record
    D->>U: Query, visualize, alert, export

Operational checkpoints in the flow¶

When data is missing, validate in this order.

Source emitting
- Is the application instrumented?
- Does the resource support the category?
Collection configured
- Is diagnostic setting enabled?
- Is the DCR associated to the resource?
Destination correct
- Did data go to the expected workspace?
- Is the region supported for the feature?
Consumption correct
- Are you querying the right table and time range?
- Are you using the correct metric namespace and dimension filters?

Integration Points¶

Azure Monitor data platform integrates with surrounding Azure services rather than replacing them.

Resource providers and diagnostic settings¶

Many Azure resource providers support metrics automatically and logs through diagnostic settings. This is the standard platform integration point for Azure-native services.

Application Insights and OpenTelemetry¶

Application Insights is a first-class consumer of the same data platform. The integration point is application instrumentation, not diagnostic settings.

Managed Prometheus and Grafana¶

Azure Monitor managed service for Prometheus stores Prometheus metrics for supported Kubernetes scenarios. Grafana can query Azure Monitor metrics, logs, and Prometheus data as visualization sources.

Event Hubs and Storage¶

Diagnostic settings and workspace export make the platform interoperable with SIEM, lakehouse, or long-term archival pipelines.

RBAC and Microsoft Entra ID¶

Workspace access, resource-context queries, and automation all depend on Microsoft Entra ID identities and Azure RBAC assignments.

Configuration Options¶

Configuration choices define behavior more than the storage engines do.

Core options to review¶

Area	Options to review	Why it matters
Workspace	SKU, retention, daily cap, networking, local auth	Cost, governance, access
Diagnostic settings	Categories, destination, per-resource rollout	Ingestion volume and usefulness
DCR	Sources, transformKql, destinations, associations	Filtering before storage
Application Insights	Workspace-based mode, sampling, connection string	App observability cost and fidelity
Alert rules	Signal type, evaluation frequency, window	Detection speed and reliability

CLI example: create a resource diagnostic setting¶

az monitor diagnostic-settings create \
    --name "send-to-law-prod" \
    --resource "$RESOURCE_ID" \
    --workspace "$WORKSPACE_ID" \
    --logs '[{"category":"AuditEvent","enabled":true}]' \
    --metrics '[{"category":"AllMetrics","enabled":true}]' \
    --output json

Example output:

{
  "id": "/subscriptions/<subscription-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.Sql/servers/sql-prod-01/providers/microsoft.insights/diagnosticSettings/send-to-law-prod",
  "logs": [
    {
      "category": "AuditEvent",
      "enabled": true
    }
  ],
  "metrics": [
    {
      "category": "AllMetrics",
      "enabled": true
    }
  ],
  "name": "send-to-law-prod",
  "workspaceId": "/subscriptions/<subscription-id>/resourceGroups/rg-monitoring-prod/providers/Microsoft.OperationalInsights/workspaces/law-prod-observability"
}

CLI example: inspect workspace table availability indirectly with a query¶

az monitor log-analytics query \
    --workspace "$WORKSPACE_ID" \
    --analytics-query "search * | where TimeGenerated > ago(10m) | summarize Records=count() by \$table | top 5 by Records desc" \
    --output table

Example output:

$Table                Records
--------------------  -------
Heartbeat             1542
AzureActivity          218
Perf                   176
requests                91
AppServiceHTTPLogs      44

This is useful after onboarding a new environment because it confirms where the largest current telemetry volumes are landing.

Pricing Considerations¶

Azure Monitor pricing is not one number. It is the sum of usage patterns across the data platform.

Major pricing factors¶

Log ingestion volume into workspaces.
Retention beyond included periods.
Data restoration or archive use where applicable.
Application Insights telemetry volume and sampling posture.
Alert rule count and type.
Export and downstream service cost. Microsoft Learn pricing guidance also separates native platform metrics from billed custom metrics and Prometheus-related data paths, so cost modeling should treat metrics and logs as different products instead of one blended line item.

Cost-aware design guidance¶

Send high-value fast-moving health signals to metrics and metric alerts first.
Use logs for context-rich investigation, not as a default destination for every possible event.
Use DCR transformations or selective categories to reduce low-value records before they are stored.
Sample application telemetry deliberately instead of sampling blindly.
Separate compliance archive requirements from interactive analytics requirements.

Anti-patterns that inflate cost¶

Enabling all diagnostic categories on all resources without ownership.
Retaining verbose trace data indefinitely in the primary workspace.
Building dashboard queries that scan very large tables for simple health checks.
Duplicating the same logs into several analytics systems without a retention reason.

Limitations and Quotas¶

Always confirm current quota pages on Microsoft Learn before final design sign-off. The most important practical constraints are architectural, not just numeric.

Important constraints¶

Metrics and logs have different retention and latency characteristics.
Not every resource provider supports resource-specific tables or export of all signals.
DCR transformations support many, but not unlimited, transformation patterns.
Application telemetry fidelity depends on instrumentation quality and sampling.
Cross-workspace queries are powerful, but they add complexity to saved queries and alerts.
Daily caps protect spend but can reduce visibility if used as a normal operating control.

Platform design implications¶

Constraint	Impact	Response
Metrics are summarized by interval	You cannot recover per-event context from a metric point	Use logs for forensic detail
Logs cost depends on volume	Verbose sources can become expensive	Filter at source or DCR where possible
Query power encourages wide scans	Expensive analytics can slow operations	Standardize efficient KQL patterns
Multi-workspace topologies add access complexity	Analysts may need cross-scope permissions and queries	Keep topology simple unless there is a real boundary need

Sources¶

https://learn.microsoft.com/en-us/azure/azure-monitor/fundamentals/data-sources
https://learn.microsoft.com/en-us/azure/azure-monitor/metrics/data-platform-metrics
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/data-platform-logs
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-workspace-overview
https://learn.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview
https://learn.microsoft.com/en-us/azure/azure-monitor/platform/diagnostic-settings
https://learn.microsoft.com/en-us/azure/azure-monitor/cost-usage

Data Platform¶

Architecture Overview¶

Data platform building blocks¶

Core Concepts¶

Signal type decides the storage engine¶

Metrics¶

Logs¶

Traces and application tables¶

CLI example: inspect workspace data platform settings¶

Data collection path decides schema and cost¶

Common collection paths¶

CLI example: inspect metric definitions available in the metrics store¶

Table shape matters more than many teams expect¶

Common workspace table styles¶

CLI example: query a workspace table to verify actual landing shape¶

Data platform boundaries are architectural boundaries¶

Data Flow¶

Resource metric flow¶

Resource log flow¶

AMA and DCR flow¶

Application telemetry flow¶

Flow comparison¶

Data platform sequencing diagram¶

Operational checkpoints in the flow¶

Integration Points¶

Resource providers and diagnostic settings¶

Application Insights and OpenTelemetry¶

Managed Prometheus and Grafana¶

Event Hubs and Storage¶

RBAC and Microsoft Entra ID¶

Configuration Options¶

Core options to review¶

CLI example: create a resource diagnostic setting¶

CLI example: inspect workspace table availability indirectly with a query¶

Pricing Considerations¶

Major pricing factors¶

Cost-aware design guidance¶

Anti-patterns that inflate cost¶

Limitations and Quotas¶

Important constraints¶

Platform design implications¶

See Also¶

Sources¶