Skip to content

Dedicated GPU Profiles

Dedicated GPU profiles are the Azure Container Apps option for GPU-backed workloads that need reserved hardware instead of serverless GPU allocation. Microsoft Learn does not label these dedicated GPU profiles as preview, but it does document important region and capacity limits.

Main Content

Capacity and region limits apply

Microsoft Learn states that GPU-enabled Dedicated profiles are available only in select regions and that capacity is allocated on a per-case basis. Plan capacity requests early and validate regional availability before you commit to a design.

Current GPU profile names

GPU model Current Dedicated profile names Allocation
A100 NC24-A100, NC48-A100, NC96-A100 Per node

Microsoft Learn also lists serverless Consumption GPU profile names separately:

  • Consumption-GPU-NC24-A100
  • Consumption-GPU-NC8as-T4

That means T4 currently appears in Consumption GPU profiles, not in the Dedicated GPU list reviewed for this guide.

Placement model

flowchart TD
    ENV[Workload profiles environment] --> GP[Dedicated GPU profile]
    GP --> APP1[Inference API]
    GP --> APP2[Batch scoring worker]
    APP1 --> N1[GPU-backed nodes]
    APP2 --> N1
    N1 --> B[Dedicated plan instance billing]

When Dedicated GPU is the better fit

Choose Dedicated GPU profiles when you need:

  • Stable GPU-backed inference capacity.
  • Multiple apps sharing the same dedicated GPU node pool.
  • A reserved environment design instead of per-replica serverless GPU billing.

Typical examples include:

  • Real-time model inference APIs.
  • Batch image or document processing.
  • GPU-heavy background workers with predictable utilization.

Cost considerations

Dedicated GPU placement changes the cost model:

  • Billing follows workload profile instances, not individual apps.
  • Extra instances add cost as profiles scale out.
  • Dedicated plan management charges apply when you use dedicated workload profiles.

Compare serverless GPU and dedicated GPU deliberately

If your workload is sporadic, Consumption GPU may fit better. If you need reserved GPU capacity or multiple apps sharing the same GPU pool, Dedicated GPU is usually the cleaner design.

See Also

Sources