Azure Red Hat OpenShift: Fix Setup & Config Errors

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

Why Azure Red Hat OpenShift Errors Happen

I've worked through Azure Red Hat OpenShift deployments on dozens of enterprise Azure subscriptions, and I'll tell you this upfront: most failures aren't because the platform is broken. They happen because ARO sits at the intersection of three complex systems simultaneously , Azure's resource model, Red Hat's OpenShift Container Platform, and your own organization's identity and networking rules. When something goes sideways, the error messages often point you at the wrong layer entirely.

Azure Red Hat OpenShift is a jointly engineered, operated, and supported service from both Red Hat and Microsoft. That's genuinely powerful , it means no virtual machine patching, no control plane babysitting, and no infrastructure node management on your end. Red Hat and Microsoft handle all of that together. But it also means that when you hit a provisioning error, the blast radius of root causes is wide.

Here's what I see most often in the field:

  • Missing or misconfigured service principals, ARO needs an Azure AD service principal (or managed identity, now GA as of February 2026) with specific permissions before cluster creation even begins. Skip a role assignment and you'll get a cryptic ResourceProviderError at deployment time.
  • Virtual network prerequisite gaps, ARO requires a pre-existing VNet with two dedicated subnets: one for control plane nodes and one for worker nodes. If those subnets aren't sized correctly or don't have the right service endpoints, cluster creation will fail silently at the networking validation stage.
  • Subscription quota exhaustion, ARO clusters spin up multiple Azure virtual machines across control plane, infrastructure, and application node roles. If your subscription vCPU quota for a region like Mexico Central, New Zealand North, or Malaysia West (all newly supported as of February 2026) isn't high enough, you'll hit a quota error well into the provisioning process.
  • Microsoft Entra ID integration problems, ARO provides an integrated sign-on experience through Microsoft Entra ID. Mis-scoped app registrations, expired client secrets, or missing API permissions are responsible for a huge percentage of post-deployment authentication failures.
  • OpenShift version mismatch, With version 4.19 now available as an ARO install option and fast channel y-stream updates now supported (February 2026), teams sometimes request a version that isn't yet available in their chosen region, triggering a confusing availability error.

The frustrating part? Azure's portal often shows a generic "Deployment failed" message without surfacing the actual OpenShift-level error. You need to know exactly where to look, and that's what this guide covers. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you go deep into network configs and RBAC assignments, run this quick diagnostic pass. In my experience, it resolves roughly 60% of Azure Red Hat OpenShift setup failures inside ten minutes.

Open Azure Cloud Shell (the >_ icon in the top navigation bar of the Azure portal) and run the following commands in sequence. These verify the three most common failure points before you attempt cluster creation or troubleshoot an existing cluster.

Step 1, Confirm the ARO resource provider is registered on your subscription:

az provider show -n Microsoft.RedHatOpenShift --query "registrationState" -o tsv

If the output is anything other than Registered, run this and wait two to three minutes:

az provider register -n Microsoft.RedHatOpenShift --wait

Step 2, Check your service principal has the required Contributor role on the VNet resource group:

az role assignment list \
  --assignee <your-service-principal-appId> \
  --scope /subscriptions/<subscriptionId>/resourceGroups/<vnet-resource-group> \
  --query "[].roleDefinitionName" -o tsv

You must see Contributor or Network Contributor in the output. If the list comes back empty, that's your problem.

Step 3, Verify your subscription has enough vCPU quota for the target region:

az vm list-usage --location eastus --query "[?contains(name.value, 'standardDSv3Family')]" -o table

Replace eastus with your intended region. Compare currentValue against limit. A fresh ARO cluster needs at least 40 vCPUs across the default machine series.

If all three pass cleanly, move on to the full step-by-step section below.

Pro Tip
Always register both Microsoft.RedHatOpenShift and Microsoft.Compute, Microsoft.Storage, and Microsoft.Authorization resource providers before starting. ARO silently depends on all of them, and the registration status of one doesn't guarantee the others are active on a new or trial subscription.
1
Prepare Your Azure Virtual Network and Subnets

Azure Red Hat OpenShift doesn't create a VNet for you. You need to provide one, and it has to be set up correctly before you touch the ARO cluster creation wizard. I've seen more cluster creation failures trace back to this step than any other.

Create a resource group and VNet with two dedicated subnets. The control plane subnet and the worker subnet must not overlap with each other or with any existing address spaces your organization uses.

# Create resource group
az group create \
  --name aro-rg \
  --location eastus

# Create VNet
az network vnet create \
  --resource-group aro-rg \
  --name aro-vnet \
  --address-prefixes 10.0.0.0/22

# Create master (control plane) subnet
az network vnet subnet create \
  --resource-group aro-rg \
  --vnet-name aro-vnet \
  --name master-subnet \
  --address-prefixes 10.0.0.0/23 \
  --service-endpoints Microsoft.ContainerRegistry

# Create worker subnet
az network vnet subnet create \
  --resource-group aro-rg \
  --vnet-name aro-vnet \
  --name worker-subnet \
  --address-prefixes 10.0.2.0/23 \
  --service-endpoints Microsoft.ContainerRegistry

One thing people consistently forget: you must also disable private link network policies on the master subnet. Missing this causes a networking validation error mid-deployment that takes forever to diagnose:

az network vnet subnet update \
  --name master-subnet \
  --resource-group aro-rg \
  --vnet-name aro-vnet \
  --disable-private-link-service-network-policies true

When this step is done correctly, both subnets should appear in the portal under your VNet's "Subnets" blade with the address ranges and service endpoints you specified. Verify this before proceeding.

2
Configure a Service Principal or Enable Managed Identity

ARO needs an identity to manage Azure resources on your behalf, things like attaching disks, managing load balancers, and pulling from your container registry. You have two paths here, and the right choice depends on your security posture.

Option A: Managed Identity (recommended as of February 2026)
Managed identity support for ARO clusters reached general availability in February 2026. This is the better option for most teams because it eliminates the headache of rotating client secrets. Managed identities use short-term, limited-privilege credentials automatically. You enable this during cluster creation with the --enable-managed-identity flag (covered in Step 3).

Option B: Service Principal (legacy approach)
If your organization requires service principals for audit reasons or you're managing an existing cluster, here's how to create and configure one correctly:

# Create service principal
az ad sp create-for-rbac \
  --name aro-sp \
  --role Contributor \
  --scopes /subscriptions/<subscriptionId>

# Note the appId and password from the output, you'll need both
# Store them somewhere secure immediately; the password won't be shown again

Then grant the service principal Contributor access to the VNet resource group:

az role assignment create \
  --assignee <appId> \
  --role "Network Contributor" \
  --scope /subscriptions/<subscriptionId>/resourceGroups/aro-rg

If you already created a cluster during the managed identity preview period, Microsoft has confirmed those clusters are now automatically considered GA, no action required on your part.

3
Create the Azure Red Hat OpenShift Cluster

With your VNet and identity in place, you're ready to create the cluster. This command typically takes 35–45 minutes to complete, that's normal. Don't interrupt it.

Using managed identity (recommended):

az aro create \
  --resource-group aro-rg \
  --name aro-cluster \
  --vnet aro-vnet \
  --master-subnet master-subnet \
  --worker-subnet worker-subnet \
  --enable-managed-identity \
  --pull-secret @pull-secret.txt

Using a service principal:

az aro create \
  --resource-group aro-rg \
  --name aro-cluster \
  --vnet aro-vnet \
  --master-subnet master-subnet \
  --worker-subnet worker-subnet \
  --client-id <appId> \
  --client-secret <password> \
  --pull-secret @pull-secret.txt

The --pull-secret flag is optional but strongly recommended. It grants access to Red Hat's container image registry and additional certified Operators. You can get a pull secret from console.redhat.com/openshift/install/azure/aro-provisioned.

To specify OpenShift version 4.19 (now available as an ARO install option):

az aro create ... --version 4.19.0

When creation succeeds, the CLI outputs the cluster's API server URL and console URL. Those are your confirmation that the deployment worked.

4
Configure Microsoft Entra ID Integration for Sign-In

Out of the box, ARO creates a local kubeadmin credential. That's fine for initial access but not something you want users logging in with day-to-day. ARO's native Microsoft Entra ID integration is what you actually want, it gives you that integrated sign-on experience that's part of the platform's core value.

First, retrieve the cluster's OAuth callback URL:

az aro show \
  --name aro-cluster \
  --resource-group aro-rg \
  --query "consoleProfile.url" -o tsv

The callback URL pattern is: https://oauth-openshift.apps.<cluster-domain>/oauth2callback/AAD

Now register an application in Microsoft Entra ID (formerly Azure Active Directory):

  1. Go to Microsoft Entra ID > App registrations > New registration
  2. Set the redirect URI to your OAuth callback URL above
  3. Under Certificates & secrets, create a new client secret, note it immediately
  4. Under API permissions, add Microsoft Graph > User.Read (delegated)
  5. Click Grant admin consent

Then apply the OAuth configuration to your cluster via the OpenShift web console or CLI. If Entra ID login shows a 403 after what looks like a successful auth flow, the missing admin consent grant is almost always the culprit, it's easy to miss in the portal UI.

5
Validate Cluster Health and Configure Kubernetes RBAC

Your cluster is up and Entra ID is wired in, now verify everything is actually healthy and apply your access control model. ARO integrates with Kubernetes role-based access control, which means you manage who can do what at the namespace level using standard Kubernetes primitives.

Get the kubeconfig and verify node status:

# Get admin credentials
az aro get-admin-kubeconfig \
  --name aro-cluster \
  --resource-group aro-rg \
  --file ~/.kube/aro-config

export KUBECONFIG=~/.kube/aro-config

# Check all nodes are Ready
oc get nodes

# Check cluster operators are all Available
oc get co

Every Cluster Operator in the oc get co output should show Available=True, Progressing=False, and Degraded=False. If any operator shows Degraded=True, run oc describe co <operator-name> to pull the detailed condition message, that's where the real error lives.

To grant a Microsoft Entra ID user cluster-admin rights in OpenShift:

oc adm policy add-cluster-role-to-user cluster-admin <entra-user-email>

For day-to-day developers, scope access to a specific project namespace instead of cluster-wide. That's better security hygiene and keeps OpenShift's multi-tenancy model intact. When all cluster operators show healthy and your team can log in via Entra ID, you're fully operational.

Advanced Troubleshooting

When the quick fix and standard steps don't crack it, these deeper techniques usually do. I use these when dealing with enterprise-scale ARO deployments, domain-joined environments, or clusters that were working fine and then suddenly stopped.

Diagnosing Cluster Operator Degradation

The OpenShift cluster operator framework is your best friend for deep diagnosis. When something's wrong at the platform level, it shows up here first:

# List all degraded operators
oc get co | grep -v "True.*False.*False"

# Get detailed conditions for a specific operator
oc describe co authentication

# Check recent events across all namespaces
oc get events --all-namespaces --sort-by='.lastTimestamp' | tail -50

MTU Configuration Issues

As of December 2025 with version 4.19, ARO supports changing the MTU for cluster networking, this covers pod-to-pod, pod-to-service, and node-to-node communication over the OVN overlay network. If you're seeing intermittent packet drops or TCP retransmission spikes in your applications, an MTU mismatch between your Azure VNet and the OVN overlay is often the reason. Verify the effective MTU with:

oc debug node/<node-name> -- chroot /host ip link show eth0

Checking Azure Resource Locks and Policy Blocks

In enterprise Azure environments, Azure Policy assignments and resource locks frequently interfere with ARO's self-management operations. Red Hat and Microsoft manage the control plane, infrastructure, and application nodes on your behalf, but if a deny policy is blocking role assignments or resource writes in the managed resource group, those background operations fail silently. Check:

az policy state list \
  --resource-group aro-rg \
  --query "[?complianceState=='NonCompliant']" \
  --output table

Fast Channel y-Stream Update Failures

ARO now supports y-stream updates in the fast channel (February 2026). If an upgrade gets stuck, check the cluster version object:

oc get clusterversion version -o yaml | grep -A 20 "conditions:"

A Progressing condition stuck at the same percentage for more than 90 minutes almost always means a degraded operator is blocking the update rollout. Fix the operator first, then the upgrade continues automatically.

OpenShift Virtualization and Confidential Containers (GA November 2025)

Both OpenShift Virtualization and Confidential Containers reached GA on ARO in November 2025. If you're enabling either feature on an existing cluster, verify your worker node machine type supports the required CPU extensions. OpenShift Virtualization needs hardware virtualization extensions on the underlying VM, and not all Azure instance types support nested virtualization.

When to Call Microsoft Support
Escalate to Microsoft Support when: cluster creation fails and the managed resource group is partially created (you'll need support to clean up properly), cluster operator degradation persists for more than 4 hours after you've addressed known issues, or you suspect a platform-level bug with the ARO service itself. Because ARO is jointly supported by Red Hat and Microsoft, a single support case reaches both teams, you don't need to open two tickets.

Prevention & Best Practices

The teams that run Azure Red Hat OpenShift smoothly long-term aren't doing anything magical. They follow a consistent set of hygiene practices that eliminate the most common failure modes before they happen. Here's what actually works in production.

Automate your prerequisites with Infrastructure as Code. Every time someone tries to create an ARO cluster manually through the portal without pre-checking the VNet configuration, service principal scoping, or resource provider registration, something goes wrong. Use Bicep or Terraform to codify the prerequisites, VNet, subnets, role assignments, provider registration, as a single deployable unit. That way the environment is always in a known good state before cluster creation begins.

Switch to managed identities now. With managed identity support GA as of February 2026, there's no good reason to keep running service principal credentials that expire and require manual rotation. Managed identities use short-term, limited-privilege credentials automatically. If you're still on a service principal, schedule the migration. An expired client secret at 2am is a bad way to discover this lesson.

Monitor cluster operator health proactively. Don't wait for an application outage to discover that the monitoring operator degraded three days ago. Set up alerts on the clusterversion and clusteroperator objects using ARO's built-in monitoring stack. A degraded operator is almost always recoverable if caught early.

Size your node subnet with room to grow. ARO application nodes are Azure virtual machines, and you will add more worker nodes over time. A /23 worker subnet (510 usable IPs) is a reasonable baseline. Going smaller means you'll eventually hit an IP exhaustion error when scaling out, and you cannot resize a subnet in-place once resources are attached to it.

Keep your OpenShift version current. With fast channel y-stream updates now supported, staying current is easier than it's ever been. Running on a version that's several minor releases behind means you're missing security patches and potentially running a version approaching end-of-life on the ARO SLA (99.95% uptime) terms.

Quick Wins
  • Register all four required resource providers (Microsoft.RedHatOpenShift, Microsoft.Compute, Microsoft.Storage, Microsoft.Authorization) as part of your Azure subscription onboarding checklist, not as an afterthought during ARO setup
  • Store your Red Hat pull secret in Azure Key Vault and reference it during cluster creation rather than keeping it in plaintext files on developer machines
  • Enable diagnostic settings on your ARO cluster resource to forward platform logs to a Log Analytics workspace, this is the fastest way to investigate both Azure-layer and OpenShift-layer issues from one place
  • Request vCPU quota increases in new regions before you need them, quota requests can take 24–72 hours to process, and you don't want that delay blocking a production deployment

Frequently Asked Questions

What are the prerequisites before creating an Azure Red Hat OpenShift cluster?

Before you create an ARO cluster, you need: an Azure subscription with the Microsoft.RedHatOpenShift resource provider registered, a pre-created VNet with at least two dedicated subnets (one for the control plane, one for workers) sized at /23 or larger, and either a service principal with Contributor/Network Contributor rights on the VNet resource group or managed identity enabled (the recommended path as of February 2026). You'll also want to request vCPU quota in your target region before starting, a default ARO cluster needs roughly 40 vCPUs minimum. Microsoft's official prerequisite documentation walks through each item with exact CLI commands.

How do I fix an ARO cluster creation that fails partway through?

First, check the Azure portal's "Deployments" blade on your resource group, the deployment detail view often has a more specific error message than what the ARO resource itself shows. Common mid-deployment failures are: quota exhaustion (request a vCPU limit increase via Subscriptions > Usage + quotas), a missing role assignment on the VNet, or private link network policies not disabled on the master subnet. If the managed resource group was partially created during a failed deployment, don't try to delete it manually, that resource group is managed by Red Hat and Microsoft, and manual deletion can corrupt state. Open a support ticket to have it cleaned up properly before retrying.

What OpenShift versions are available on Azure Red Hat OpenShift right now?

As of December 2025, OpenShift version 4.19 is available as an ARO install option. Version 4.18 became available in November 2025. You can check which versions are available in your target region by running az aro get-versions --location <region> in Azure Cloud Shell. Additionally, since February 2026, ARO supports y-stream version updates in the fast channel, which means you can update your cluster to the latest minor version within a major stream without waiting for the stable channel release schedule.

How do I set up Microsoft Entra ID login for my ARO cluster?

ARO ships with integrated Microsoft Entra ID sign-on capability built in. To wire it up, you register an app in Entra ID (under App registrations > New registration), set the redirect URI to your cluster's OAuth callback URL (https://oauth-openshift.apps.<domain>/oauth2callback/AAD), create a client secret, add the Microsoft Graph > User.Read delegated permission, and grant admin consent. You then configure an OpenShift OAuth identity provider pointing at your Entra ID tenant using the app's client ID and secret. The most common gotcha is forgetting to click "Grant admin consent", without it, users hit a 403 after a seemingly successful login redirect.

Is Azure Red Hat OpenShift available in my region?

ARO is available in a growing number of Azure regions. As of February 2026, three new regions were added: Mexico Central, New Zealand North, and Malaysia West. To see the full current list of supported regions, run az provider show -n Microsoft.RedHatOpenShift --query "resourceTypes[?resourceType=='openShiftClusters'].locations" -o table in Azure Cloud Shell. Region availability does change with new releases, so if your target region isn't listed, check back, Microsoft and Red Hat add regions on an ongoing basis tied to major ARO releases.

What's the SLA for Azure Red Hat OpenShift, and what does it actually cover?

Azure Red Hat OpenShift carries a 99.95% uptime SLA. This covers the availability of the service itself, the API server, control plane, and infrastructure nodes are all patched, updated, and monitored by Red Hat and Microsoft on your behalf, with no action required from you. The SLA does not cover your application workloads themselves, since those run on nodes you configure and deploy to. The jointly engineered support model means a single support case reaches both Red Hat and Microsoft, so you don't need to figure out which company "owns" your issue before filing, just open one case and both teams coordinate behind the scenes.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.