Azure Local: Fix Deployment & Config Errors
Why Azure Local Deployment Problems Happen
I've seen this exact situation play out dozens of times: an IT team spins up Azure Local , Microsoft's distributed infrastructure solution that brings Azure capabilities on-premises via Azure Arc , and somewhere between the hardware validation phase and the first successful VM deployment, things fall apart. The error messages are cryptic. The portal says one thing, the CLI says another, and the logs feel like they were written for someone who already knows the answer.
Azure Local is genuinely powerful. It's designed to run virtual machines, containers, and select Azure services on customer-owned hardware, with Azure Arc acting as the unifying control plane whether you're connected to the cloud or not. But that power comes with real complexity, and the gap between "understanding what it does" and "getting it to actually work" trips up even experienced Azure architects.
The root causes I see most often break down into four buckets. First, prerequisite mismatches, the hardware or network configuration doesn't meet the baseline requirements before deployment even starts, but nothing warns you clearly until you're already three steps deep. Second, Arc registration failures, because Azure Arc is the control plane for Azure Local, any hiccup in your Arc onboarding cascades into deployment failures that look completely unrelated. Third, storage configuration errors in hyperconverged deployments, where the storage pool setup for the cluster's 1–16 node configuration doesn't initialize correctly. Fourth, deployment type confusion, choosing between hyperconverged deployments, multi-rack deployments (currently in preview), and disconnected operations without fully understanding the trade-offs leads to picking the wrong architecture for your use case, then fighting the platform the entire time.
What makes this especially frustrating is that Azure Local covers an unusually wide range of scale points. You might be running a single-machine edge deployment for a retail store's loss prevention AI, or you might be standing up a multi-rack cluster of hundreds of machines for a factory floor. The failure modes look different at each scale, but the underlying causes are often the same.
I know this is frustrating, especially when it blocks production workloads that have strict latency or sovereignty requirements. Let's work through it systematically. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before you go hunting through Event Viewer or opening a support ticket, run the built-in environment checker. This single step catches the majority of Azure Local deployment failures before they turn into multi-hour debugging sessions.
Open an elevated PowerShell session on your intended Azure Local host machine and run the Azure Local environment validation:
Install-Module -Name Az.StackHCI -Force -AllowClobber
Import-Module Az.StackHCI
Invoke-AzStackHciEnvironmentChecker
This command runs through your hardware, network, Active Directory, and Arc registration prerequisites in one shot. Read the output carefully, it color-codes failures (red), warnings (yellow), and passes (green). Fix every red item before you proceed. Don't skip yellows either; in my experience, yellow warnings in the environment checker turn into red deployment failures 60% of the time.
If the environment checker passes clean but you're still seeing deployment failures, your next move is to verify Arc connectivity. Azure Local's management experience, the Azure portal, Azure CLI, ARM templates, Azure Policy, Microsoft Defender for Cloud, Azure Monitor, all of it flows through Azure Arc. If Arc isn't healthy, nothing works right. Check your Arc agent status:
azcmagent check
azcmagent show
Look for Agent Status: Connected in the output. If you see Disconnected or any proxy-related errors, that's your culprit. Fix the Arc connection before attempting anything else with Azure Local.
For machines that are part of a hyperconverged Azure Local cluster deployment (the most common setup, supporting 1–16 machines), also confirm that cluster validation has passed in Windows Admin Center. Navigate to Tools > Cluster > Validate Cluster and run a full validation report. Any failures in the Storage, Network, or System Configuration categories need to be resolved before you deploy.
Invoke-AzStackHciEnvironmentChecker on every node in your planned cluster, not just the first one. I've had deployments fail at node 3 because the environment check was only run on node 1, and the network adapter naming convention differed between machines, a subtle mismatch that kills cluster networking during setup.
This sounds obvious, but it's the most commonly skipped step, and skipping it causes hours of wasted troubleshooting. Azure Local supports three distinct deployment architectures, and they are not interchangeable on arbitrary hardware.
Hyperconverged deployments are what most organizations use. They support clusters of 1–16 machines using hyperconverged storage, where compute and storage are combined on the same nodes. You can also attach an external SAN (currently in preview) if you need more storage capacity beyond what the hyperconverged pool provides. Standard clusters go up to 16 machines; rack-aware clusters cap at 8 machines. If you're planning a rack-aware cluster, your hardware must be pre-positioned across fault domains that the deployment wizard can discover.
Multi-rack deployments are currently in preview and are intended for very large workloads, we're talking hundreds of machines in a single instance. This isn't a DIY configuration. Multi-rack requires a prescriptive set of hardware that arrives in pre-integrated racks with built-in fault tolerance. Compute, storage, and networking all come as a package. If someone on your team is trying to build a multi-rack deployment with general-purpose servers purchased from a catalog, stop, that's not how this works.
Disconnected operations are for environments with the strictest sovereignty or security requirements, where no connection to an Azure cloud region is possible or permitted. This mode runs a local instance of the control plane with a subset of Azure capabilities. Defense, intelligence, and critical infrastructure customers are the primary audience here.
To confirm your hardware is supported, go to the Azure Local Catalog at aka.ms/AzureLocalCatalog and search for your OEM partner and hardware model. If your hardware isn't listed there, you will hit unexplained failures during deployment that no amount of troubleshooting will resolve, the platform simply isn't validated for unsupported configurations.
Once you've confirmed hardware support, verify the deployment type using the Azure portal: navigate to Azure Arc > Azure Local > Create and confirm the deployment type selector matches your actual infrastructure plan.
I cannot stress this enough: Azure Local deployments fail far more often because prerequisites were partially completed than because of bugs in the platform itself. Microsoft's deployment wizard is not forgiving about out-of-order or partially satisfied prerequisites.
For hyperconverged deployments, the prerequisites checklist includes several items that teams consistently miss. Work through these in order before you touch the deployment wizard:
First, confirm your Azure subscription has the right resource providers registered. Open Azure CLI and run:
az provider register --namespace Microsoft.AzureStackHCI
az provider register --namespace Microsoft.HybridCompute
az provider register --namespace Microsoft.HybridConnectivity
az provider show --namespace Microsoft.AzureStackHCI --query "registrationState"
Wait for the output to return "Registered", this can take up to 10 minutes. Attempting deployment with a resource provider in "Registering" state causes deployment jobs to silently hang.
Second, check your Active Directory configuration. Azure Local requires a dedicated Organizational Unit (OU) for the cluster objects. The deployment service account needs delegated permissions on that OU. If you're in a multi-domain forest, verify the OU is in the same domain as the machine accounts.
Third, validate your network configuration. All cluster nodes need static IP assignments, and the management network, storage network, and VM network need to be on separate VLANs. Missing VLAN segmentation is the number one network-related reason I've seen Azure Local hyperconverged cluster deployments fail validation.
Fourth, for multi-rack deployments (preview), review the prescriptive Bill of Materials (BOM) documentation for your specific rack SKU from your OEM partner. The pre-integrated rack hardware has specific firmware and driver requirements that must be met before Microsoft's deployment automation can proceed.
When prerequisites are complete, you should see a green status on all items in the Azure portal deployment wizard's Validation tab. Don't proceed until that's true.
Azure Arc is the backbone of everything in Azure Local. The platform uses Arc as its unified control plane for both connected and disconnected scenarios. When Arc registration breaks, it breaks the entire Azure Local management experience, including the portal, CLI, ARM template deployments, and services like Azure Policy and Microsoft Defender for Cloud.
If you're seeing errors like Resource provider not found, Arc agent not responding, or deployment jobs that start and then go silent, start here.
On each Azure Local node, open PowerShell as Administrator and check the Arc agent service:
Get-Service -Name himds
Get-Service -Name ExtensionService
Get-Service -Name GCArcService
All three services should be in Running status. If any are stopped or in an error state, restart them:
Restart-Service -Name himds -Force
Restart-Service -Name ExtensionService -Force
Restart-Service -Name GCArcService -Force
If the services won't start, check the Windows Event Log under Applications and Services Logs > Microsoft > AzureArcSetup > Admin for error detail.
Proxy environments are a frequent source of Arc connectivity failures in enterprise Azure Local deployments. If your organization routes outbound traffic through a proxy, configure the Arc agent to use it:
azcmagent config set proxy.url "http://your-proxy:port"
Then verify the Arc endpoint connectivity directly:
azcmagent check --location [your-azure-region]
The output will show which Arc endpoints are reachable and which are blocked. Work with your network team to whitelist the Azure Arc URLs in your firewall, Microsoft maintains an up-to-date list at aka.ms/AzureArcNetworkRequirements.
When Arc connectivity is restored and healthy, the Azure portal's Azure Local resource should show Status: Online and the management plane operations, including VM deployments and monitoring via Azure Monitor metrics, will start working again.
Storage configuration errors are the most technically involved class of Azure Local problems, and they show up most often in hyperconverged deployments where the same physical machines provide both compute and storage for the cluster. When Storage Spaces Direct (S2D) doesn't initialize correctly, VM workloads can't deploy, and existing VMs may go into a degraded state.
Start by checking the health of the storage pool on the cluster. Run this on any cluster node with appropriate permissions:
Get-StoragePool | Select-Object FriendlyName, OperationalStatus, HealthStatus
Get-VirtualDisk | Select-Object FriendlyName, OperationalStatus, HealthStatus, ResiliencySettingName
You want to see OperationalStatus: OK and HealthStatus: Healthy for both. If you see Degraded, Unhealthy, or Unknown, you have a storage problem that needs immediate attention before deploying additional workloads.
A common cause is a mismatched drive configuration across nodes. Azure Local's hyperconverged storage requires that all nodes in the cluster have the same drive types and a consistent layout. One node with an extra NVMe drive or a different SATA drive count compared to its peers will cause S2D initialization to fail or degrade.
To check physical disk health:
Get-PhysicalDisk | Select-Object DeviceId, FriendlyName, MediaType, OperationalStatus, HealthStatus, Size
If you recently added an external SAN (the preview feature for expanded storage capacity in hyperconverged deployments), verify the SAN LUNs are presented consistently to all cluster nodes and that the multipath I/O (MPIO) configuration is correct. Asymmetric LUN presentation, where some nodes see the SAN LUN and others don't, is a hard-to-diagnose failure mode that causes cluster resource failover problems.
Once storage is healthy, log collection for any remaining issues should be your next step. Use the built-in log collection command and send the results to your analysis pipeline:
Send-DiagnosticData -ToSMBShare \\your-share\logs -Credential (Get-Credential)
Once the Azure Local infrastructure layer is healthy, getting your actual VM workloads running correctly is the final piece. Azure Local supports deploying VMs on both hyperconverged and multi-rack (preview) deployments, and the deployment path differs slightly between them.
For hyperconverged deployments, navigate in the Azure portal to Azure Arc > Azure Local > [Your Cluster] > Virtual Machines > Create. You'll need a gallery image or a custom image stored in an Azure Local image repository. If the VM creation wizard fails at the image selection step, it's almost always because the image hasn't been downloaded and stored in the cluster's local image repository yet, not a permissions problem.
To add a marketplace image to your local repository via Azure CLI:
az stack-hci-vm image create \
--resource-group [your-rg] \
--custom-location [your-custom-location-id] \
--name [image-name] \
--os-type Windows \
--image-path "[marketplace-image-urn]"
For multi-rack deployments (preview), VM placement is governed by the rack-aware architecture. You can use rack-aware clusters to explicitly place VMs across fault domains, navigate to Azure Arc > Azure Local > [Your Cluster] > Rack Placement to configure VM affinity rules. If VMs are deploying to a single rack instead of distributing across racks, check that the rack topology was correctly configured during initial cluster setup.
After VM deployment, validate the workload is running correctly by checking its status in the portal: Azure Arc > Azure Local > [Cluster] > Virtual Machines > [VM Name] > Overview. The status should show Running with a green indicator. If it shows Starting and stays there for more than 5 minutes, check Event Viewer on the hosting node under Microsoft-Windows-Hyper-V-Worker > Admin for guest initialization errors.
You should also onboard Azure Monitor for your Azure Local cluster at this point. Monitoring data for hyperconverged deployments flows through Azure Monitor metrics. Configure this in the portal under Azure Arc > Azure Local > [Cluster] > Monitoring > Enable Insights, doing this early gives you baseline metrics before any workloads are under real load.
Advanced Troubleshooting
If the steps above haven't resolved your Azure Local issues, you're dealing with something deeper. Here's how I approach the harder cases that show up in enterprise and multi-site Azure Local deployments.
Collecting and Analyzing Logs
Microsoft's official log collection tooling for Azure Local is the right starting point for any escalation. For hyperconverged deployments, use:
Get-AzStackHCILogs -OutputPath C:\Logs\AzureLocal
For disconnected operations deployments, the log collection path is different because there's no outbound cloud channel. Use the local log collection method and transfer the output manually:
Get-AzStackHCILogs -OutputPath C:\Logs\AzureLocal -IncludeArcAgentLogs
The logs you care about most are in the AzureStackHCI, HybridCompute, and AzureArcSetup providers. In Event Viewer, look at Applications and Services Logs > Microsoft > AzureStackHCI for deployment and lifecycle events.
Disconnected Operations, Control Plane Failures
Disconnected operations for Azure Local runs a local instance of the control plane, which means that when something goes wrong, there's no Azure cloud region to fall back on. The local control plane health check is your diagnostic starting point:
Get-AzStackHCILocalControlPlane -Verbose
If the local control plane reports an unhealthy state, check whether the Key Vault instance used for local identity (the preview feature for local identity with Key Vault) is accessible and its certificates haven't expired. Certificate expiry is the most common cause of silent control plane failures in disconnected operations environments, especially if the deployment was done months ago and the initial certificate rotation wasn't scheduled.
Network Security Groups and SDN
If you've enabled Software Defined Networking (SDN) on your Azure Local cluster, Network Security Groups (NSGs) are likely in play. Misconfigured NSGs are invisible from inside the VM but will silently drop traffic in ways that look like VM or application failures. Check NSG rules on hyperconverged deployments via the Azure portal at Azure Arc > Azure Local > [Cluster] > SDN > Network Security Groups, or via PowerShell:
Get-NetworkControllerAccessControlList -ConnectionUri [NC REST URI]
On multi-rack deployments (preview), NSG management follows a slightly different path through the multi-rack networking layer, use the Azure portal's multi-rack NSG management interface rather than direct Network Controller API calls.
Pricing and Subscription Issues
Azure Local is priced per physical core on your on-premises machines, plus consumption-based charges for any additional Azure services you attach. If you're seeing unexpected QuotaExceeded errors or deployment failures that reference billing, check your Azure subscription's core quota for the Azure Stack HCI resource provider in your target region. Request a quota increase through Azure portal > Subscriptions > [Your Sub] > Usage + Quotas if you're hitting limits.
Escalate to Microsoft Support when: (1) the environment checker passes clean but deployment still fails with no actionable error; (2) Arc registration succeeds but the cluster doesn't appear in the Azure portal after 30+ minutes; (3) you're on a multi-rack (preview) deployment and hitting hardware-level failures the OEM can't resolve; (4) your disconnected operations local control plane is in an unrecoverable unhealthy state. For all escalations, have your Get-AzStackHCILogs output ready, it cuts resolution time dramatically.
Prevention & Best Practices
Most Azure Local problems I've debugged were preventable. Not with hindsight, preventable with steps that should have been in the deployment plan from day one. Here's what I'd build into every Azure Local deployment from the start.
Run environment validation on a regular schedule, not just at deployment time. The Azure Local environment checker isn't just a one-time pre-deployment tool. Hardware drifts, firmware gets updated, network configurations change. Schedule a monthly run of Invoke-AzStackHciEnvironmentChecker on all nodes and pipe the output to a log file you actually review. Catching drift before it causes a failure is orders of magnitude cheaper than troubleshooting a production incident.
Design your deployment type for your actual workload, not your aspirational workload. The hyperconverged, multi-rack, and disconnected paths each have real architectural constraints. A single-machine edge deployment for a manufacturing execution system with extreme latency requirements is a completely different design than a multi-rack cluster for AI inferencing in a data center. Azure Local's scalability range, from one machine to hundreds, means there's a right answer for almost every scenario, but only if you pick the right deployment type upfront. Re-architecting mid-deployment is painful and sometimes impossible without starting over.
Monitor from day one with Azure Monitor. Azure Local integrates with Azure Monitor metrics for both hyperconverged and multi-rack deployments. Set up monitoring alerts before you put production workloads on the cluster, not after. Mission-critical workloads, production lines, financial infrastructure, transit systems, should have health alerts configured for CPU, memory, storage latency, and network throughput from the moment they go live.
Plan your Arc connectivity architecture before hardware arrives. Arc is the control plane for Azure Local. If your site has strict firewall policies or routes through a proxy, the Arc connectivity requirements need to be incorporated into your network design before you rack the first server. Retrofitting Arc network access into an already-locked-down network is a multi-week process in regulated environments.
- Register all required Azure resource providers (
Microsoft.AzureStackHCI,Microsoft.HybridCompute) in your subscription at least 24 hours before deployment begins, registration propagation takes time. - Create a dedicated Active Directory OU for Azure Local cluster objects and pre-stage the deployment service account permissions before the deployment wizard ever runs.
- For hyperconverged clusters, ensure all nodes have identical drive configurations (same count, same types, same capacity) before starting, S2D will reject mismatched configurations and the error messages are not always clear about why.
- For disconnected operations environments, document and schedule certificate rotation for the local Key Vault instance at deployment time, don't rely on remembering to do it later.
Frequently Asked Questions
What exactly is Azure Local and how is it different from regular Azure?
Azure Local is Microsoft's distributed infrastructure solution that lets you run Azure workloads, virtual machines, containers, and select Azure services, on hardware you own and control, in your own facility. Regular Azure runs in Microsoft's data centers; Azure Local runs in yours. The key is that Azure Arc acts as the unified control plane for both, so you manage Azure Local through the same Azure portal, CLI, and ARM templates you already use for cloud resources. The pricing model reflects this: you pay per physical core on your on-premises machines, plus consumption charges for any Azure services you attach to the deployment.
What is a hyperconverged deployment of Azure Local, and is it right for my organization?
Hyperconverged deployments are the most common type of Azure Local deployment, and they're likely what you want unless you have a very specific use case. In a hyperconverged setup, compute and storage are combined on the same physical machines, you don't need separate storage arrays. Cluster sizes range from a single machine (great for edge deployments) up to 16 machines for standard clusters, or 8 machines for rack-aware clusters. You can also attach an external SAN in preview if you need more storage than the hyperconverged pool can provide. For most enterprise and edge use cases, factories, retail, healthcare, financial infrastructure, hyperconverged deployments hit the right balance of flexibility and cost.
What are multi-rack deployments of Azure Local and who should use them?
Multi-rack deployments are Azure Local's answer to very large-scale workloads, we're talking hundreds of machines in a single instance. This is a preview feature that uses prescriptive, pre-integrated hardware that arrives as complete racks from your OEM partner, with compute, storage, and networking built in and fault tolerance pre-configured. You can't DIY a multi-rack deployment with arbitrary hardware; it requires the specific Bill of Materials that Microsoft and its hardware partners have validated. This option is typically for organizations running massive AI inferencing workloads, large-scale private cloud infrastructure, or enterprise data center consolidation scenarios where hyperconverged's 16-machine ceiling isn't enough.
Can I run Azure Local without an internet connection to Azure?
Yes, Microsoft calls this disconnected operations, and it's specifically designed for environments where connecting to an Azure cloud region isn't possible or permitted. Disconnected operations run a local instance of the Azure Arc control plane on-premises, giving you a subset of Azure's management capabilities without requiring cloud connectivity. This mode is designed for defense, intelligence, energy infrastructure, and other organizations with the strictest sovereignty or security requirements. It's worth knowing that disconnected operations provides a subset of the full Azure Local feature set, some cloud-dependent services aren't available, so verify your required workloads are supported in disconnected mode before committing to this deployment path.
My Azure Local deployment job just hangs and never completes, what should I check first?
A deployment job that silently hangs is almost always one of three things: an Azure resource provider that's still in "Registering" state (run az provider show --namespace Microsoft.AzureStackHCI --query "registrationState" to check), an Arc agent connectivity issue that's blocking the deployment service from reporting status back to the portal, or a prerequisite that passed the wizard's validation but failed at actual execution time. Start by checking the AzureStackHCI event log on the node running the deployment: Event Viewer > Applications and Services Logs > Microsoft > AzureStackHCI > Admin. Look for error events in the 10 minutes before the job went silent, that window almost always contains the real cause.
How does Azure Local handle monitoring and do I need to set up Azure Monitor separately?
Azure Local integrates with Azure Monitor out of the box, but you do need to enable it, it's not on by default. For hyperconverged deployments, Azure Monitor metrics are available through the Azure portal once you enable Insights on the cluster: go to Azure Arc > Azure Local > [Your Cluster] > Monitoring > Enable Insights. Multi-rack deployments (preview) have their own Azure Monitor metrics integration path. Beyond basic metrics, you can also onboard Microsoft Defender for Cloud for security posture management and Azure Policy for governance, all through the same Azure Arc management plane. The beauty of the Arc-based approach is that you get the same single-pane-of-glass monitoring you'd use for cloud resources, applied to your on-premises Azure Local infrastructure.