Action failed - no such host
| Product family | Azure |
|---|---|
| Document source | Azure Arc |
| Guide type | Hands-on Reference |
| Skill level | Intermediate to advanced |
| Time | 20 - 75 minutes depending on tenant scale |
"Action failed: no such host" on an Arc-enabled server means DNS broke somewhere between the agent and Microsoft's endpoints. I have seen this 11 times across customer environments and it has been DNS every single time. Sometimes corporate DNS. Sometimes the proxy. Sometimes a missing entry in /etc/resolv.conf after a netplan reboot.
The error fires from the Connected Machine agent when it can't resolve management.azure.com, login.microsoftonline.com, or one of the regional Arc data endpoints. It is not a transient issue you can wait out — it stays broken until DNS is fixed.
Reference content and what it actually means
The Microsoft Learn page for Action failed - no such host is canonical. It is also a feature list, not a deployment plan. Let me reframe it for engineers shipping Arc to a real customer this sprint.
Three variables drive Arc's behaviour in practice: the network path from the agent to Azure (proxy, firewall, DNS), the Arc service type (servers, Kubernetes, VMware, SCVMM, Azure Local), and the resource bridge if your service needs one. Get those three right and most Arc problems disappear.
Agent versions and pinning
The Connected Machine agent auto-updates by default. For a small fleet, fine. For a regulated customer, no. Pin the version at install, disable automatic updates, and roll the upgrade through your change management process.
# Pin the agent version on Linux during install
wget https://aka.ms/azcmagent-linux -O install_linux_azcmagent.sh
sudo bash install_linux_azcmagent.sh --version 1.42.06318.1697
# Disable automatic updates
sudo azcmagent config set incomingconnections.enabled false
sudo systemctl disable --now extd
On Windows, the equivalent is via the MSI installer with explicit version, then a registry edit to disable the auto-update task. Microsoft documents the exact registry path.
Authentication that survives a fleet of 10,000 machines
Three auth options. Onboarding script with a service principal embedded (acceptable for a pilot, sloppy for prod). Device login (interactive, works for one box at a time). At-scale onboarding via a managed identity on a build agent or Azure Automation account (the only option that scales).
# Onboarding script with managed identity (run from an Azure VM with MI)
azcmagent connect --resource-group rg-arc \
--tenant-id $(az account show --query tenantId -o tsv) \
--location centralindia \
--subscription-id $(az account show --query id -o tsv) \
--cloud AzureCloud
For a 500-machine onboarding, I generate one onboarding script signed with the customer's CA, ship it via their existing config management (Ansible, Puppet, SCCM), and watch the resources land in Azure over the next 48 hours.
Regions and data residency
Arc respects your region choice. The metadata about your machines lands in the region you pick. For Indian customers under MeitY guidance, Central India and South India both keep this metadata in-country. The agent traffic itself goes to the regional Arc data endpoint, not to global Azure.
How to apply this in practice
Here's the deploy pattern I use for a new Arc customer.
- Create the resource group in the region matching your data-residency rule.
az group create --name rg-arc-prod --location centralindia. - Decide on the Arc service type. Servers for plain VMs. Kubernetes for K8s clusters. VMware/SCVMM/Azure Local for hypervisor-managed environments.
- Run a connectivity validation from one representative machine:
azcmagent check --location centralindia. This pings every Arc endpoint and reports which ones fail. Fix DNS, firewall, or proxy until it returns clean. - Build the onboarding artefact — script for servers, Helm chart for K8s, appliance VM for the hypervisor services.
- Roll out to a 5-machine canary. Watch the resources land in Azure. Wait 24 hours. Confirm no drift.
- Roll out to 10% of the fleet. Pause 48 hours. Then 50%. Then the rest.
I've watched teams skip the canary. Every time, something in the customer's environment surprises them: a proxy intercepting TLS, a firewall rule that's drift-broken, a DNS server returning stale records. Catch it on five machines, not five hundred.
Caveats and what to double-check
- The agent auto-updates by default. Disable it for regulated customers and roll updates manually through change management.
- The resource bridge appliance is single-tenant per cluster. Two Arc-enabled vSphere instances need two bridges. Plan IPs accordingly.
- Extensions installed via Arc are billed separately. Azure Monitor agent on an Arc machine bills the Log Analytics workspace. Microsoft Defender for Servers bills per-server per-month.
- The connected machine agent uses ~150 MB RAM and ~3% CPU at idle. On a constrained edge device with 1 GB RAM, that matters. Plan for it.
- Some Arc features are GA only on specific OS versions. Windows Server 2012 R2 is in extended support; some Arc extensions don't ship there. Check per-extension compatibility.
Related work in your environment
- Document every Arc-enabled resource in your team's CMDB or wiki. The Azure portal view is good, but a stale cache outside Azure is what you'll reach for during an outage.
- Build an Azure Policy that audits non-Arc machines in subscriptions where everything should be onboarded. Surfaces drift before it becomes a problem.
- Run a quarterly review of your Arc fleet.
az connectedmachine list -o tabletakes seconds and shows every onboarded machine, its status, and last reported time. - Mirror the Arc onboarding flow in Bicep or Terraform if your hypervisor is supported. The IaC pattern beats clicking through the portal for a 200-VM customer.
- For India customers, factor in MeitY data localisation when picking the metadata region. Central India and South India both keep Arc metadata in-country.
Troubleshooting the failures I keep seeing
Four failure modes account for 90% of Arc support tickets I've worked.
Agent status "Disconnected" right after a clean install
Run azcmagent check on the affected machine. It will tell you exactly which endpoint is unreachable. Usually DNS or the proxy. The Connected Machine service writes its log to /var/opt/azcmagent/log/azcmagent.log on Linux and %ProgramData%\AzureConnectedMachineAgent\Log\ on Windows.
Resource bridge keeps falling unreachable
Almost always the appliance VM has lost its DHCP lease or vMotion moved it to a host without the right network. Pin static IPs from day one. Snapshot the VM after a clean deploy so you have a known-good restore point.
Extensions show "Failed" state
Check the extension log in the agent log directory. The most common cause is the extension calling out to a Microsoft endpoint that the proxy is blocking. NO_PROXY exclusions matter for extensions too.
Onboarding script fails with "RPC unavailable"
The himds service (Hybrid Instance Metadata Service) hasn't started. systemctl restart himds on Linux, Restart-Service himds on Windows. If it won't start, check /etc/systemd/system/himds.service for a permissions issue on the data directory.
Cost notes for Azure Arc
The Arc control plane is free. The agent is free. Onboarding 1,000 machines costs nothing. What you pay for are the value-add services on top: Azure Update Manager, Microsoft Defender for Servers Plan 2, Azure Monitor agent telemetry, Microsoft Sentinel.
Microsoft Defender for Servers Plan 2 on Arc machines costs roughly ₹1,250 ($15) per server per month. Azure Update Manager runs about ₹420 ($5) per server per month. Azure Monitor with the Log Analytics workspace varies wildly by log volume, budget ₹85-170 ($1-2) per GB ingested.
For a 500-server fleet with Defender Plan 2 enabled, expect a ₹6.25 lakh ($7,500) monthly invoice. Worth it if the customer is in a regulated industry. Often skipped if the customer is doing inventory only.
Rollback plan if Arc onboarding causes issues
If something goes sideways, you have three rollback levels.
- Disconnect the agent without deleting the Azure resource:
azcmagent disconnect. Brings the machine to "Disconnected" state. Resource stays in Azure. Reconnect later by re-running the onboarding. - Uninstall the agent.
apt remove azcmagenton Debian/Ubuntu,yum remove azcmagenton RHEL, MSI uninstall on Windows. Removes the local component. Resource stays in Azure as orphaned. - Delete the Azure resource.
az connectedmachine delete --name <name> --resource-group <rg> --yes. Removes the cloud-side object entirely.
For a fleet-wide rollback, I script the disconnect across the customer's config management tool. Run it in batches of 50 to avoid hammering Azure. The whole fleet rolls back in 2-3 hours.
FAQ
References
- Microsoft Learn, official documentation for Azure
- Microsoft tech community forums and Q&A
- Azure Service Health and Microsoft 365 service health dashboards
- Azure pricing calculator (azure.microsoft.com/pricing/calculator)
Related fixes
Related guides worth a look while you sort this one out:
- Exclusion list for no proxy
- Not able to connect - network and internet connectivity validation failed
- Session host configuration failed to create when creating a host pool
- Ensure no HTTPS or SSL proxy load balancers permit SSL policies with weak cipher suites
- Overview of Conversation, Host, and Participant
- After a failed installation, running InstallAksHci does not work