How to Troubleshoot Azure DNS: Fix Resolution Failures
Why Azure DNS Troubleshooting Is So Hard to Get Right
I've seen this exact situation on dozens of enterprise Azure deployments: a team spins up a new VM or App Service, connects it to a virtual network, and then… nothing resolves. The app can't reach its backend database by hostname. A microservice can't find the private endpoint it was configured to talk to. Or worse , everything worked fine in staging and broke silently in production. Azure DNS resolution failure doesn't always throw a screaming error. Sometimes it just silently returns NXDOMAIN and your logs fill up with timeouts that look like a network issue, not a DNS issue at all.
The root of the confusion is that Azure DNS is actually two very different systems sharing one brand name. There's Azure Public DNS , a globally distributed, authoritative DNS hosting service where you delegate your public domain zones. And there's Azure Private DNS, a name resolution service tied to virtual networks for internal resource discovery. Mix up which one applies to your scenario and you'll be chasing ghosts for hours.
Here's why Microsoft's own error messages don't help: Azure DNS doesn't surface failures at the portal level. You won't get a big red banner that says "DNS resolution is broken." Instead, you'll see connection timeouts in your application, failed health probes in your load balancer, or cryptic "Name or service not known" messages deep inside container logs. The DNS failure is invisible until you know exactly where to look.
Who sees Azure DNS problems most often? In my experience, it's three groups. First: teams that just configured a private DNS zone virtual network link but forgot to enable auto-registration or linked the wrong VNet. Second: organizations migrating on-premises workloads to Azure and setting a custom DNS server in their VNet configuration, which then breaks Azure-native resolution for services like Azure Storage, Azure SQL, and Key Vault private endpoints. Third: developers hitting Azure DNS propagation delay after updating an A record or CNAME and wondering why their changes aren't visible yet.
The good news is that every one of these scenarios has a clear diagnostic path. Let's walk through it systematically. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before diving into anything fancy, run a basic DNS resolution test from inside the affected VM or container. This takes 60 seconds and tells you immediately whether the problem is DNS configuration or something else entirely, like a firewall rule or a missing route.
Open a terminal on the affected Azure VM (or use the Azure portal's **Run Command** feature under the VM blade) and run:
nslookup <hostname-you-expect-to-resolve> 168.63.129.16
That IP, 168.63.129.16, is Azure's recursive resolver. It's a virtual IP present in every Azure virtual network and it's the default DNS server your VMs use unless you've overridden it. Querying it directly bypasses any custom DNS forwarder you may have configured, so it isolates whether Azure's own resolver can see the record.
If nslookup against 168.63.129.16 succeeds but your app still can't resolve the name, your custom DNS server Azure configuration is the culprit, your VNet is pointed at a custom DNS server that isn't correctly forwarding queries back to 168.63.129.16 for Azure-internal names.
If nslookup against 168.63.129.16 fails with a Server Failed or NXDOMAIN response, the problem is in your DNS zone configuration itself, the record is missing, the private zone isn't linked to this VNet, or the zone name doesn't match what your app is querying.
For public zones, also run this from your local machine to confirm global propagation:
nslookup -type=SOA yourdomain.com 8.8.8.8
If the SOA record shows Azure's nameservers (ns1-xx.azure-dns.com, etc.) and the TTL is sane, your zone delegation is intact. If you see your registrar's default nameservers instead, your NS records haven't been updated at the registrar, that's a five-minute fix in your domain control panel.
This is the single most common cause of Azure DNS not resolving for private endpoints and internal resources. A private DNS zone is useless if it isn't linked to the virtual network where your resources live.
In the Azure portal, navigate to Private DNS zones and select your zone (e.g., privatelink.blob.core.windows.net or your custom zone like internal.mycompany.com). In the left menu, click Virtual network links.
You'll see a list of VNets linked to this zone. Check two things for each link:
- The Link Status column should read Completed, if it says "Provisioning" or "Failed," the link itself is broken.
- The Auto-registration toggle, if you want VMs in this VNet to automatically register their private IPs as A records in this zone, this must be Enabled. If it's disabled, you need to manually create A records for each VM.
If your VNet isn't in the list at all, click + Add and link it. You'll need the subscription and VNet name. Give the link a descriptive name, something like link-prod-eastus-vnet01, so you can identify it later.
Using the CLI instead? Run:
az network private-dns link vnet list \
--resource-group <rg-name> \
--zone-name <zone-name> \
--output table
When it's working correctly, querying the zone from a VM in the linked VNet should return a valid A record. Run nslookup <record.yourzone.com> 168.63.129.16 again, you should now see a non-empty answer section with the IP address you expect.
This one trips up enterprises migrating from on-premises AD environments. When you set a custom DNS server on your Azure virtual network, typically your domain controller's IP, that custom server becomes the first-hop resolver for every VM in that VNet. If that server doesn't know how to forward queries for Azure-internal names back to 168.63.129.16, your private DNS zones stop working entirely.
Check your VNet DNS settings: go to Virtual networks, select your VNet, then click DNS servers in the left menu. If it says Custom and lists one or more IP addresses, those servers must be configured to conditionally forward all *.privatelink.* queries and any of your Azure private zone names to 168.63.129.16.
On a Windows DNS Server (typically your domain controller), add a conditional forwarder:
# Run on the DNS server itself (PowerShell, elevated)
Add-DnsServerConditionalForwarderZone `
-Name "privatelink.blob.core.windows.net" `
-MasterServers 168.63.129.16 `
-PassThru
# Repeat for each private DNS zone you use:
Add-DnsServerConditionalForwarderZone `
-Name "privatelink.database.windows.net" `
-MasterServers 168.63.129.16 `
-PassThru
If you're running BIND on a Linux DNS forwarder, add the equivalent zone blocks with forwarders { 168.63.129.16; };. After updating, restart the DNS service and test again from an affected VM. The resolution should succeed within seconds, there's no propagation delay for this type of change since you're modifying the forwarder, not a public zone.
One thing I want to call out explicitly: 168.63.129.16 is only reachable from within the Azure virtual network fabric. Your on-premises DNS servers cannot forward to it across ExpressRoute or VPN, that's a hard platform limitation. For hybrid scenarios you need an Azure DNS Private Resolver or a DNS forwarder VM sitting inside the Azure VNet.
Whether you're working with a public zone or a private zone, bad record data causes hard-to-diagnose Azure DNS resolution failures. The most common mistakes I see: CNAME records pointing to a hostname that no longer exists, A records still holding the old IP after a resource was redeployed, and TTLs set so high that stale data lingers for hours after a legitimate update.
For a public zone, open DNS zones in the portal, select your zone, and review the record sets. Every A record should have an IP you recognize. Every CNAME target should resolve to something real. Check the TTL column, a TTL of 3600 (one hour) is fine for stable records, but if you're in the middle of a migration, temporarily drop critical records to 300 seconds so changes propagate in five minutes instead of an hour.
For a private zone, navigate to Private DNS zones and open the zone. Look at the record sets. If auto-registration is on, your VMs should appear automatically. If you're managing records manually, confirm the IP in each A record matches the current private IP of the resource, which can change if a VM is stopped and restarted and you didn't assign a static private IP.
Fix a bad A record via CLI:
# For a public zone
az network dns record-set a update \
--resource-group <rg> \
--zone-name <zone> \
--name <record-name> \
--set aRecords[0].ipv4Address=<correct-ip>
# For a private zone
az network private-dns record-set a update \
--resource-group <rg> \
--zone-name <zone> \
--name <record-name> \
--set aRecords[0].ipv4Address=<correct-ip>
After updating, flush DNS cache on the affected machine (ipconfig /flushdns on Windows, sudo systemd-resolve --flush-caches on Ubuntu) and test again. If the correct record appears in nslookup but your app still fails, the app itself may be caching DNS responses at the application layer, a separate problem entirely.
Private endpoints are where Azure DNS troubleshooting gets genuinely complicated. When you create a private endpoint for an Azure service, say, Azure Storage or Azure SQL, Azure automatically creates a DNS record that points the service's public FQDN to the private endpoint's private IP. But only if everything is wired up correctly. I've seen this break in three distinct ways.
Problem 1: The private DNS zone was not created or not linked. When you create a private endpoint in the portal, there's a "DNS" tab that asks whether you want to integrate with a private DNS zone. If someone clicked through that without enabling it, no zone was created and your service FQDN still resolves to its public IP, meaning traffic bypasses the private endpoint entirely, which also breaks things if you've enabled network access restrictions.
Check whether the private endpoint has a DNS configuration associated with it:
az network private-endpoint show \
--name <endpoint-name> \
--resource-group <rg> \
--query "customDnsConfigs" \
--output table
Problem 2: Multiple private endpoints for the same service type, linked to different zones. This happens in large organizations where different teams created private endpoints independently. You might have two zones named privatelink.blob.core.windows.net in different resource groups, each linked to different VNets, and the wrong one wins for a given query.
Problem 3: The FQDN mismatch. Your app might be querying mystorageaccount.blob.core.windows.net but the private DNS record is registered under the zone privatelink.blob.core.windows.net as mystorageaccount. The resolution chain has to correctly alias the public name through the CNAME to the private zone, if anything in that chain is broken, you get NXDOMAIN. Verify the CNAME chain with:
nslookup mystorageaccount.blob.core.windows.net 168.63.129.16
A healthy response shows a CNAME pointing to mystorageaccount.privatelink.blob.core.windows.net, which then resolves to the private IP (typically in the 10.x.x.x range). If you see the public IP instead, the private DNS integration isn't working.
Public Azure DNS zones are authoritative, Azure runs the nameservers, and when you make a change in the portal or via API, that change propagates to Azure's global DNS infrastructure almost immediately (typically within seconds to a few minutes). But "almost immediately" isn't "instantly," and TTLs on existing cached responses can make it feel like propagation is broken when it isn't.
Here's the reality of Azure DNS propagation delay: if your old A record had a TTL of 3600 and you updated the IP, any resolver that cached the old record won't check again for up to an hour. Azure's nameservers will return the new IP immediately, but resolvers that have the old data cached will keep using it until the TTL expires. This is expected DNS behavior, not a bug.
What you actually need to check is whether the NS delegation at your registrar is correct. Many people forget this step after creating an Azure public DNS zone. Go to DNS zones in the portal, open your zone, and look at the NS record set at the apex. You'll see four nameservers like:
ns1-04.azure-dns.com.
ns2-04.azure-dns.net.
ns3-04.azure-dns.org.
ns4-04.azure-dns.info.
Log into your domain registrar (GoDaddy, Namecheap, Google Domains, etc.) and verify your domain's NS records match exactly these four values. If they don't, changes you make in Azure's DNS zone will never be visible to the public internet, your domain is still delegating to your registrar's default nameservers. Update the NS records at the registrar and allow up to 48 hours for the registrar-level delegation change to propagate (this is the one part of DNS you genuinely can't speed up).
To verify from the command line:
# Check which nameservers the world sees for your domain
nslookup -type=NS yourdomain.com 8.8.8.8
# Confirm a specific record on Azure's nameservers directly
nslookup yourrecord.yourdomain.com ns1-04.azure-dns.com
If the second command returns the correct IP but the first doesn't, you're in the TTL waiting game. If the second command also fails, the record doesn't exist in the zone, go back and create it.
Advanced Azure DNS Troubleshooting
Using Azure DNS Private Resolver for Hybrid Scenarios
If your organization runs workloads both on-premises and in Azure, connected via ExpressRoute or site-to-site VPN, you need a proper hybrid DNS architecture. The old approach of running a DNS forwarder VM in Azure works but requires you to manage that VM's availability, patching, and scaling. Azure DNS Private Resolver is the platform-native alternative and the one I recommend for new deployments.
The resolver sits inside your VNet and provides two endpoints: an inbound endpoint (lets on-premises DNS forward queries into Azure and resolve private DNS zones) and an outbound endpoint (lets Azure VMs forward queries for on-premises domains back to your on-prem DNS servers via forwarding rulesets). If your hybrid DNS is broken, check whether both endpoints are provisioned and in a Succeeded state:
az dns-resolver show \
--name <resolver-name> \
--resource-group <rg> \
--query "provisioningState"
Event Viewer and Network Watcher for DNS Failures
On Windows VMs, DNS client events land in the Event Viewer under Applications and Services Logs → Microsoft → Windows → DNS Client Events → Operational. Event ID 1014 means a name resolution timed out. Event ID 1016 means the resolution failed entirely. These events include the queried name and the DNS server that was asked, extremely useful for confirming which DNS server your VM is actually using versus what you think it's using.
Azure Network Watcher's IP flow verify tool can confirm whether UDP port 53 traffic is allowed between your VM and its configured DNS server. Navigate to Network Watcher → IP flow verify, select the VM, specify port 53 and the DNS server IP as the destination. If it returns "Access denied," a Network Security Group rule is blocking DNS traffic, a surprisingly common misconfiguration when NSGs are applied at the subnet level.
DNS Resolution in AKS and Container Environments
Azure Kubernetes Service uses CoreDNS as its internal cluster DNS. When pods can't resolve Azure private DNS zone records, the fix is almost always a CoreDNS ConfigMap update to forward Azure-internal queries correctly. Check the current CoreDNS config:
kubectl get configmap coredns -n kube-system -o yaml
Look for a custom forward block. If you're using private DNS zones, you typically need to forward those specific zone names to 168.63.129.16 via a custom ConfigMap using the coredns-custom ConfigMap (in AKS, the custom one overrides the default without replacing it).
Group Policy and DNS Suffix Search Order
In domain-joined Azure VMs managed through Active Directory, Group Policy can override the DNS suffix search list set by DHCP. If a policy pushes a suffix search order that doesn't include your private zone's domain name, short-name resolution breaks. Check this via: ipconfig /all and look at the "DNS Suffix Search List" output. If it's missing expected suffixes, trace the GPO responsible using gpresult /h gpresult.html and review the "Network/DNS Client" section.
If you've verified VNet links, checked record data, confirmed NS delegation, audited your custom DNS forwarding configuration, and DNS resolution is still failing, especially if it's intermittent or only affects certain record types, it's time to escalate. Intermittent DNS failures inside Azure often indicate a platform-level issue with Azure's recursive resolver infrastructure, which you genuinely cannot fix yourself. Open a support ticket with severity A or B (depending on production impact) and include the output of nslookup <name> 168.63.129.16, timestamps of failures, the affected VM's resource ID, and the VNet and private DNS zone names. Visit Microsoft Support to open a case directly.
Prevention & Best Practices
Most Azure DNS problems I've seen in production were entirely avoidable. The teams that get DNS right don't do anything heroic, they just make a few architectural decisions upfront that prevent entire categories of problems from ever appearing.
Deploy Azure DNS Private Resolver from day one in hybrid environments. If you know you're going to have on-premises-to-Azure connectivity, set up the resolver before you start deploying workloads. Retrofitting DNS architecture after 50 services are already running is painful. The resolver provides a clean, managed, highly available path for hybrid name resolution without any VMs to babysit.
Assign static private IPs to VMs that need stable DNS records. Auto-registration in private DNS zones records the private IP at the time of registration, but if you stop and deallocate a VM, it gets a new dynamic private IP when it restarts, and the DNS record goes stale. Either assign a static private IP in the VM's NIC settings or manage DNS records explicitly outside of auto-registration for anything production-critical.
Keep TTLs short during any migration or redeployment period. Drop your public zone TTLs to 300 seconds a few days before a planned change. After the change is stable and you're confident you won't be rolling back, raise them back to 3600. This discipline alone prevents hours of troubleshooting during cutover windows.
Use Azure Policy to enforce private DNS zone integration for private endpoints. There are built-in Azure Policy definitions that audit or deny the creation of private endpoints without corresponding private DNS zone integration. Enabling these in "Audit" mode first shows you existing gaps; switching to "Deny" prevents new problems from being introduced by other teams.
Document your DNS architecture explicitly. I've walked into so many environments where nobody knows why the custom DNS server is configured, what zones it's supposed to forward, or what happens if that VM goes down. A one-page diagram showing VNets, DNS servers, forwarding rules, and private zone links saves hours of forensic investigation when something breaks at 2 AM.
- Set Azure Monitor alerts on DNS-related NSG flow denials, catches port 53 blocks before they cause application incidents
- Use
Resolve-DnsName(PowerShell) instead ofnslookupfor richer output including TTL, record type, and server used - Tag all private DNS zones with the VNets they serve, makes cross-team auditing much faster
- Run a quarterly DNS audit script that lists all private zones, their VNet links, and record counts, catch orphaned zones before they cause confusion
Frequently Asked Questions
Why does nslookup work but my application still can't connect by hostname?
This almost always means your application is caching DNS results at the application layer, independent of the OS resolver. Java applications are notorious for this, the JVM caches DNS results indefinitely by default unless you set networkaddress.cache.ttl=60 in the JVM security properties. Node.js, .NET, and Python have similar behaviors depending on the HTTP client library being used. Try restarting the application process after confirming DNS works at the OS level. If a restart fixes it, implement a TTL-aware DNS caching policy in your application config.
Can I use the same private DNS zone across multiple subscriptions?
Yes, but the zone itself lives in one subscription, you can't replicate it across subscriptions automatically. What you can do is link a private DNS zone in Subscription A to virtual networks in Subscription B by creating a virtual network link that references the cross-subscription VNet. You'll need appropriate RBAC permissions (specifically Network Contributor on the VNet you're linking). Azure DNS Private Resolver with centralized hub-and-spoke DNS architecture is usually a cleaner approach for multi-subscription enterprises, keeping all DNS zones in a dedicated connectivity subscription.
My Azure DNS changes aren't propagating, how long should I actually wait?
Changes to records inside an Azure DNS zone propagate to Azure's authoritative nameservers within 60 seconds typically, sometimes faster. The wait you're experiencing is almost certainly the TTL of the previously cached record in whatever resolver sits between you and the authoritative nameserver. If the old record had a TTL of 3600, you're waiting up to an hour for that cache to expire. You can verify the current state of the Azure nameservers directly using nslookup yourrecord.yourdomain.com ns1-XX.azure-dns.com, if the correct answer comes back there, Azure's side is done and you're waiting on cache expiry elsewhere.
What's the difference between Azure DNS and Azure DNS Private Resolver?
Azure DNS is the hosting platform for both public and private DNS zones, it stores your records and answers queries for resources inside Azure VNets via the platform's built-in resolver at 168.63.129.16. Azure DNS Private Resolver is a separate, managed service that lets you extend that resolution capability bidirectionally across hybrid network boundaries. Think of it this way: Azure DNS is where the zone files live; Azure DNS Private Resolver is the managed forwarding infrastructure that connects those zones to networks that can't natively reach 168.63.129.16, like your on-premises data center.
Why does my private endpoint resolve to a public IP instead of a private IP?
This happens when the private DNS zone integration wasn't configured when the private endpoint was created, or when a custom DNS server in your VNet doesn't forward the relevant privatelink.* zone queries to 168.63.129.16. Azure services like Storage and SQL use a CNAME chain: the public FQDN resolves to a privatelink.* alias, which then resolves to either the public IP or the private endpoint IP depending on what's in that zone. If your DNS server never checks Azure's private zone for that alias, it falls through to the public IP. Fix your custom DNS forwarder or add the missing private DNS zone and link it to your VNet.
Can I use my own nameservers with an Azure public DNS zone?
No, Azure Public DNS is an authoritative hosting service, meaning Azure's nameservers are the authority for your zone. You delegate your domain to Azure's NS records at your registrar; you don't bring your own nameservers. If you need to host DNS on your own infrastructure, you'd run that separately and not use Azure DNS as the authoritative source. That said, you can absolutely host some subdomains in Azure DNS and delegate others elsewhere, just use NS record delegation at the subdomain level to split the authority between providers.