How to Troubleshoot Azure Firewall

Microsoft Fix Intermediate 18 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

Azure Firewall troubleshooting is one of those problems that can eat an entire afternoon , and I say that from direct experience. You've deployed what looks like a perfectly valid rule, refreshed the portal, waited the mandatory few minutes for propagation, and your traffic is still being dropped. No clear reason. No friendly error message pointing you to the offending rule. Just silent packet death.

I've seen this exact scenario on dozens of enterprise Azure environments, and the root causes almost always fall into one of six buckets.

Rule priority conflicts. Azure Firewall processes rules in priority order , lowest number wins. A network rule collection with priority 100 that allows TCP/443 can be silently overridden by a higher-priority collection at 50 that denies all outbound. The portal doesn't warn you about this. You just see traffic drop.

DNAT rules not mapping correctly. Destination Network Address Translation (DNAT) rules require both the firewall's public IP and the correct destination port. Miss either one, or forget to create a corresponding network rule allowing the translated traffic, and inbound connections fail completely with no useful error on the client side.

SNAT port exhaustion. When your firewall is handling high-volume outbound traffic, it can exhaust the available SNAT ports (roughly 2,496 per public IP address by default for non-Premium SKUs). Symptoms look exactly like random connection failures. This hits hardest in batch processing workloads and large AKS clusters.

Asymmetric routing. If your UDR (User Defined Route) setup isn't pushing all traffic through the firewall symmetrically, meaning outbound goes through the firewall but return traffic takes a different path, the firewall drops the return packets because it has no session state for them. Stateful firewalls hate asymmetric traffic.

DNS resolution failures. Azure Firewall has its own DNS proxy feature. When it's misconfigured, FQDNs in application rules silently fail to resolve, causing rules that look correct on paper to block everything in practice. This is especially common after migrations from on-premises environments.

Firewall policy inheritance issues. If you're using Azure Firewall Manager with parent and child policies, rule inheritance can produce unexpected allow/deny combinations that are genuinely difficult to untangle from the portal alone.

Here's the part Microsoft's own error messages don't tell you: most of these problems produce identical symptoms from the application layer, a connection timeout or refused connection. Diagnosing which of the six causes is actually responsible requires knowing where to look in the logs. That's exactly what this guide covers.

Browse all Microsoft fix guides →

The Quick Fix, Try This First

I know this is frustrating, especially when traffic is down in a production environment and every minute matters. Before going deep into diagnostics, try the fastest sanity check first: verify your diagnostic logging is actually on and query the logs directly.

The single biggest reason Azure Firewall troubleshooting takes so long is that people spend time guessing at rules when the answer is sitting right there in Log Analytics, they just never turned on diagnostic settings.

Go to your Azure Firewall resource in the portal. In the left-hand menu, under Monitoring, click Diagnostic settings. You should see entries for AzureFirewallApplicationRule, AzureFirewallNetworkRule, and AzureFirewallDnsProxy. If those aren't being sent to a Log Analytics workspace, that's step one, enable them right now.

Once logs are flowing (give it 5–10 minutes on a fresh enablement), go to your Log Analytics workspace, click Logs, and run this query:

AzureDiagnostics
| where Category == "AzureFirewallNetworkRule" or Category == "AzureFirewallApplicationRule"
| where TimeGenerated > ago(30m)
| where msg_s contains "Deny"
| project TimeGenerated, msg_s, Category
| order by TimeGenerated desc
| take 50

That query surfaces every deny action in the last 30 minutes across both rule types, sorted newest-first. Read the msg_s column. It will tell you the source IP, destination IP, destination port, and which rule collection made the deny decision. That single piece of information cuts troubleshooting time from hours to minutes.

If you see your traffic in those results, say, a deny on TCP from 10.2.1.5 to 52.160.0.0 on port 443, you know exactly where to create your fix. If you don't see your traffic in the deny logs at all, the firewall may not even be receiving the traffic, which points to a routing problem (UDR misconfiguration or missing association), not a rule problem.

Pro Tip
Save the Log Analytics query above as a function in your workspace (click Save > Save as function and name it fw_denies). Next time traffic breaks, you're two clicks away from an answer instead of rebuilding the query from memory at 2am during an incident.
1
Enable Diagnostic Logging and Verify Traffic Reaches the Firewall

Before touching a single rule, confirm two things: diagnostic logs are enabled, and the traffic you care about is actually hitting the firewall at all. These are separate questions and both matter.

Enable diagnostic logs: Navigate to your Azure Firewall in the portal. Under Monitoring > Diagnostic settings, click Add diagnostic setting. Check all four log categories: AzureFirewallApplicationRule, AzureFirewallNetworkRule, AzureFirewallDnsProxy, and AzureFirewallThreatIntel. Send them to your Log Analytics workspace. Also enable the AllMetrics option, you'll want SNAT utilization data later.

Check if traffic reaches the firewall at all: In the Azure portal, go to Network Watcher > Next hop. Enter the source VM's subscription, resource group, virtual machine, network interface, source IP, and the destination IP you're trying to reach. Click Next hop.

The result should show VirtualAppliance with the next hop IP matching your Azure Firewall's private IP. If it shows Internet or None, your UDR isn't routing traffic through the firewall at all, fix the route table before touching firewall rules.

You can also use IP flow verify under Network Watcher to test whether a specific flow is allowed or denied at the NIC level. This rules out NSG conflicts before blaming the firewall.

If both checks pass, traffic reaches the firewall and logs are enabled, run the deny query from the Quick Fix section and note the exact source/destination/port combination being dropped. You'll need that for every step that follows.

2
Audit Rule Collections for Priority Conflicts

Rule priority conflicts are the number-one cause of "my rule is there but traffic is still blocked" in Azure Firewall troubleshooting. The logic is simple but the portal makes it easy to miss: lower priority numbers are evaluated first, and the first matching rule wins, period. If a deny rule at priority 100 matches before your allow rule at priority 200, your traffic dies.

In the portal, open your Azure Firewall or Firewall Policy. Under Settings > Rules (or Rule collections if using Firewall Policy), sort collections by priority number ascending. Read through them top to bottom, exactly as the firewall does.

Look specifically for:

  • Any collection with a lower priority number than your allow rule that contains a broad deny or a catch-all deny-all at the bottom
  • Network rule collections and application rule collections, these are evaluated independently, so check both
  • Application rules that use FQDNs vs. network rules that use IPs, they can conflict with each other in non-obvious ways

If using Azure Firewall Manager with a parent policy, the parent policy rules always take precedence over child policy rules, regardless of priority numbers within the child. This trips up a lot of teams who inherit a parent policy from a central networking team and then wonder why their child policy rules seem to be ignored.

Fix: adjust priority numbers so your allow rules have a lower (higher priority) number than any conflicting deny rules. Increment in steps of 100 (e.g., 100, 200, 300) to leave room for future insertions without renumbering everything.

# List all rule collection priorities via Azure CLI
az network firewall policy rule-collection-group list \
  --policy-name MyFirewallPolicy \
  --resource-group MyResourceGroup \
  --query "[].{Name:name, Priority:priority}" \
  --output table
3
Fix DNAT Rule Misconfigurations for Inbound Traffic

If inbound connections are failing, say, you can't reach a service hosted in your Azure VNet from the internet, the problem is almost always a DNAT rule issue. Azure Firewall troubleshooting for inbound traffic has a specific checklist that covers 95% of cases.

Check 1: The DNAT rule references the correct public IP. If your firewall has multiple public IP addresses, the DNAT rule must reference the specific public IP that clients are connecting to. In the rule, under Destination, confirm the IP matches what your DNS record resolves to. Mismatch here means the rule never triggers.

Check 2: A companion network rule exists. DNAT rules alone are not enough. After the DNAT translation occurs, the translated traffic needs a matching network rule that allows it to reach the backend. Without this, the traffic gets translated and then immediately dropped. Create a network rule that allows the backend subnet to receive traffic on the translated port.

Check 3: The destination port in the DNAT rule matches client expectations. A DNAT rule that translates port 8080 on the public IP to port 80 on the backend will silently fail if clients are connecting to port 80 on the public IP, not 8080.

Test your DNAT configuration with a simple TCP connection test from outside Azure:

# From an external machine, test TCP reachability
Test-NetConnection -ComputerName <firewall-public-ip> -Port <dnat-port> -InformationLevel Detailed

If TcpTestSucceeded comes back False, go back to the DNAT rule. If it comes back True but the application still fails, the problem is upstream of the firewall, check your backend VM's own firewall, NSG on the backend subnet, and the application listener port.

4
Diagnose and Fix SNAT Port Exhaustion

SNAT port exhaustion is sneaky. Connections work fine under normal load, then start failing randomly at peak times, and because the failures look like generic timeouts, the firewall is often the last thing people check. I've seen this tank production workloads in AKS clusters, Azure Batch jobs, and microservice environments that make a high volume of outbound connections.

First, check your SNAT utilization metric. In the portal, go to your Azure Firewall resource, click Monitoring > Metrics, add a metric for SNAT port utilization, and set the aggregation to Average over the last 24 hours. If you're regularly hitting 80%+ utilization, you're at risk. At 100%, new outbound connections fail.

By default, each public IP address on the firewall provides approximately 2,496 SNAT ports. The fix is to add more public IP addresses to the firewall:

# Add a new public IP to your Azure Firewall via PowerShell
$pip = New-AzPublicIpAddress `
  -Name "fw-pip-02" `
  -ResourceGroupName "MyResourceGroup" `
  -Location "eastus" `
  -AllocationMethod Static `
  -Sku Standard

$azfw = Get-AzFirewall -Name "MyFirewall" -ResourceGroupName "MyResourceGroup"
$azfw.AddPublicIpAddress($pip)
Set-AzFirewall -AzureFirewall $azfw

Each additional public IP adds another ~2,496 SNAT ports. For workloads making tens of thousands of concurrent outbound connections, you may need 5–10 public IPs, or consider deploying Azure NAT Gateway on the firewall subnet, NAT Gateway provides 64,512 SNAT ports per public IP and takes SNAT pressure off the firewall entirely for outbound scenarios.

After adding IPs, monitor the SNAT utilization metric again over 24–48 hours to confirm it drops to a safe level. You're looking for consistent utilization below 75%.

5
Fix DNS Proxy Failures Causing FQDN Rule Breakdowns

Application rules in Azure Firewall that use FQDNs (like *.microsoft.com or api.github.com) depend entirely on the firewall's DNS resolution being correct. If the DNS proxy is misconfigured or disabled, the firewall can't resolve the FQDN to an IP, the rule never matches, and traffic gets blocked, even though the rule looks exactly right in the portal.

Check DNS proxy status first. In the portal, navigate to your Azure Firewall Policy (or directly to the firewall if you're using classic rules), then go to DNS Settings. Confirm that DNS Proxy is set to Enabled and that the DNS servers listed are reachable from the firewall subnet.

If you're using custom DNS servers (common in hybrid environments with on-premises DNS), verify the firewall can reach those servers. DNS proxy failures show up in the AzureFirewallDnsProxy log category, query them directly:

AzureDiagnostics
| where Category == "AzureFirewallDnsProxy"
| where TimeGenerated > ago(1h)
| where msg_s contains "FAIL" or msg_s contains "error"
| project TimeGenerated, msg_s
| order by TimeGenerated desc

If you see repeated SERVFAIL or NXDOMAIN errors for domains that should resolve, your custom DNS server isn't forwarding correctly. Common fix: add a conditional forwarder on your custom DNS server that forwards Azure-internal domains (*.azure.com, *.windows.net) to Azure DNS at 168.63.129.16.

Also check that VMs on your protected subnets have their DNS server set to the firewall's private IP address. This is required for the DNS proxy to intercept resolution requests. If VMs are still pointing at 168.63.129.16 directly, they bypass the proxy and FQDN application rules will be unreliable. Configure this at the VNet DNS settings level under Virtual Network > DNS servers > Custom, entering the firewall's private IP.

Advanced Troubleshooting

When the standard Azure Firewall troubleshooting steps don't resolve the issue, it's time to go deeper. These techniques cover enterprise scenarios, policy inheritance chains, network-level diagnostics, and situations I've only ever seen in domain-joined or hub-spoke environments.

Use the Azure Firewall Workbook for visual diagnostics. In the portal, go to your Azure Firewall, click Monitoring > Workbooks, and open the built-in Azure Firewall Workbook. This pre-built dashboard gives you traffic trends, top blocked IPs, SNAT utilization over time, and threat intelligence hits, all on a single screen. It's genuinely one of the best built-in diagnostic tools in Azure and most teams don't know it exists. Set the time range to the window when issues occurred and look for spikes in deny events or SNAT exhaustion.

Investigate asymmetric routing with Connection Monitor. If you're seeing intermittent failures on established connections rather than new connection failures, asymmetric routing is a strong candidate. Use Network Watcher's Connection Monitor to set up continuous tests between source and destination. Asymmetric routing often shows up as high round-trip times or packet loss on established sessions. Validate every subnet involved in the traffic path has a UDR pointing to the firewall's private IP as the next hop, and that no route is more specific (longer prefix) than your intended firewall route.

Check forced tunneling configuration. If your environment uses forced tunneling (all internet-bound traffic routes back on-premises via ExpressRoute or VPN), Azure Firewall needs a dedicated management subnet (AzureFirewallManagementSubnet) with a separate public IP to maintain its own management traffic path. Without this, the firewall loses connectivity to Azure's control plane and stops working entirely. Verify the management subnet exists and is correctly associated.

Analyze Threat Intelligence blocks. Azure Firewall Premium includes Threat Intelligence-based filtering that blocks traffic to/from known malicious IPs and domains. In some environments, legitimate traffic to third-party services gets caught by Threat Intel updates. Query the logs:

AzureDiagnostics
| where Category == "AzureFirewallThreatIntel"
| where TimeGenerated > ago(24h)
| project TimeGenerated, msg_s
| order by TimeGenerated desc

If legitimate traffic is being blocked here, you can add the destination to an allowlist via a higher-priority network rule (lower priority number) that explicitly allows the connection before Threat Intel evaluation occurs.

Enterprise/domain-joined scenarios: In hub-spoke topologies managed through Azure Firewall Manager, verify that spoke VNet peering has Use Remote Virtual Network's Gateway disabled and that the hub VNet peering has Allow Gateway Transit enabled. Misconfigured peering flags are a silent traffic killer that shows no errors in firewall logs, because traffic never reaches the firewall in the first place.

Event Viewer equivalent, Azure Activity Log: For firewall policy deployment failures (rule changes that don't propagate), check the Azure Activity Log under your Firewall resource. Filter for operations on Microsoft.Network/azureFirewalls and look for failed write operations. These typically surface as HTTP 409 (conflict) or 412 (precondition failed) errors with a timestamp you can correlate to when the issue started.

When to Call Microsoft Support

Escalate to Microsoft Support if you're seeing firewall health state show as Degraded in the portal for more than 15 minutes with no recent configuration change, if rule changes consistently fail to propagate after 30+ minutes, or if your firewall is dropping traffic that matches an explicit allow rule and logs confirm the allow rule should be matching. These point to platform-level issues that you cannot resolve through configuration changes. Open a Severity A support ticket if production traffic is down, Microsoft's SLA for Sev A is a 15-minute initial response.

Prevention & Best Practices

The best Azure Firewall troubleshooting session is the one you never have to do. After working through enough of these incidents, a clear set of practices separates environments that stay stable from the ones that generate a new ticket every other week.

Always use Azure Firewall Policy instead of classic rules. Classic rules are configured directly on the firewall resource and are harder to audit, version, and deploy consistently. Firewall Policy gives you a separate resource you can manage with ARM templates, Bicep, or Terraform, version-control in Git, and deploy through a pipeline with proper review. The moment you have two or more firewalls, Policy is non-negotiable.

Document your rule intent alongside the rule. Azure Firewall rules have a Description field. Fill it in, always. "Allow AKS nodes to pull from ACR" is infinitely more useful than a bare IP range six months later when you're trying to figure out if a rule is safe to delete. This sounds trivial until you're staring at 200 rules with no descriptions at 3am during an outage.

Set up Azure Monitor Alerts for key firewall metrics. Create an alert that fires when SNAT port utilization exceeds 80%. Create a second alert for Firewall Health State dropping below 100%. These two alerts alone would have prevented most of the emergency escalations I've seen. Go to Monitor > Alerts > Create > Alert rule and select your firewall as the scope.

Test rules in a staging environment before production. Use a secondary Firewall Policy in a non-production environment and mirror your rule changes there first. This catches priority conflicts and DNAT misconfigurations before they impact users. Bicep and Terraform both make it straightforward to maintain a staging policy that matches production minus a few environment-specific values.

Review and clean up rules quarterly. Firewall rule sets grow. Old rules for decommissioned services accumulate and make auditing harder. Schedule a quarterly review, even 30 minutes, to remove rules for services that no longer exist, consolidate overlapping rules, and verify that all rules still have a valid business justification.

Quick Wins
  • Enable diagnostic logging on day one, never deploy a firewall without it pointed at a Log Analytics workspace
  • Add at least two public IP addresses to every production firewall to provide SNAT headroom from the start
  • Store all Firewall Policy definitions in source control (Git) and deploy via CI/CD pipeline to prevent manual drift
  • Set up the Azure Firewall Workbook as a pinned dashboard for your on-call team so traffic visibility is instant during incidents

Frequently Asked Questions

Why is Azure Firewall blocking traffic even though I have an allow rule for it?

The most common reason is rule priority conflict, a deny rule with a lower priority number (evaluated first) is matching before your allow rule gets a chance. Open your Log Analytics workspace and run a query against AzureFirewallNetworkRule filtering on Deny to see exactly which rule collection is making the deny decision. Check whether your allow rule's priority number is lower than the conflicting deny. Also verify the rule is in the correct collection type (network vs. application) for the traffic you're trying to pass, network rules match on IP/port, application rules match on FQDN/protocol, and they evaluate separately.

How do I check what Azure Firewall is actually blocking right now?

If diagnostic settings are enabled and sending to Log Analytics, run this query in your workspace: AzureDiagnostics | where Category in ("AzureFirewallNetworkRule","AzureFirewallApplicationRule") | where msg_s contains "Deny" | order by TimeGenerated desc | take 100. The msg_s column gives you source IP, destination IP, port, protocol, and the collection that made the deny decision. If you haven't enabled diagnostic settings yet, go to your Firewall resource, then Monitoring > Diagnostic settings and add them, you need them before you can see anything useful.

My Azure Firewall DNAT rule isn't working, inbound connections time out. What's wrong?

Three things to check in order. First, confirm the DNAT rule's destination IP matches the specific public IP address the client is connecting to, if your firewall has multiple PIPs, the wrong one won't match. Second, make sure you also have a network rule that allows the post-NAT traffic to reach the backend; DNAT alone doesn't grant access. Third, check that your backend VM's NSG allows inbound traffic on the translated port from the firewall's private IP subnet. All three need to be in place for inbound DNAT to work end-to-end.

Azure Firewall FQDN application rules stopped working after I changed DNS settings, how do I fix it?

FQDN rules require the DNS proxy to be enabled and working. Go to your Firewall Policy, click DNS Settings, and verify DNS Proxy is Enabled. Then check that your VMs' DNS server is set to the firewall's private IP, if VMs resolve DNS directly through Azure DNS or a custom server, they bypass the proxy and FQDN rules become unreliable. Configure this at the VNet level under Virtual Network > DNS servers > Custom and enter the firewall's private IP. Finally, query the AzureFirewallDnsProxy log category for FAIL or error messages to catch upstream DNS forwarding issues.

What does "SNAT port utilization" mean and why is my Azure Firewall dropping outbound connections randomly?

SNAT (Source Network Address Translation) is how the firewall maps outbound connections from private IPs to its public IPs. Each public IP provides approximately 2,496 SNAT ports, and each active outbound connection uses one port. When all ports are exhausted, new outbound connections fail with a timeout, which looks indistinguishable from a rule block. Check the SNAT port utilization metric in Azure Monitor under your firewall resource. If it regularly hits 80–100%, add more public IP addresses to the firewall (each adds ~2,496 ports) or deploy Azure NAT Gateway on the firewall subnet for higher-volume scenarios.

Azure Firewall health state shows "Degraded" in the portal, is this a rule problem or a platform issue?

A Degraded health state almost never comes from rule misconfiguration, it points to infrastructure-level issues. Common causes include losing connectivity to the Azure management plane (often caused by forced tunneling without a dedicated management subnet), resource health events affecting the underlying infrastructure, or a configuration deployment that got stuck mid-apply. First check Service Health > Resource health for your firewall resource to see if Azure is reporting an active incident. If the state persists beyond 15–20 minutes with no explanation in Activity Log or Service Health, open a support ticket immediately, this is a Severity A situation if traffic is impacted.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.