How to Troubleshoot Azure Virtual Network Issues

Microsoft Fix Intermediate 18 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

I've seen this exact scenario play out hundreds of times in enterprise environments: you spin up a new VM in Azure, assign it to a Virtual Network, and then , nothing. The VM can't reach the internet, can't talk to another VM in the same VNet, or the application that worked perfectly in dev suddenly breaks in production. Your Azure Virtual Network troubleshooting journey starts here, and I want you to know upfront: it's almost never a random Azure glitch. There's always a reason, and it's almost always findable.

Azure Virtual Networks (VNets) are the foundation of your cloud network architecture. They give your resources a private, isolated IP space in the cloud. But that isolation , which is the whole point, also means that connectivity doesn't just "happen." Every packet has to navigate through Network Security Groups (NSGs), route tables, DNS resolution, peering configurations, and sometimes VPN or ExpressRoute gateways before it reaches its destination. Any one of those layers can silently drop traffic with no obvious error message.

Here's what I see causing the majority of Azure VNet connectivity issues in real deployments:

  • NSG rules blocking traffic, either an inbound or outbound rule denying traffic that your app depends on. The default "DenyAllInbound" rule at priority 65500 catches everything not explicitly permitted.
  • Incorrect or missing User Defined Routes (UDRs), a custom route table attached to a subnet that's sending traffic to a firewall NVA (Network Virtual Appliance) that isn't configured to pass it through.
  • VNet Peering misconfiguration, peering connections that are showing "Connected" on both sides but have "Allow forwarded traffic" or "Allow gateway transit" set incorrectly.
  • DNS resolution failures, VMs resolving names to wrong IPs because the VNet DNS server setting points to the wrong address, or Private DNS Zones aren't linked to the correct VNet.
  • Overlapping address spaces, particularly common when companies expand their Azure footprint and two VNets end up with conflicting CIDR blocks, making peering impossible.
  • VPN Gateway or ExpressRoute circuit problems, BGP session drops, authentication failures, or IKEv2 negotiation mismatches cutting off on-premises connectivity.

I know this is frustrating, especially when it blocks your work and you're staring at an error that says absolutely nothing useful. The good news is that Azure gives you surprisingly powerful diagnostic tools, and once you know the right sequence to use them, you can pinpoint most Azure VNet issues in under 15 minutes. Let's get into it.

Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you go deep into logs and route tables, run Azure Network Watcher's IP Flow Verify tool. This is the single fastest way to determine whether a Network Security Group is the culprit. I've watched engineers spend two hours manually reading NSG rules when this tool gives you the answer in 30 seconds.

Here's how to do it:

  1. In the Azure Portal, search for Network Watcher in the top search bar and open it.
  2. In the left-hand menu under Network diagnostic tools, click IP flow verify.
  3. Fill in the fields:
    • Virtual machine: select the VM that's having connectivity problems
    • Network interface: select its NIC
    • Protocol: TCP or UDP depending on what you're testing
    • Direction: Inbound or Outbound
    • Local IP / Local port: the VM's IP and the port your app listens on
    • Remote IP / Remote port: the source you're testing from and its port
  4. Click Check.

The result will tell you either "Access allowed" or "Access denied", and critically, it tells you which specific NSG rule is responsible. If you see "DenyAllInbound" listed, that means no explicit allow rule exists for that traffic and the default deny is catching it. If you see a custom rule name, you know exactly which rule to edit.

If IP Flow Verify shows "Access allowed" but traffic still isn't getting through, that tells you the NSG isn't the problem, which is actually great news because it narrows your search immediately. The issue is then in routing, DNS, the application layer, or the destination endpoint itself. Move on to the step-by-step section to continue the investigation.

Pro Tip
Network Watcher must be enabled in the same region as your VNet. It's enabled automatically when you create certain resources, but if you're working in a less common Azure region, go to Network Watcher > Overview and make sure the region shows "Enabled." I've seen teams spend 20 minutes confused about why Network Watcher tools aren't working, it was just disabled in that region the whole time.
1
Enable Network Watcher and Run Connection Troubleshoot

After IP Flow Verify, your next stop is Connection Troubleshoot, a more powerful tool that actually initiates a test connection and traces it through the network stack. This one catches problems that IP Flow Verify misses, particularly issues at the route table and gateway layer.

Navigate to Network Watcher > Connection troubleshoot in the Azure Portal. Configure it as follows:

  • Source type: Virtual machine
  • Virtual machine: the affected VM
  • Destination type: choose "IP address" for an IP test or "URI" to test HTTP/HTTPS endpoints
  • Destination IP / URI: enter what you're trying to reach
  • Destination port: the exact port (e.g., 443, 1433, 22)
  • Protocol: TCP

Click Check and wait 60–90 seconds. The results show you:

  • Reachability: Reachable or Unreachable
  • Average latency in milliseconds
  • Hop-by-hop trace, every network hop between source and destination, including where packets are being dropped

Pay close attention to the hop list. If you see a hop that says "Next hop type: None", that is your smoking gun. It means the routing table has no valid next hop for that destination, and packets are being silently dropped. Note down the exact hop details; you'll need them in Step 3 when we look at route tables.

If Connection Troubleshoot shows the VM can reach the destination fine but your application still can't, the problem is almost certainly at the application layer, a Windows Firewall rule, an application-level binding issue, or a TLS certificate mismatch. Run Test-NetConnection -ComputerName <IP> -Port <port> from inside the VM to verify at the OS level.

2
Audit NSG Rules on Both Subnet and NIC Levels

Here's something that trips up even experienced engineers: in Azure, NSGs can be attached at two different levels, the subnet level and the individual NIC level. Both sets of rules are evaluated, and both can block traffic independently. I can't count how many times I've seen someone fix the subnet NSG and wonder why traffic is still blocked, only to find there's a NIC-level NSG with a conflicting rule.

To check NSG rules on the Azure Virtual Network subnet troubleshooting path:

  1. Go to your VM in the Portal, click Networking in the left menu.
  2. You'll see two sections: one showing rules that apply to the NIC, and one showing rules applied via the subnet. Check both.
  3. Look at the Inbound port rules and Outbound port rules tabs.

Rules are evaluated in priority order (lowest number = highest priority, range 100–4096). The first matching rule wins. To add a new allow rule that overrides a deny:

  1. Click Add inbound port rule (or outbound).
  2. Set the Priority to a number lower than the deny rule you want to override.
  3. Set Source, Source port ranges, Destination, Service, and Action: Allow.
  4. Click Add.

For PowerShell fans, you can view all rules on an NSG like this:

$nsg = Get-AzNetworkSecurityGroup -Name "YourNSGName" -ResourceGroupName "YourRG"
$nsg.SecurityRules | Sort-Object Priority | Format-Table Name, Priority, Direction, Access, Protocol, SourcePortRange, DestinationPortRange

After saving the new rule, changes propagate within about 30 seconds. Test again with IP Flow Verify to confirm the correct rule is now matching.

3
Inspect Effective Routes to Find Routing Conflicts

Routing is the second most common cause of Azure VNet connectivity issues after NSG misconfigurations. Even if traffic is permitted by the NSG, packets still need a valid path to the destination. Azure creates system routes automatically, but when you add custom route tables (User Defined Routes, or UDRs), those UDRs take precedence, and if they're misconfigured, they can blackhole your traffic.

The fastest way to see exactly what routing decisions Azure is making for a specific NIC is through Effective Routes:

  1. Go to your VM, click Networking, then click on the NIC name.
  2. In the NIC blade, under Help, click Effective routes.
  3. You'll see a merged table of all routes that apply to that NIC, system routes, VNet peering routes, and your custom UDRs.

Look for routes where Next hop type is None, these are invalid routes that will silently drop packets. Also watch for UDRs that send traffic to a Virtual Appliance IP that's unreachable or misconfigured. A common scenario: a UDR sends all traffic (0.0.0.0/0) to an Azure Firewall NVA, but the firewall subnet's own NSG is blocking the traffic before it can be inspected and forwarded.

Via PowerShell, you can pull effective routes for deeper scripted analysis:

Get-AzEffectiveRouteTable `
  -NetworkInterfaceName "yourVMnic" `
  -ResourceGroupName "yourRG" | Format-Table

If you find a bad UDR, go to Route tables in the Portal, open the offending table, click Routes, and edit or delete the problematic route. Changes here also propagate in about 30 seconds.

4
Verify VNet Peering Status and Traffic Settings

Azure VNet peering is non-transitive by design. That one fact is behind more escalation tickets than I can count. If VNet A is peered with VNet B, and VNet B is peered with VNet C, VMs in VNet A cannot by default reach VMs in VNet C. Traffic doesn't automatically flow through VNet B. You need either a direct peering between A and C, or a hub-and-spoke architecture with a Network Virtual Appliance or Azure Firewall handling transit routing.

To diagnose Azure virtual network peering not working:

  1. Navigate to your VNet in the Portal.
  2. Click Peerings in the left menu.
  3. Check the Peering status column, it must show Connected on both ends of the peering. If one side shows "Initiated" and the other shows "Disconnected," the remote VNet's peering was deleted or never created properly.

For each peering, verify these settings match your intent:

  • Allow virtual network access: must be Enabled, if this is off, traffic between peered VNets is completely blocked.
  • Allow forwarded traffic: must be Enabled if you're routing traffic through an NVA or hub VNet.
  • Allow gateway transit / Use remote gateways: these work as a pair, only enable "Use remote gateways" on the spoke VNet when the hub VNet has "Allow gateway transit" turned on and an active VPN or ExpressRoute gateway deployed.

A quick PowerShell check for peering state:

Get-AzVirtualNetworkPeering `
  -VirtualNetworkName "YourVNet" `
  -ResourceGroupName "YourRG" | Select-Object Name, PeeringState, AllowVirtualNetworkAccess, AllowForwardedTraffic

If the peering shows "Disconnected," the fix is to delete the peering on both sides and recreate it cleanly. There's no in-place repair for a broken peering state, I've tried, it doesn't work.

5
Fix DNS Resolution Failures Within the VNet

DNS is the silent killer of Azure VNet deployments. Everything looks configured correctly, NSGs, routes, peering, but your application still can't connect because it's resolving the hostname to the wrong IP, or failing to resolve it at all. Azure DNS resolution failure in a VNet usually comes down to one of three issues: wrong DNS server configured on the VNet, Private DNS Zone not linked to the VNet, or a custom DNS server that's down or unreachable.

First, check what DNS server your VNet is configured to use:

  1. Open your VNet in the Portal.
  2. Click DNS servers in the left menu.
  3. It should show either Default (Azure-provided) or a list of custom DNS server IPs.

If you're using Azure Private DNS Zones (which you should be for private endpoint connectivity), verify the zone is linked to the right VNet:

  1. Go to Private DNS zones in the Portal, open your zone.
  2. Click Virtual network links.
  3. Confirm your VNet appears in the list with status Completed and Auto-registration set appropriately.

To test DNS resolution from inside a VM, RDP or SSH in and run:

# Windows PowerShell
Resolve-DnsName your-resource-name.privatelink.database.windows.net

# Linux / bash
nslookup your-resource-name.privatelink.database.windows.net
dig your-resource-name.privatelink.database.windows.net

If the name resolves to a public IP instead of a private IP in your VNet's address space (typically 10.x.x.x or 172.x.x.x), the Private DNS Zone link is missing or the private endpoint's A record wasn't created. If resolution fails entirely, the custom DNS server is probably unreachable, check its NSG and make sure UDP/TCP port 53 is open inbound from within the VNet.

After fixing DNS links, you don't need to restart the VM. DNS changes take effect within a few minutes as the TTL expires, but you can flush the DNS cache immediately with ipconfig /flushdns on Windows or sudo systemd-resolve --flush-caches on Ubuntu.

Advanced Troubleshooting

Using Network Watcher Packet Capture

When you need to see exactly what's on the wire, Azure Network Watcher packet capture is your answer. This captures traffic at the VM's NIC level and saves it as a .pcap file to a storage account, no need to install Wireshark on the VM itself or deal with OS-level firewall complications. I reach for this when the higher-level tools all say traffic is fine but the application still reports connection failures, which usually points to a TCP handshake issue or TLS negotiation problem.

To start a capture: go to Network Watcher > Packet capture > Add. Set the target VM, storage account destination, optional filters (e.g., only capture traffic on TCP port 443 from a specific remote IP), and a time limit. The capture file can then be downloaded and opened in Wireshark locally for analysis. Look for TCP RST packets (connection refused), SYN packets with no SYN-ACK response (firewall drop), or TLS alert handshake failures.

Event Viewer and Azure Monitor Logs

For Windows VMs, Event Viewer can surface network-related errors that Azure's portal tools miss. Check Windows Logs > System for Event ID 4227 (TCP/IP failed to establish connection) and Event ID 1014 (DNS name resolution timeout). For deeper Azure platform-level diagnostics, enable NSG Flow Logs in Network Watcher, these log every flow through your NSG to a storage account and can be queried through Azure Monitor or Traffic Analytics.

To enable NSG flow logs:

# Via Azure CLI
az network watcher flow-log create \
  --location eastus \
  --name MyFlowLog \
  --nsg YourNSGName \
  --storage-account YourStorageAccountId \
  --enabled true \
  --format JSON \
  --log-version 2

Flow log version 2 includes throughput information (bytes and packets per flow), which helps identify whether a specific flow is being dropped or throttled.

VPN Gateway and ExpressRoute Diagnosis

Azure VPN Gateway connection failed errors are particularly nasty because they can have dozens of root causes. The most common ones I see are IKE authentication mismatches (pre-shared key typo or certificate mismatch), Phase 1/Phase 2 policy misalignment between Azure and the on-premises VPN device, and BGP peering drops on ExpressRoute circuits.

For VPN Gateway, go to the gateway resource in the Portal and click VPN troubleshoot under Monitoring. Point it to a storage account and run the diagnostic, it generates a detailed report showing the IKE negotiation logs and exactly where the handshake failed. For ExpressRoute, check the circuit's BGP Summary under the connection resource, if the BGP peer state shows "Idle" or "Connect" instead of "Established," the session is down and needs to be investigated on both the provider side and the Microsoft edge router side.

Overlapping Address Space Problems

If you're trying to peer two VNets with overlapping CIDR ranges, Azure will simply refuse to create the peering, you'll get error code RemoteVnetAddressSpaceOverlap. The only fix is to re-address one of the VNets, which unfortunately requires deleting and recreating subnets (and migrating VMs out of them first). Plan address spaces carefully at the start using a proper IPAM approach, I can't stress this enough.

When to Call Microsoft Support
Escalate to Microsoft if: your VPN Gateway diagnostics show correct configuration on both ends but the tunnel won't establish (could be a platform-level issue with that specific gateway SKU); if an ExpressRoute circuit shows "Provisioned" but BGP is consistently dropping and your provider confirms their end is healthy; or if Network Watcher tools are returning internal errors rather than results. Platform-layer bugs do happen occasionally, and Microsoft support has access to backend infrastructure diagnostics you simply don't have. Open a ticket at Microsoft Support with your subscription ID, resource IDs, and the output of your diagnostic runs already attached, it speeds things up significantly.

Prevention & Best Practices

The best Azure Virtual Network troubleshooting session is the one you never have to do. After working through countless networking incidents in Azure, here's what the teams with the fewest outages consistently do right.

Design your address space before you deploy anything. The CIDR blocks you pick on day one are very difficult to change later. Use a range like 10.0.0.0/8 and carve it up deliberately, different /16 blocks for different environments (dev, staging, prod), different /24 blocks for different application tiers within each environment. Keep a spreadsheet or use an Azure-native IPAM tool to track allocations. Leave room to grow, you'll always need more address space than you expect.

Follow the principle of least privilege with NSG rules. Never open 0.0.0.0/0 on port 22 or 3389 to the internet. Use Azure Bastion for administrative access instead, it's cleaner, more auditable, and eliminates an entire class of attack surface. For internal traffic, use Azure Service Tags (like AzureLoadBalancer or VirtualNetwork) instead of hardcoded IP ranges so your rules stay valid as infrastructure changes.

Enable NSG Flow Logs and Traffic Analytics from the start, not after an incident happens. Having historical flow data when you're investigating "why did this break at 2am" is invaluable. Flow logs cost almost nothing at low traffic volumes and they've saved me hours of guesswork in post-incident reviews.

Tag everything. Every VNet, subnet, NSG, route table, and peering should have consistent tags for environment, application, owner, and cost center. When you're troubleshooting at 11pm and can't remember which NSG belongs to which application tier, good tagging is the difference between a 5-minute investigation and a 2-hour one.

Quick Wins
  • Enable Network Watcher in every region where you have resources, it's free and takes 30 seconds to turn on
  • Set up Azure Monitor alerts on VPN Gateway tunnel drop events (TunnelEgressBytes = 0) so you know about connectivity failures before users report them
  • Use Azure Policy to enforce that all new NSGs have flow logging enabled automatically, prevents blind spots in new deployments
  • Document your VNet peering topology in a diagram and keep it updated, especially important in hub-and-spoke architectures where a single misconfigured transit route can break connectivity across dozens of spoke VNets

Frequently Asked Questions

My two VMs are in the same VNet and same subnet, why can't they ping each other?

Same-subnet VMs communicating with each other bypass route tables but still go through NSG rules. Check both the subnet-level NSG and the NIC-level NSG on both VMs. Look specifically for a rule blocking ICMP, many default NSG configurations don't include an ICMP allow rule, so ping fails even though TCP traffic works fine. Also check the Windows Firewall inside the VM itself: even if Azure's NSG allows ICMP, the Windows built-in firewall may be blocking "File and Printer Sharing (Echo Request)", enable that rule under Control Panel > Windows Defender Firewall > Allow an app through Windows Firewall.

Azure VNet peering shows "Connected" on both sides but VMs still can't communicate, what's wrong?

Connected peering state confirms the peering relationship is valid, but it doesn't guarantee traffic can actually flow. The most common cause of this exact symptom is "Allow virtual network access" being disabled on one or both peering connections, go into each peering and confirm that toggle is on. The second most common cause is NSG rules on the destination subnet that block traffic from the source VNet's IP range. Use IP Flow Verify (with the source IP from the other VNet) to confirm whether NSGs are the blocker. Also verify the source VM's route table doesn't have a UDR sending inter-VNet traffic somewhere unexpected.

How do I fix "Azure private endpoint not resolving to private IP", I keep getting the public IP instead?

This is a Private DNS Zone linking issue almost every single time. The private endpoint creates an A record in a Private DNS Zone (e.g., privatelink.database.windows.net), but if that zone isn't linked to the VNet where your client VM lives, the VM's DNS query goes to Azure public DNS and gets the public IP. Go to the Private DNS Zone, click "Virtual network links," and add a link to your VNet. After adding the link, flush DNS on the client VM with ipconfig /flushdns and re-test. If you're using a custom DNS server on-premises that forwards to Azure, also confirm that conditional forwarders for the privatelink zones are set up on your on-prem DNS server pointing to Azure DNS at 168.63.129.16.

My Azure VPN Gateway site-to-site connection keeps dropping every few hours, how do I stop it?

Intermittent Azure VPN Gateway connection failures that happen on a regular schedule are almost always a Dead Peer Detection (DPD) or IKE re-key timeout issue. When the IKE Phase 2 SA lifetime expires, both sides need to renegotiate, if your on-premises device initiates the re-key but the Azure gateway is slow to respond (or vice versa), the tunnel drops briefly. First, run the VPN troubleshoot diagnostic in the Portal to get IKE logs and confirm the re-key is the trigger. Then, on your on-premises VPN device, ensure it's configured to initiate re-keys proactively before the SA expires rather than waiting for it to lapse. Also consider upgrading your VPN Gateway SKU to VpnGw2 or higher, lower SKUs have CPU constraints that can delay IKE re-negotiation under load.

Can I change the address space of an existing Azure VNet without deleting it?

Yes, you can add additional address spaces to an existing VNet without any downtime. Go to the VNet in the Portal, click "Address space," and add the new CIDR range. Existing subnets and resources are unaffected. However, you cannot remove an address space that's currently in use by a subnet, and you cannot expand or change the address space if the VNet has peering connections, you must delete the peering first, change the address space, then recreate the peering. Shrinking the address space is generally more disruptive than expanding it, so plan ahead and allocate generously from the start.

What's the difference between a Network Security Group and Azure Firewall, do I need both?

NSGs are subnet/NIC-level filters that work on Layer 4 (TCP/UDP port and IP), they're stateful, free to use, and the right tool for basic traffic segmentation within and between VNets. Azure Firewall is a fully managed Layer 4 through Layer 7 firewall service deployed in its own subnet, capable of FQDN filtering, TLS inspection, threat intelligence feeds, and centralized policy management across multiple VNets in a hub-and-spoke topology. For simple deployments, NSGs alone may be sufficient. For production environments with compliance requirements, internet egress control, or complex multi-VNet architectures, Azure Firewall adds a layer of control that NSGs simply can't provide, and the two are complementary, not mutually exclusive.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.