Azure VPN Gateway Troubleshooting: Fix Connection Issues Fast

Microsoft Fix Intermediate 18 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

I've seen this exact scenario play out at least a hundred times: you're an hour into your morning, coffee in hand, and suddenly your Azure VPN Gateway tunnel drops. Or worse , it never came up in the first place and you're staring at a Connection Status: Unknown message in the Azure Portal with zero explanation. I know this is frustrating, especially when on-premises workloads depend on that tunnel and your users are already filing tickets.

Azure VPN Gateway troubleshooting is genuinely complex because failures can originate from three completely different places , Azure's side, your on-premises device, or the network path between them, and the error messages almost never tell you which one is to blame. A generic "tunnel down" alert gives you nothing to work with.

Here's the reality of what actually breaks these tunnels:

IKE/IPsec negotiation failures are the single most common cause I see. Your on-premises VPN device and Azure negotiate a two-phase handshake (IKE Phase 1 and Phase 2). If the encryption algorithms, Diffie-Hellman groups, or lifetime values don't match on both ends, the tunnel dies at negotiation, silently. Event ID 20227 in Windows Server RRAS, or a "IKE authentication credentials are unacceptable" message in Azure diagnostics, usually signals this.

Shared key mismatches are embarrassingly common, especially after infrastructure-as-code deployments where a Terraform or Bicep template gets updated on one side but not the other. Azure won't tell you "wrong key", it just fails.

Overlapping address spaces are another silent killer. If your on-premises subnet and your Azure Virtual Network address space overlap even partially, routing breaks. Azure will accept the configuration without warning you.

BGP misconfiguration trips up many teams running active-active gateways. Wrong ASN values, missing peer IPs, or route advertisement issues cause the tunnel to connect but pass zero traffic, which is infuriating because the portal shows it as "Connected."

Gateway SKU limitations also catch people off guard. Basic SKU gateways don't support BGP, zone redundancy, or active-active mode. If you're hitting throughput ceilings or seeing dropped connections under load, the SKU is often the constraint.

Finally, NAT-T (NAT Traversal) issues arise when your on-premises device sits behind NAT, and the VPN device doesn't properly handle UDP port 4500 encapsulation. This is especially common with consumer-grade routers repurposed for small office setups connecting to Azure.

The good news: most Azure VPN Gateway connection failures are fixable without opening a support ticket, if you follow a structured diagnostic process. Let's do that now. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you spend an hour in diagnostics, run the Azure VPN Gateway's built-in diagnostic tool. It's underused and catches the most common issues in under two minutes.

Open the Azure Portal, navigate to your Virtual Network Gateway, and in the left-hand menu select Diagnose and solve problems. Click VPN Gateway Connectivity. Select your connection from the dropdown and hit Run Diagnostics.

Azure runs a series of backend checks, IKE state, BGP peering, packet flow, and returns a plain-English summary. In my experience it correctly identifies the root cause about 65% of the time. If it pinpoints a shared key issue, configuration mismatch, or policy error, you can fix it right there and move on.

If the Portal tool doesn't resolve it, run this PowerShell command to reset the gateway. A reset forces Azure to reinitialize the IPsec tunnels from scratch and clears transient negotiation states that can get stuck, particularly after an Azure platform maintenance event:

Connect-AzAccount
Select-AzSubscription -SubscriptionId "your-subscription-id"

# Reset the VPN Gateway (takes 1-2 minutes)
Reset-AzVirtualNetworkGateway `
  -VirtualNetworkGateway (Get-AzVirtualNetworkGateway `
    -Name "your-gateway-name" `
    -ResourceGroupName "your-rg-name")

This is non-destructive. It doesn't delete your configuration, it simply kicks the gateway process. For active-active gateways, Azure resets each instance sequentially, so expect a brief interruption. Wait 3–5 minutes after the command completes before testing connectivity again.

If the tunnel comes back, great. Monitor it for 30 minutes. If it drops again, the reset masked a real configuration problem and you need the full troubleshooting process below.

Pro Tip
Always note the exact timestamp (UTC) when the tunnel dropped. Azure VPN Gateway diagnostic logs are queryable by time range in Log Analytics, and having a precise window cuts your log search from thousands of entries down to a handful. The Azure Portal shows connection history in UTC, don't confuse it with your local timezone when writing queries.
1
Enable VPN Diagnostics and Pull Gateway Logs

You can't fix what you can't see. The first real troubleshooting step is turning on diagnostic logging if it isn't already enabled, and surprisingly, many teams run Azure VPN Gateways in production without it.

In the Azure Portal, go to your Virtual Network GatewayMonitoringDiagnostic settings. Click Add diagnostic setting. Give it a name like vpn-diag-logs, then check all of the following log categories:

  • GatewayDiagnosticLog, gateway operational events, IKE negotiation details
  • TunnelDiagnosticLog, tunnel connect/disconnect events with reason codes
  • RouteDiagnosticLog, BGP route advertisements and withdrawals
  • IKEDiagnosticLog, raw IKE message exchanges (very verbose, essential for mismatch diagnosis)
  • P2SDiagnosticLog, if you're also running point-to-site VPN

Send these to a Log Analytics Workspace. Then run this Kusto query to see what happened around your failure window:

AzureDiagnostics
| where ResourceType == "VIRTUALNETWORKGATEWAYS"
| where Category == "TunnelDiagnosticLog"
| where TimeGenerated between (datetime("2026-04-20 08:00:00") .. datetime("2026-04-20 09:00:00"))
| project TimeGenerated, operationName_s, status_s, subCategory_s, message_s
| order by TimeGenerated asc

Look for status_s == "Failed" entries. The message_s field will contain strings like "IKE authentication credentials are unacceptable" (shared key mismatch), "No proposal chosen" (algorithm mismatch), or "Dead Peer Detection timeout" (network path issue or idle timeout on on-premises device). Those strings tell you exactly which step below to focus on.

If you see the tunnel came up successfully and then dropped after exactly 28,800 seconds (8 hours), that's an IKE Phase 1 lifetime rekey failure, common when one side doesn't support rekey and the session simply expires.

2
Verify IPsec/IKE Policy and Shared Key Match

This step resolves a massive proportion of Azure site-to-site VPN not connecting issues. Both sides of the tunnel must agree on the exact same cryptographic parameters. Azure is pickier than most people expect.

First, check the shared key. In the Portal, go to your connection resource (not the gateway, the Connection object) → Authentication typeShared Key → click the eye icon to reveal it. Copy it. Now log into your on-premises VPN device and confirm it matches character for character. Watch for leading/trailing spaces, which get silently pasted in from documentation.

Next, check your IPsec policy. If you're using a custom policy (rather than the Azure default), navigate to your ConnectionConfigurationIPsec/IKE policy. To view or set it via PowerShell:

$connection = Get-AzVirtualNetworkGatewayConnection `
  -Name "your-connection-name" `
  -ResourceGroupName "your-rg-name"

$connection.IpsecPolicies

The output shows your DH group, IKE encryption, IKE integrity, IPsec encryption, IPsec integrity, PFS group, and SA lifetime values. Take this list and compare it line-by-line against your on-premises device's Phase 1 and Phase 2 configuration. Every single value must match. Azure supports IKEv2 by default, if your on-premises device is configured for IKEv1 only, that's an immediate failure. Check your Connection's IKE Protocol setting and align it.

Common mismatches I see constantly: Azure configured for DHGroup2 while the on-premises Cisco ASA is set to group5. Or Azure using GCMAES256 for IPsec encryption while a Palo Alto is configured for AES-256-CBC. They're both AES-256, but they're not the same algorithm and the negotiation will silently fail.

If you set a matching custom policy on both sides and the tunnel comes up, you've found the problem. If you're unsure which algorithms your on-premises device supports, temporarily set the Azure connection to Default policy, this tells Azure to accept any algorithm the remote side proposes, which helps isolate whether a policy mismatch is the root cause.

3
Check Address Space Overlaps and Routing Configuration

The tunnel can be fully up, IKE negotiated, IPsec established, and still pass zero traffic. When that happens, the culprit is almost always routing. This is the scenario that makes people think Azure VPN is broken when the networking team has actually introduced an IP conflict.

In the Azure Portal, open your Virtual NetworkAddress space and note every CIDR block. Now open the Local Network Gateway associated with your connection and look at the Address space field there, this is supposed to represent your on-premises subnets. These two sets of address ranges must not overlap, even partially. A 10.0.0.0/8 in Azure and a 10.1.0.0/16 on-premises? That's a full overlap from Azure's routing perspective.

Run this to check effective routes on an Azure VM that should be routing through the tunnel:

Get-AzEffectiveRouteTable `
  -ResourceGroupName "your-rg-name" `
  -NetworkInterfaceName "your-vm-nic-name" `
  | Format-Table -AutoSize

Look for routes with NextHopType: VirtualNetworkGateway pointing to your on-premises subnets. If those routes are missing, Azure isn't sending traffic toward the tunnel at all. This happens when the Local Network Gateway address spaces aren't correctly defined, or when User Defined Routes (UDRs) on your subnet are overriding the gateway-propagated routes.

Check your subnet's route table: Virtual NetworkSubnets → click the subnet → Route table. If a UDR sends 0.0.0.0/0 to a firewall NVA, you need an explicit route for your on-premises CIDR pointing to the VPN Gateway. Also verify that Gateway route propagation is set to Enabled on your subnet's route table, disabling it is a common accident that kills VPN routing.

On your on-premises device, also verify that static routes or BGP advertisements point your Azure Virtual Network address space toward the VPN tunnel interface, not the default internet route.

4
Diagnose BGP Peering for Active-Active or Dynamic Routing Setups

If you're running BGP over your Azure VPN Gateway, which you absolutely should be for production workloads, BGP peering failures are a distinct failure mode from tunnel failures. The IPsec tunnel can be up while BGP is down, meaning no routes are being exchanged and traffic still doesn't flow.

Check BGP status with PowerShell:

$gateway = Get-AzVirtualNetworkGateway `
  -Name "your-gateway-name" `
  -ResourceGroupName "your-rg-name"

# Check BGP peer status
Get-AzVirtualNetworkGatewayBGPPeerStatus `
  -VirtualNetworkGateway $gateway

The output shows each BGP peer's State (should be Connected), Neighbor IP, LocalAddress, and Routes Received. If State shows Idle or Active (rather than Connected), BGP peering is failing.

The most common BGP misconfiguration: wrong ASN. Azure VPN Gateway uses its configured BGP ASN (find it under your gateway's Configuration blade). Your on-premises device's BGP configuration must list this exact ASN as the remote peer ASN. I've seen teams configure 65515 on Azure and 65000 on-premises, expecting Azure to be the "remote" side, they're setting up the peer relationship backwards.

Also verify the BGP peer IP addresses. For a standard VPN gateway, Azure's BGP peer IP is visible in the gateway's Configuration tab as the BGP peer IP address, it's always within the GatewaySubnet. Your on-premises device must be configured to peer with exactly this IP. For active-active gateways, there are two BGP peer IPs (one per gateway instance) and you must configure both peers on your on-premises device.

Check the advertised routes to confirm Azure is actually sending your VNet prefixes:

Get-AzVirtualNetworkGatewayAdvertisedRoute `
  -VirtualNetworkGateway $gateway `
  -Peer "your-on-premises-bgp-ip"

If this returns nothing, Azure isn't advertising routes to your peer, investigate whether your VNet address space is properly associated with the gateway and whether BGP is enabled on the connection object itself.

5
Test End-to-End Connectivity with Packet Capture

When the tunnel shows "Connected" in the portal but actual application traffic isn't flowing, you need packet-level visibility. Azure Network Watcher provides a built-in packet capture that runs directly on your gateway, no agent installation needed.

Start a packet capture on the gateway using the Azure Network Watcher VPN diagnostic:

$storageAccount = Get-AzStorageAccount `
  -ResourceGroupName "your-rg-name" `
  -Name "yourstorageaccount"

$sasUrl = New-AzStorageAccountSASToken `
  -Service Blob `
  -ResourceType Container,Object `
  -Permission "racw" `
  -ExpiryTime (Get-Date).AddHours(2) `
  -Context $storageAccount.Context

# Start VPN troubleshoot (captures packets and analyzes)
Start-AzVirtualNetworkGatewayConnectionVpnDeviceConfigScript `
  -ResourceGroupName "your-rg-name" `
  -Name "your-connection-name" `
  -DeviceVendor "Cisco" `
  -DeviceFamily "ASA" `
  -FirmwareVersion "8.4"

For direct packet capture, use Network Watcher's Packet capture feature on a VM in the VNet. Send a ping from an on-premises host to the Azure VM's private IP, and simultaneously run a packet capture filtered to that IP. If you see ICMP echo requests arriving at the Azure VM but no replies, the issue is the Azure VM's OS firewall (Windows Firewall, or a Linux iptables rule) blocking ICMP, not the VPN tunnel.

Use the IP flow verify tool in Network Watcher to quickly test whether Azure's Network Security Groups (NSGs) are blocking traffic: Network WatcherIP flow verify. Enter the VM, NIC, direction (Inbound), source IP (your on-premises host), destination port, and protocol. It'll tell you immediately if an NSG rule is dropping packets before they reach your application.

Also check your GatewaySubnet, it must not have an NSG attached. Applying an NSG to the GatewaySubnet is explicitly unsupported and will cause erratic VPN behavior. If you see an NSG on that subnet, remove it. This alone has unblocked dozens of troubled deployments I've worked on.

Advanced Troubleshooting

Point-to-Site VPN Not Connecting

Azure point-to-site VPN issues have their own distinct failure patterns. The most common: certificate authentication failures. When a P2S client gets Error 800 or Error 812, the root certificate stored in Azure doesn't match the certificate chain on the client machine.

In the Portal, go to your gateway → Point-to-site configurationRoot certificates. Verify the root certificate public data matches what's in your client's Trusted Root Certification Authorities store (certmgr.msc → Trusted Root Certification Authorities → Certificates). If you've recently rotated certs, you must re-download and reinstall the VPN client package, the package embeds the root cert thumbprint and doesn't update automatically.

For OpenVPN protocol connections (TCP port 443), check that your on-premises firewall isn't stripping or inspecting TLS traffic on that port. SSL inspection proxies frequently break OpenVPN handshakes.

Event Viewer for Windows RRAS Gateways

If your on-premises endpoint is a Windows Server running RRAS, Event Viewer is your best diagnostic source. Open Event ViewerApplications and Services LogsMicrosoftWindowsRasServerOperational. Key event IDs to look for:

  • Event ID 20227, The connection was terminated. Reason code in the event details tells you exactly why.
  • Event ID 20209, IKE authentication failed (usually shared key or certificate mismatch)
  • Event ID 20271, A connection attempt failed because of timeout, indicates Dead Peer Detection fired

Forced Tunneling and Split Tunneling Conflicts

If you've enabled forced tunneling (routing all internet traffic through on-premises via the VPN), you need a User Defined Route that explicitly sends Azure platform traffic (like 168.63.129.16/32, Azure's health probe and DNS IP) directly to the internet, not through the tunnel. Sending health probe traffic through a forced tunnel breaks VM extension operations and can trigger false VM health failures.

Checking Gateway Health in the Portal

Navigate to your gateway → Overview → look at the Gateway status field. If it shows Not Running or Updating, Azure is performing maintenance on your gateway instance. Do not attempt fixes during this window, wait for it to return to Running status. Azure maintenance events are typically 15–45 minutes. You'll also see the gateway status in Azure Service Health under resource health for your specific gateway resource ID.

When to Call Microsoft Support

Escalate to Microsoft Support when: the gateway status is stuck in "Not Running" for more than an hour after a reset; you see tunnel flapping (connecting and disconnecting repeatedly) with no configuration changes on your side; Log Analytics shows IKE negotiations succeeding but the connection object never transitions to "Connected"; or you're hitting throughput significantly below your SKU's advertised limit with no NSG or NVA in the path. Collect your diagnostic logs, the output of your BGP peer status commands, and the exact UTC timestamps of failure events before you call, it cuts resolution time dramatically.

Prevention & Best Practices

Fixing a broken Azure VPN Gateway tunnel is satisfying. Not breaking it in the first place is better. Here's what separates teams that have stable VPN connectivity from those who are constantly firefighting.

Use VpnGw2 SKU or higher for production workloads. The Basic SKU is a development/test SKU, it doesn't support BGP, zone redundancy, active-active mode, or custom IPsec policies. I've seen teams put production workloads on Basic SKUs and wonder why they keep hitting issues. The SKU upgrade requires gateway redeployment (expect 30–45 minutes of downtime), so plan for it now rather than during an incident.

Deploy active-active gateways with BGP for all production site-to-site connections. Active-active eliminates the single point of failure inherent in active-passive mode. Combined with BGP for dynamic route exchange, you get automatic failover between gateway instances without any manual intervention. This is the single biggest architectural improvement available to you.

Set up Azure Monitor alerts on your gateway. Create metric alerts for Tunnel Egress Bytes and Tunnel Ingress Bytes dropping to zero, that's your tunnel down alert. Also alert on BGP Peer Status going to 0. Configure these to fire into your operations team's channel immediately; don't wait for users to report connectivity problems.

Document your IPsec policy settings formally. Keep a shared document (or infrastructure-as-code comment block) with both sides of the tunnel's exact crypto settings. When someone changes the on-premises VPN device for a firmware upgrade and the tunnel drops at 2 AM, having that reference cuts the MTTR from hours to minutes.

Test IKE rekey behavior in a maintenance window before going live. Let the tunnel run for a full SA lifetime (default 3,600 seconds for Phase 2, 28,800 for Phase 1) and verify it renegotiates cleanly. On-premises devices that don't initiate rekey and instead wait for the remote side to do it can cause hour-long outages on a predictable schedule, which is one of the most confusing failure patterns to diagnose under pressure.

Quick Wins
  • Enable diagnostic logging on Day 1, you cannot retroactively pull logs from before the diagnostic setting was created
  • Store your VPN shared key in Azure Key Vault and reference it from your IaC templates, eliminates copy-paste key mismatches during deployments
  • Never apply NSGs to the GatewaySubnet, this is unsupported and causes intermittent, hard-to-diagnose failures
  • Pin your on-premises VPN device firmware version in change management, firmware updates silently change default crypto algorithm selections and break existing tunnels

Frequently Asked Questions

My Azure VPN Gateway shows "Connected" but I can't ping anything on-premises, what's wrong?

This almost always means the tunnel is up but routing isn't working. Start by checking the Local Network Gateway's address space, it needs to list every on-premises subnet you want to reach. Then run Get-AzEffectiveRouteTable on your Azure VM's NIC and confirm there's a VirtualNetworkGateway route for your on-premises subnets. Also check whether your on-premises VPN device has a route pointing Azure's VNet CIDR toward the tunnel interface, both sides need correct routing, not just Azure's side. Finally, check the on-premises firewall isn't blocking ICMP from Azure source IPs.

Why does my Azure VPN tunnel drop every 8 hours like clockwork?

Eight hours is 28,800 seconds, the default IKE Phase 1 (ISAKMP) SA lifetime. When this SA expires, both sides need to renegotiate. If one side (usually the on-premises device) doesn't initiate rekey and just waits, the SA expires without renewal and the tunnel drops. Fix this by ensuring your on-premises device's Phase 1 lifetime matches Azure's and that it's configured to initiate rekey before expiration (typically at 80% of the lifetime). You can also increase the Phase 1 lifetime in your Azure custom IPsec policy to reduce rekey frequency, valid values are 300 to 172,800 seconds.

How long does it take to reset an Azure VPN Gateway and will it affect other connections?

A gateway reset via Reset-AzVirtualNetworkGateway takes approximately 1–2 minutes to complete, but the tunnel may take an additional 3–5 minutes to re-establish after the gateway comes back. For active-passive gateways, there's a brief outage during the reset. For active-active gateways, Azure resets each instance one at a time, so if your on-premises device supports both tunnel endpoints, traffic can continue on the second instance during the reset, though this requires your on-premises device to be properly configured for active-active. All connections on the gateway are affected by the reset, not just the one you're troubleshooting.

Can I use the Azure VPN Gateway with a device that's behind NAT?

Yes, but your on-premises VPN device must support NAT-T (NAT Traversal) per RFC 3947. NAT-T encapsulates IPsec ESP packets inside UDP port 4500, which can traverse NAT. Make sure UDP ports 500 and 4500 are forwarded to your VPN device on the NAT router, and that IKEv2 with NAT-T is enabled on your device. Azure VPN Gateway automatically detects NAT and switches to NAT-T mode during IKE negotiation. Note that policy-based VPN gateways have very limited NAT-T support, route-based gateways handle this much better.

What's the difference between a route-based and policy-based Azure VPN Gateway and does it matter for troubleshooting?

It matters a lot. Policy-based gateways use static routing with specific traffic selectors, they only work with IKEv1, don't support BGP, active-active mode, or point-to-site VPN, and are limited to one site-to-site connection. If you're troubleshooting a policy-based gateway and seeing "No proposal chosen" errors, it's often because the traffic selector (the combination of local and remote subnet pairs) on Azure doesn't exactly match what your on-premises device is proposing. Route-based gateways use IKEv2 with a wildcard traffic selector (0.0.0.0/0 ↔ 0.0.0.0/0) and are far more flexible. For any new deployment, always use route-based.

Azure VPN Gateway diagnostic logs aren't showing up in Log Analytics, what should I check?

First, confirm that the diagnostic setting on the gateway is actually saved and lists the correct Log Analytics workspace, it's easy to navigate away before saving. Logs take 5–15 minutes to appear after the setting is created. Query the AzureDiagnostics table and filter by ResourceType == "VIRTUALNETWORKGATEWAYS", note that some categories like IKEDiagnosticLog only populate when there's active IKE negotiation happening, so trigger a connection attempt while checking. Also verify your Log Analytics workspace isn't in a different subscription from your gateway, which can cause silent ingestion failures if cross-subscription diagnostic settings aren't properly configured.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.