Azure VPN Gateway Not Working, Connectivity, Rules, and Routing Fixes
Why Azure VPN Gateway Not Working Is So Hard to Diagnose
I've seen this exact situation play out dozens of times: you set up an Azure VPN Gateway, everything looks green in the portal, and then your developers call saying they can't reach the on-premises SQL Server. Or worse, the tunnel was working fine for months, and it just stopped. No warning. No meaningful error in the portal. Just silence.
Azure VPN Gateway connectivity issues are frustrating precisely because the portal surface is deceiving. A gateway that shows "Connected" status can still be dropping packets, misrouting traffic, or silently failing on specific IP ranges. Microsoft's own documentation acknowledges this directly, VPN Gateway connections can fail for a wide variety of reasons, and the portal gives you almost none of the signal you need to narrow it down.
The root causes I see most often fall into a few categories. First, there's misconfigured IKE/IPsec policies, your on-premises firewall is negotiating one cipher suite, Azure expects another, and the tunnel never completes Phase 2. Second, BGP route exchange failures where your on-premises routes aren't propagating correctly into Azure's route table. Third, gateway maintenance events causing brief disconnections that the tunnel never recovers from automatically. Fourth, point-to-site VPN problems where certificate trust chains break after a certificate rotation. And finally, plain-old throughput bottlenecks that look like connectivity problems but are actually bandwidth saturation on the gateway SKU.
What makes Azure VPN troubleshooting different from your typical on-premises firewall debug session is that you're working across a boundary you don't fully control. You own the on-premises side. Microsoft manages the Azure gateway infrastructure. That makes log correlation absolutely essential, which is exactly why Azure's diagnostic logging system exists, and why most admins aren't using it to its full potential.
The good news: once you know where to look, Azure gives you five distinct diagnostic log tables, GatewayDiagnosticLog, TunnelDiagnosticLog, RouteDiagnosticLog, IKEDiagnosticLog, and P2SDiagnosticLog, and each one tells a different part of the story. I'll walk you through exactly how to use each one, plus the configuration fixes that actually resolve the most common Azure VPN Gateway connection problems.
I know this is frustrating, especially when it's blocking production workloads or remote access for your entire team. Let's fix it. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before you spend an hour chasing logs, try a gateway reset. It sounds blunt, but it's actually the first thing Microsoft's own support team does, and it resolves a surprising number of Azure VPN Gateway not working scenarios, particularly those where the tunnel was healthy and then dropped unexpectedly.
Here's what to do:
- Go to the Azure portal and navigate to Virtual Network Gateways.
- Select your VPN gateway by name.
- In the left-hand blade, scroll down to Support + troubleshooting and click Reset.
- On the Reset page, leave the default option selected (reset the primary instance) and click Reset.
- Wait. This takes 5–15 minutes. The gateway will reboot the active instance. During this window, your tunnels will be down, that's expected and normal.
After the reset completes, check whether your site-to-site or point-to-site connection reconnects automatically. On the on-premises side, you may need to clear your VPN device's SA (Security Association) table and force a re-negotiation. On a Cisco ASA, that's clear crypto ipsec sa peer <Azure-public-IP>. On Palo Alto, you'd go to Network > IPSec Tunnels, select the tunnel, and click Restart.
Why does a reset help? When Azure performs maintenance on a gateway instance, or when a network glitch causes a DPD (Dead Peer Detection) timeout, the active gateway instance can end up in a state where it's technically running but not successfully processing tunnel traffic. The reset reboots the active instance, which forces a clean re-establishment of all tunnels. According to Microsoft's official documentation, this is expected behavior and is one of the recognized troubleshooting approaches.
If the tunnel comes back and stays stable, you're done, but I'd still recommend going through the diagnostic log steps below to understand what caused the original drop, so it doesn't happen again next week.
If the reset doesn't help, or if the tunnel comes back and drops again within minutes, you have a deeper configuration or policy issue. Keep reading.
You can't fix what you can't see. If you're running into Azure VPN Gateway connectivity issues and you don't have diagnostic logs enabled, you're essentially flying blind. This step is non-negotiable, and it takes about 3 minutes to set up.
First, you need a Log Analytics workspace. If you don't have one already:
- In the Azure portal, search for Log Analytics workspaces.
- Click + Create, choose your subscription and resource group, and give it a name like
vpn-diagnostics-workspace. - Select the same region as your VPN gateway, this matters for latency and data residency.
- Click Review + Create, then Create.
Now attach the workspace to your VPN gateway:
- Navigate to your Virtual Network Gateway in the portal.
- In the left blade, under Monitoring, click Diagnostic settings.
- Click + Add diagnostic setting.
- Give it a name. Then check all five log categories: GatewayDiagnosticLog, TunnelDiagnosticLog, RouteDiagnosticLog, IKEDiagnosticLog, and P2SDiagnosticLog.
- Under Destination details, check Send to Log Analytics workspace and select the workspace you just created.
- Click Save.
One important caveat: if you're using a policy-based VPN gateway (as opposed to route-based), only the GatewayDiagnosticLog and RouteDiagnosticLog will be available to you. The IKE and Tunnel logs are route-based gateway features.
New log data starts flowing within a few minutes of the first event. Historical data before you enabled the setting is not available retroactively, another reason to set this up before you have a crisis, not during one.
When it's working correctly, you'll see data appearing in your workspace under Logs. Run AzureDiagnostics | take 10 in the query editor to confirm data is flowing.
The GatewayDiagnosticLog is your audit trail. Every time someone modifies the VPN gateway configuration, or Azure performs a maintenance update, an entry lands here. I've seen more than one outage traced back to a teammate who made a "quick change" to the Local Network Gateway address space at the worst possible time.
Open your Log Analytics workspace, click Logs, and run this query:
AzureDiagnostics
| where Category == "GatewayDiagnosticLog"
| project TimeGenerated, OperationName, Message, Resource, ResourceGroup
| sort by TimeGenerated asc
The columns you care about most:
- TimeGenerated: UTC timestamp of the event. Always convert to your local timezone when correlating with user-reported issues.
- OperationName: This tells you what actually happened. The key values to watch for are
SetGatewayConfiguration(the gateway itself was modified),SetConnectionConfiguration(a connection or Local Network Gateway was changed),HostMaintenanceEvent(Azure did planned maintenance), andGatewayResourceMove(the gateway was moved, which is rare but disruptive). - Message: The detailed result, whether the operation succeeded or failed, and any specifics.
Here's the diagnostic move that saves a ton of time: cross-reference your GatewayDiagnosticLog timestamps with your TunnelDiagnosticLog. If you see a SetGatewayConfiguration event at 14:32 UTC and a TunnelDisconnected event at 14:33 UTC, that's not a coincidence, the configuration change caused the drop. This correlation is exactly what Microsoft's documentation recommends as the first step when investigating tunnel failures during change windows.
Keep in mind: there can be a few minutes of delay between when a change is made and when it appears in the log. If you're looking for a change that happened "just now," give it 5–10 minutes before concluding nothing was changed.
If the GatewayDiagnosticLog shows no configuration changes around the time of the outage, move to the TunnelDiagnosticLog, the issue is likely infrastructure-side or IPsec-related.
The TunnelDiagnosticLog is the most useful table for understanding Azure VPN tunnel disconnection problems. It's lightweight, meaning you can query across days or even weeks of data quickly, and it tells you exactly when tunnels went up and down, from which gateway instance, and to which remote IP.
Run this query to see your tunnel history:
AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| project TimeGenerated, OperationName, remoteIP_s, instance_s, Resource, ResourceGroup
| sort by TimeGenerated asc
If you have multiple tunnels (multiple on-premises sites), filter by the specific remote IP to reduce noise:
AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| where remoteIP_s == "203.0.113.45"
| project TimeGenerated, OperationName, remoteIP_s, instance_s, Resource, ResourceGroup
| sort by TimeGenerated asc
Here's how to interpret what you see in the instance_s column. Your VPN gateway has two instances: GatewayTenantWorker_IN_0 and GatewayTenantWorker_IN_1. These are the two redundant gateway instances that Azure runs for you automatically.
Three patterns to recognize:
- Disconnect on IN_0, reconnect on IN_1 within a few seconds: Gateway failover. Azure was doing maintenance on one instance and traffic moved to the other. This is normal and expected, but if your on-premises VPN device takes too long to detect the failover and reconnect, you get an extended outage. The fix here is tuning your DPD settings on the on-premises device.
- Disconnect and reconnect on the same instance within a few seconds: Network glitch or a DPD timeout triggered by the on-premises device. Check whether the on-premises device sent a delete notification or if the tunnel just timed out.
- Disconnect with no reconnection: The tunnel came down and never came back up. This points to an IKE negotiation failure on reconnect, move to the IKEDiagnosticLog next.
When you find the timestamps of disconnection events here, note them down. You'll use them to query the IKEDiagnosticLog at exactly those moments for the detailed root cause.
This is where the real diagnostic horsepower lives. The IKEDiagnosticLog captures detailed IPsec logging, Phase 1 and Phase 2 IKE negotiations, proposed and accepted cipher suites, SA lifetimes, and failure reasons. If your Azure VPN Gateway tunnel keeps dropping or won't connect at all, this log will tell you exactly why.
Query the IKE log around the timestamp you found in TunnelDiagnosticLog:
AzureDiagnostics
| where Category == "IKEDiagnosticLog"
| where TimeGenerated between (datetime("2026-04-20T14:30:00Z") .. datetime("2026-04-20T14:45:00Z"))
| project TimeGenerated, OperationName, Message, Resource
| sort by TimeGenerated asc
The most common failure messages you'll encounter and what they mean:
- "No proposal chosen", This is the #1 cause of persistent Azure site-to-site VPN connection failures. Your on-premises device and Azure are proposing incompatible encryption algorithms. Azure's default IKEv2 policy supports AES-256, SHA-256, DH Group 2 for Phase 1, and AES-256, SHA-256, PFS Group 2 for Phase 2. If your firewall is proposing something different, say 3DES or MD5, the negotiation fails here, every time.
- "Authentication failed", In certificate-based setups, this usually means a cert mismatch. In PSK (pre-shared key) setups, it means the keys on both sides don't match. Go back to your Local Network Gateway in Azure and verify the shared key under Connections > [your connection] > Shared key, then compare it byte-for-byte with what's configured on the on-premises device.
- "IKE SA deleted" followed quickly by a new negotiation attempt, This is normal rekey behavior. If you see it happening every few minutes, it suggests an SA lifetime mismatch. Azure defaults to 27,000 seconds for Phase 1 and 27,000 seconds for Phase 2. If your on-premises device is set to a much shorter value, you'll see constant renegotiation.
To fix an IKE policy mismatch, navigate in the portal to your VPN connection, then under Settings find Configuration. You can enable a custom IPsec/IKE policy here and specify exact values to match your on-premises device. This is far more reliable than trying to get your firewall vendor to match Azure's defaults.
After any IKE policy change, reset the VPN connection (not the whole gateway) by going to the connection and clicking Reset in the connection overview blade. Give it 2–3 minutes to re-establish.
If your tunnel is up, you can confirm this in TunnelDiagnosticLog, but traffic still isn't flowing between specific subnets, the problem is almost always routing. Either Azure isn't learning your on-premises routes, or your on-premises device isn't learning Azure's address space, or there's a route conflict somewhere in the middle.
The RouteDiagnosticLog traces all routing activity: static route additions and removals, BGP session events, and route updates. Run this query:
AzureDiagnostics
| where Category == "RouteDiagnosticLog"
| project TimeGenerated, OperationName, Message, Resource, ResourceGroup
| sort by TimeGenerated asc
The OperationName values tell you what's happening:
BgpConnectedEvent/BgpDisconnectedEvent: Your BGP peer session is coming up and going down. Frequent flapping here points to a BGP timer mismatch or an unreachable BGP peer IP.BgpRouteUpdate: Routes are being advertised or withdrawn. Check the Message field for which prefixes are being exchanged, if your on-premises192.168.10.0/24isn't showing up here, Azure won't know how to route to it.StaticRouteUpdate: A static route was added or removed on the gateway. If you're using BGP, unexpected StaticRouteUpdate events can indicate someone manually added a static route that's conflicting with your BGP-learned routes.
Three routing fixes that resolve the majority of Azure VPN routing problems:
- Verify the Local Network Gateway address space: Navigate to Local Network Gateways, select yours, and check Address space. Every on-premises subnet you want to reach must be listed here. Missing a subnet? Add it and save, the tunnel will briefly renegotiate, then traffic to that subnet will start flowing.
- Check for overlapping address spaces: Azure Virtual Networks cannot overlap with each other or with your on-premises ranges. If your Azure VNet uses
10.1.0.0/16and your on-premises network also uses10.1.0.0/16, routing will be completely broken and Azure won't tell you why in any obvious error. Use the Effective Routes blade on a VM's network interface to see what routes Azure is actually using. - BGP peer IP reachability: If you're running BGP (which I strongly recommend over static routes for any production Azure VPN setup), the BGP peer IPs on both sides must be reachable through the tunnel. Azure's BGP peer IP for a VPN gateway is visible in the gateway's Configuration blade under BGP settings. Make sure your on-premises BGP session is targeting that IP, not the gateway's public IP.
After any Local Network Gateway change or BGP configuration adjustment, allow 5–10 minutes for routes to propagate before testing again. Use the Effective Routes feature on a test VM in Azure to confirm the on-premises routes are visible before you do any ping testing.
Advanced Troubleshooting for Azure VPN Gateway
If the five steps above didn't get your Azure VPN Gateway working, you're dealing with something less common, but these scenarios do come up, especially in enterprise and domain-joined environments.
Validating VPN Throughput
Sometimes what looks like an Azure VPN Gateway not working problem is really a throughput issue. Users report "the VPN is broken" but what they mean is it's painfully slow. Azure explicitly supports throughput validation, and you should run it before declaring the gateway itself faulty.
The recommended tool is iPerf3. Install it on both an Azure VM and an on-premises machine, then run:
# On the Azure VM (server mode)
iperf3 -s
# On the on-premises machine (client mode)
iperf3 -c <azure-vm-private-ip> -t 30 -P 8
Compare the result against your gateway SKU's advertised throughput. A VpnGw1 supports up to 650 Mbps aggregate. If you're pushing close to that ceiling, you'll need to upgrade to VpnGw2 (1 Gbps) or higher. Upgrades can be done in-place from the gateway's Configuration blade, no tunnel downtime required, though there's a brief interruption during the SKU change.
Third-Party VPN Device Compatibility
If you're using a VPN device from Cisco, Palo Alto, Fortinet, Check Point, or another vendor, compatibility quirks account for a significant portion of Azure VPN Gateway connectivity issues. Microsoft maintains a list of validated VPN devices, if yours isn't on that list, technical support must come from the device vendor, not Microsoft. For validated devices, Microsoft provides exact configuration templates. Find them by navigating to your Virtual Network Gateway in the portal, then going to Connections > Download VPN device script. Select your device vendor and model to get a pre-built configuration file. This alone has resolved mismatches for me in cases where manual configuration had small errors.
Point-to-Site Certificate Troubleshooting
P2SDiagnosticLog is your friend for Azure point-to-site VPN problems. Run this:
AzureDiagnostics
| where Category == "P2SDiagnosticLog"
| project TimeGenerated, OperationName, Message, Resource
| sort by TimeGenerated desc
| take 50
The most common point-to-site failure is a certificate that's been revoked or whose root certificate has been removed from the gateway. In the portal, go to your VPN gateway, then Point-to-site configuration, and check the Root certificates section. If the root CA cert used to sign your client certs isn't listed there, upload it again. It must be in Base64-encoded X.509 format. After uploading, clients will need to re-download the VPN client package, the existing one won't pick up the change automatically.
Event Viewer on Windows VPN Clients
For point-to-site clients running Windows, don't ignore the local Event Viewer. Navigate to Event Viewer > Applications and Services Logs > Microsoft > Windows > RasClient > Operational. Event ID 20227 means the connection attempt failed, the Message field there tells you whether it's a credential issue, a certificate error, or a network-level block. Event ID 20271 is a successful connection. Seeing 20227 repeatedly with error code 789 specifically means an L2TP/IPsec configuration mismatch, check whether the gateway and client are both configured for the same VPN protocol (IKEv2 vs SSTP vs OpenVPN).
If you've gone through all the diagnostic logs, confirmed the configuration is correct on both sides, and the gateway is still not establishing tunnels, especially if GatewayDiagnosticLog shows HostMaintenanceEvent entries or GatewayTenantPrimaryChanged events happening repeatedly, you may be dealing with an infrastructure-side issue on Azure's end. Open a support ticket at Microsoft Support. When you do, have your Log Analytics workspace ready and share the workspace ID, Microsoft support can query it directly, which dramatically speeds up their investigation. Severity A tickets (production down) are answered within 15 minutes on Business and Premier support plans.
Prevention & Best Practices for Azure VPN Gateway
After you've fixed an Azure VPN Gateway not working situation, the next thing you want to do is make sure it doesn't happen again, or at minimum, make sure you catch it faster next time. These are the practices I'd put in place on every production Azure VPN deployment.
Set Up Alerts on Tunnel Disconnect Events
Don't wait for users to tell you the VPN is down. In Azure Monitor, create an alert rule on your Log Analytics workspace that fires whenever a TunnelDisconnected event appears in the TunnelDiagnosticLog. Navigate to Monitor > Alerts > + Create > Alert rule, set the signal type to Custom log search, and paste this condition query:
AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| where OperationName == "TunnelDisconnected"
Set the threshold to 1 event, evaluation frequency to 5 minutes, and route the action to an email or PagerDuty webhook. Now you'll know about tunnel drops before your users do.
Use BGP Instead of Static Routes
Static routes are fine for simple, single-site setups. But the moment you add a second on-premises location, a second ExpressRoute, or start changing on-premises subnets, static routes become a maintenance nightmare. BGP automatically propagates route changes and handles failover between redundant gateways far more gracefully. Enable BGP when creating the gateway, it can't be added after the fact without recreating the gateway.
Document Your IKE Policy Settings
The single most common cause of Azure VPN Gateway connectivity issues I see in enterprise environments is an undocumented IKE policy change on the on-premises firewall. Someone upgrades the firmware, the defaults change, and suddenly Azure stops understanding the Phase 1 proposals. Keep a current record of your exact IKE Phase 1 and Phase 2 policy settings, algorithm, key length, DH group, SA lifetime, for both sides. Review it whenever you do a firmware upgrade on your VPN device.
Test Failover Behavior Regularly
Azure VPN Gateways have two instances for redundancy, but that redundancy only helps if your on-premises device handles the failover correctly. Once a quarter, deliberately trigger a gateway reset (in non-production hours) and time how long it takes for the tunnel to re-establish. If it takes more than 60 seconds, your DPD settings or BGP timers need tuning. Knowing your actual failover time in advance means no surprises during a real maintenance window.
- Enable all five diagnostic log categories on every VPN gateway, not just when you have a problem
- Create a Monitor alert on TunnelDisconnected events so you're notified within 5 minutes of any tunnel drop
- Upgrade to at least the VpnGw2 SKU for any production workload where more than 10 concurrent users access the tunnel
- Use Azure's Download VPN device script feature to get vendor-specific configuration templates, stop hand-writing IKE configs
Frequently Asked Questions
My Azure VPN Gateway says "Connected" but I still can't ping anything on-premises, what's going on?
A "Connected" status in the portal only means the IKE Phase 1 and Phase 2 negotiations succeeded, the tunnel is up at the crypto layer. It doesn't mean traffic is actually being routed correctly through it. The most likely culprits are: (1) the destination subnet isn't listed in your Local Network Gateway's address space, so Azure doesn't know to send that traffic into the tunnel; (2) there's a Network Security Group (NSG) on the Azure VM's subnet or NIC that's blocking ICMP; or (3) the on-premises firewall is blocking return traffic from Azure's address space. Start with the Effective Routes blade on the Azure VM's NIC, if the on-premises route doesn't appear there, the Local Network Gateway address space is the problem.
How often should my Azure VPN tunnel disconnect for renegotiation, is it normal for it to drop every few hours?
Some disconnections are normal and expected. IPsec SAs have lifetimes, and when they expire, the tunnel renegotiates, which causes a very brief (sub-second) interruption that most applications never notice. Azure's default SA lifetime is 27,000 seconds (7.5 hours) for both Phase 1 and Phase 2. If your tunnel is disconnecting more frequently than that, or if it's staying down for minutes rather than seconds during renegotiation, something is wrong. Check IKEDiagnosticLog around the disconnect times, you'll usually see a "no proposal chosen" or authentication failure that explains why the renegotiation is failing and forcing a longer outage.
I'm seeing GatewayTenantPrimaryChanged in my GatewayDiagnosticLog, should I be worried?
Not necessarily, this event means Azure promoted the secondary gateway instance to primary, which is part of normal failover behavior during maintenance. It's the same infrastructure event that causes the "disconnect on IN_0, reconnect on IN_1" pattern you'll see in TunnelDiagnosticLog. Where it becomes a concern is if you see this event happening frequently, multiple times per week, without corresponding HostMaintenanceEvent entries. Frequent, unexplained primary changes can indicate an underlying infrastructure issue in your Azure region, and that's a case worth raising with Microsoft Support.
Can I use the same Azure VPN Gateway for both site-to-site and point-to-site connections at the same time?
Yes, absolutely, Azure VPN Gateways support both connection types simultaneously on the same gateway. The gateway handles each connection type independently, so a point-to-site client certificate issue won't affect your site-to-site tunnel and vice versa. The one constraint to watch is total bandwidth: all active connections share the gateway SKU's aggregate throughput limit. If you have heavy site-to-site traffic and many concurrent point-to-site users, you may hit the ceiling on a VpnGw1 (650 Mbps aggregate). Use the iPerf3 throughput test to measure your actual utilization before deciding whether to upgrade.
After I reset my VPN Gateway, how long does it take for the tunnel to reconnect?
The gateway reset itself takes roughly 5–15 minutes depending on the SKU. Once the gateway comes back online, the tunnel reconnection depends on which side initiates, typically your on-premises VPN device should detect the gateway is back and initiate a new IKE exchange automatically. This should complete within 60–90 seconds under normal conditions. If the tunnel doesn't reconnect within 5 minutes of the gateway completing its reset, manually clear the IKE/IPsec SAs on your on-premises device and force a re-initiation. If it still won't connect, the IKEDiagnosticLog will show you exactly what's failing in the new negotiation attempt.
My RouteDiagnosticLog shows BgpDisconnectedEvent every day at the same time, what would cause that?
A BGP session dropping at a consistent time every day is almost always caused by either a scheduled maintenance task on the on-premises side (a firewall policy push, a routing daemon restart, or a daily backup that temporarily spikes CPU on the VPN device), or a BGP keepalive timeout due to a brief latency spike at that time. Check whether the on-premises device has any scheduled jobs around that time. Also look at the BGP hold timer settings, if your on-premises device is configured with a hold timer of 90 seconds and the link has 1–2 second latency spikes, three missed keepalives will drop the session. Azure's default BGP timer is 65 seconds for keepalive and 180 seconds for hold time; match these on your on-premises device to improve stability.