How to Fix Azure VPN Gateway Issues (2026)

Microsoft Fix Intermediate 18 min read Official Docs Grounded Updated April 20, 2026

What's in This Guide

Why This Happens
The Quick Fix
Step-by-Step Solution
Advanced Troubleshooting
Prevention & Best Practices
FAQ

Why Your Azure VPN Gateway Is Failing

I've seen this situation play out dozens of times. An engineer spends two hours configuring an Azure VPN Gateway, hits "Deploy," waits 30–45 minutes for the provisioning to complete , and then the tunnel just won't come up. The Azure portal shows a green checkmark on the gateway itself, but the connection status reads "Unknown" or "Not Connected." No helpful error. No obvious next step. Just silence.

That's the thing about Azure VPN Gateway problems , they rarely give you a clean error message. You're left piecing together clues from the connection diagnostics, the on-premises VPN device logs, and your gateway configuration. It's genuinely frustrating, especially when this is blocking a production migration or a hybrid connectivity rollout your team has been planning for weeks.

So what actually goes wrong? The most common root causes I see in enterprise environments fall into a handful of categories:

Wrong SKU for the workload. This is the silent killer. Someone picks the Basic SKU because it sounds fine for a "small" deployment, not realizing it caps you at 10 Site-to-Site tunnels, doesn't support BGP, and has no zone redundancy at all. Later, when the business needs more tunnels or higher availability, you're stuck doing a full gateway replacement, which means downtime. The official Azure docs are explicit about this: the Basic SKU is dev-test only, not production.

Mismatched IKE/IPsec parameters. Azure VPN Gateway supports IKEv2 and OpenVPN for Point-to-Site, and a range of IKE Phase 1 and Phase 2 policies for Site-to-Site. If your on-premises firewall is sending a proposal Azure doesn't agree with, different encryption algorithm, different Diffie-Hellman group, different lifetime, the tunnel negotiation fails silently. Phase 1 might complete and Phase 2 dies, or vice versa.

Address space overlap. Azure flat-out refuses to route traffic if your on-premises network and your Azure VNet share overlapping CIDR ranges. This is a configuration error that the portal sometimes warns about and sometimes doesn't, depending on where in the workflow you hit it.

Generation confusion. Azure VPN Gateway now has two generations, Generation 1 and Generation 2. They're not interchangeable. A Generation 2 VpnGw2 has meaningfully different throughput characteristics than a Generation 1 VpnGw2 (1.25 Gbps vs. 1 Gbps aggregate), and some features only exist on Generation 2. Picking the wrong generation can leave you undersized with no clear upgrade path.

Availability zone deployment gaps. Organizations that need zone-redundant gateways sometimes deploy standard (non-AZ) SKUs without realizing the distinction. A VpnGw3 and a VpnGw3AZ have the same throughput on paper, but only the AZ variant protects you from a datacenter-level failure within the region.

The good news: almost every one of these issues is fixable. Let's work through them. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before diving into deep configuration analysis, run through this sequence. It resolves roughly 60% of Azure VPN Gateway connection failures I see, and it takes under ten minutes.

Open the Azure portal and navigate to your Virtual Network Gateway resource. In the left menu under Monitoring, click Connection troubleshoot. Select the connection that's failing, pick a destination IP on your on-premises network, and click Check. Azure will run a diagnostic sequence and tell you exactly where in the handshake things are breaking down. This single tool has saved me hours of log-diving.

If the troubleshooter reports an IKE policy mismatch, go to your connection resource (not the gateway, the connection object), open Settings > Configuration, and check whether you're using a default or custom IPsec/IKE policy. If it's custom, compare every field against what your on-premises device is advertising. The most common mismatches I find: DH Group (Azure defaults to DHGroup2 for Gen1, DHGroup14+ for Gen2 recommended), and PFS Group settings that don't align.

If the troubleshooter reports the gateway is unreachable entirely, check your Gateway Subnet. Your VNet must have a subnet named exactly GatewaySubnet, no variations, no caps differences. It needs to be at minimum a /29, but Microsoft strongly recommends /27 or larger. If you deployed a /29 and you're now trying to add a second gateway for active-active, you'll run out of IPs and the deployment will fail with a cryptic error about insufficient subnet space.

Finally, and this sounds basic, but I've seen it trip up experienced Azure architects, confirm your Local Network Gateway has the correct on-premises public IP address and the correct address space for your on-premises network. Any drift between what Azure expects and what's actually being sent from your VPN device will kill the tunnel.

Pro Tip

The Connection Troubleshoot tool in the Azure portal captures a 5-minute packet trace on the gateway side automatically. You can download these captures under Monitoring > Network Watcher and open them in Wireshark to see the exact IKE exchange, including which side is rejecting the proposal and why. Most engineers don't know this feature exists and spend hours staring at on-premises firewall logs instead.

Verify Your Gateway SKU Matches Your Requirements

This is the step most guides skip, and it's the one that bites you hardest later. Open the Azure portal, go to your Virtual Network Gateway, and note the SKU shown under Overview. Now compare it against what you actually need.

Here's how to think about it. If you're on the Basic SKU, stop. That SKU supports a maximum of 10 Site-to-Site tunnels, 128 Point-to-Site SSTP connections only (no IKEv2, no OpenVPN), and delivers roughly 100 Mbps aggregate throughput. It doesn't support BGP. It's not zone-redundant. Microsoft positions it explicitly as a dev-test SKU. If this is a production environment, you need to replace it.

For most production workloads, you want at minimum a Generation 2 VpnGw2. That gives you 30 S2S tunnels, up to 500 Point-to-Site connections (IKEv2 and OpenVPN both supported), 1.25 Gbps aggregate throughput, and BGP support. If you need zone redundancy, meaning your gateway survives a full availability zone failure, add the AZ suffix: VpnGw2AZ.

Need more than 100 S2S VPN tunnels? Per the official documentation, you should not be using VPN Gateway at all at that scale, the right answer is Azure Virtual WAN, which is architected for that many connections.

To resize a gateway SKU (within the same generation), go to Virtual Network Gateway > Configuration > SKU and select the new size. This triggers a redeployment and will cause a 20–30 minute interruption. Plan maintenance windows accordingly. Note: you cannot resize from Generation 1 to Generation 2, that requires deleting and recreating the gateway entirely.

# Check current gateway SKU via Azure CLI
az network vnet-gateway show \
  --name <YourGatewayName> \
  --resource-group <YourRG> \
  --query "sku" \
  --output table

If the command returns Basic, start planning your migration to a production SKU now.

Audit Your GatewaySubnet Configuration

The GatewaySubnet is where Azure deploys the actual gateway VMs, and it has stricter rules than any other subnet in your VNet. Getting this wrong causes deployment failures that present as generic "resource creation failed" errors with no useful detail.

In the Azure portal, go to your Virtual Network > Subnets and find the GatewaySubnet entry. Confirm two things: the name is exactly GatewaySubnet (case-sensitive, no spaces, no modifications), and the address range is appropriately sized.

A /29 gives you 8 IP addresses, of which Azure reserves 5, leaving you 3 usable. That's technically the minimum, but it's genuinely too small for anything but a basic single-gateway setup. Microsoft's recommendation, and mine, is /27 or larger. Here's why: active-active gateways require two gateway VM instances, each consuming an IP. Future features and scaling operations also consume IPs. A /27 gives you 32 addresses with 27 usable and leaves you room to grow without a destructive subnet resize.

You also cannot attach a Network Security Group (NSG) directly to the GatewaySubnet. I know that sounds counterintuitive from a security standpoint, but it's a hard platform requirement, adding an NSG here will break gateway connectivity silently in ways that are very hard to diagnose.

# Verify GatewaySubnet address prefix via CLI
az network vnet subnet show \
  --vnet-name <YourVNetName> \
  --resource-group <YourRG> \
  --name GatewaySubnet \
  --query "{name:name, prefix:addressPrefix, nsg:networkSecurityGroup}" \
  --output json

If nsg returns anything other than null, remove it immediately. Go to Subnets > GatewaySubnet > Network security group, set it to None, and save. Then reset your gateway connection and retest.

Fix IKE and IPsec Policy Mismatches

This is the most technically involved step, and it's the one where engineers spend the most time. IKE policy mismatches cause the tunnel to fail during Phase 1 or Phase 2 negotiation, and the Azure portal gives you almost nothing to go on beyond "Connection failed."

Navigate to your Connection resource (not the gateway, find it under Connections in the portal). Go to Settings > Configuration. Under IPsec/IKE policy, switch from Default to Custom so you can explicitly control the parameters.

For Generation 2 gateways, Microsoft recommends these IKE Phase 1 settings as a starting baseline for strong security with broad device compatibility:

IKE Phase 1 (Main Mode):
  Encryption:     AES256
  Integrity:      SHA256
  DH Group:       DHGroup14

IKE Phase 2 (Quick Mode / IPsec):
  Encryption:     AES256
  Integrity:      SHA256
  PFS Group:      PFS14
  SA Lifetime:    27000 seconds / 102400000 KB

Now compare these against your on-premises VPN device configuration. They must match exactly, Azure does not negotiate down automatically when using a custom policy. If your firewall is a Cisco ASA, Palo Alto, Fortinet, or Checkpoint, each has its own terminology for these parameters. The DH Group called "Group 14" in Azure maps to "group14" in Cisco IOS and "dh-group 14" in Fortinet.

After saving a custom policy change, the connection will reset. Wait 60–90 seconds and then check the connection status. If it moves to Connected, you found your issue. If it still fails, enable VPN Gateway diagnostic logs (under Monitoring > Diagnostic settings) and capture the IKE logs for the next connection attempt, they'll show the exact proposal exchange.

Resolve Address Space Overlaps and BGP Configuration Errors

Address space overlap is a silent issue. Azure won't always warn you at creation time, sometimes it surfaces only when traffic fails to route correctly and you're left wondering why packets that should traverse the VPN tunnel are going nowhere.

The rule is simple: your Azure VNet address space and your on-premises network address space (configured in the Local Network Gateway) cannot overlap. Not even partially. A 10.0.0.0/16 Azure VNet and a 10.0.50.0/24 on-premises network will conflict because Azure sees the 10.0.0.0/16 as local and won't route 10.0.50.x traffic out the VPN tunnel.

To check this: open your Local Network Gateway resource and review the IP address ranges listed under Configuration. Then open your Virtual Network and check its address spaces. If there's any overlap, you need to either re-IP one side or implement NAT on the gateway (supported on VpnGw2 and above).

For BGP issues specifically, BGP is not supported on the Basic SKU at all. On supported SKUs, enable it on both the Virtual Network Gateway and the Local Network Gateway. Each side needs a unique BGP ASN. Microsoft uses ASN 65515 by default for Azure gateways; your on-premises device should use a different private ASN (65000–65535 range).

# Enable BGP on an existing Local Network Gateway
az network local-gateway update \
  --name <YourLocalGWName> \
  --resource-group <YourRG> \
  --asn 65010 \
  --bgp-peering-address <OnPremBGPPeerIP>

After enabling BGP on both sides and recreating the connection with BGP enabled, verify peering is established. A connected BGP session shows the learned routes automatically, you no longer need to maintain static address prefixes in the Local Network Gateway, which is a major operational win for complex environments.

Enable Zone Redundancy for Production High Availability

If you haven't deployed your Azure VPN Gateway in an Availability Zone, you're accepting a risk you probably don't know you're accepting. A standard (non-AZ) gateway deploys into a single availability zone, if that zone has an infrastructure event, your VPN connectivity goes down. For hybrid workloads that back business-critical systems, that's unacceptable.

Zone-redundant gateways, the SKUs ending in AZ (VpnGw1AZ through VpnGw5AZ for both generations), deploy gateway instances across multiple availability zones within a region. This means a zone-level failure leaves your tunnel intact. Per the official documentation, this physically and logically separates the gateway infrastructure while maintaining connectivity from on-premises to Azure.

There's an important requirement for AZ gateways that catches people off guard: they require a Standard SKU public IP address (not Basic). Basic SKU public IPs don't support zone redundancy and cannot be used with AZ gateway SKUs. If you try to associate a Basic public IP with a VpnGw2AZ, the deployment will fail.

# Create a Standard SKU, zone-redundant public IP for an AZ gateway
az network public-ip create \
  --name <GatewayPIPName> \
  --resource-group <YourRG> \
  --sku Standard \
  --zone 1 2 3 \
  --allocation-method Static

Note that migrating from a non-AZ to an AZ SKU requires deleting and recreating the gateway, there's no in-place upgrade path. Plan for 45–60 minutes of downtime and coordinate with your networking team. The connection resources (Local Network Gateways, Connection objects) can be preserved and reattached to the new gateway, which reduces the reconfiguration burden significantly.

After the AZ gateway is up, verify its zone assignment in the Azure portal under Virtual Network Gateway > Overview. You should see availability zones listed, if the field is blank or says "None," something went wrong during deployment.

Advanced Troubleshooting

For engineers dealing with persistent issues that the standard steps haven't resolved, especially in enterprise, domain-joined, or multi-hub environments, here's where to look next.

Azure Monitor and Diagnostic Logs

The most underused troubleshooting tool for Azure VPN Gateway is its diagnostic log stream. Go to your Virtual Network Gateway, select Monitoring > Diagnostic settings, click Add diagnostic setting, and enable at minimum these log categories: GatewayDiagnosticLog, TunnelDiagnosticLog, RouteDiagnosticLog, and IKEDiagnosticLog. Route them to a Log Analytics workspace.

Once logs are flowing, run this KQL query to see IKE negotiation failures in real time:

AzureDiagnostics
| where ResourceType == "VIRTUALNETWORKGATEWAYS"
| where Category == "IKEDiagnosticLog"
| where Message contains "ERROR" or Message contains "FAILED"
| project TimeGenerated, Message, remoteIP_s
| order by TimeGenerated desc
| take 50

This query surfaces the exact IKE error strings, things like NO_PROPOSAL_CHOSEN (policy mismatch), INVALID_ID_INFORMATION (PSK or certificate issue), or TS_UNACCEPTABLE (traffic selector mismatch). Each of these maps to a specific configuration fix.

Forced Tunneling and Route Table Conflicts

If you've implemented forced tunneling (routing all internet-bound traffic from your VNet back through the VPN to on-premises), verify that your User Defined Routes (UDRs) aren't sending gateway management traffic down the tunnel too. Azure's gateway infrastructure needs to reach Azure management endpoints directly. A UDR on the GatewaySubnet with a 0.0.0.0/0 next-hop of "Virtual Appliance" is a common misconfiguration that causes the gateway to become unresponsive.

Active-Active Gateway Configuration

Active-active gateways provide higher availability by running two gateway instances simultaneously, each with its own public IP. Pricing for active-active is the same as active-passive per the official Azure documentation, so there's no cost reason not to use it for production. What you do need: two public IPs (both Standard SKU for AZ gateways), and your on-premises device must support BGP to properly handle the dual-tunnel setup. Without BGP, active-active becomes complicated to manage because you're manually maintaining which tunnel carries which traffic.

P2S Connection Failures

Point-to-Site connections failing for remote users usually trace back to one of three things: the VPN client configuration package is stale (regenerate and redistribute after any gateway certificate change), the root certificate has expired or been removed from the gateway configuration, or the client's split tunneling configuration is conflicting with corporate proxy settings. For OpenVPN-based P2S connections, the Azure VPN Client app on Windows logs detailed errors at %APPDATA%\Microsoft\AzureVpn\Logs.

When to Call Microsoft Support

If your gateway is showing a "Provisioning Failed" state that persists after a redeploy, if you're seeing packet loss on an established tunnel that diagnostic tools can't explain, or if a gateway SKU resize has left the resource in a stuck "Updating" state for more than two hours, stop troubleshooting and open a support ticket. These are platform-level issues that require backend access. Go directly to Microsoft Support, file a Severity B or A ticket depending on business impact, and have your gateway Resource ID and subscription ID ready. The Azure VPN Gateway team can pull internal telemetry that isn't exposed through any portal or CLI.

Prevention & Best Practices

Once you've fixed the immediate issue, here's how to make sure you don't find yourself back here in six months. These aren't generic best practices, they're the specific things I see mature Azure networking teams do differently from teams that are constantly firefighting VPN issues.

Get your SKU right at deployment time. Changing a gateway SKU mid-life is disruptive. Take 30 minutes before you deploy to model your tunnel count, P2S user count, throughput requirements, and HA needs against the SKU table. Generation 2 is almost always the right choice for new deployments in 2026. For anything production, use at minimum VpnGw2 (Gen2). If you're in a region with Availability Zones and need HA, use a VpnGw2AZ or higher.

Use BGP wherever possible. Static routing for VPN tunnels is operationally painful. Every on-premises subnet change requires a manual update to your Local Network Gateway. BGP eliminates that entirely, routes propagate automatically as your network changes. It's supported on all non-Basic SKUs.

Monitor proactively with Azure Monitor alerts. Set up metric alerts on your VPN Gateway for tunnel egress bytes (alert if it drops to zero unexpectedly), gateway health (alert on any unhealthy state), and P2S connection count if you're running remote access. A 5-minute alerting window catches outages before users do.

Test your failover. If you've deployed an active-active or zone-redundant gateway, actually test what happens when one instance fails. Azure doesn't give you a button to simulate this in production, but you can validate the failover behavior in a staging VNet. Knowing your RTO is measured in seconds (for active-active with BGP) versus minutes (for active-passive) is information worth having before an incident.

Quick Wins

Enable diagnostic logs on Day 1, retroactive log collection isn't possible and you'll want them when something breaks
Size your GatewaySubnet to /27 from the start, resizing later is disruptive and sometimes impossible without recreating the VNet
Document your IKE policy settings in your runbook, when a tunnel drops at 2am, you want those parameters one click away
Set a calendar reminder to audit P2S root certificates 60 days before expiry, expired gateway certificates lock out every remote user instantly

Frequently Asked Questions

Why does my Azure VPN Gateway show "Connected" but traffic still doesn't pass?

A "Connected" status means the IKE tunnel negotiation completed successfully, but traffic can still fail if your traffic selectors don't match what's actually being sent, or if there's an address space overlap. The tunnel is up, but Azure doesn't know where to route the packets. Check your Local Network Gateway address prefixes and confirm they exactly match the subnets on your on-premises side. Also verify there's no NSG or UDR on the GatewaySubnet blocking traffic. Use the Connection Troubleshoot tool in Network Watcher and run a test with a specific source and destination IP to isolate where the packet is dropping.

How long does it actually take to deploy an Azure VPN Gateway?

Plan for 30–45 minutes for a standard gateway deployment, and up to 60 minutes for zone-redundant AZ SKUs. This is not a bug, Azure is provisioning two gateway VM instances, allocating public IPs, configuring internal routing, and performing health checks. There's no way to speed this up. Initiate the deployment and come back. If it's been more than 90 minutes and the status is still "Provisioning," that's when you open a support ticket, it's likely a platform-side issue in your region.

Can I change my Azure VPN Gateway from Generation 1 to Generation 2?

No, you cannot upgrade a gateway between generations in place. Generation 1 and Generation 2 gateways are distinct resource types. To move from Gen1 to Gen2, you need to delete the existing gateway, create a new one with the Gen2 SKU, and reattach your connection resources. Your Local Network Gateway and Connection objects don't need to be deleted, just the Virtual Network Gateway itself. This process causes a connectivity outage of 45–60 minutes. Schedule it during a maintenance window and pre-document your connection settings, shared keys, and IKE policies so you can restore quickly.

What's the difference between VpnGw4 and VpnGw5, is the upgrade worth it?

Both VpnGw4 and VpnGw5 are Generation 2 SKUs with support for up to 100 S2S tunnels and 128 P2S connections. The key differences are throughput and P2S capacity: VpnGw4 delivers 5 Gbps aggregate with up to 5,000 P2S connections, while VpnGw5 doubles that to 10 Gbps and 10,000 P2S connections. VpnGw5 also supports up to 6,700 VMs in the virtual network versus 5,300 for VpnGw4. If you're running large-scale remote access (thousands of P2S users) or pushing multi-gigabit throughput between Azure and on-premises, VpnGw5 makes sense. For most enterprise S2S-only scenarios, VpnGw4 is the right ceiling.

Does active-active Azure VPN Gateway cost more than active-passive?

No, and this surprises a lot of people. Per official Azure documentation, the compute cost of an active-active gateway setup is the same as active-passive. You do pay for two public IP addresses instead of one, but Standard SKU public IP costs are minimal. The only real additional cost is the data transfer from both gateway instances, which in most active-passive-traffic-pattern deployments is nearly identical anyway. Given that active-active with BGP delivers dramatically better failover times (seconds vs. minutes), there's almost no reason not to use it for production workloads on supported SKUs.

My P2S VPN connects fine but internet traffic isn't going through the tunnel, how do I fix that?

This is a split tunneling configuration issue. By default, Azure P2S VPN uses split tunneling, only traffic destined for the Azure VNet address space goes through the tunnel, and all other traffic (including internet) goes directly from the client. If you want all traffic to route through Azure (and then out through your on-premises or Azure firewall), you need to enable forced tunneling for P2S. This requires advertising a default route (0.0.0.0/0) from the gateway to P2S clients, which is configured differently depending on your protocol: for OpenVPN, it's set in the client configuration; for IKEv2, you need to configure a custom route in the gateway settings. After any route change, regenerate and redistribute the VPN client configuration package to all users.

Related Microsoft Fix Guides

Sai Kiran Pandrala

Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.