Fix Azure Load Balancer Issues: Setup & Config Guide

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

What's in This Guide

Why This Happens
The Quick Fix
Step-by-Step Solution
Advanced Troubleshooting
Prevention & Best Practices
FAQ

Why Azure Load Balancer Problems Happen

I've seen Azure Load Balancer break production environments in ways that take hours to untangle , not because the service is unreliable, but because the configuration surface area is large and the error messages Azure gives you are almost offensively vague. "Backend unhealthy." Great. Thanks. That narrows it down to about fifteen possible causes.

Azure Load Balancer operates at Layer 4 of the OSI model , it works with raw TCP and UDP traffic, not HTTP application logic. That's an important distinction. It doesn't inspect request contents. It just routes packets to healthy backend pool instances based on rules you define. When traffic stops flowing, the problem is almost always in one of four places: your health probe configuration, your Network Security Group (NSG) rules, your backend pool membership, or your load-balancing rule definitions. Sometimes it's all four at once.

Here's the pattern I see most often. Someone sets up an Azure Load Balancer for the first time, gets the frontend IP configured correctly, creates a backend pool with two VMs, and then traffic just sits there. Nothing reaches the VMs. The reason, nine times out of ten: the health probes are being silently blocked by an NSG rule that nobody remembered to update. The load balancer sends its probe traffic to the backend VMs, the NSG drops it, the load balancer marks both VMs as unhealthy, and suddenly your backend pool is empty from the load balancer's perspective even though both VMs are running perfectly fine.

There's also a major shift in behavior between the Basic SKU and the Standard SKU that catches people off guard. Basic Load Balancer is open to the internet by default. Standard Load Balancer is closed, all inbound traffic is blocked unless you explicitly allow it through NSG rules. If you've been running on Basic and recently migrated (or built a new environment that defaulted to Standard), that security-by-default behavior will look exactly like a broken load balancer unless you know what's happening.

Speaking of Basic Load Balancer: it was retired on September 30, 2025. If you're still running Basic, your load balancer may already be in an unsupported state or facing service disruptions. Upgrading isn't optional at this point, it's urgent. This guide covers that migration in detail.

Finally, there's the Azure Load Balancer outbound connectivity trap. Many teams discover, after deploying, that their backend VMs behind a Standard internal load balancer can't reach the internet. That's expected behavior, not a bug. Standard Load Balancer doesn't provide outbound connectivity on its own for internally-load-balanced VMs. You need either a public load balancer with outbound rules, a NAT gateway, or explicitly configured outbound connectivity. None of this is obvious from the Azure portal UI.

I know this is frustrating, especially when a misconfigured health probe is silently killing traffic at 2am. Let's fix it. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you go deep on configuration, do this one check. In the Azure portal, navigate to your Load Balancer resource, then look for Insights in the left-hand menu under "Monitoring." Click it. You'll see a dashboard with two critical metrics front and center: Data Path Availability (also called VipAvailability) and Health Probe Status (DipAvailability). These two numbers tell you almost everything you need to know immediately.

If Data Path Availability is 0%, traffic is not reaching the load balancer frontend at all. Check your frontend IP configuration and make sure the load-balancing rules are actually defined.

If Health Probe Status is 0% but Data Path Availability looks fine, your load balancer is receiving traffic, but it has nowhere to send it because all your backend VMs are failing health checks. This is the most common scenario. Go straight to Step 1 of this guide.

If both metrics show healthy numbers but your application still isn't working, the problem is downstream of the load balancer, inside the VMs themselves, in the application layer, or in a misconfigured port mapping. The load balancer is doing its job; something else is broken.

For a fast command-line check, run this in Azure CLI:

az network lb show \
  --resource-group myResourceGroup \
  --name myLoadBalancer \
  --query "{probes:probes, rules:loadBalancingRules, backendPools:backendAddressPools}" \
  --output table

This gives you a snapshot of your probes, rules, and backend pools in one shot. If the probes array is empty, or the rules don't reference the correct backend pool, you've found your issue without clicking through a dozen portal screens.

You can also check health probe status per backend instance:

az network lb address-pool show \
  --resource-group myResourceGroup \
  --lb-name myLoadBalancer \
  --name myBackendPool

Pro Tip

When you create a Standard Load Balancer health probe, the probe traffic itself originates from the IP address 168.63.129.16, Azure's internal platform IP. Your NSG must allow inbound traffic from this source address (or use the AzureLoadBalancer service tag as a shortcut). Miss this one rule and every single backend VM in your pool will appear unhealthy, no matter how perfectly your app is running.

Diagnose and Fix Your Azure Load Balancer Health Probe Configuration

Health probes are how Azure Load Balancer decides which backend VMs are fit to receive traffic. If a VM stops responding to probes, it gets pulled from rotation. The tricky part is that "not responding to probes" and "the probe is being blocked by an NSG" look identical to the load balancer. Both situations result in the VM being marked unhealthy.

In the Azure portal, go to your Load Balancer → Settings → Health probes. Check the following for each probe:

Protocol: TCP probes just check if the port is open. HTTP/HTTPS probes expect a 200 OK response from a specific path. If you're using HTTP but your app returns 301 redirects, the probe will fail.
Port: Must match the actual port your application is listening on inside the VM, not the frontend port of the load balancer.
Interval and Unhealthy threshold: Default is 15-second intervals, 2 consecutive failures before marking unhealthy. If your app has a slow startup, increase the threshold.

To add or update a health probe via CLI:

az network lb probe create \
  --resource-group myResourceGroup \
  --lb-name myLoadBalancer \
  --name myHTTPProbe \
  --protocol Http \
  --port 80 \
  --path /health \
  --interval 10 \
  --threshold 3

To test whether your application actually responds correctly on the probe port, SSH or RDP into one of your backend VMs and run:

# Linux
curl -v http://localhost:80/health

# Windows PowerShell
Invoke-WebRequest -Uri "http://localhost:80/health" -UseBasicParsing

If this returns anything other than HTTP 200, your application isn't responding correctly to the probe path, and the load balancer will never mark that VM healthy. Fix the app endpoint first, then re-check the Insights dashboard. The DipAvailability metric should climb back toward 100% within two probe intervals once the endpoint is healthy.

Fix NSG Rules Blocking Azure Load Balancer Traffic

This is where most Azure Load Balancer setup problems actually live. Standard Load Balancer is built on a Zero Trust security model, traffic is blocked by default unless you explicitly open it. Many people configure the load balancer rules perfectly and then forget that the NSG on the backend VMs' NICs (or the subnet NSG) is still locked down.

You need at minimum two inbound NSG rules on your backend instances:

A rule allowing your actual application traffic (e.g., TCP port 80 or 443 from any source)
A rule allowing health probe traffic from the AzureLoadBalancer service tag

In the Azure portal: navigate to your NSG → Inbound security rules → click + Add. Create the health probe rule first since it's the most commonly missing one.

Or do it with CLI, run both commands:

# Allow application traffic (port 80 example)
az network nsg rule create \
  --resource-group myResourceGroup \
  --nsg-name myNSG \
  --name AllowHTTPInbound \
  --priority 100 \
  --direction Inbound \
  --source-address-prefixes '*' \
  --source-port-ranges '*' \
  --destination-address-prefixes '*' \
  --destination-port-ranges 80 \
  --protocol Tcp \
  --access Allow

# Allow health probe traffic from Azure Load Balancer
az network nsg rule create \
  --resource-group myResourceGroup \
  --nsg-name myNSG \
  --name AllowAzureLoadBalancerProbe \
  --priority 110 \
  --direction Inbound \
  --source-address-prefixes AzureLoadBalancer \
  --source-port-ranges '*' \
  --destination-address-prefixes '*' \
  --destination-port-ranges '*' \
  --protocol '*' \
  --access Allow

After adding these rules, wait about 60 seconds and check the Health Probe Status metric in Insights again. If it was sitting at 0% because of blocked probes, you should see it jump immediately. If traffic is still not flowing, check whether the NSG is applied at the subnet level as well, subnet-level NSGs are evaluated before NIC-level NSGs, and a deny rule at the subnet level will override any allow rule on the NIC.

One more thing: if your traffic still isn't getting distributed after NSG rules look correct, double-check that your load-balancing rule has Session persistence set correctly. If it's set to "Client IP" or "Client IP and protocol," all traffic from the same source IP will always go to the same backend VM, which can look like uneven distribution or no distribution at all if that one VM is unhealthy.

Reconfigure Your Backend Pool to Include the Right VMs

An empty or incorrectly configured backend pool is a surprisingly common Azure Load Balancer configuration error. The load balancer has no VMs to send traffic to, so nothing works, but the portal might not make this obvious at a glance.

Go to your Load Balancer in the Azure portal → Settings → Backend pools. Click your pool and verify:

The correct VMs or Virtual Machine Scale Sets (VMSS) are listed
Their state shows as "Succeeded" (not "Failed" or "Updating")
The VMs are in the same virtual network as the load balancer
For NIC-based backend pools, the correct NIC and IP configuration is selected

You can add a VM to the backend pool via CLI like this:

# Get the NIC ID of the VM you want to add
NIC_ID=$(az vm show \
  --resource-group myResourceGroup \
  --name myBackendVM \
  --query "networkProfile.networkInterfaces[0].id" \
  --output tsv)

# Add the NIC's IP config to the backend pool
az network nic ip-config address-pool add \
  --resource-group myResourceGroup \
  --nic-name myBackendVMNIC \
  --ip-config-name ipconfig1 \
  --lb-name myLoadBalancer \
  --address-pool myBackendPool

Keep in mind that Azure Load Balancer distributes inbound flows to backend pool instances according to its load-balancing algorithm, by default, a 5-tuple hash based on source IP, source port, destination IP, destination port, and IP protocol. This means you need at least two healthy VMs in the backend pool to actually see traffic distribution. With one VM, all traffic goes to that one instance, which is fine for availability, but not for load testing.

If you're running a Virtual Machine Scale Set, make sure the VMSS instances are not in a "Deallocating" or "Stopped (deallocated)" state, deallocated instances are automatically removed from the backend pool and won't receive traffic. Check the VMSS instance health in the portal under the VMSS resource → Instances.

Once VMs are properly added and the health probes pass, the Insights dashboard will show both Data Path Availability and Health Probe Status at or near 100%. That's your confirmation.

Resolve Azure Load Balancer Outbound Connectivity Failures

Here's a scenario I hear about constantly: "My VMs behind the load balancer suddenly can't reach the internet. Outbound connections are failing with timeouts." This is one of the most confusing Azure Load Balancer issues because the problem isn't with the load balancer itself, it's with how Standard Load Balancer handles outbound SNAT (Source Network Address Translation).

When you attach VMs to a public Standard Load Balancer, Azure automatically provides outbound internet connectivity by translating the VMs' private IPs to the load balancer's public IP. But this only works if you have outbound rules configured, or if your load-balancing rules have "Outbound SNAT" enabled. By default in Standard SKU, outbound rules are not automatically created.

For a Standard internal load balancer, there is no outbound internet connectivity at all by default. The internal LB only handles traffic within the virtual network. If your VMs need internet access, you need a separate solution, either attach a public load balancer with outbound rules, deploy an Azure NAT Gateway on the subnet, or assign public IPs directly to the VMs.

To create an outbound rule on your public Standard Load Balancer via CLI:

# First, ensure you have a frontend IP configuration for outbound
az network lb outbound-rule create \
  --resource-group myResourceGroup \
  --lb-name myPublicLoadBalancer \
  --name myOutboundRule \
  --frontend-ip-configs myFrontendIP \
  --protocol All \
  --outbound-ports 10000 \
  --address-pool myBackendPool

If you're seeing SNAT port exhaustion errors, which show up as intermittent connection failures to external services, look at the SNAT Connection Count metric in Azure Monitor for your load balancer. A high number of failed SNAT connections indicates you're running out of SNAT ports. The fix is to either increase the number of outbound ports in your outbound rule, add more frontend IPs to multiply your SNAT port allocation, or move to a NAT Gateway which provides significantly more SNAT ports per VM.

To check SNAT connection counts via PowerShell:

Get-AzMetric `
  -ResourceId "/subscriptions/{subId}/resourceGroups/myRG/providers/Microsoft.Network/loadBalancers/myLB" `
  -MetricName "SnatConnectionCount" `
  -TimeGrain 00:01:00 `
  -StartTime (Get-Date).AddHours(-1) `
  -EndTime (Get-Date)

Once you configure outbound rules correctly and the SNAT ports are no longer exhausted, outbound connections from your backend VMs will start working again immediately. No restart required.

Migrate from Basic to Standard Azure Load Balancer Before It Breaks

I want to be direct about this: Basic Load Balancer was officially retired on September 30, 2025. If you're reading this after that date and you're still running Basic, Microsoft is no longer providing support, SLAs, or bug fixes for your load balancer. Your environment is in a precarious spot.

You cannot do an in-place SKU upgrade. Basic and Standard Load Balancers are architecturally different enough that you need to migrate resources, not just click an "upgrade" button. The good news: Microsoft provides a PowerShell migration module that automates most of this.

Key differences you need to understand before migrating:

Security model: Basic is open to the internet by default. Standard uses Zero Trust, closed by default, NSGs required.
Availability zones: Basic does not support availability zones. Standard is zone-redundant by default.
Backend pool size: Basic supports up to 300 instances. Standard supports up to 5,000.
Diagnostics: Standard has multidimensional metrics and Azure Monitor integration. Basic has minimal diagnostics.
Gateway Load Balancer chaining: Only available with Standard SKU.

To run the automated migration:

# Install the migration module (run in PowerShell as Administrator)
Install-Module -Name AzureLoadBalancerMigration -Force

# Connect to your Azure account
Connect-AzAccount

# Run the migration (this creates a new Standard LB and migrates your config)
Start-AzBasicLoadBalancerMigration `
  -ResourceGroupName "myResourceGroup" `
  -BasicLoadBalancerName "myBasicLB" `
  -StandardLoadBalancerName "myNewStandardLB"

The script will pause and ask for confirmation before making changes. Review everything carefully, especially the NSG rules it recommends creating. After migration, verify your Health Probe Status and Data Path Availability metrics in Insights immediately. Because Standard LB is closed by default, if you didn't have NSG rules before, traffic will be blocked post-migration until you add them. That's the number one post-migration complaint: "migration worked but now nothing reaches my VMs." Add the AzureLoadBalancer service tag rule and your application port rule as described in Step 2, and you'll be back online.

Advanced Azure Load Balancer Troubleshooting

Using Azure Monitor Metrics for Deep Diagnostics

Azure Load Balancer exposes several multidimensional metrics through Azure Monitor that go far beyond what the Insights dashboard shows. If you're troubleshooting intermittent failures or trying to understand traffic patterns, these are your primary tools.

The most important metrics and what they tell you:

VipAvailability (Data Path Availability): Is the load balancer frontend reachable? If this drops, you have a platform-level issue, escalate to Microsoft Support immediately.
DipAvailability (Health Probe Status): What percentage of backend instances are passing health probes? Filter by backend port, backend IP, and frontend IP to isolate which specific VM is failing.
ByteCount / PacketCount: Is traffic actually flowing? Useful for confirming that load-balancing rules are correctly routing packets.
SnatConnectionCount: Are you running out of outbound SNAT ports? Filter by "Connection State = Failed" to see failed attempts specifically.

To pull DipAvailability per backend instance using Azure CLI:

az monitor metrics list \
  --resource "/subscriptions/{subId}/resourceGroups/myRG/providers/Microsoft.Network/loadBalancers/myLB" \
  --metric "DipAvailability" \
  --dimension BackendIPAddress BackendPort \
  --interval PT1M \
  --start-time 2026-04-20T00:00:00Z \
  --end-time 2026-04-20T01:00:00Z

Azure Load Balancer Health Event Logs

Standard Load Balancer emits health event logs that you can query through Azure Monitor Logs (Log Analytics). These logs record when backend instances transition between healthy and unhealthy states, invaluable for tracking down intermittent failures that cleared before you could investigate.

In the portal: go to your Load Balancer → Monitoring → Diagnostic settings → add a setting to send logs to a Log Analytics workspace. Then query them:

// Kusto query in Log Analytics
AzureLoadBalancerHealthEvent
| where TimeGenerated > ago(24h)
| where LoadBalancerName == "myLoadBalancer"
| project TimeGenerated, BackendIPAddress, BackendPort, HealthState, Message
| order by TimeGenerated desc

Distribution Mode and Session Persistence Issues

Azure Load Balancer uses a 5-tuple hash by default to distribute connections. If you need session persistence (same client always hits the same backend VM), you can change the distribution mode to 2-tuple (source IP only) or 3-tuple (source IP and protocol). But this reduces the effectiveness of load distribution. I've seen teams enable session persistence to fix one problem and then scratch their heads for days about why one VM is handling 80% of traffic. Check your distribution mode setting in the load-balancing rule under Session persistence in the portal.

Cross-Region Load Balancer Connectivity

If you're using Azure's cross-region load balancing feature to distribute traffic across multiple Azure regions, there's an important architecture requirement: each regional load balancer must be a Standard SKU public load balancer, and cross-region load balancer itself must be in the same tenant. If you're seeing traffic not reaching secondary regions, verify that the regional load balancers are correctly configured as backend pools for the global tier and that health probes at the global tier are passing.

When to Call Microsoft Support

Escalate to Microsoft Support immediately if: your Data Path Availability (VipAvailability) metric drops below 100% with no changes on your side (this is a platform issue, not a configuration issue), if you're seeing unexpected behavior after completing the Basic-to-Standard migration and all NSG rules look correct, or if health event logs show backend instances failing probes with no application-level explanation. For these scenarios, open a severity A support ticket with your Load Balancer resource ID and the specific time range of the issue captured from Azure Monitor.

Prevention & Best Practices for Azure Load Balancer

Most of the Azure Load Balancer troubleshooting scenarios in this guide are preventable. The teams that never call me about broken load balancers follow a consistent set of practices from day one.

Start with Standard SKU, always. Basic Load Balancer is retired. There is no scenario today where you should deploy a new Basic Load Balancer. Standard gives you availability zone support, multidimensional metrics, better security defaults, and a much higher backend pool scale limit. The slightly higher cost is worth it by a significant margin.

Design your NSG rules before you deploy the load balancer. Map out every port that needs to be open, application traffic and health probe traffic, and create those NSG rules as part of your infrastructure-as-code template, not as an afterthought. Forgetting the AzureLoadBalancer service tag rule is the single most common cause of Azure Load Balancer health probe failures in production environments. Put it in your ARM template, Bicep file, or Terraform module and never think about it again.

Use availability zones when creating Standard Load Balancer. Deploying a zone-redundant Standard Load Balancer, paired with backend VMs spread across multiple availability zones, protects you from single-zone failures. The load balancer distributes traffic only to healthy instances in zones that are operational. Set this up during initial deployment, retrofitting zone-awareness into an existing deployment is significantly more complex.

Set up Azure Monitor alerts on DipAvailability. Create an alert rule that fires when the Health Probe Status metric drops below 100% for more than 5 minutes. This gives you early warning before users start noticing problems. The alert costs almost nothing and has saved production incidents for every team that's configured it.

Test your health probe endpoint actively. Before you go live, hit the probe endpoint from outside the VM, not just from localhost. I've seen cases where the app returned HTTP 200 on localhost but the external health probe was hitting a different path and getting 404s. The load balancer's perspective is what matters, not localhost.

Quick Wins

Always include the AzureLoadBalancer service tag NSG rule as a template default in every new deployment
Pin your Standard Load Balancer to availability zones at creation time, you cannot add zones to an existing LB without redeployment
Set up a Load Balancer Insights workbook alert for DipAvailability dropping below 100%
Document your outbound SNAT port allocation and monitor SnatConnectionCount proactively, SNAT exhaustion sneaks up on you as traffic grows

Frequently Asked Questions About Azure Load Balancer

What is Azure Load Balancer and when should I use it instead of Application Gateway?

Azure Load Balancer works at Layer 4 of the OSI model, it distributes TCP and UDP traffic without looking at the content of the requests. It's the right choice when you need high-throughput, low-latency distribution of raw network traffic across VMs or scale sets and don't need HTTP-level features. Azure Application Gateway, by contrast, operates at Layer 7 and understands HTTP/HTTPS, it can do URL-based routing, SSL termination, and web application firewall (WAF) rules. If you're running a standard multi-tier app with web servers, use Application Gateway in front and Load Balancer for your backend tiers. If you're running something like a database cluster or a UDP-based service, Load Balancer is the right tool. The two services are in the same "Load Balancing and Content Delivery" category in Azure and are often used together in the same architecture.

Why are my backend VMs showing as unhealthy in Azure Load Balancer even though they're running fine?

Almost every time I see this, the culprit is an NSG rule blocking health probe traffic. Azure Load Balancer sends health probe packets from the internal platform IP 168.63.129.16. If your NSG doesn't have an inbound allow rule for the AzureLoadBalancer service tag, those probes get silently dropped and the load balancer marks your VMs unhealthy. The second most common cause is the health probe port or path not matching what the application is actually serving, for HTTP probes, the path must return an HTTP 200 status code, not a redirect. SSH or RDP into the VM and test the probe endpoint locally to confirm the app is responding correctly before blaming the load balancer.

My Basic Load Balancer stopped working, is it because of the September 2025 retirement?

Yes, that's very likely the cause. Microsoft retired Basic Load Balancer on September 30, 2025, which means if you're past that date, your Basic Load Balancer is no longer covered by SLA and Microsoft may have begun restricting or decommissioning Basic LB resources. You need to migrate to Standard Load Balancer. Use the AzureLoadBalancerMigration PowerShell module to automate most of the process, the module creates a new Standard Load Balancer with your existing configuration migrated over. After migration, remember to add NSG rules immediately since Standard Load Balancer is closed by default, unlike Basic which was open to the internet by default. Check Step 5 and Step 2 of this guide for the exact commands.

Does Azure Load Balancer store any of my traffic data or customer information?

No. Azure Load Balancer doesn't store customer data. All traffic processing happens in real-time and packets are forwarded without any persistence of the payload content. This is partly a function of how it operates, at Layer 4, the load balancer passes through packets without decrypting or inspecting them, so there's nothing to store even if it wanted to. This is different from Azure Application Gateway, which terminates SSL and can inspect request content. For compliance purposes, Azure Load Balancer processing is ephemeral, there's no data-at-rest concern for the traffic itself, though your Azure Monitor metrics and diagnostic logs (which contain IP addresses and connection metadata) are stored according to your Log Analytics workspace retention settings.

What's the difference between a public load balancer and an internal load balancer in Azure?

A public Azure Load Balancer has a frontend IP address that's reachable from the internet. It handles inbound internet traffic to your VMs and can also provide outbound connectivity by translating your VMs' private IPs to the public frontend IP when they initiate connections to the internet. An internal (private) load balancer uses a private frontend IP address within your virtual network, it's only reachable from inside the VNet or from on-premises networks connected via VPN or ExpressRoute. Use an internal load balancer for services that should never be exposed to the internet, like database tiers, internal APIs, or microservices that only talk to other services in your network. Many architectures use both: a public load balancer for the web tier and an internal load balancer for backend services.

Can Azure Load Balancer handle millions of connections? What are the real scale limits?

Standard Azure Load Balancer is designed to scale to millions of flows for both TCP and UDP applications, this isn't marketing copy, it's the actual architecture. The service doesn't have a hard connection count limit that you'll hit in typical enterprise scenarios. The real limits you'll run into at scale are SNAT port exhaustion for outbound connections (each frontend IP provides approximately 64,000 SNAT ports, shared across backend VMs), and backend pool size (Standard supports up to 5,000 backend instances, versus 300 for the now-retired Basic SKU). For extremely high outbound connection volumes, Azure NAT Gateway is a better solution for SNAT since it allocates ports per-VM rather than sharing them across the pool. The load balancer itself won't be your bottleneck, your SNAT port allocation and your backend VM capacity will be.

Related Microsoft Fix Guides

Sai Kiran Pandrala

Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.