Fix Azure Virtual Machines: Setup, Errors & Best Practices

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

Why Azure Virtual Machines Break , And Why It's Never Obvious

I've worked with Azure virtual machines across dozens of enterprise environments, and there's one thing that remains true every single time: when something goes wrong, the Azure portal error message tells you almost nothing useful. You get a vague allocation failure, a grayed-out Deploy button, or a VM that simply won't start , and you're left staring at the screen wondering what you missed.

Here's the thing. Azure virtual machines aren't just a single resource. When you spin one up, Azure is silently creating a whole ecosystem behind the scenes, a virtual network, a network interface card (NIC), an OS disk, a private IP address, and often a public IP and a network security group (NSG) on top of that. Every single one of those supporting resources has its own configuration requirements, its own pricing model, and its own failure modes. A misconfigured NSG blocks your SSH or RDP access. A wrong VM size in a region that doesn't support it triggers an allocation failure. Choosing a region that doesn't match where your data disk already lives causes a deployment error that Azure describes in the least helpful way possible.

The most common reasons I see Azure VM deployments fail or misbehave:

  • VM size unavailable in the selected region, Not every VM size series (Dasv7, Dsv6, Fasv7) is available in every Azure region. This is probably the number-one cause of Azure VM allocation failures.
  • Network Security Group rules blocking connectivity, Port 22 for SSH or port 3389 for RDP isn't open, and people assume the VM is broken when it's perfectly healthy.
  • Incorrect authentication setup, Trying to log in with a password when SSH key authentication was configured, or vice versa.
  • Resource group and location mismatch, Attaching a data disk or virtual network from a different region than the VM itself.
  • OS disk size underestimated, Some images use less than the standard 127 GiB OS disk, and workloads quickly run out of space.
  • Trusted Launch conflicts, Since Trusted Launch as Default (TLaD) entered preview for Generation 2 VMs, some older tooling and custom images fail to deploy because they don't support Secure Boot or vTPM.

What makes all of this especially maddening is that Azure virtual machine deployment errors tend to surface at the end of a long provisioning process, sometimes 5 to 10 minutes in, so you waste time before getting a failure message. And when you're under pressure to get a dev environment or production workload running, that's genuinely frustrating.

I know this feeling well. The good news is that every one of these problems has a clear, documented fix. This guide walks you through them in order of likelihood, starting with the fastest wins. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you dig into anything advanced, check these three things in order. In my experience, one of them resolves about 70% of Azure virtual machine problems without any deeper investigation needed.

1. Verify VM size availability in your region. Open the Azure portal, navigate to Virtual Machines → Create → Virtual machine, and on the Basics tab look at the Size field. Click See all sizes. If your target size (for example, Standard_D4s_v3) shows as grayed out or unavailable, it simply isn't offered in the region you've selected. Change your region, try East US 2 or West Europe, and check again.

2. Check your Network Security Group rules. In the Azure portal, go to Virtual machines → [your VM name] → Networking. Under Inbound port rules, confirm that the port you need is explicitly allowed. For Linux VMs, you need port 22 (SSH). For Windows VMs, you need port 3389 (RDP). If neither is listed, click Add inbound port rule and add it. By default, Azure blocks all inbound traffic, the NSG doesn't automatically open any ports unless you tell it to.

3. Confirm your authentication method matches what was configured. If the VM was created with an SSH public key, you cannot log in with a username and password. Run this Azure CLI command to verify the VM's authentication type:

az vm show --resource-group MyResourceGroup --name MyVM --query "osProfile.linuxConfiguration.disablePasswordAuthentication"

If this returns true, password authentication is off, you must use your private key. If you've lost the key, you'll need to reset access via the Azure portal under VM → Reset password.

If you've checked all three and the VM still won't deploy or connect, move on to the step-by-step section below.

Pro Tip
When you create an Azure VM, write down the exact region, resource group name, VM size, and authentication method you used, even just in a notepad. Azure's resource dependency model means that swapping any one of these later involves cascading changes. I've watched teams spend hours on Azure VM configuration errors that would have been a 30-second fix if they'd kept a simple note at creation time.
1
Select the Right Azure VM Size for Your Workload

This is where most Azure virtual machine setup problems begin. Picking a VM size isn't just about CPU count, Azure's size families target very different workload types, and deploying into the wrong family burns money and still underperforms.

Here's the breakdown you actually need:

  • Dasv7 / Dsv6, General purpose. Balanced CPU-to-memory ratio. Good for web servers, development environments, and small-to-medium databases.
  • Fasv7, Compute optimized. High CPU-to-memory ratio. Use this for batch processing, gaming servers, and analytics workloads that are CPU-bound.

To check what sizes are available in your target region using Azure PowerShell:

Get-AzVMSize -Location "eastus2" | Where-Object { $_.Name -like "Standard_D*" } | Select-Object Name, NumberOfCores, MemoryInMB

Or with the Azure CLI:

az vm list-sizes --location eastus2 --output table

One thing people miss: the number of NICs you can attach to a VM is directly tied to its size. If you need multiple NICs for a complex networking setup, you have to size your VM large enough to support them, you can't add NICs beyond the limit set by the VM size series. Check the Virtual Machine pricing page in the Azure portal to see NIC limits per size before you deploy.

After selecting your size, you should see the estimated monthly cost update automatically in the portal's pricing summary panel on the right side of the Create VM screen. If the estimate looks unexpectedly high, double-check whether you've accidentally selected a premium storage tier or a size with more cores than needed.

2
Configure Network Security Group Rules Correctly

I cannot tell you how many "my Azure VM won't connect" tickets I've seen that were 100% caused by a misconfigured NSG. It's not a bug. It's by design, Azure's default stance is to block all inbound traffic. You have to explicitly open every port you need.

Here's how to fix it properly. In the Azure portal, navigate to Virtual machines → [VM name] → Networking → Inbound port rules. Click Add inbound port rule.

For SSH access on Linux VMs:

Source: Any
Source port ranges: *
Destination: Any
Destination port ranges: 22
Protocol: TCP
Action: Allow
Priority: 300
Name: Allow-SSH

For RDP access on Windows VMs, use port 3389 instead of 22. Keep the priority number low (300–400) to make sure it takes precedence over any deny rules at higher priority numbers.

You can also manage NSG rules via Azure CLI:

az network nsg rule create \
  --resource-group MyResourceGroup \
  --nsg-name MyNSG \
  --name Allow-SSH \
  --protocol tcp \
  --direction Inbound \
  --priority 300 \
  --source-address-prefix '*' \
  --source-port-range '*' \
  --destination-address-prefix '*' \
  --destination-port-range 22 \
  --access Allow

One important note: Azure doesn't charge extra for NSG rules themselves. There are no additional charges for network security groups in Azure, so don't hold back on creating granular rules out of cost concern. Tighter, more specific rules are always better from a security standpoint.

If your rule is in place but you still can't connect, verify that the NSG is actually associated with the correct subnet or NIC. Go to Networking → Network Interface and check the Network security group field. If it shows "None," the NSG exists but isn't attached.

3
Set Up SSH Key or Password Authentication Properly

Azure gives you two authentication options for Linux VMs: SSH public key (recommended) or password. For Windows VMs, it's username and password. Mixing these up is an extremely common source of Azure VM connection errors.

If you're setting up SSH key authentication, Azure can generate the key pair for you during VM creation. In the portal, on the Administrator account section, select SSH public key, then choose Generate new key pair. Azure stores the public key in the VM and prompts you to download the private key (.pem file) once, you only get one download chance.

Save your private key and set the correct permissions on it immediately:

chmod 400 ~/Downloads/MyVM_key.pem

Then connect using:

ssh -i ~/Downloads/MyVM_key.pem azureuser@<public-ip-address>

If you already have an SSH key pair, paste your existing public key into the SSH public key source field instead. The format should start with ssh-rsa followed by the key string.

Lost your private key and locked out? Don't panic. In the Azure portal, go to VM → Help → Reset password. You can inject a new public key or reset the password from here without rebuilding the VM. Azure handles this through the VM agent running inside the OS.

For Windows VMs, if you can't RDP in even with the correct password, verify under Networking that port 3389 is open, and also check that the VM's public IP address is still assigned, Azure can reassign dynamic IPs on VM restart unless you've configured a static public IP allocation.

4
Configure OS Disk and Data Disk Storage Correctly

Disk configuration is the most overlooked part of Azure virtual machine setup, until you hit a storage error or run out of space at 2 AM. Here's what you need to know.

Every Azure VM gets an OS disk and a local temporary disk. Azure does not charge for the local temporary disk storage, but that disk is ephemeral. Anything written to it is gone when the VM is deallocated or moved to different hardware. Never use the local disk (usually mapped as /dev/sdb on Linux or the D: drive on Windows) for persistent data.

The OS disk is usually 127 GiB for most images, though some marketplace images use smaller sizes. It's charged at the regular managed disk rate. To check your current OS disk size:

az vm show --resource-group MyResourceGroup --name MyVM --query "storageProfile.osDisk.diskSizeGb"

If you need more OS disk space, you can resize it, but only while the VM is deallocated:

az vm deallocate --resource-group MyResourceGroup --name MyVM
az disk update --resource-group MyResourceGroup --name MyOSDiskName --size-gb 256
az vm start --resource-group MyResourceGroup --name MyVM

For data disks, Microsoft's best practice is to keep your data on a separate disk from your OS. This matters enormously in production: if a VM fails and you need to rebuild it, you can detach the data disk and attach it to a new VM without losing anything. Set this up from the start rather than retrofitting it later. Go to VM → Disks → Add data disk to attach a new managed disk.

Choose Premium SSD for production workloads, Standard SSD for dev/test, and Standard HDD only for archival or infrequently accessed data. The pricing difference between tiers is visible on the managed disks pricing page in the Azure portal.

5
Set Up Availability Zones and Region for Reliability

If you're running anything more than a throwaway dev VM, availability configuration is non-negotiable. Azure gives you two main options for Azure virtual machine high availability, and picking the wrong one, or skipping it entirely, means a single hardware failure takes down your workload.

Availability Zones are physically separated zones within a single Azure region, with independent power, cooling, and networking. If you deploy two or more VM instances across two or more Availability Zones in the same region, Azure guarantees at least 99.99% connectivity uptime by SLA. To configure this in the portal, on the Basics tab of VM creation, under Availability options, select Availability zone, then choose Zone 1, Zone 2, or Zone 3.

Virtual Machine Scale Sets are the right choice when you need to automatically scale the number of VM instances up or down based on demand or a schedule. Scale sets can also span multiple Availability Zones. Create a scale set via:

az vmss create \
  --resource-group MyResourceGroup \
  --name MyScaleSet \
  --image Ubuntu2204 \
  --upgrade-policy-mode automatic \
  --admin-username azureuser \
  --generate-ssh-keys \
  --zones 1 2 3

For region selection: use the Azure portal's VM creation wizard and pick a region close to your users to minimize latency. If you need to check which regions support the VM size you want, run:

az account list-locations --output table

Then cross-reference with available sizes in that region. Mismatched region choices, creating a VM in East US but connecting it to a virtual network in West Europe, will cause deployment failures that are confusing to diagnose. Always keep all resources for a single workload in the same region unless you have a specific geo-redundancy reason to separate them.

After deployment, verify your availability configuration under VM → Overview → Availability zone. If it shows "No infrastructure redundancy required," the VM has no availability protection and a single rack failure could take it offline.

Advanced Troubleshooting for Azure Virtual Machines

If the five steps above didn't solve your problem, you're dealing with something deeper. Here's how I approach the harder Azure VM issues.

Allocation Failures

Azure VM allocation failures happen when Azure can't find physical hardware in your selected region and availability zone that matches your requested VM size. The error typically looks like: "Allocation failed. We do not have sufficient capacity for the requested VM size in this region." Your options in order of preference:

  1. Try a different availability zone in the same region (e.g., Zone 2 instead of Zone 1).
  2. Try a different region entirely, use az vm list-skus --location eastus2 --size Standard_D4s_v3 --query "[].restrictions" to see if restrictions exist.
  3. Try a comparable VM size (e.g., Standard_D4s_v4 instead of Standard_D4s_v3).
  4. File a support ticket requesting capacity reservation in your preferred region.

Trusted Launch and Generation 2 VM Issues

Since Trusted Launch as Default (TLaD) entered preview, new Generation 2 VMs default to having Secure Boot and vTPM enabled. If you're deploying a custom or marketplace image that doesn't support these features, the VM will fail to start. Check under VM → Configuration → Security type. You can change this to Standard if your image doesn't support Trusted Launch, but do register for the TLaD preview to prepare, since this will eventually become the default non-opt-out behavior.

Boot Diagnostics

When a VM won't start and you have no other way to see what's happening, enable boot diagnostics. Go to VM → Boot diagnostics → Enable with managed storage account. After enabling, stop and restart the VM, then click Screenshot to see what the OS is actually displaying at boot time. This catches kernel panics, failed disk mounts, and misconfigurations that are otherwise completely invisible from the Azure portal.

Serial Console Access

If SSH and RDP are both unreachable, Azure Serial Console gives you direct access to the VM's serial port, bypassing the network entirely. Go to VM → Help → Serial console. This works even if the VM's network configuration is broken, which makes it an invaluable last resort for Azure VM connection problems.

Activity Log for Deployment Errors

Every Azure resource operation gets logged. If your VM deployment failed silently, go to Resource group → Activity log, filter by the VM name, and look for failed operations. The detailed error messages there are far more informative than anything shown in the portal's deployment summary. Filter to the last 24 hours and look for entries with status Failed.

Checking VM Agent Status

The Azure VM agent enables features like password reset, extension installation, and diagnostics. If it's not running, many management operations will silently fail. Check agent status via:

az vm get-instance-view --resource-group MyResourceGroup --name MyVM --query "instanceView.vmAgent.statuses"

A healthy agent returns a status of ProvisioningState/succeeded. If the agent is unhealthy, the most common fix on Linux is reinstalling the walinuxagent package via the serial console.

When to Call Microsoft Support
Escalate to Microsoft Support if: you're getting persistent allocation failures across multiple regions and sizes with no workaround; your VM enters an unrecoverable failed provisioning state that can't be deleted; you see unexpected charges on your bill for a VM you've already stopped (stopped VMs still incur compute charges, you must deallocate them to stop billing); or if Serial Console access is returning errors. For enterprise agreements, use Azure support plans that include 24/7 critical incident response.

Prevention & Best Practices for Azure Virtual Machines

Getting your Azure virtual machines right the first time saves a lot of pain. These aren't theoretical best practices, they're the things I consistently see skipped in organizations that then spend days firefighting avoidable outages.

Always separate OS and data disks from day one. I've already covered this, but it bears repeating: if your data lives on a separate managed disk, recovering from a VM failure is a data disk reattach operation, not a data recovery nightmare. This single decision protects you more than almost anything else.

Use static public IP addresses for anything that isn't temporary. Dynamic IPs change when a VM is deallocated. If you have DNS records, firewall rules, or connection strings pointing to your VM's IP, a dynamic address will break them silently on the next restart. During VM creation, set the public IP to Static under Networking → Public IP → Create new → Assignment: Static.

Tag every resource at creation time. Azure resource tags are the difference between being able to understand your infrastructure six months from now and having a complete mystery on your hands. At minimum, tag VMs with environment (dev/staging/prod), owner, and project. Tags also make cost analysis in Azure Cost Management far more useful.

Implement auto-shutdown for non-production VMs. Azure charges for running VMs by the minute. Dev and test VMs that run overnight unnecessarily burn significant budget. Go to VM → Auto-shutdown and configure a daily shutdown time. This alone can cut dev VM costs by 60–70%.

Use Azure Hybrid Benefit if you have existing Windows Server licenses. This is a real cost-saver that a surprising number of organizations miss entirely. If your organization has Software Assurance coverage for Windows Server, you can apply those licenses to Azure VMs and dramatically reduce the OS licensing component of your Azure VM cost.

Plan your VM naming convention before you deploy anything. Changing a VM's name after creation requires rebuilding the VM. Name resources using a consistent schema from the start, for example: [project]-[env]-[region]-[role]-[number], like myapp-prod-eus2-web-01. All related resources (NIC, disk, NSG) should follow the same convention.

Quick Wins
  • Enable auto-shutdown on all dev/test Azure VMs immediately to stop paying for idle compute
  • Attach data disks separately from the OS disk during initial VM creation, retrofitting this is painful
  • Set public IP addresses to Static before pointing any DNS or firewall rules at the VM
  • Register for the Trusted Launch as Default preview now to test your images before TLaD becomes mandatory for Generation 2 VMs

Frequently Asked Questions

What do I need to think about before creating an Azure virtual machine?

Before you deploy any Azure virtual machine, nail down six things: the resource naming convention you'll use (you can't easily rename a VM later), which Azure region best fits your users and compliance requirements, what VM size family matches your workload type, what OS you need and whether it requires a separate license, how you'll handle authentication (SSH key vs. password), and what related resources, virtual network, NSG, disks, public IP, you need alongside the VM. Skipping any of these decisions and "figuring it out later" usually means rebuilding the VM from scratch.

Why does my Azure VM show "Allocation failed" when I try to deploy?

Allocation failures mean Azure doesn't have available physical hardware in your selected region and availability zone that supports the VM size you've requested. This happens more with newer size series and in high-demand regions. Your fastest fix is to try a different availability zone in the same region, for example, switch from Zone 1 to Zone 2 or Zone 3. If that doesn't work, try an adjacent VM size (like Standard_D4s_v4 instead of Standard_D4s_v3). You can also check region restrictions for a specific SKU using az vm list-skus --location [region] --size [vmsize]. If you need a specific size in a specific region consistently, open a support ticket to request capacity reservation.

I stopped my Azure VM but I'm still being charged, why?

This is one of the most common Azure VM billing surprises: stopping a VM from inside the operating system (via shutdown -h now on Linux or Start → Shut Down on Windows) puts the VM in a "Stopped" state but does NOT deallocate it. Azure still charges you compute costs for a Stopped VM because the hardware is still reserved for you. To stop billing, you must deallocate the VM, either via the Azure portal by clicking Stop (which triggers deallocation), or via Azure CLI: az vm deallocate --resource-group MyRG --name MyVM. A deallocated VM shows as "Stopped (deallocated)" in the portal. Note: storage costs for the OS disk continue even when deallocated.

How do I connect to my Azure VM if I've lost the SSH private key?

You're not permanently locked out, Azure provides an access recovery path without rebuilding the VM. In the Azure portal, go to VM → Help → Reset password. You'll see two options: reset the SSH public key (inject a new public key you generate locally), or reset the password (switch to password authentication). Generate a new key pair locally with ssh-keygen -t rsa -b 4096 -f ~/new_vm_key, then paste the contents of new_vm_key.pub into the Reset password form. Azure injects it via the VM agent. If the VM agent itself is broken, you'll need Azure Serial Console or disk swap recovery, which is a longer process but still doesn't require deleting the VM.

What's the difference between stopping and deallocating an Azure virtual machine?

Stopping a VM via the OS leaves it in "Stopped" state, hardware reserved, compute charges still running, public IP retained, VM still visible as running in Azure's billing. Deallocating via the Azure portal or CLI releases the underlying hardware back to Azure's pool, compute charges stop, the dynamic public IP is released (static IPs are retained), and the VM shows as "Stopped (deallocated)." You're still charged for managed disk storage in deallocated state. Always use the Azure portal's Stop button or az vm deallocate to stop incurring compute costs, never just shut down the OS.

How do Availability Zones protect my Azure VM and what uptime SLA do they provide?

Azure Availability Zones are physically separate datacenters within a single Azure region, each with independent power, cooling, and networking infrastructure. When you deploy two or more Azure virtual machine instances across two or more Availability Zones in the same region, Azure's SLA guarantees VM connectivity to at least one instance 99.99% of the time. That's roughly 52 minutes of maximum downtime per year. Compare this to a single VM with no redundancy, which only carries a 99.9% SLA (about 8.7 hours per year). For anything running production workloads, deploying across Availability Zones is the minimum acceptable configuration, use Virtual Machine Scale Sets with zone spreading if you also need automatic scaling.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.