Azure Virtual Machines: Fix Setup & Config Errors
Why Azure Virtual Machine Problems Are So Common
I've helped dozens of teams spin up their first Azure virtual machine, and the story is almost always the same. Everything looks fine in the portal. You click Create, you wait through the deployment spinner, and then , nothing. A cryptic error message that doesn't actually tell you what went wrong. Or worse, the VM deploys successfully but you can't connect to it, can't access your app, or the costs come back three times what you expected.
Azure VMs are genuinely powerful. They give you on-demand, scalable computing without the capital expense of owning physical hardware , you get the flexibility of virtualization without racking servers yourself. But that power comes with real complexity. Before you even click Create, Microsoft expects you to have already made decisions about your resource names, the geographic region where your VM will live, the right size for your workload, which operating system you need, how many VMs you might eventually run, how your VM will start up and what configuration it needs, and which supporting resources, virtual networks, NICs, IP addresses, NSGs, and disks, will sit alongside it.
Most guides skip straight to "click here, type that." This one doesn't. Because the real reason Azure VM setup fails, or costs you money you didn't expect, is almost always a planning decision that was skipped at the start, not a button you clicked wrong. I'm going to walk you through exactly what those decisions are, where they go wrong, and how to fix it when things break.
The most common pain points I see with Azure virtual machine deployment and configuration issues are: allocation failures when a region runs out of capacity for a specific VM size, network security group rules blocking SSH or RDP access, disk configuration errors where data gets placed on the OS disk instead of a separate data disk, VM size mismatches that cause performance problems or throttling, and availability zone misconfigurations that leave workloads exposed to single points of failure.
I know this is frustrating, especially when you're trying to get a dev environment up fast or when a production deployment is blocked. Browse all Microsoft fix guides →
The Quick Fix, Try This First
If your Azure VM won't deploy and you're seeing an AllocationFailed error, the fastest fix is almost always to change your target region or availability zone. This happens because Microsoft's datacenters have finite capacity, and specific VM sizes can be temporarily exhausted in a given region. It's not a problem with your account or your configuration, it's a supply issue on Azure's side.
Here's what you do. In the Azure portal, go to Virtual machines → Create → Azure virtual machine. On the Basics tab, find the Region dropdown. Instead of your original choice, try an adjacent region, for example, if you selected East US, try East US 2 or Central US. Your data requirements may limit where you can go, so keep compliance in mind, but for dev/test environments this works immediately.
If changing the region isn't an option, the second fastest fix is to change the VM size series. If you requested a Dasv7-series VM and got an AllocationFailed error, try the equivalent Dsv6-series size for the same vCore and memory profile. Microsoft frequently has more capacity available across different generation series in the same region.
For connectivity issues, specifically when you've deployed successfully but can't reach your VM, the fastest single fix is to check your Network Security Group rules. In the portal, navigate to your VM → Networking → Network settings. Look at your inbound port rules. If you're trying to SSH into a Linux VM, you need port 22 open. For RDP into a Windows VM, you need port 3389. If those rules are missing or set to Deny, that's your problem. Add an inbound rule, set the destination port to 22 or 3389, protocol TCP, action Allow, and priority below 1000 (lower numbers run first).
This is the decision that affects everything downstream, your monthly bill, your performance, and whether your app runs or crawls. Azure VM size determines processing power, memory, storage capacity, and network bandwidth all at once. Getting this wrong is expensive to undo.
Microsoft charges by the hour based on VM size and operating system. For partial hours, you only pay for the minutes used, but that hourly rate can swing wildly between size families. The Dasv7 series is Microsoft's current general-purpose option optimized for most production workloads. If you need heavy CPU work, video processing, large-scale batch jobs, or simulation, look at the Fasv7 series (compute optimized). For memory-heavy workloads like databases or in-memory caching, size up on RAM rather than vCores.
To check available sizes for your target region, run this in Azure CLI:
az vm list-sizes --location eastus --output table
Or in PowerShell:
Get-AzVMSize -Location "eastus" | Format-Table Name, NumberOfCores, MemoryInMB, MaxDataDiskCount
The output shows you exactly what's available in that region before you try to deploy. This one step prevents the majority of AllocationFailed errors because you're not requesting a size that doesn't exist or isn't available in your chosen region.
When you pick a size, also check the MaxDataDiskCount column. This limits how many data disks you can attach later. If you're building something that will grow, say, a database server that will need more storage over time, pick a size with headroom on that number from day one. You cannot change the max data disk count without resizing the entire VM.
If this step works, you'll see your size appear in the Size selection screen inside the portal under Virtual machines → Create → Basics → Size → See all sizes without any grayed-out unavailability notices.
One of the most common Azure virtual machine configuration mistakes I see is people creating resources across multiple regions without realizing it. Your VM, your virtual network, your OS disk, and your NSG all need to live in the same region. If they don't, you either get a deployment error or you get unexpected data-transfer charges as traffic crosses regional boundaries.
The region you choose is called location in Azure's terminology when creating a VM. This location specifies where your virtual hard disks are physically stored, which matters for both performance (latency to users) and compliance (data residency laws in the EU, healthcare regulations in the US, etc.).
To list all available locations in Azure CLI:
az account list-locations --output table
In PowerShell:
Get-AzLocation | Select-Object DisplayName, Location | Format-Table
Create your resource group in the same region as your VM. This keeps everything together and makes cleanup easier when you're done. In the portal: Resource groups → Create → select your Subscription → enter a Resource group name → choose your Region → Review + create.
Name your resources with a consistent convention from the start. Azure resource names can't be changed after creation for most resource types. A workable naming pattern is: [project]-[environment]-[resource-type]-[region]. For example: myapp-prod-vm-eastus. This matters more than it sounds when you're managing dozens of VMs and trying to find things in the portal at 2 AM during an incident.
If this step is done right, all your resources will show up under a single resource group, and you won't see cross-region charges on your bill.
The NSG is what controls which traffic can reach your Azure virtual machine and which gets blocked. There are no additional charges for NSGs in Azure, but a misconfigured NSG will make your VM completely unreachable, which is exactly what happens to most people on their first deployment.
Every VM gets a virtual network interface card (NIC) that connects it to a virtual network. The NSG sits in front of that NIC and applies port rules. By default, when you create a VM through the portal and select "Allow selected ports," Azure creates an NSG rule for the port you chose. The problem is when people create VMs through CLI or ARM templates without explicitly defining NSG rules, the defaults can leave you locked out.
To check your current NSG rules for a specific VM in Azure CLI:
az network nsg list --resource-group myapp-prod-rg --output table
az network nsg rule list --nsg-name myapp-nsg --resource-group myapp-prod-rg --output table
To add an SSH rule if it's missing:
az network nsg rule create \
--resource-group myapp-prod-rg \
--nsg-name myapp-nsg \
--name AllowSSH \
--protocol tcp \
--priority 1000 \
--destination-port-range 22 \
--access Allow
For RDP (Windows VMs), change --destination-port-range 22 to 3389.
Keep in mind the NIC count limit. How many NICs you can attach to a VM is determined by the VM's size. This is why sizing matters beyond just CPU and RAM. If you need multiple NICs for network segmentation or dual-homed configurations, check the NIC limit for your chosen size before deploying. You cannot add NICs beyond the VM's maximum without resizing first.
When this is configured correctly, you can SSH or RDP into your VM within seconds of it booting, no timeouts, no connection refused errors.
This is one of those best practices that sounds optional until you have a VM fail and lose everything. Microsoft's own documentation is direct about it: keep your data on a separate disk from your operating system. The reason is practical and I've seen it matter in real incidents, if your VM fails and becomes unbootable, you can detach the data disk and attach it to a new VM. If your data was on the OS disk, it's gone with the VM.
Here's how Azure billing works for disks. Your VM gets an OS disk (usually 127 GiB, though some images are smaller) and a local disk. Azure doesn't charge for local disk storage, but local disk data does not persist across VM reboots or deallocations. Do not store anything there that you need to keep. The OS disk is charged at the regular rate for managed disks.
For data storage, attach a Premium SSD or Standard HDD managed disk separately. Premium SSD (P-series) disks are significantly faster and are the right call for databases, app data, or anything latency-sensitive. Standard HDD is cheaper and fine for backups or archival data. You can see the pricing difference on the managed disks pricing page in the Azure portal under Cost Management → Pricing calculator.
To add a data disk to an existing VM in the portal: navigate to your VM → Disks → Create and attach a new disk. Set the disk type, size, and name. After attaching, you need to initialize and format it inside the VM itself, Azure attaches it as raw disk. On Linux:
lsblk
sudo fdisk /dev/sdc
sudo mkfs.ext4 /dev/sdc1
sudo mount /dev/sdc1 /mnt/data
On Windows, use Disk Management (diskmgmt.msc) to initialize and format the new disk after it appears in the VM.
If this is done right, you'll see two separate disk entries under your VM's Disks blade, one tagged as OS disk, one as data disk.
If your Azure VM hosts anything that matters, a production app, a customer-facing API, a database, you need to think about availability zones before you deploy, not after. This is one of those configuration decisions that cannot be easily changed after the fact.
Availability Zones are physically separated zones within an Azure region. They have independent power, cooling, and networking. Microsoft guarantees VM connectivity to at least one instance 99.99% of the time when you have two or more instances deployed across two or more Availability Zones in the same Azure region. That SLA drops significantly for single-zone deployments.
For workloads that need to scale horizontally, multiple identical VM instances handling load, Azure Virtual Machine Scale Sets are the right tool. Scale Sets let you create a group of load-balanced VMs where the instance count can increase or decrease automatically based on demand or a defined schedule. You can deploy Scale Set VMs across multiple availability zones, a single zone, or regionally.
To check if your target region supports Availability Zones:
az account list-locations --query "[?metadata.regionCategory=='Recommended']" --output table
When creating a VM with zone pinning in CLI:
az vm create \
--resource-group myapp-prod-rg \
--name myapp-vm-zone1 \
--image Ubuntu2204 \
--zone 1 \
--size Standard_D4as_v5 \
--generate-ssh-keys
Deploy a second instance with --zone 2 and put them behind an Azure Load Balancer to achieve the 99.99% SLA. If you skip this and run a single-zone VM, you're exposed to the entire zone going down, which does happen, just rarely.
When this is right, you'll see a Zone column populated in your VM's Overview blade showing which zone it's pinned to.
Advanced Troubleshooting for Azure Virtual Machine Errors
Diagnosing Azure VM Deployment Failures
When a VM deployment fails in the portal, the error dialog is usually too vague to act on. Go straight to the source. In the portal, navigate to your Resource Group → Deployments. Click the failed deployment. You'll see a detailed operation log with the actual error code and message. The two most common deployment error codes in my experience are:
- AllocationFailed, Azure cannot find capacity for your requested VM size in the target zone/region. Fix: change region, change availability zone, or change VM size series as covered above.
- OperationNotAllowed, Your subscription has a quota limit. Each Azure subscription has regional vCPU quotas per VM family. You've hit one. Fix: go to Subscriptions → [your subscription] → Usage + quotas and request a quota increase. Increases for standard compute quotas are usually approved within a few hours.
Trusted Launch and Generation 2 VM Issues
If you're creating new Generation 2 VMs, be aware that Microsoft has been rolling out Trusted Launch as default (TLaD) in preview. With this enabled, new Gen 2 VMs default to Trusted Launch with Secure Boot and vTPM enabled. If you're deploying older OS images or custom images that don't support Secure Boot, your VM may fail to boot. In the portal, you can disable Secure Boot during VM creation under Advanced → Security type → Standard instead of Trusted Launch. If you want to participate in the TLaD preview and test these defaults before they become mandatory, you need to explicitly register for the preview through the Azure portal under Preview features.
SSH Key Authentication Problems
Azure can create and store public/private SSH key pairs for you automatically during VM creation. The public key goes into your VM; you keep the private key for SSH access. If you've lost your private key or the connection is refused even with the right NSG rules, the most common culprits are:
- Wrong file permissions on your private key file. SSH refuses to use keys with open permissions. Run
chmod 600 ~/.ssh/mykey.pemon Linux/Mac. - The wrong username. Each Linux image has a default admin user. Ubuntu images use
azureuserby default unless you specified something else during creation. - VM is in a Stopped (deallocated) state, it has no public IP assigned. Start it first.
Cost Troubleshooting, When Your Bill Is Higher Than Expected
Azure bills for VMs hourly (or by minute for partial hours) based on size and OS. But many people forget that storage is priced and charged separately. Your OS disk, data disks, public IP address, and data transfer egress all appear as separate line items. If you stop a VM in the portal without deallocating it, you continue to pay for the compute. To stop billing for compute, go to your VM and click Stop, this deallocates the VM. A "Stopped" state without deallocation still charges you.
Azure Hybrid Benefit can significantly reduce licensing costs if you have existing Windows Server or SQL Server licenses with Software Assurance. Enable it under your VM's Configuration blade → Azure Hybrid Benefit.
If you're hitting persistent AllocationFailed errors across multiple regions and sizes, you may have a subscription-level restriction that isn't visible in the portal. If quota increase requests are denied or taking more than 48 hours, or if your VM is unresponsive and Azure Diagnostics shows no actionable data, it's time to open a support ticket directly. Enterprise customers with Premier Support get faster SLAs for VM availability issues. For all support options, visit Microsoft Support and choose Azure as the product area. Have your subscription ID and the failed deployment correlation ID ready, you can find the correlation ID in the failed deployment details in the portal.
Prevention & Best Practices for Azure Virtual Machines
The best Azure virtual machine is one that never needs troubleshooting. Most of the configuration errors and unexpected costs I've seen are avoidable with a small amount of upfront planning.
Start with the Well-Architected Framework. Microsoft publishes Azure Well-Architected Framework Virtual Machine considerations specifically to help you think through reliability, security, cost optimization, operational excellence, and performance efficiency before you deploy. It's not light reading, but the Virtual Machine section translates directly into deployment decisions. The same framework covers Disk Storage considerations separately, worth reading if you're architecting anything with significant storage requirements.
Use the Azure Quickstart templates from the Azure portal or GitHub. These are pre-validated ARM templates that give you a tested starting point for common scenarios, web servers, database VMs, dev environments. Starting from a Quickstart template rather than configuring everything by hand eliminates a whole class of configuration errors.
For teams managing multiple VMs, use Azure Virtual Machine Scale Sets from the start, even if you only need one instance today. Scale Sets give you the same management plane for one VM or a hundred, and they make adding availability zone coverage trivial later. Growing into a Scale Set after the fact is significantly harder than starting with one.
Tag every resource. In the portal, every VM, disk, NIC, and NSG can have key-value tags. Use tags like environment:production, owner:teamname, and project:myapp. Tags make cost attribution and bulk operations far easier when you're managing more than a handful of VMs. You can filter Cost Management reports by tag and create budget alerts per tag.
- Always set a budget alert in Azure Cost Management before deploying a new VM, set the threshold at 80% of expected monthly cost so you get a warning before you're over budget.
- Enable Azure Update Manager (formerly Azure Automation Update Management) at deployment time for automatic OS patching, manual patching is the most skipped maintenance task and the one that gets exploited most often.
- Use Availability Zones for any VM that will run for more than a week, the 99.99% SLA versus a single-instance SLA is a meaningful difference for anything business-critical.
- Store SSH keys in Azure Key Vault rather than on your local machine, when a team member leaves or a laptop gets lost, you can rotate access without rebuilding the VM.
Frequently Asked Questions
What do I need to think about before creating an Azure virtual machine?
Before you click Create, you need firm answers to seven things: what you're naming all the resources (names can't be changed later for most resource types), which region/location will store your VM's disks, what VM size fits your workload's CPU/memory/storage/network requirements, whether you anticipate needing multiple VMs and how they'll scale, which operating system the VM needs to run, how the VM will be configured after it starts (startup scripts, extensions, cloud-init), and which supporting resources it needs, virtual network, NIC, NSG, IP addresses, and disks. Skipping even one of these up front is typically what causes problems you'll spend hours debugging later. The Azure Well-Architected Framework Virtual Machine considerations page is a solid checklist for production deployments.
Why does my Azure VM deployment keep failing with AllocationFailed?
AllocationFailed means Azure couldn't find capacity for your requested VM size in the region or availability zone you selected. This is a supply constraint on Azure's side, not a problem with your account or configuration. Your fastest options are: switch to a different region (try East US 2 if East US is failing), switch to a different availability zone within the same region, or try a different VM size series with equivalent specs, for example, Dsv6 instead of Dasv7. If you need a specific region and size combination for compliance or latency reasons, open a support request, Microsoft can sometimes provision quota ahead of time for planned workloads.
I deployed my Azure VM successfully but I can't connect to it, what's wrong?
Nine times out of ten, this is a Network Security Group rule issue. In the portal, go to your VM → Networking → Network settings and check your inbound port rules. For Linux VMs, TCP port 22 must be allowed. For Windows VMs, TCP port 3389 must be allowed. If the rules are there but you're still blocked, check whether the source IP is correctly set, if it's set to your specific IP address and your IP has changed (common with home ISPs), update it. Also verify the VM is in a Running state and not Stopped, a Stopped (deallocated) VM loses its public IP unless you're using a static IP address reservation.
How much does an Azure virtual machine actually cost per month?
Azure charges an hourly price based on your VM's size and operating system. For partial hours, you only pay for the minutes used. But the VM compute cost is just one line item, storage is priced and charged completely separately. You're also paying for your OS disk (charged at the managed disk rate), any data disks you attach, a public IP address if you're using one, and outbound data transfer over a certain threshold. The Azure Hybrid Benefit can meaningfully cut costs if you have existing Windows Server or SQL Server licenses with Software Assurance. Use the Azure Pricing Calculator and the Virtual machines selector tool in the portal to get a full cost estimate before deploying, not after.
What's the difference between stopping and deallocating an Azure VM?
This one catches a lot of people off guard. If you "Stop" a VM from inside the operating system (like running shutdown now on Linux), the VM transitions to a Stopped state but Azure still charges you for the compute because the underlying hardware is still reserved for your VM. To actually stop billing for compute, you need to Deallocate the VM, in the portal, click the Stop button at the top of the VM blade, and Azure will ask to confirm deallocating it. When deallocated, you no longer pay for the VM's compute, though you still pay for the OS disk and any other attached storage. Note: a deallocated VM loses its dynamic public IP address, so if you need a consistent IP, assign a static (reserved) public IP before stopping.
How do I achieve 99.99% uptime SLA with Azure Virtual Machines?
Microsoft's 99.99% SLA for Azure VMs requires two specific conditions: you need two or more VM instances, and they must be deployed across two or more Availability Zones within the same Azure region. A single VM, even in an availability zone, only gets a lower SLA. The practical way to meet this for most apps is to deploy your VMs inside a Virtual Machine Scale Set with zone-redundant deployment and put them behind an Azure Load Balancer. This also gives you automatic instance scaling based on demand or schedule. For more on the exact SLA terms, check the SLA for Azure Virtual Machines page in Microsoft's official service descriptions.