How to Troubleshoot Azure Virtual Machines Windows
Why This Is Happening
I've seen this exact scenario play out more times than I can count: you spin up an Azure Windows VM, or you go to connect to one that was running fine yesterday, and suddenly you're staring at a spinning wheel, a cryptic error code, or worse , a completely blank screen in the Azure portal. Your work is blocked. Maybe a production workload is down. The pressure is on, and Microsoft's generic error messages aren't telling you anything useful.
Azure Virtual Machines running Windows can fail in a surprisingly wide variety of ways, and the root cause isn't always obvious. The Azure platform itself is adding a layer of abstraction between you and the hardware, so when something goes wrong, you're troubleshooting both the Windows OS layer and the cloud infrastructure layer simultaneously. That's what makes Azure VM Windows troubleshooting genuinely tricky , even for experienced engineers.
Here are the most common reasons I see Azure Windows VMs misbehave:
- RDP connection refused (error 0x00000104, "Connection Failed"), The Network Security Group (NSG) is blocking port 3389, or the Remote Desktop service inside the VM stopped running.
- VM stuck in "Stopping" or "Starting" state, A failed Azure fabric update or a Windows shutdown hang left the VM in a limbo state that the portal can't resolve automatically.
- Boot failure / BSOD (Stop codes 0x0000007B, 0x000000EF, 0xC000000E), A Windows Update corrupted a boot-critical driver, or the OS disk's BCD store is damaged.
- High CPU / memory causing the VM to be unresponsive, A runaway process is consuming all resources, and because RDP itself is starved, you can't even get in to kill it.
- Disk I/O errors or disk full conditions, Azure managed disk throttling, a nearly full OS disk, or a corrupted file system.
- Windows activation failures (error 0xC004F074), The VM can't reach the Azure Key Management Service (KMS) endpoint
azkms.core.windows.neton port 1688.
I know this is frustrating, especially when it blocks your work or affects your users. The good news is that Azure gives you powerful out-of-band tools like Boot Diagnostics, Serial Console, and VM Repair that let you fix Windows problems even when you can't get a normal RDP session. This guide walks you through all of them, in the order I'd personally attack the problem.
The Quick Fix, Try This First
Before you dive into Boot Diagnostics or disk repairs, do this first. It sounds too simple, but it resolves roughly 40% of Azure Windows VM problems I encounter, especially "VM won't start" and RDP failures.
Redeploy the VM from the Azure portal. This moves your VM to a different physical host node in the Azure datacenter without changing your data, IP, or configuration. Azure fabric issues, node failures, hypervisor bugs, corrupt VM agent state, vanish when you move to fresh hardware.
- Open the Azure portal at portal.azure.com.
- Navigate to Virtual Machines → click your VM name.
- In the left sidebar, scroll to the Help section and click Redeploy + Reapply.
- Click the Redeploy button. The VM will go through a stop/deallocate/restart cycle on a new host. This takes 5–15 minutes.
While that's running, also check your Network Security Group. Many "RDP not working" calls I get are simply a misconfigured NSG. Go to your VM → Networking → Inbound port rules. You need an inbound rule allowing TCP port 3389 from your source IP (or from Any if this is a dev/test environment). If that rule is missing or has a lower priority number than a Deny-All rule, RDP will never reach the VM, it doesn't matter how healthy Windows is inside.
If redeployment doesn't fix it, also try Reset Remote Access: VM → Help → Reset password → switch the mode dropdown to Reset configuration only → click Update. This reinstalls the Remote Desktop Services configuration without touching your password or data. I've seen this bring back dozens of VMs where RDP just inexplicably stopped accepting connections after a Windows Update.
If the quick fix didn't work, the next thing I always do is pull up Boot Diagnostics. This is Azure's built-in screen capture of the VM's display output, essentially a screenshot of what you'd see if you had a physical monitor plugged in. It tells you immediately whether Windows is booting, stuck on a BSOD, or showing a login screen that just isn't reachable via RDP.
Navigate to your VM in the Azure portal → Help → Boot diagnostics → click Screenshot. Wait a few seconds for the screenshot to refresh. What you see will tell you which path to take:
- Windows login screen visible: Windows booted fine. Your problem is network/RDP-level, not OS-level. Jump to Step 2.
- Blue screen with a stop code (e.g.,
CRITICAL_PROCESS_DIED 0x000000EForINACCESSIBLE_BOOT_DEVICE 0x0000007B): Windows is crashing before RDP starts. Jump to Step 3. - Black screen or "Preparing Automatic Repair": Windows Recovery Environment (WinRE) kicked in. This usually means a bad Windows Update or corrupted boot files. Jump to Step 4.
- Azure loading screen spinning indefinitely: The VM agent or Azure Guest Agent isn't responding. Try Stop (Deallocate) and Start again before proceeding.
You can also open the Serial Log from the Boot Diagnostics blade, click Serial log. This shows the raw text output of the Windows boot sequence, including driver loading messages and Windows Error Recovery entries. Look for lines containing BOOTMGR, winload, or disk error to pinpoint where the boot is failing.
If boot diagnostics is disabled on your VM, you'll see a message asking you to enable it. Enable it, then stop (deallocate) and restart the VM so it can capture the next boot cycle.
If the Boot Diagnostics screenshot shows a Windows login screen but you still can't RDP in, Azure Serial Console is your best friend. It gives you a direct keyboard connection to the VM's serial port, no network required. You can run PowerShell commands, fix NSG-independent Windows Firewall rules, restart services, and more, all without an RDP session.
To access it: VM → Help → Serial console. You'll be prompted to log in with your VM administrator credentials. Once you're at the Special Administration Console (SAC) prompt, type cmd and press Enter, then type ch -si 1 and press Enter to get a CMD channel.
Once inside, run these diagnostic commands:
:: Check if RDP service is running
sc query TermService
:: Check Windows Firewall status
netsh advfirewall show allprofiles
:: Verify RDP is enabled in the registry
reg query "HKLM\SYSTEM\CurrentControlSet\Control\Terminal Server" /v fDenyTSConnections
:: Check which IPs are allowed by Windows Firewall for RDP
netsh advfirewall firewall show rule name="Remote Desktop - User Mode (TCP-In)"
If fDenyTSConnections returns 0x1, RDP is disabled at the OS level. Fix it:
reg add "HKLM\SYSTEM\CurrentControlSet\Control\Terminal Server" /v fDenyTSConnections /t REG_DWORD /d 0 /f
net start TermService
If the Windows Firewall RDP rule is missing, add it back:
netsh advfirewall firewall add rule name="Remote Desktop" protocol=TCP dir=in localport=3389 action=allow
After running these, try RDP again from your local machine. If it works, you should see the Windows login screen appear normally in your RDP client.
This is the big one. If Boot Diagnostics showed a BSOD stop code, particularly 0x0000007B INACCESSIBLE_BOOT_DEVICE, 0x000000EF CRITICAL_PROCESS_DIED, or 0xC000000E, Windows itself is broken. You can't fix it from inside the VM because Windows can't even start. You need to attach the OS disk to a healthy "repair VM" as a data disk, fix the problem there, then reattach it.
Microsoft provides an Azure CLI extension that automates most of this. Install it first:
az extension add --name vm-repair
Then create the repair VM:
az vm repair create \
--resource-group MyResourceGroup \
--name MyBrokenVM \
--repair-username repairadmin \
--repair-password "YourSecureP@ssw0rd!" \
--verbose
This creates a new Windows Server VM, stops your broken VM, detaches its OS disk, and attaches it to the repair VM as drive D: (or E:, depending on existing disks). RDP into the repair VM using the credentials you specified above.
Once inside the repair VM, open an elevated PowerShell prompt and run the following to fix the most common boot issues, a corrupted BCD store:
# Identify the attached OS disk (usually D: or E:)
Get-Disk | Where-Object {$_.OperationalStatus -eq "Online"}
# Fix BCD on the attached disk (replace D: with your actual drive letter)
bootrec /rebuildbcd /s D:
bootrec /fixmbr /s D:
bootrec /fixboot /s D:
# Run CHKDSK on the attached OS volume
chkdsk D: /f /r
For 0x0000007B specifically (often caused by a Windows Update changing storage controller drivers), you may also need to run:
dism /image:D:\ /enable-feature /featurename:Microsoft-Windows-StorPort-Driver /all
After repairs, run az vm repair restore to reattach the fixed disk and delete the repair VM.
If your Azure Windows VM is alive but completely unresponsive, RDP times out, Serial Console is sluggish, the culprit is almost always CPU or memory exhaustion. A runaway process is starving everything else, including the Azure Guest Agent that normally lets you manage the VM.
First, check Azure Monitor to confirm. In the portal: VM → Monitoring → Metrics. Add the metrics Percentage CPU and Available Memory Bytes. If CPU is pegged at 95–100% and memory is near zero, you have a resource exhaustion problem.
The fastest way to kill a process without an RDP session is Run Command: VM → Operations → Run command → select RunPowerShellScript. This executes PowerShell directly via the Azure Guest Agent, bypassing RDP entirely.
# List top 10 CPU-hungry processes
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, CPU, WorkingSet
# Kill a specific offending process (replace 'MsMpEng' with your process name)
Stop-Process -Name "MsMpEng" -Force
# Or by PID
Stop-Process -Id 4521 -Force
# Check if Windows Update is the cause (a common culprit)
Get-Service wuauserv | Select-Object Status
Stop-Service wuauserv -Force
If Windows Defender's MsMpEng.exe is the offender (very common on freshly deployed VMs), it's typically doing an initial full disk scan. You can schedule it for off-hours or exclude the noisy paths via Run Command:
Add-MpPreference -ExclusionPath "C:\ProgramData\SomeApp"
Set-MpPreference -ScanScheduleDay 0 # 0 = disable scheduled scans temporarily
After the immediate crisis is resolved, resize the VM to a larger SKU if the workload genuinely needs more resources. In the portal: VM → Availability + scale → Size → pick a size with more vCPUs or RAM. Azure can resize most VMs without data loss, though it requires a restart.
Two problems I see constantly on Azure Windows VMs that people overlook: Windows activation failures and a broken Azure Guest Agent. Both are silent killers, the VM appears to run fine, but activation errors eventually trigger Windows' reduced functionality mode, and a broken Guest Agent means you lose access to Run Command, VM extensions, and password resets.
Fixing Windows Activation (error 0xC004F074): Azure uses its own KMS server. Your VM needs to reach azkms.core.windows.net on TCP port 1688. If you have a custom DNS server or restrictive NSG rules, this traffic may be blocked. Run this from inside the VM (via Serial Console or Run Command) to test:
# Test KMS connectivity
Test-NetConnection -ComputerName azkms.core.windows.net -Port 1688
# Force re-activation pointing at Azure KMS
cscript C:\Windows\System32\slmgr.vbs /skms azkms.core.windows.net:1688
cscript C:\Windows\System32\slmgr.vbs /ato
# Check current activation status
cscript C:\Windows\System32\slmgr.vbs /dli
Fixing the Azure Guest Agent: If the portal shows your VM as running but Run Command fails with "VM agent not ready," the Guest Agent service (WindowsAzureGuestAgent) has crashed or corrupted its state. Fix it via Run Command if it's partially working, or via Serial Console:
# Check Guest Agent service status
Get-Service WindowsAzureGuestAgent
Get-Service RdAgent
# Restart both services
Restart-Service WindowsAzureGuestAgent -Force
Restart-Service RdAgent -Force
# If services won't start, check for corrupt agent files
Get-Item "C:\WindowsAzure\Packages\GuestAgent\*"
If the Guest Agent is thoroughly broken, the cleanest fix is to download and reinstall it. From Serial Console, download the latest installer from the Microsoft update catalog and run it silently. You can also redeploy the VM (as described in the Quick Fix section), that typically resets the Guest Agent state automatically. After a successful Guest Agent restart, the portal should show the VM status as "Running" with extensions available again.
Advanced Troubleshooting
If the steps above haven't resolved your Azure Windows VM issue, you're dealing with something deeper. Here's how I approach the harder cases.
Reading Event Viewer Logs via the Repair VM
When you can't boot into Windows, attach the OS disk to a repair VM (as described in Step 3) and read the event logs offline. From an elevated PowerShell prompt on the repair VM:
# Load the System event log from the attached disk
$log = [System.Diagnostics.Eventing.Reader.EventLogSession]::new("D:")
Get-WinEvent -ComputerName "D:" -LogName System -MaxEvents 50 |
Where-Object {$_.LevelDisplayName -eq "Critical" -or $_.LevelDisplayName -eq "Error"} |
Format-List TimeCreated, Id, Message
Event IDs to pay special attention to: 41 (kernel power failure, unexpected restart), 6008 (unexpected shutdown), 7023 (service terminated with error), and 1074 (initiated system restart). These tell you what crashed and when.
Group Policy and Domain-Joined VMs
If your Azure VM is domain-joined, Group Policy can override local settings in ways that cause bizarre failures. I've seen GPOs disable RDP, enforce firewall rules that block critical ports, and even interfere with Azure's own agent communications. To check what GPOs are applied:
gpresult /h C:\Temp\gpreport.html /f
Start-Process "C:\Temp\gpreport.html"
If a GPO is the culprit, work with your Active Directory team to either modify the policy or create an exception for the affected VM's OU. For emergency recovery, you can temporarily break the domain join from the repair VM by editing the registry:
reg load HKLM\TempSystem D:\Windows\System32\config\SYSTEM
reg add "HKLM\TempSystem\ControlSet001\Services\Netlogon\Parameters" /v DisablePasswordChange /t REG_DWORD /d 1 /f
reg unload HKLM\TempSystem
Network-Level Diagnosis with Azure Network Watcher
For persistent connectivity problems to Azure Windows VMs, Azure Network Watcher is underused. In the portal: search Network Watcher → Connection troubleshoot. Set the source as your VM, destination as your local IP (or another VM), protocol TCP, port 3389. Network Watcher will trace the path, check NSG rules, and tell you exactly which rule is blocking traffic, including rules you may have forgotten exist on the subnet NSG versus the NIC NSG.
Disk Throttling and Storage Errors
Azure managed disks have IOps and throughput limits based on disk size and SKU. If your VM's workload exceeds these limits, you'll see I/O errors in the application logs (event ID 153 in the System log: "The IO operation at logical block address X was retried"). Check disk metrics in Azure Monitor under the VM's Disk tab. If you're hitting throttling, upgrade to a Premium SSD or Ultra Disk SKU, or add a read cache policy for the OS disk.
If you've worked through every step in this guide and your Azure Windows VM still won't recover, it's time to escalate. Specifically, reach out if: the VM is stuck in a "Failed" provisioning state for more than 30 minutes, Boot Diagnostics shows no screenshot at all (indicating a hypervisor-level problem), or you're seeing Azure fabric errors in the Activity Log that reference internal Azure components. Open a support case at Microsoft Support with your VM's resource ID, the approximate time the issue started, and the Boot Diagnostics screenshot saved as evidence. Azure engineers can access host-level logs you can't see from the portal.
Prevention & Best Practices
The best Azure Windows VM troubleshooting session is the one you never have to do. After years of supporting enterprise Azure environments, these are the practices I push every team to adopt, not after the first outage, but before it.
Enable Boot Diagnostics on every VM at creation time. I cannot stress this enough. When a VM goes down at 2 AM and you're trying to figure out if it's a BSOD or a network issue, you will be grateful you turned this on. It's free for basic diagnostics (screenshots and serial logs) when pointed at a managed storage account. Enable it in the VM creation wizard under the Management tab.
Tag your NSGs and document your inbound rules. NSG misconfiguration is the number-one cause of "VM is fine, RDP just doesn't work" tickets. Maintain a spreadsheet or use Azure Policy to enforce that all NSGs have a description field filled in for each rule. When someone on your team adds a rule that accidentally blocks RDP, you'll know who added it and why.
Configure Azure VM Backup from day one. Azure Backup for VMs is application-consistent (using VSS snapshots) and costs pennies per GB per month. Go to your VM → Operations → Backup and set up a recovery services vault. If a Windows Update destroys your VM's boot capability, you can restore to a previous snapshot in 20 minutes instead of spending 3 hours on disk repair.
Monitor disk space proactively with Azure Monitor Alerts. A full OS disk will cause Windows to behave erratically, event log writes fail, temp files can't be created, and the paging file can't expand. Set an alert: Azure Monitor → Alerts → + Create → metric: OS Disk Used Burst BPS Credits Percentage or use a custom Log Analytics query on the InsightsMetrics table for disk free space below 10%.
- Always use a static private IP for VMs running services, dynamic IPs can change after a stop/deallocate cycle and break DNS records.
- Install the Azure Monitor Agent extension on every Windows VM so you get guest OS metrics (memory, disk) in addition to host-level metrics.
- Set the auto-shutdown schedule for dev/test VMs to avoid paying for VMs that get forgotten, go to VM → Operations → Auto-shutdown.
- Keep the Windows pagefile on D: (the temp disk) rather than C:, the temp disk is local SSD on most Azure VM sizes and significantly faster for pagefile I/O, reducing the chance of performance-induced unresponsiveness.
Frequently Asked Questions
My Azure Windows VM says "Running" in the portal but I can't RDP in, what's going on?
The "Running" status in the Azure portal reflects the hypervisor state, not the Windows OS state. The VM's hardware is on and the hypervisor is happy, but Windows itself may be stuck at a blue screen, waiting at a login prompt with a bad RDP config, or the Guest Agent may have crashed. Open Boot Diagnostics (VM → Help → Boot diagnostics → Screenshot) to see what Windows is actually showing on its virtual display. Nine times out of ten, this immediately tells you whether you have an OS-level problem or a network/RDP-level problem.
Azure VM is stuck in "Stopping" state for over an hour, how do I force it to stop?
A VM stuck in "Stopping" is usually caused by Windows hanging during the shutdown sequence, a service that won't stop, or a failed update. You can't just click Stop again in the portal; it'll keep spinning. The fix is to use the Azure CLI or PowerShell to force a power-off at the hypervisor level: run az vm power-off --resource-group MyRG --name MyVM (not az vm stop, which tries a graceful shutdown, use power-off which is the equivalent of pulling the power cord). Then run az vm start afterward. Your data will be fine as long as your OS disk is on a managed disk, but unsaved in-memory state will be lost.
After a Windows Update, my Azure VM won't boot and shows 0x0000007B, is my data safe?
Yes, your data is almost certainly safe. Error 0x0000007B INACCESSIBLE_BOOT_DEVICE means Windows can't find its storage controller driver during boot, typically because a Windows Update removed or replaced a critical driver (often the StorPort or Stornvme driver). Your files on C: and any data disks are untouched. Follow Step 3 in this guide to use the Azure VM Repair extension, attach the OS disk to a repair VM, and run bootrec /rebuildbcd and the DISM StorPort restore command. Most people have this fixed within 45 minutes without losing a single file.
How do I fix Azure Windows VM RDP error "Remote Desktop can't connect to the remote computer"?
This error has three common causes, and you need to rule them out in order. First, check your NSG inbound rules to make sure TCP 3389 is allowed from your source IP, the portal's Networking tab on the VM blade shows this clearly. Second, check Windows Firewall inside the VM using Serial Console (run netsh advfirewall show allprofiles) to confirm the RDP firewall rule is enabled. Third, verify the Remote Desktop service is running and that fDenyTSConnections is set to 0 in the registry, both are checkable and fixable via Serial Console or Run Command without needing a working RDP session.
My Azure VM disk is full and Windows is acting weird, how do I clean it up without RDP?
Use Azure's Run Command feature (VM → Operations → Run command → RunPowerShellScript) to run disk cleanup commands without needing an RDP session. Start by finding what's eating space: Get-ChildItem C:\ -Recurse -ErrorAction SilentlyContinue | Sort-Object Length -Descending | Select-Object -First 20 FullName, Length. Common culprits on Azure VMs are Windows Update cache in C:\Windows\SoftwareDistribution\Download (safe to delete when Windows Update isn't running), IIS logs in C:\inetpub\logs, and Azure Diagnostics logs under C:\WindowsAzure\Logs. You can safely clear the SoftwareDistribution folder by stopping the wuauserv service first, then deleting the Download subfolder contents.
Can I troubleshoot an Azure Windows VM without the Azure CLI installed locally?
Absolutely. Everything in this guide can be done through the Azure portal's web UI, including Boot Diagnostics, Serial Console, Run Command, Redeploy, and Reset Remote Access, no CLI required. The Azure CLI commands I've included are faster and more scriptable, but they're optional. If you do want the CLI without installing it locally, use Azure Cloud Shell, click the Cloud Shell icon (looks like a terminal prompt >_) in the top-right of the portal. It gives you a pre-authenticated Bash or PowerShell environment with the Azure CLI already installed, running in your browser.