Azure Virtual Machines Not Responding, Boot, RDP/SSH, and Disk Fixes

Microsoft Fix Intermediate 18 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

You log into the Azure portal, click your VM, hit Connect, and nothing. Or maybe the VM shows as "Running" in the dashboard but still won't accept an RDP or SSH connection. Or it booted halfway and dropped into a GRUB rescue shell, staring back at you with a blinking cursor while your users are filing support tickets. I've seen every one of these scenarios, and I know how frustrating it is when Azure's error messages are vague to the point of being useless.

Azure Virtual Machines failing to respond falls into a handful of distinct root cause categories, and understanding which one you're facing is half the battle. Here's the breakdown:

Boot-level failures. These happen when the OS itself can't start. Common culprits include corrupted GRUB bootloader configuration, a bad kernel update that introduced a kernel panic on startup, UEFI boot failures tied to Secure Boot settings, and fstab misconfiguration that causes the filesystem to refuse to mount. After applying kernel patches or changing disk configurations without updating fstab properly, these are disturbingly common.

Network and connectivity failures. Your VM is actually running fine, you just can't reach it. This happens when the guest network adapter inside the VM gets disabled, when a firewall rule or Network Security Group (NSG) blocks port 3389 (RDP) or port 22 (SSH), or when the VM's public IP address has been deallocated and reassigned. Windows VMs also occasionally lose RDP connectivity because the Remote Desktop service crashes or gets disabled by a Group Policy change.

Authentication and credential failures. These show up as "authentication errors" during RDP or SSH login. On Windows VMs, this often traces back to NLA (Network Level Authentication) settings, a locked or expired account, or a misconfigured security policy. On Linux VMs, the single biggest culprit I see is SELinux misconfiguration, the VM boots fine, SSH is listening, but the connection gets silently dropped because SELinux is denying the sshd process access to its required files.

Disk and storage failures. Managed disk corruption, fstab entries pointing to a device that no longer exists, BitLocker key issues on Windows Generation 2 VMs, and Virtual Disk Management problems can all prevent a VM from starting or make it unresponsive mid-session.

Extension and agent failures. If the Azure Linux VM Agent or Windows VM Agent stops responding, Azure loses the ability to push commands and retrieve status from your VM. This can make a perfectly healthy VM look dead in the portal, no extension status, no heartbeat, no way to run scripts remotely.

The good news: every single one of these is fixable without losing your data. You just need to know which door to open first. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you go deep into boot diagnostics and disk repairs, there's one move that resolves a surprising number of Azure Virtual Machines not responding situations in under two minutes: redeploy the VM to a new Azure host node.

Azure's physical host infrastructure occasionally has transient hardware or hypervisor issues that cause a VM to become unresponsive without any visible error in the guest OS. Redeployment moves your VM to a fresh host node, reattaches your disks, and restarts everything from a known-good hardware baseline. Your data is untouched.

Here's exactly how to do it:

  1. Open the Azure portal at portal.azure.com and sign in.
  2. Navigate to Virtual Machines in the left menu, then click your VM's name.
  3. In the VM's left-hand menu, scroll down to the Help section and click Redeploy + Reapply.
  4. On the blade that opens, click the Redeploy button. Confirm when prompted.
  5. Wait 5–10 minutes. The portal will show your VM status cycling through "Updating" and back to "Running."
  6. Try your RDP or SSH connection again.

If your VM is Windows-based and you've forgotten the local administrator password, which sometimes happens after a redeploy changes the hostname or triggers a policy refresh, you can reset credentials directly from the portal. Go to your VM → HelpReset Password. Choose Reset password mode, enter a new username/password, and click Update. No OS reinstall required.

For Linux VMs that are completely unreachable via SSH, try the Run Command feature under Operations in the VM menu. This lets you execute shell commands directly through the Azure agent without needing an SSH connection, so you can restart the sshd service, check its status, or even fix a broken config file, all from the portal.

Pro Tip
Before you redeploy, enable Boot Diagnostics first if it isn't already on, go to your VM → Support + TroubleshootingBoot diagnosticsEnable with managed storage account. The screenshot that Boot Diagnostics captures at startup is the single fastest way to diagnose a GRUB rescue loop, kernel panic, or fstab error without attaching the disk to another VM.
1
Run Azure Boot Diagnostics and Capture the Startup Screenshot

When your Azure Virtual Machine won't boot and you can't connect, you're essentially debugging blind, unless you use Boot Diagnostics. This feature captures a screenshot of whatever the VM is displaying on its virtual monitor at startup, and it also stores the serial console log. These two things together tell you almost everything you need to know about why the VM isn't responding.

To access it: open your VM in the Azure portal, scroll the left menu to Support + Troubleshooting, and click Boot diagnostics. If you see a screenshot showing a GRUB rescue prompt, a kernel panic trace, or a Windows blue screen with a stop code, you've already identified your problem category.

Common things you'll see and what they mean:

  • GRUB rescue> prompt, The bootloader can't find the kernel or its configuration files. Usually caused by a bad kernel update or a disk geometry change.
  • Kernel panic, not syncing, A kernel module failed to load, often after applying a kernel update that introduced a dependency issue.
  • Windows blue screen with INACCESSIBLE_BOOT_DEVICE, Windows can't find or read its system drive. Often triggered when Hyper-V drivers are disabled inside the guest.
  • BitLocker recovery screen, The VM's system drive is BitLocker-encrypted and the recovery key is being requested. This commonly appears after a hardware change or host migration.
  • Waiting for /dev/sda1 (or similar), A Linux fstab entry is pointing to a device that doesn't exist at that path, and the system is hung waiting for it to appear.

You can also open the Serial console from this same menu section. The serial console gives you an interactive terminal to the VM even when SSH and RDP are completely down, it's one of the most powerful tools in Azure's troubleshooting toolkit and it's under-used by most people who manage Azure VMs.

Once you know what category of failure you're dealing with, continue to the relevant step below.

2
Fix RDP Connection Failures on Windows Azure VMs

RDP not connecting is one of the most common Azure Virtual Machines problems I see reported. The error messages Windows gives you, "Remote Desktop can't connect to the remote computer" or "An authentication error has occurred (Code 0x80004005)", are almost deliberately unhelpful. Here's how to actually fix them.

Step one: check the NSG rules. In the Azure portal, navigate to your VM → Networking. Look at the inbound port rules for your Network Security Group. You need a rule allowing TCP traffic on port 3389 from your source IP. If that rule is missing, click Add inbound port rule and create it. This alone fixes probably 30% of RDP failures.

Step two: check if the VM guest NIC is disabled. Sometimes after an update or policy application, the network adapter inside the VM guest OS gets disabled. You can't fix this via RDP since you can't connect, but you can use the Run Command feature in the Azure portal (under Operations) to run a PowerShell command that re-enables it:

Get-NetAdapter | Where-Object {$_.Status -eq "Disabled"} | Enable-NetAdapter

Step three: diagnose with Event ID. If you can get partial access via the Azure Serial Console, check the Windows Event Viewer for specific RDP-related Event IDs. Event ID 1149 in the TerminalServices-RemoteConnectionManager log means authentication succeeded but the session failed to start. Event ID 4625 in the Security log means authentication failed, and the SubStatus code tells you exactly why (0xC000006D = bad credentials, 0xC0000064 = account doesn't exist, 0xC000006A = wrong password for existing account).

Step four: use the portal's Reset Password tool. Go to your VM → HelpReset Password. For authentication errors, sometimes selecting Reset configuration only (not changing the password) is enough, this resets the RDP configuration itself, re-enables the Remote Desktop service, and sets the firewall rules. It's the equivalent of the old "fix RDP" script, built right into the portal.

If none of these work, redeploy the VM to a new node as described in the Quick Fix section. RDP connectivity issues tied to the underlying host resolve immediately after redeploy.

3
Fix SSH Connection Failures on Linux Azure VMs

SSH failing on a Linux Azure Virtual Machine has its own distinct set of causes compared to Windows RDP issues. I want to walk you through the most impactful ones, especially SELinux, which bites people constantly and barely appears in generic troubleshooting guides.

SELinux misconfiguration is the silent killer of SSH sessions. Your SSH client will time out or get a "Connection refused" even though sshd is running and listening on port 22. SELinux is blocking sshd from accessing its required keys or socket files, and it does it silently by default. To diagnose this from the Azure Serial Console:

sudo getenforce
sudo ausearch -m avc -ts recent | grep sshd

If getenforce returns Enforcing and you see AVC denials for sshd, the fix is to restore the correct SELinux context on the SSH host key files:

sudo restorecon -R /etc/ssh/

Then restart sshd: sudo systemctl restart sshd. This resolves most SELinux-related SSH failures immediately.

Check your NSG just like you would for RDP. Go to your VM → Networking in the portal and verify that TCP port 22 is allowed inbound. Also check that the VM's internal firewalld or iptables rules aren't blocking it:

sudo firewall-cmd --list-all
sudo iptables -L INPUT -n -v | grep 22

Debian-specific SSH issues. If you're running a Debian Linux VM on Azure and SSH is failing after an OS update, check whether the update replaced /etc/ssh/sshd_config with a default version that doesn't match your configuration. The Azure Linux VM Agent for Debian has specific provisioning behavior you need to be aware of, after a fresh provisioning cycle, the host keys may be written to a different location than expected. Use Boot Diagnostics and the Serial Console to inspect /var/log/auth.log for the actual connection rejection reason.

VM Agent not responding. If Run Command in the portal also fails, the Azure Linux VM Agent (walinuxagent) may have stopped. From the Serial Console: sudo systemctl status walinuxagent. If it's not running, start it with sudo systemctl start walinuxagent and enable it to persist across reboots with sudo systemctl enable walinuxagent.

4
Resolve GRUB Rescue, Kernel Panic, and UEFI Boot Failures

This is where things get genuinely technical, but don't let that scare you. Boot failures on Azure Virtual Machines are very recoverable. The key is that you need to repair the OS disk while it's attached to a different VM (called a repair VM), since you obviously can't boot into a broken OS to fix it.

The Azure VM repair process:

  1. In the Azure portal, go to your broken VM and click Stop to deallocate it.
  2. Take a snapshot of the OS disk: VM → Disks → click the OS disk → Create snapshot. Name it something like broken-vm-os-snapshot and save it. This is your safety net.
  3. Create a new disk from that snapshot: go to DisksCreate → select Source: Snapshot and pick your snapshot.
  4. Spin up a small repair VM (same OS family, a working Debian or Ubuntu VM for Linux, a Windows Server VM for Windows) and attach the new disk to it as a data disk.

For GRUB rescue on Linux: Mount the attached OS disk inside your repair VM, then reinstall GRUB. Assuming the disk appears as /dev/sdc:

sudo mkdir /mnt/broken
sudo mount /dev/sdc1 /mnt/broken
sudo mount --bind /dev /mnt/broken/dev
sudo mount --bind /proc /mnt/broken/proc
sudo mount --bind /sys /mnt/broken/sys
sudo chroot /mnt/broken
grub-install /dev/sdc
update-grub
exit

For UEFI boot failures: Azure Generation 2 VMs use UEFI instead of BIOS. Boot failures here are often caused by a corrupted EFI System Partition (ESP). Azure's official documentation on Linux VM UEFI boot failures notes that the ESP must be intact and the bootloader entry must be correctly registered. Inside your chroot, check efibootmgr -v and recreate the boot entry if it's missing.

For Windows INACCESSIBLE_BOOT_DEVICE / disabled Hyper-V drivers: This happens when someone installs Windows updates inside the VM and a change disables the synthetic Hyper-V storage drivers that Azure depends on. Attach the OS disk to a repair VM and use the Offline Registry Editor to re-enable the storvsc and vmbus drivers in HKLM\SYSTEM\CurrentControlSet\Services, set their Start value to 0.

5
Fix fstab Errors and Managed Disk Problems

fstab errors are a rite of passage for anyone managing Linux Azure Virtual Machines. One wrong entry in /etc/fstab, a UUID that no longer matches, a mount point that doesn't exist, or a network share that times out, and your VM hangs at boot waiting for a device that never shows up. It will sit there for 90 seconds, then drop to emergency mode. You won't be able to SSH in at all.

Diagnosing the fstab error: Boot Diagnostics will show you a screen that says something like Dependency failed for /data. Job dev-disk-by\x2duuid-XXXXXXXX.device timed out. That UUID is the device that Linux can't find.

Fixing fstab from the Serial Console: If your VM has the Azure Serial Console enabled (and it should, enable it now if not), you can often get an emergency root shell directly. At the emergency mode prompt, type the root password or just press Enter if none is set. Then edit fstab:

sudo nano /etc/fstab

Find the offending line, it's usually a data disk entry with a stale UUID or a network mount. Either comment it out by adding # at the start, or correct the UUID. To find the correct UUID for your attached disks:

sudo blkid

Save the file (Ctrl+O, Enter, Ctrl+X in nano), then reboot: sudo reboot. The VM should come up clean.

Azure Managed Disk issues: If a data disk detaches unexpectedly or shows as unavailable in the guest OS, check the Azure portal first. Go to your VM → Disks. If the disk shows as "Unattached," reattach it from the portal by clicking Attach existing disk. If the disk shows attached but the guest can't see it, run sudo rescan-scsi-bus.sh (Linux) or use Device Manager to scan for hardware changes (Windows) to force the OS to detect the newly attached disk without rebooting.

Disk encryption (BitLocker) problems: On Windows VMs with Azure Disk Encryption enabled, if the VM can't unlock the OS drive at boot, you'll see a BitLocker recovery screen. The recovery key is stored in your Azure Key Vault. Go to your Key Vault in the portal, find the secret named after your VM, copy the 48-digit recovery key, and enter it at the BitLocker prompt via the Serial Console. After recovery, verify your Key Vault access policies are correctly configured so this doesn't recur.

Advanced Troubleshooting

When the standard steps don't resolve your Azure Virtual Machines issue, you're likely dealing with something at the infrastructure, policy, or extension layer. These are the scenarios I see most often in enterprise environments.

VM extensions stopping or reporting no status. If you're seeing "Extension status isn't reported" in the Azure portal, the VM Agent has lost communication with the Azure fabric. On Linux, check the agent log: sudo cat /var/log/waagent.log | tail -100. Look for HTTP errors or certificate failures. On Windows, check C:\WindowsAzure\Logs\WaAppAgent.log. A common enterprise cause is an internal firewall or proxy blocking the VM Agent's outbound HTTPS calls to the Azure host endpoint (168.63.129.16). This IP address must always be reachable from inside the VM, it's Azure's internal metadata and agent communication endpoint.

Generation 2 Linux VM access failures. Generation 2 VMs use UEFI and don't support legacy BIOS boot. If you're trying to access a Gen 2 Linux VM and it's failing to start, verify the VM's image supports Gen 2. Not all marketplace images do. Also verify that the OS disk's EFI partition is intact, Gen 2 VMs will refuse to boot entirely if the EFI System Partition is missing or corrupted, and there will be no fallback.

High CPU causing VM unresponsiveness. Sometimes a VM appears "not responding" but it's actually just pegged at 100% CPU. Azure provides Performance Diagnostics directly in the portal. Go to your VM → Support + TroubleshootingPerformance diagnostics. Run the diagnostic and it will analyze CPU, memory, disk I/O, and network bottlenecks inside the guest. On Windows VMs, high CPU issues are also diagnosable through Azure Monitor, look at the Percentage CPU metric under the VM's Metrics blade and correlate it with the time the VM became unresponsive.

Kernel changes breaking restart. A VM that can't restart after applying kernel changes is a specific and painful scenario. On Linux, the issue is almost always that the initramfs wasn't regenerated for the new kernel. Boot from the previous kernel (using the GRUB menu via Serial Console), then regenerate: sudo update-initramfs -u -k all (Debian/Ubuntu) or sudo dracut --force (RHEL/CentOS). On Windows, applying certain CUs can cause a failed restart loop, the Boot Diagnostics screenshot will show the stuck "Working on updates" screen. Giving it 30–40 minutes is sometimes legitimately the answer for large CU packages.

Isolating performance bottlenecks on Linux VMs. For Linux VMs that are responding but slowly, use Azure's built-in guidance on bottleneck isolation. Inside the guest: vmstat 1 10 for CPU and memory pressure, iostat -xz 1 10 for disk I/O wait, and sar -n DEV 1 10 for network utilization. Compare these against your Azure VM's advertised IOPS and throughput limits, it's surprisingly common to hit the disk IOPS ceiling on a Standard_DS2_v2 running a database workload.

When to Call Microsoft Support

If you've worked through all the steps above and your Azure Virtual Machine is still unresponsive, especially if Boot Diagnostics shows nothing (a blank screen at startup), if the Serial Console itself won't connect, or if the VM is stuck in a "Stopping" state for more than 20 minutes, it's time to open a support request. These symptoms often point to a problem at the Azure host infrastructure level that only Microsoft engineers can access. Go to Microsoft Support, log into the Azure portal, navigate to Help + SupportNew support request, and choose Technical as the issue type with Virtual Machine running Linux/Windows as the service. Make sure you have a support plan, Basic doesn't include technical support, so you'll need Developer, Standard, or higher.

Prevention & Best Practices

The best Azure Virtual Machines troubleshooting session is the one you never have to run. I've watched teams lose hours, sometimes days, to failures that were entirely avoidable. Here's what actually works for keeping Azure VMs healthy and responsive long-term.

Always enable Boot Diagnostics before you need it. I cannot stress this enough. When a VM is healthy, boot diagnostics feel unnecessary. The moment it fails to start and you don't have it enabled, you're debugging completely blind. Enable it on every VM, during provisioning, before it ever has a problem. It costs almost nothing and saves hours.

Keep the Azure VM Agent updated. Both the Windows Azure Guest Agent and the Linux VM Agent (walinuxagent) need to stay current. An outdated agent causes extension failures, status reporting gaps, and loss of portal-based management features like Run Command and Password Reset. For Linux VMs, check the agent version: waagent --version. The Azure documentation on the Linux VM Agent overview details the supported versions per distro, pin your VMs to supported image versions and agent packages.

Test fstab changes before rebooting. On Linux VMs, always validate your fstab changes before you reboot. The command sudo mount -a attempts to mount everything in fstab right now, without rebooting. If it throws an error, you've caught the mistake before it locks you out of the VM. This one habit prevents a huge percentage of Linux Azure VM boot failures.

Tag your VMs with maintenance windows and owner information. When a VM stops responding at 2 AM and nobody can figure out what it does or who manages it, recovery takes much longer. Use Azure resource tags: owner, environment, maintenance-window, criticality. These tags also drive Azure Monitor alert routing and can be used in Azure Policy to enforce backup and diagnostic settings.

Set up Azure Monitor alerts proactively. Configure alerts on Percentage CPU > 90% for 5 minutes, Disk Read Bytes/sec near the VM's IOPS limit, and Available Memory Bytes below a safe threshold. Getting an alert before a VM goes fully unresponsive gives you the window to act. Pair this with Azure's Troubleshoot performance issues through Monitoring documentation for recommended baseline thresholds per VM series.

Quick Wins
  • Enable Boot Diagnostics on every VM at creation time, make it a provisioning checklist item, not an afterthought.
  • Run sudo mount -a after every fstab edit on Linux VMs before rebooting, every single time without exception.
  • Verify NSG rules allow port 3389 (Windows) or port 22 (Linux) from your management IP range, and review these rules quarterly, they drift over time.
  • Keep a tested VM snapshot or disk snapshot before applying OS updates, kernel patches, or disk encryption changes on production VMs.

Frequently Asked Questions

My Azure VM shows "Running" in the portal but I still can't RDP into it, why?

The "Running" state in the Azure portal only tells you the VM's power state, not whether services inside the VM are healthy. The most common cause of this exact symptom is that the Remote Desktop service inside the Windows guest has crashed or been disabled, or the VM's guest network adapter has been disabled by a policy change. Open Boot Diagnostics in the portal to take a screenshot, if the VM's screen looks normal but RDP is failing, use the Reset Password tool and choose "Reset configuration only" to restore RDP settings without changing any passwords. Also check your Network Security Group rules to confirm port 3389 is open from your IP address.

How do I fix a Linux Azure VM stuck in GRUB rescue mode?

A GRUB rescue prompt means the bootloader can't locate the kernel files it needs to hand off to. You'll need to repair this from outside the failed VM using a repair VM. Stop your broken VM, snapshot its OS disk, create a new disk from the snapshot, attach it to a working Linux VM, chroot into it, and run grub-install followed by update-grub. Detach the repaired disk, swap it back as the OS disk on your original VM, and start it. The Azure Boot Diagnostics serial console log will show you the exact filesystem path that GRUB is trying to reach, which helps narrow down whether it's a partition table issue or a missing GRUB configuration file.

What's the fastest way to reset a forgotten admin password on an Azure Windows VM?

The fastest way is through the Azure portal itself, no need to attach disks or use any offline method. Go to your VM → HelpReset Password, select the Reset password mode, enter the username and your new password, then click Update. Azure pushes this change through the VM Agent within about 30 seconds and you can immediately try RDP with the new credentials. If the VM Agent isn't responding, you can use the offline method documented in Microsoft's official guides: stop the VM, attach the OS disk to a repair VM, edit the SAM registry hive to reset the account, then reattach it.

My Azure Linux VM won't boot after a kernel update, how do I recover without losing data?

First, check Boot Diagnostics in the Azure portal, the screenshot and serial log will tell you whether this is a kernel panic, a missing initramfs, or an fstab issue. If it's a kernel panic after an update, the safest recovery path is to boot from the previous kernel using GRUB. Connect to the Azure Serial Console, interrupt the GRUB countdown, select the previous kernel entry from the menu, and boot into it. Once you're in, you can either pin the old kernel, regenerate the initramfs for the new one with sudo update-initramfs -u -k all, or remove the broken kernel package entirely. Your data disks are separate from the OS disk in Azure and are never affected by a failed kernel boot.

Azure VM extensions are showing "Extension status isn't reported", what does that mean?

This error means the Azure VM Agent inside your guest OS has lost its ability to communicate with the Azure fabric and report back extension status. The agent communicates outbound over HTTPS to the Azure infrastructure IP 168.63.129.16, if your VM's internal firewall, a custom iptables rule, or a corporate proxy is blocking this address, the agent goes silent. Check the agent log (/var/log/waagent.log on Linux, C:\WindowsAzure\Logs\WaAppAgent.log on Windows) for HTTP connection errors. Ensure 168.63.129.16 on port 80 and 443 is reachable from inside the VM, restart the agent service, and the extensions should resume reporting within a few minutes.

How do I get help from Microsoft when my Azure VM is completely unresponsive and nothing works?

If you've exhausted self-service options, Boot Diagnostics shows nothing, Serial Console won't connect, and the VM has been stuck in "Stopping" or "Updating" for over 20 minutes, open a support request directly in the Azure portal under Help + SupportNew support request. Choose Technical as the issue type and select your VM service. You can also post to Microsoft Q&A using the tag azure-virtual-machines for community help. For tooling issues like problems with Azure CLI or PowerShell during VM management, open GitHub issues in the respective repositories, github.com/Azure/azure-cli/issues for CLI bugs and github.com/Azure/azure-powershell/issues for PowerShell module issues. Having a Standard or Premier support plan gets you direct Microsoft engineer access with SLA-backed response times.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.