Fix Azure Backup Issues, Setup, Errors & Configuration
Why Azure Backup Is Failing, And Why the Error Messages Don't Help
I've seen this play out dozens of times. An IT admin sets up Azure Backup, the portal says everything is configured, and then three weeks later someone asks for a restore, and the backups simply aren't there. Or you get an alert that an Azure Backup job failed with a vague error like UserErrorVmNotInDesirableState or ExtensionSnapshotFailedNoNetwork and you're left staring at it wondering where to even start.
Here's the honest truth: Azure Backup is genuinely powerful, but it has a lot of moving parts. You're dealing with agents, vault configurations, storage replication settings, network connectivity rules, and backup policies, all of which have to line up correctly. When one thing is off, the whole chain breaks. And Microsoft's error messages were clearly written by engineers for engineers, which means they're technically accurate and practically useless for most people.
The most common reasons Azure Backup jobs fail or never run at all:
- MARS agent not registered or outdated, the Microsoft Azure Recovery Services agent on Windows machines goes stale quickly, and an unregistered or version-mismatched agent will silently drop backup jobs
- VM backup extension failures, on Azure VMs, the snapshot extension (
VMSnapshotfor Windows,VMSnapshotLinuxfor Linux) can fail to install, time out, or conflict with other extensions - Recovery Services vault misconfiguration, wrong storage replication type chosen at vault creation, or cross-region restore not enabled before you needed it
- Backup policy gaps, policies exist but aren't actually associated with resources, or retention settings are configured at the policy level but ignored because the resource was protected under a different policy
- Network and firewall rules blocking outbound traffic, Azure Backup requires outbound access to specific service endpoints and many corporate networks block this without realizing it
- Subscription or resource group permission issues, the backup identity doesn't have the right RBAC role assigned on the target resource
Who sees these problems? Everyone. Small businesses running one or two Azure VMs who set up Azure Backup from a tutorial and never touched it again. Enterprise teams managing hundreds of workloads who assumed a backup policy covering 90% of resources meant 100%. Database admins trying to get SQL Server in Azure VMs backup running alongside their existing SQL Agent jobs. It affects all of them.
I know this is frustrating, especially when the Azure portal shows a green checkmark next to your vault and you still wake up to a failed backup alert at 3 AM. The good news is that almost every Azure Backup failure follows a recognizable pattern, and this guide walks through every one of them. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before you go deep into diagnostics, do this one check. It catches the majority of Azure Backup issues I see in the wild, especially for setups that "used to work" and suddenly stopped.
Go to the Azure portal, navigate to your Recovery Services vault, and click Backup Jobs in the left panel under Monitoring. Change the time filter to Last 7 days and look for any jobs with status Failed or Warning. Click the failed job. You'll see an error code and a link that says View details, click that. Azure will show you the actual failure reason and, critically, an Error ID.
Now take that Error ID and head to Backup Health (also under Monitoring in your vault). This dashboard aggregates failures across all protected items and categorizes them. If you see a cluster of the same error ID hitting multiple VMs or machines at once, you're almost certainly dealing with a network connectivity issue or an extension version problem, not a machine-specific failure.
If it's a single machine, run this PowerShell command to force a re-registration of the backup item:
$vault = Get-AzRecoveryServicesVault -Name "YourVaultName" -ResourceGroupName "YourRG"
Set-AzRecoveryServicesVaultContext -Vault $vault
$container = Get-AzRecoveryServicesBackupContainer `
-ContainerType "AzureVM" `
-FriendlyName "YourVMName"
$item = Get-AzRecoveryServicesBackupItem `
-Container $container `
-WorkloadType AzureVM
Backup-AzRecoveryServicesBackupItem -Item $item
This triggers an on-demand backup job. If it succeeds, your scheduled policy will also start working again, the act of triggering a job often clears stale state in the extension. If it fails with a specific error, that error is now your entry point into the step-by-step fixes below.
The vault is the foundation of everything in Azure Backup. If it's misconfigured, no amount of agent reinstalls or policy tweaks will fix your Azure Backup issues. So start here.
In the Azure portal, search for Recovery Services vaults and open yours. In the left navigation, go to Properties. Under Backup Configuration, verify:
- Storage replication type, Microsoft recommends GRS (Geo-redundant storage) as the default because it replicates your backup data to a secondary region hundreds of miles away. LRS is cheaper but only protects against local hardware failures. ZRS (Zone-redundant storage) is the right choice if you need data residency within a single region with no downtime tolerance. You cannot change this after backups exist in the vault.
- Cross Region Restore, this must be explicitly enabled. It's off by default. If you need to restore into a secondary region after a regional outage, flip this toggle to Enabled. Note: enabling this means your backup data costs will increase because the data is now accessible in the paired region.
- Soft Delete, should be enabled. Soft delete retains backup data for 14 days after deletion, protecting against accidental removal or ransomware attacks on your backup configuration.
Next, check that the vault is in the same region as the resources you're trying to back up. Azure Backup for VMs and managed disks requires the vault and the source resource to be in the same region. If they're in different regions, re-protection won't work.
Finally, confirm RBAC assignments. Go to your vault → Access control (IAM) → Role assignments. The service principal or managed identity doing the backup work needs at minimum the Backup Contributor role on the vault and Virtual Machine Contributor (or equivalent) on the VMs being protected. Missing role assignments are a very common silent failure, the portal won't scream about it, but jobs will fail with UserErrorAccessDenied.
When this step is done correctly, you'll see all vault properties showing expected values and no role assignment gaps in IAM. Then move to agent-level checks.
If you're backing up on-premises Windows machines, Windows Server, or individual files and folders from Azure VMs, you're using the Microsoft Azure Recovery Services (MARS) agent. Azure Backup MARS agent issues are among the most common complaints I see, particularly around registration failures and connectivity timeouts.
First, check the agent version. Open the MARS agent console on the affected machine: search for Microsoft Azure Backup in your Start menu. Go to Actions → About Microsoft Azure Backup. Note the version number and compare it against the current release. Outdated MARS agents will silently fail to connect. Download the latest agent from the Azure portal: go to your Recovery Services vault → + Backup → select On-premises → Files and folders → download the agent installer.
After updating, re-register the machine if it shows as Unregistered in your vault. You'll need the vault credentials file, download it from Recovery Services vault → Properties → Backup Credentials. The credentials file expires after 10 days, so generate a fresh one if the machine was never successfully registered.
Connectivity is the next thing to check. The MARS agent needs outbound HTTPS access on port 443 to these endpoints:
# Required outbound endpoints for MARS agent
*.backup.windowsazure.com
*.blob.core.windows.net
*.queue.core.windows.net
*.store.core.windows.net
dc.services.visualstudio.com
Run this command on the machine to test connectivity to the Azure Backup service URL for your region:
Test-NetConnection -ComputerName "pod01-manag1.backup.windowsazure.com" -Port 443
If TcpTestSucceeded returns False, your firewall or proxy is blocking the agent. Work with your network team to whitelist the required FQDN patterns. If your org uses a proxy server, configure the MARS agent to use it: in the agent console, go to Change Properties → Proxy Configuration and enter your proxy details.
Success looks like this: agent status shows as Registered in the vault's Backup Infrastructure → Protected Servers view, and a manual backup job completes without errors.
For Azure VM backup, there's no agent to install manually, Azure Backup works through a VM extension that gets installed automatically when you first protect a VM. But this extension fails more often than you'd expect. When it does, you'll see error codes like ExtensionSnapshotFailedNoNetwork, ExtensionOperationFailed, or GuestAgentSnapshotTaskStatusError in your backup job details.
The extension involved is called VMSnapshot (Windows) or VMSnapshotLinux (Linux). Check its status by going to your VM in the Azure portal → Extensions + applications. Look for the snapshot extension and check its provisioning state. If it shows Failed or Updating (stuck), delete it, Azure Backup will reinstall it fresh on the next backup job.
To delete and force reinstall via PowerShell:
# Remove the failed extension
Remove-AzVMExtension `
-ResourceGroupName "YourResourceGroup" `
-VMName "YourVMName" `
-Name "VMSnapshot" `
-Force
# Trigger a backup job to reinstall the extension
$vault = Get-AzRecoveryServicesVault -Name "YourVaultName" -ResourceGroupName "YourRG"
Set-AzRecoveryServicesVaultContext -Vault $vault
$container = Get-AzRecoveryServicesBackupContainer -ContainerType "AzureVM" -FriendlyName "YourVMName"
$item = Get-AzRecoveryServicesBackupItem -Container $container -WorkloadType AzureVM
Backup-AzRecoveryServicesBackupItem -Item $item
The other critical dependency is the Azure VM Guest Agent. The snapshot extension requires the Guest Agent to be running and healthy. On Windows VMs, check this in Services (services.msc), the service is called Windows Azure Guest Agent and should be running. On Linux, check with sudo systemctl status walinuxagent. If the Guest Agent is stopped or crashed, the backup extension can't function regardless of how the vault is configured.
Once the extension reinstalls and the Guest Agent is healthy, trigger another on-demand backup. You should see the job move through status: Transferring data to vault → Completed. That confirms the extension chain is working again.
One of the sneakiest Azure Backup problems I've encountered is when everything looks configured but backups are just not running on schedule. The culprit is almost always a broken or missing backup policy association. The vault has a policy. The VM exists. But they were never actually linked, or the link got broken during a resource move or subscription change.
In the Azure portal, go to your Recovery Services vault → Backup Items → select Azure Virtual Machine. You'll see a list of protected VMs. For each one, check the Last Backup Status column. If it says Warning or shows a date older than your policy frequency, click the item name → Backup Policy. Confirm the correct policy is assigned.
To reassign a backup policy via PowerShell (which also fixes stale associations):
$vault = Get-AzRecoveryServicesVault -Name "YourVaultName" -ResourceGroupName "YourRG"
Set-AzRecoveryServicesVaultContext -Vault $vault
# Get the policy you want to assign
$policy = Get-AzRecoveryServicesBackupProtectionPolicy -Name "DefaultPolicy"
# Get the VM item
$container = Get-AzRecoveryServicesBackupContainer -ContainerType "AzureVM" -FriendlyName "YourVMName"
$item = Get-AzRecoveryServicesBackupItem -Container $container -WorkloadType AzureVM
# Re-enable protection with the policy
Enable-AzRecoveryServicesBackupProtection -Policy $policy -Item $item
On the retention settings side: Azure Backup lets you retain data short-term and long-term through the policy. Short-term retention stores data in the vault's standard tier. For long-term retention beyond what standard tier supports, Azure Backup supports moving recovery points to the Archive tier, this is significantly cheaper for data you need to keep for compliance but rarely access. If your backup policy retention looks correct but old recovery points are disappearing, check whether Archive tier tiering rules are configured under your policy's Advanced Settings.
Also confirm that for databases like SQL Server in Azure VMs or SAP HANA, you have a separate backup policy for the log backup frequency in addition to the full/differential schedule. Missing log backups means you can't do point-in-time restores even if your full backups are healthy.
Getting backup jobs to run is half the battle. Actually getting data back when you need it is the other half, and restore failures are their own category of pain. The most common Azure Backup restore failure scenarios are: restoring to the wrong region, restoring a VM to a VNet that no longer exists, and permission gaps that only surface at restore time because nobody tested restore since the initial setup.
For Azure VM restore failures, go to your vault → Backup Items → select the VM → Restore VM. Before choosing a recovery point, check the date range, you'll only see recovery points that fall within your policy's retention window. If the recovery point you need is in the Archive tier, you'll need to rehydrate it first (which takes up to 15 minutes) before you can restore from it.
If restore is failing with UserErrorVnetNotFound or similar network errors:
# List available recovery points for a VM
$vault = Get-AzRecoveryServicesVault -Name "YourVaultName" -ResourceGroupName "YourRG"
Set-AzRecoveryServicesVaultContext -Vault $vault
$container = Get-AzRecoveryServicesBackupContainer -ContainerType "AzureVM" -FriendlyName "YourVMName"
$item = Get-AzRecoveryServicesBackupItem -Container $container -WorkloadType AzureVM
$startDate = (Get-Date).AddDays(-30)
$endDate = Get-Date
Get-AzRecoveryServicesBackupRecoveryPoint `
-Item $item `
-StartDate $startDate.ToUniversalTime() `
-EndDate $endDate.ToUniversalTime()
When the target VNet no longer exists (common after infrastructure changes), choose the Create new virtual machine restore option instead of Replace existing VM, and specify a valid VNet and subnet. You can always move the restored VM to the correct VNet afterward.
For cross-region restore, restoring into a secondary region after a disaster, this only works if you enabled it on the vault beforehand (see Step 1). If it was enabled, go to your vault → Backup Items → switch the region dropdown to your secondary region → select the VM → Restore. The recovery points available in the secondary region are typically 12-24 hours behind the primary region, which is expected behavior per the official Azure Backup support matrix for cross-region restore.
You'll know the restore completed successfully when the new VM appears in the target resource group with status Running and all disks are attached. For file-level restores using the MARS agent, success means the recovered files appear in the target directory you specified during the recovery wizard.
Advanced Troubleshooting for Azure Backup
If the steps above didn't resolve your Azure Backup problems, you're dealing with something deeper. Here's where I go when the straightforward fixes don't land.
Event Viewer Analysis on MARS Agent Machines
On any Windows machine running the MARS agent, the backup service writes detailed logs to Windows Event Viewer. Open Event Viewer (eventvwr.msc) and navigate to Applications and Services Logs → Microsoft Azure Backup → Operational. Filter for Event IDs in the 3000–3999 range, these are Azure Backup-specific. Event ID 3114 usually indicates a connectivity failure. Event ID 3097 indicates a certificate problem. Event ID 3200 means the agent can't communicate with the vault. These IDs give you a precise starting point.
The MARS agent also maintains its own logs at:
C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog
C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\ScVSSRequestor_currX.errlog
Group Policy and Network Proxy Conflicts
In domain-joined environments, Group Policy can override proxy settings that the MARS agent is trying to use, and can also interfere with the certificate trust chain the agent needs for TLS connections to Azure endpoints. Run gpresult /H gpresult.html and open the HTML report to check what proxy and certificate policies are being applied to the machine. If you see policies enforcing SSL inspection (common in enterprises), you'll need to whitelist the Azure Backup FQDN patterns in your SSL inspection device so the agent's TLS connections aren't being intercepted and re-signed with an internal CA that Azure doesn't trust.
PowerShell Diagnostics for Multi-Machine Environments
If you're managing Azure Backup at scale across many VMs, use Backup Explorer in the Azure portal (under your vault → Backup Explorer) to get an aggregated view of all protected items, failed jobs, and items that have never had a successful backup. You can filter by subscription, vault, resource group, and policy. This surfaces "ghost" resources, items that appear protected but haven't had a successful backup in weeks.
For scriptable diagnostics, the Azure CLI offers this quick health check across an entire vault:
az backup job list \
--resource-group YourRG \
--vault-name YourVaultName \
--status Failed \
--output table
Azure Kubernetes Service (AKS) Backup Issues
AKS backup is one of the newer Azure Backup datasource types and has its own set of gotchas. The backup extension must be installed on the AKS cluster, and the Backup vault (note: different from Recovery Services vault, AKS backup uses the newer Backup vault resource type) must have a Backup admin role on the AKS cluster and a Storage Blob Data Contributor role on the storage account used as the backup datastore. Missing either role assignment causes silent job failures.
Prevention & Best Practices for Azure Backup
The best Azure Backup troubleshooting session is the one you never have to run. Here's what I recommend setting up proactively, not because it sounds good on paper, but because I've seen each of these prevent real incidents.
Test your restores on a schedule. This sounds obvious but almost nobody does it consistently. Set a calendar reminder to do a file-level or VM restore test every quarter. Azure Backup's whole purpose is recovery, if you've never tested recovery, you don't actually know if you have backups. The test doesn't have to be full production scale; restoring a single file or spinning up a VM from a recovery point in a test resource group takes 20 minutes and confirms the entire chain is working.
Use Azure Policy to enforce backup coverage. Microsoft provides built-in Azure Policy definitions specifically for backup. The policy Configure backup on VMs without a given tag to an existing Recovery Services vault in the same location automatically enrolls new VMs in a backup policy based on location. This prevents the "we deployed a new VM and forgot to protect it" scenario that causes data loss.
Set up Azure Monitor alerts on backup failures. In your Recovery Services vault → Alerts → Create alert rule, configure alerts for Backup Health Events with signal Backup Alerts. Set severity to Sev1 for failures and route to your operations team via email or PagerDuty. The default built-in monitoring in the vault is good, but Azure Monitor alerts ensure your team is paged before a business user notices the backup gap.
Keep the MARS agent updated. The MARS agent releases updates that fix connectivity issues, add support for new Windows versions, and patch security vulnerabilities. Don't leave it on a version from two years ago and wonder why things stop working. Script the update into your standard server patching cycle.
Document your vault architecture before you need it. Know which vault covers which resources, what the storage replication type is, whether cross-region restore is enabled, and what your retention windows are. Having this documented in a runbook means that when something breaks at midnight, the on-call engineer doesn't have to reconstruct the architecture from scratch inside the portal.
- Enable soft delete on every Recovery Services vault, it's free and prevents catastrophic backup deletion by ransomware or human error
- Use GRS (geo-redundant storage) as your vault replication type unless you have a specific reason to use LRS, the cost difference is small compared to the recovery capability you gain
- Run
az backup protection check-vmweekly via automation to catch any VMs that lost their backup protection after being moved between resource groups - Tag all Recovery Services vaults with an owner and a business unit, makes it significantly easier to track down who to contact when something fails in a large Azure environment
Frequently Asked Questions
What can I actually back up with Azure Backup?
The range is broader than most people realize. You can back up on-premises Windows machines using the MARS agent (files, folders, and system state), on-premises VMs running on Hyper-V or VMware via MABS or DPM, full Azure VMs running Windows or Linux, Azure Managed Disks, Azure Files shares, SQL Server databases running inside Azure VMs, SAP HANA databases on Azure VMs, Azure Database for PostgreSQL and MySQL Flexible Servers, Azure Blob Storage, Azure Kubernetes Service clusters, Azure Data Lake Storage, and Azure Elastic SAN. If it's running in or connected to Azure, there's very likely a supported backup path for it, the key is matching the right agent or extension to the right datasource type.
Why does Azure Backup show "Warning" status instead of "Completed" on my backup jobs?
A Warning status means the backup job technically succeeded but something non-fatal happened along the way, typically some files were skipped because they were locked by running processes, or the VM had some disk inconsistency that the snapshot had to work around. This is different from a Failed status where no recovery point was created at all. For Warning jobs, Azure Backup still creates a usable recovery point, but it may not be 100% application-consistent. Check the job details for the specific warning message; if it's repeatedly warning about the same locked files, those files may need to be handled differently (for example, through application-consistent backup settings or VSS configuration on Windows).
Can I change the storage replication type (LRS vs GRS vs ZRS) after I've already started taking backups?
No, and this is one of the most painful constraints in Azure Backup. Once backup data exists in a Recovery Services vault, the storage replication type is locked. If you set it to LRS initially and later decide you need GRS for disaster recovery, you'd have to stop protecting all resources, delete all backup data from the vault, change the replication type, and then re-protect everything from scratch. This means losing your existing recovery point history. The only way to avoid this is to make the right choice at vault creation time, Microsoft's recommendation is GRS as the default, and I agree with that for any production workload.
How does Azure Backup handle data transfer costs, do I get charged for ingress/egress?
Azure Backup does not charge for inbound or outbound data transfer during normal backup and restore operations, that's one of its genuinely useful advantages over some competing solutions. You pay for the storage consumed by your backup data in the vault (priced per GB, varying by storage tier) and for the protected instance fee based on the size of the source resource. The one exception is if you use the Azure Import/Export service for the initial offline backup of a very large dataset, that initial seeding via physical drives does incur data transfer costs. Day-to-day incremental backups and restores over the network carry no transfer charges.
My Azure VM backup is failing with "ExtensionSnapshotFailedNoNetwork", what does that mean?
This error means the snapshot extension on your VM couldn't reach the Azure storage endpoint needed to write the snapshot. The most common cause is an NSG (Network Security Group) or Azure Firewall rule blocking the VM's outbound traffic to Azure Storage. Azure Backup requires outbound access on port 443 to your region's storage endpoints. You have two options: add an outbound NSG rule allowing HTTPS to the Storage service tag, or, the preferred approach for production environments, set up a private endpoint for your Recovery Services vault so backup traffic stays entirely within your VNet. Go to your vault → Private endpoint connections → + Private endpoint to set this up.
What is the difference between a Recovery Services vault and a Backup vault in Azure?
These are two different resource types in Azure Backup and they're not interchangeable. The Recovery Services vault is the older, more mature resource type, it supports Azure VMs, on-premises machines via MARS and MABS, SQL Server in Azure VMs, SAP HANA, and Azure Files. The Backup vault is the newer resource type introduced for more modern datasources: Azure Disks, Azure Blobs (vaulted backup), Azure Database for PostgreSQL, Azure Kubernetes Service, Azure Data Lake Storage, and MySQL Flexible Server. If you're protecting an AKS cluster or using vaulted blob backup, you need a Backup vault. For everything else, you're almost certainly using a Recovery Services vault. Both appear in the Azure portal but under different resource types, so make sure you're looking at the right one when troubleshooting.