Azure Site Recovery Not Working, Diagnosed and Fixed (2026 Guide)
Why Azure Site Recovery Is Not Working
I've seen this exact scenario play out dozens of times across enterprise environments: you set up Azure Site Recovery, everything looks fine in the portal, and then, nothing. Replication stalls. Import jobs hang. The agent throws version mismatch errors. Your backup window closes, and you're sitting there staring at a status screen that gives you almost no actionable information.
Azure Site Recovery troubleshooting is uniquely frustrating because the product sits at the intersection of on-premises infrastructure, Azure networking, the Recovery Services vault, and the Microsoft Azure Recovery Services (MARS) Agent, and any one of those layers can silently break the whole chain. When Azure Site Recovery is not working, the failure could be a name mismatch buried in an import job configuration, a stale agent version that Microsoft's cloud endpoint no longer accepts, or a bandwidth throttle that's quietly choking your replication threads down to a crawl.
There are three root-cause categories I see over and over:
1. Import job name mismatches during offline seeding. This one is sneaky. You configure offline seeding in the MAB console, you ship your drive to Azure, and then the job just... sits there. The portal says "Waiting for Azure Import Job to complete" indefinitely. The underlying cause is almost always that the import job name you typed into the MAB offline seeding wizard doesn't exactly match the job name you created in the Azure portal. One space, one capital letter difference, that's all it takes. The error log at C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog will show you a FATAL entry like Import job with name 'YourJobName' is not found, followed by a WARNING that the job is in pending state code 1b7afbe0. It looks like a networking problem. It isn't.
2. Recovery Services Agent version mismatch. Microsoft's Azure Backup service gets updated on the cloud side regularly, and if your on-premises MARS Agent falls behind, you'll start seeing jobs fail with error code 0x1FBD3 and a message telling you the Azure Backup service version and Recovery Services Agent versions don't match. This affects agents below version 2.0.9083.0 particularly hard. The error shows up in the Job Details dialog on the MARS console, the DPM console, or Azure Backup Server, wherever you're running backups from.
3. Network bandwidth misconfiguration. Not all Azure Site Recovery failures are hard errors. Some are slow-motion failures, replication that technically runs but never catches up because the agent's thread count and throttle settings are set to defaults that don't match your environment's bandwidth capacity. The default upload thread count is 4. On a beefy connection, that's leaving serious throughput on the table.
The error messages Microsoft surfaces for all three of these problems are technically accurate but maddeningly unhelpful for diagnosis. That's what this guide fixes. Browse all Microsoft fix guides →
The Quick Fix, Try This First
If your Azure Site Recovery job is failing right now and you need triage fast, start here. This single check resolves the majority of cases I encounter.
Check your Recovery Services Agent version first. Open the MARS Agent console on your server. Go to Help > About Microsoft Azure Recovery Services Agent. Note the version number. You need to be on at least version 2.0.9083.0. If you're below that, and you're seeing error code 0x1FBD3 in your failed job details, this is your problem.
Here's how to update it:
- Close the Microsoft Azure Recovery Services Agent console completely. Don't just minimize it, close it.
- If your server has unrestricted internet access, go directly to the Microsoft Download Center and download the latest MARS Agent installer. The direct link Microsoft publishes is referenced in the error message itself (
https://go.microsoft.com/fwlink/?linkid=225925). - If your server or its proxy has limited internet access, you need to configure firewall rules before the update will work, skip to Step 2 in the detailed section below for that path.
- Run the installer. It will upgrade in-place without touching your backup schedules or recovery points.
- Reopen the MARS console, go to the Jobs view, and manually trigger the failed backup job to confirm it runs clean.
If the version was already current, or if the update doesn't clear the issue, your problem is almost certainly one of the other two root causes: an import job name mismatch or a bandwidth configuration problem. Keep reading.
C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog. The MARS Agent writes detailed FATAL and WARNING entries there that the console UI never surfaces. In 80% of cases, that log file tells you the exact string that's mismatched, the exact error code that's blocking, and the exact function call that failed, giving you a far better starting point than the portal's generic status messages.
If your Azure Site Recovery troubleshooting journey starts with an offline backup that's stuck at "Waiting for Azure Import Job to complete," the fix is almost certainly a naming mismatch, and I know how ridiculous that sounds when you're dealing with a backup infrastructure problem. But this is what Microsoft's documentation confirms, and it's what I've seen kill offline seeding jobs time and time again.
Here's what happens behind the scenes: when the MAB agent polls Azure for the import job status, it does an exact string lookup against the job name. If you typed "AzureImport-Prod-01" in the MAB wizard but created "azureimport-prod-01" (all lowercase) in the Azure portal, the lookup returns nothing, and the error log records that the job "is not found" even though the job exists perfectly fine in your subscription.
To diagnose this right now:
- Open
C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlogin Notepad or any text editor. - Search for
GetImportJobStatus. The FATAL line immediately after that function call will show you the exact job name the agent is trying to find. - Log into the Azure portal, navigate to your subscription, and go to All services > Import/export jobs. Find your import job and note its exact name, character for character.
- If they don't match, even by a single character, you've found your problem.
Unfortunately, you cannot fix a name mismatch by editing the existing configuration. Microsoft's resolution is to remove the backup policy entirely using Stop protection and delete data, then schedule a completely new policy with a new import job name that you create fresh in the Azure portal first, before entering it in the MAB wizard. That order of operations matters. Create the Azure portal job first, copy the name exactly, then paste it into the MAB offline backup wizard.
When the fix is working, the "Waiting for Azure Import Job to complete" message will clear and your import job status in the portal will begin progressing through its normal states.
Error code 0x1FBD3, "Azure Backup service version and Microsoft Azure Recovery Services Agent do not match", means the cloud side of the service has moved on and your agent hasn't kept up. This shows up in the Errors tab of the Job Details dialog on the MARS console, on DPM, and on Azure Backup Server. It's not subtle once you know what to look for, but it's easy to miss if you're only watching the top-level job status.
If your server has open internet access, the fix is straightforward: close the MARS Agent console, download the latest installer from the Download Center, run it, reopen the console. Done.
But if you're in an environment with proxy restrictions or locked-down outbound firewall rules, which describes most enterprise setups, you need to configure firewall exceptions before the update will work. From an elevated command prompt, run the configuration command Microsoft provides for your environment's proxy settings. The exact syntax depends on your proxy configuration, but the key endpoints that the MARS Agent needs outbound HTTPS access to are the Azure Backup service endpoints for your region.
After updating, always verify the version change actually took effect:
- Open the MARS console.
- Go to Help > About Microsoft Azure Recovery Services Agent.
- Confirm the version number is at or above 2.0.9083.0.
- Go to the Schedule Backup view and manually run a backup job.
- Watch the Jobs tab, the job should show "Completed" with no errors in the Errors tab of the Job Details dialog.
If the job still fails with 0x1FBD3 after updating, Microsoft's guidance is to contact support directly, that combination means something more unusual is happening at the service validation layer, and it needs a support engineer with portal-side diagnostic access to resolve it.
This one doesn't throw errors, it just makes Azure Site Recovery replication painfully slow, which in practice means it "doesn't work" for your RPO requirements. If your Hyper-V host to Azure replication is consistently behind schedule, the first thing to check is the upload thread count in the registry.
The MARS Agent defaults to 4 upload threads per VM when replicating into Azure. On a modern connection with adequate bandwidth, that default is a serious bottleneck. Microsoft supports up to 32 threads, and in my experience, bumping this to 8 or 16 can double or triple your effective replication throughput on a 100Mbps+ connection.
Open Registry Editor (regedit.exe) on the Hyper-V host and navigate to:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication
Look for a value named UploadThreadsPerVM. If it doesn't exist, create it:
Value Name: UploadThreadsPerVM
Value Type: REG_DWORD
Value Data: 8
Start with 8. Monitor your replication lag and network utilization for a few days. If you have headroom, push it to 16. Don't jump straight to 32 unless you've confirmed your upload bandwidth can absorb that load, you don't want replication threads saturating the link that your production VMs depend on for normal operations.
Changes to this registry value take effect without a reboot, but you may need to restart the MARS Agent service for the new thread count to be picked up. After the restart, check your replication status in the ASR portal, you should see replication progress moving faster than before.
If your Azure Site Recovery failover back to on-premises (Azure to on-premises) is running slower than expected, there's a parallel registry setting that controls download thread count during that direction of transfer. Most administrators know about the upload thread setting but miss this one entirely.
On the same Hyper-V host, in the same registry path:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication
Add or modify:
Value Name: DownloadThreadsPerVM
Value Type: REG_DWORD
Value Data: 8
Same principle applies here, default is 4, max is 32. During a real failover scenario, you want this set as high as your on-premises bandwidth can support, because every minute of failover time has direct business impact.
One thing worth thinking about: these two registry values, UploadThreadsPerVM and DownloadThreadsPerVM, are per-VM values, meaning the thread count applies to each VM being replicated. If you have 10 VMs replicating simultaneously with 8 threads each, that's 80 concurrent threads. Do the math on your link capacity before you set these aggressively in a high-VM-density environment.
After setting both values, it's a good discipline to document what you set and why in your runbook. Six months from now when someone else is troubleshooting Azure Site Recovery not working again, having these non-default registry values documented prevents a lot of confusion.
Here's the flip side of the thread-count problem: sometimes Azure Site Recovery replication works fine but it's hammering your internet connection and impacting production systems. The fix is bandwidth throttling, and you have two ways to set it up.
Method 1: MMC Snap-in (GUI approach)
- Open the Microsoft Management Console by running
mmc.exefrom an elevated command prompt. - Go to File > Add/Remove Snap-in.
- Add Windows Server Backup for the local computer and click OK.
- In the left pane, expand Windows Server Backup and select Backup.
- In the Actions pane on the right, click Change Properties.
- Click the Throttling tab.
- Check Enable internet bandwidth usage throttling for backup operations.
- Set your work hours bandwidth and non-work hours bandwidth. Valid range is 512 Kbps to 1023 Mbps.
Method 2: PowerShell (scripted/automated approach)
This is what I recommend for any environment with more than a handful of hosts, because you can deploy it consistently across all machines. The Set-OBMachineSetting cmdlet handles all throttling configuration. Here's an example that throttles to 512 Kbps during business hours on Mondays and Tuesdays, and allows 2 Mbps the rest of the time:
$mon = [System.DayOfWeek]::Monday
$tue = [System.DayOfWeek]::Tuesday
Set-OBMachineSetting -WorkDay $mon, $tue -StartWorkHour "9:00:00" -EndWorkHour "18:00:00" -WorkHourBandwidth (512*1024) -NonWorkHourBandwidth (2048*1024)
If you want to remove all throttling and let the agent use maximum available bandwidth at all times:
Set-OBMachineSetting -NoThrottle
After running your chosen configuration, monitor network utilization on the host for several days. The goal is to find the sweet spot where replication stays current without impacting anything else on the link. There's no single right answer, it depends entirely on your available bandwidth and your RPO requirements.
Advanced Troubleshooting for Azure Site Recovery
If you've worked through all five steps and Azure Site Recovery is still not working, you're in deeper territory. Here's what to look at next.
Event Viewer analysis. The Windows Event Log holds a lot of detail that the MARS console never surfaces. Open Event Viewer and navigate to Applications and Services Logs > Microsoft > Windows > Backup > Operational. Filter for Error and Warning events in the timeframe when your failed jobs ran. You'll often find VSS writer failures, disk snapshot errors, or network timeout events that point directly at the underlying issue without any guessing.
Also check Windows Logs > Application and filter by source "MSExchangeRepl" or "MSSQLSERVER" if you're protecting Exchange or SQL workloads, application-aware backups fail silently at the application layer more often than you'd expect, and the failure shows up in the application event log, not the backup log.
Proxy and network-layer failures. In enterprise environments, the MARS Agent's outbound HTTPS calls often have to traverse a proxy. If the proxy is performing SSL inspection, it can break the certificate chain validation that the agent depends on. Check the CBEngineCurr.errlog for any SSL or certificate-related errors, they'll contain terms like "certificate validation failed" or show TLS handshake error codes. If you find these, you'll need to either whitelist Azure Backup endpoints at the proxy level or add the proxy's CA certificate to the machine's Trusted Root store.
Domain-joined and Group Policy considerations. In domain-joined environments, Group Policy can override the throttle settings you configure via MMC or PowerShell. If you set throttling and it keeps reverting, check your applied GPOs with gpresult /h gpresult.html and look for any policies touching Windows Backup or network QoS settings. An overly aggressive QoS policy can throttle replication traffic even when you've explicitly disabled throttling in the MARS settings.
Recovery Services vault configuration. Sometimes the issue isn't on-premises at all, it's a vault-side configuration problem. Log into the Azure portal, open your Recovery Services vault, and go to Backup Infrastructure > Protected Servers. Verify that your server appears there and its last contact time is recent. A server that shows "Contact Lost" indicates the agent isn't able to reach the vault endpoint, which is almost always a firewall or proxy issue on the on-premises side.
Re-registering the server with the vault. If the server shows as disconnected or if you've had to rebuild the MARS Agent installation, you may need to re-register the server with the Recovery Services vault. Download the vault registration credentials from the vault's Properties blade in the portal, these credentials expire after 48 hours, so always download fresh ones, then run the MARS Agent installation with those credentials to re-establish the trust relationship.
0x1FBD3 after updating to the latest MARS Agent version, if your import job names match exactly but the offline seeding job remains stuck, or if re-registration with the vault fails with a generic error, these are signals to escalate. None of those scenarios are self-serviceable from the on-premises side; they require a support engineer with portal-level diagnostic access to the vault and the Azure Import/Export service. Open a support ticket at Microsoft Support and include your CBEngineCurr.errlog file, the MARS Agent version number, and the exact job name from both the MAB wizard and the Azure portal, that combination will get you past tier-1 quickly.
Prevention & Best Practices for Azure Site Recovery
Once you've resolved the immediate Azure Site Recovery not working problem, you want to stay out of this situation. Most of the failures I've described in this guide are entirely preventable with a few disciplined operational habits.
Keep the MARS Agent current, proactively. Microsoft updates the Azure Backup service on the cloud side without advance notice to on-premises administrators. If your agent version falls behind, you will eventually hit the version mismatch error. Set up a monthly maintenance task to check the agent version and update if needed. You can automate this by scripting a version check against the known current version number and triggering an update if the installed version is older.
Document your import job names in a shared location. The offline seeding name mismatch problem is 100% preventable if you treat import job names as a formal configuration item. Before you start any offline backup workflow: create the import job in the Azure portal first, record the exact name in your documentation system, then copy-paste that exact string into the MAB wizard. Never type it by hand twice.
Establish a bandwidth baseline before setting thread counts. Don't guess at registry thread settings. Before you touch UploadThreadsPerVM or DownloadThreadsPerVM, run a bandwidth test from the Hyper-V host to Azure during both peak and off-peak hours. Use that measured throughput to calculate how many threads you can support without impacting production traffic. A good starting rule: use no more than 50% of your available upload bandwidth for replication threads during business hours.
Test your failover regularly, before you need it. Azure Site Recovery exists to protect you in a disaster. The worst time to discover that your failover doesn't work is during an actual disaster. Schedule quarterly test failovers using the non-disruptive test failover feature in the ASR portal. This validates end-to-end that your VMs can actually start up in Azure with the expected data, without impacting your production replication.
Monitor the CBEngineCurr.errlog proactively. Set up a simple scheduled task or monitoring rule that scans this log file for FATAL and WARNING entries daily and alerts your operations team. Catching a version mismatch or an import job error on day one is far better than finding out three weeks later that your backups silently failed while everyone assumed they were running fine.
- Set a monthly calendar reminder to check and update the MARS Agent version, takes 5 minutes and prevents the most common Azure Site Recovery failure mode
- Always create the Azure portal import job first, then copy-paste the exact name into the MAB offline seeding wizard, never type it twice
- Set
UploadThreadsPerVMto 8 as a baseline on any host with >50 Mbps upload capacity, the default of 4 is consistently too conservative - Use
Set-OBMachineSetting -NoThrottleon off-hours replication windows to maximize throughput when production traffic is quiet
Frequently Asked Questions
Why does my Azure backup keep getting stuck at "Waiting for Azure Import Job to complete"?
This almost always means the import job name you entered in the MAB offline seeding configuration doesn't exactly match the job name you created in the Azure portal. The MARS Agent does an exact string lookup, and even a single character difference, a capital letter, a trailing space, a hyphen versus an underscore, causes the lookup to return nothing. Open C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog, search for GetImportJobStatus, and compare the job name in that log against what you see in the Azure portal. If they don't match, you'll need to stop the current protection policy (using "Stop protection and delete data"), create a fresh import job in the Azure portal, and set up a new backup policy with the correct matching name.
What does error code 0x1FBD3 mean in Azure Backup and how do I fix it?
Error code 0x1FBD3 means "Azure Backup service version and Microsoft Azure Recovery Services Agent do not match", in plain terms, your on-premises agent is too old for the current cloud service version. This specifically affects MARS Agent versions below 2.0.9083.0. The fix is to close the MARS Agent console, download the latest agent installer from the Microsoft Download Center (the error message itself contains the download link), install it, then reopen the console and retry the failed job. If you're in a restricted network environment, you'll need to configure firewall access to Azure endpoints before the download will work.
How do I speed up Azure Site Recovery replication that's falling behind?
The fastest lever you have is the UploadThreadsPerVM registry value. It lives at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication and defaults to 4. Setting it to 8 or 16 (max 32) directly increases how many concurrent upload threads the agent uses per VM, which translates to proportionally higher throughput on connections with available headroom. Set it as a REG_DWORD value. You should also check whether bandwidth throttling is enabled via Set-OBMachineSetting or the MMC Throttling tab, an active throttle can cap your throughput no matter how many threads you configure.
Can I use PowerShell to configure Azure Site Recovery bandwidth throttling instead of the GUI?
Yes, and for anything beyond a single machine, PowerShell is the better path. The Set-OBMachineSetting cmdlet handles all throttle configuration. You can specify work days, start and end work hours, and separate bandwidth limits for work hours versus non-work hours. For example: Set-OBMachineSetting -WorkDay Monday, Wednesday -StartWorkHour "8:00:00" -EndWorkHour "18:00:00" -WorkHourBandwidth (512*1024) -NonWorkHourBandwidth (2048*1024). To remove throttling entirely and let the agent run at full speed, run Set-OBMachineSetting -NoThrottle. Changes apply to the machine the cmdlet runs on, so you'll need to run it on each Hyper-V host you want to configure.
What's the maximum value I can set for UploadThreadsPerVM and DownloadThreadsPerVM?
Microsoft supports a maximum of 32 for both UploadThreadsPerVM and DownloadThreadsPerVM. The default for both is 4. In practice, I'd recommend working up incrementally, start at 8, monitor your link utilization and replication lag for a few days, then step up to 16 if you have headroom. Keep in mind these thread counts apply per VM, so if you're replicating 20 VMs simultaneously with 16 threads each, you're asking the agent to manage 320 concurrent upload threads. That's fine if your bandwidth supports it, but on a constrained link it can hurt more than help.
Azure Site Recovery shows my server as "Contact Lost" in the portal, what does that mean?
"Contact Lost" in the Azure portal under your Recovery Services vault's Protected Servers view means the MARS Agent on that machine has stopped successfully phoning home to the vault endpoint. The most common causes are: a firewall rule change that blocked outbound HTTPS to Azure endpoints, a proxy configuration change that the agent wasn't updated to reflect, or the vault registration credentials expiring and not being renewed. Start by checking whether the MARS Agent service is still running on the machine, then verify outbound connectivity to Azure Backup endpoints from that host. If connectivity is fine but the status persists, try re-registering the server using fresh vault credentials downloaded from the vault's Properties blade in the Azure portal, those credentials are time-limited, so always download a new set rather than reusing old ones.