Azure Site Recovery Troubleshooting: Fix Every Common Error

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

What's in This Guide

Why This Happens
The Quick Fix
Step-by-Step Solution
Advanced Troubleshooting
Prevention & Best Practices
FAQ

Why This Is Happening

I've seen this scenario play out more times than I can count. An IT admin sets up Azure Site Recovery, walks away feeling confident, and then three days later gets paged at 11 PM because replication has stalled, an import job is frozen, or the agent is throwing a cryptic mismatch error that the Azure portal does absolutely nothing to explain. Azure Site Recovery troubleshooting is genuinely one of the most frustrating areas of enterprise Azure management , not because the product is bad, but because the error messages it throws are written for engineers who already know what went wrong.

The honest truth is that ASR failures fall into a surprisingly small number of root cause buckets. The first, and by far the most common, is a version mismatch between the Microsoft Azure Recovery Services (MARS) Agent and the Azure Backup service itself. Microsoft updates the backend service regularly, and if your agent is even one minor version behind, you'll hit the dreaded 0x1FBD3 error and your backup jobs will fail entirely. The second bucket is offline seeding import job name mismatches , a subtle configuration mistake that bricks your entire offline backup workflow and leaves you staring at a "Waiting for Azure Import Job to complete" message that never resolves. The third bucket is network bandwidth misconfiguration, which doesn't cause outright failures but silently destroys your replication performance and causes lag that only becomes obvious during an actual failover test.

Who sees these problems? Mostly mid-sized businesses running Hyper-V environments with VMM, organizations doing an on-premises-to-Azure migration, and enterprises who set up ASR years ago and haven't touched the agent since. If you're in an IT department where "if it ain't broke, don't fix it" is the operating philosophy, your agent is almost certainly outdated.

Microsoft's error messages don't help because they're often surfaced through multiple layers, the Recovery Services Agent console, the Azure portal, the DPM console, and the event log, and the message in each place tells a slightly different part of the story. None of them tell you the whole story in plain English. That's what this guide is for.

Browse all Microsoft fix guides →

The Quick Fix, Try This First

If your Azure Site Recovery environment is broken right now and you need the fastest path to a working state, start here. In a majority of cases, I'd estimate around 60%, the root cause is a stale MARS Agent version. Microsoft's backend gets updated, your agent gets left behind, and everything grinds to a halt.

Here's exactly what to do:

Close the Microsoft Azure Recovery Services Agent console entirely. Don't just minimize it. Right-click the taskbar icon and close it. You cannot update the agent while the console is running.
Open your browser and go to the Microsoft Download Center link for the latest MARS Agent: https://go.microsoft.com/fwlink/?linkid=225925. Download the installer.
Run the installer as Administrator. It will detect the existing agent and perform an in-place upgrade. You don't need to uninstall first.
Once the upgrade completes, reopen the Recovery Services Agent console and check the Jobs view. Trigger a manual backup by clicking Back Up Now in the Actions pane on the right side.
Watch the Errors tab in the Job Details dialog. If the 0x1FBD3 error is gone and the job progresses, you're done.

If the job fails with a different error code after the update, or if you were already running the latest agent version, keep reading, the sections below cover every other failure mode in detail.

Pro Tip

Before you do anything else, check the agent version number right inside the console. Open the Recovery Services Agent, click About Microsoft Azure Recovery Services Agent under the Help menu, and note the version number. Then cross-reference it against Microsoft's release notes. If your version starts with anything below 2.0.9083.0, an update is mandatory, that specific build introduced the backend compatibility layer that resolves the version mismatch error class.

Update the Recovery Services Agent to the Latest Version

This is the fix for error code 0x1FBD3 and the "Azure Backup service version and Microsoft Azure Recovery Services Agent do not match" message. The error surfaces on the Errors tab of the Job Details dialog, you'll see it in the Recovery Services Agent console, in Data Protection Manager (DPM), or in Azure Backup Server, depending on your setup.

Before starting, confirm you actually need to do this. Navigate to the agent console, open the Actions pane on the right, and look for any update notification banner. If there's no banner, that doesn't mean you're current, Microsoft sometimes delays push notifications.

Close the console, then update via the direct download link:

https://go.microsoft.com/fwlink/?linkid=225925

If your server or its proxy has restricted internet access, which is common in enterprise environments, you'll need to configure firewall rules before the agent can communicate with the Azure backend. Open an elevated command prompt (right-click CMD, Run as Administrator) and run the firewall configuration command appropriate to your environment before attempting the download or re-registration.

After installing, reopen the console and look at the Jobs list. A successful agent update will show a green checkmark next to your next scheduled or manual backup. If the Errors tab is clean on the next run, you've resolved the version mismatch completely. If you're using System Center DPM or Azure Backup Server, apply the same agent update on those machines as well, the mismatch error can appear there too, and the resolution is identical.

Fix a Frozen Offline Seeding Import Job

This one is particularly nasty because it looks like it should be a transient issue, the interface just says "Waiting for Azure Import Job to complete", but it will sit there forever if you don't intervene. The symptom shows up in the MAB (Microsoft Azure Backup) interface and doesn't move no matter how long you wait.

The root cause is almost always a name mismatch. When you set up offline seeding, you entered an Azure Import Job Name in the local MAB configuration screen. When you then went to the Azure portal and actually created the import job, you used a slightly different name, maybe different capitalization, an extra space, or a different word entirely. Both names must be character-for-character identical. Azure's matching is case-sensitive.

You can confirm this is your problem by checking the error log at:

C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog

Open it in Notepad and search for the string GetImportJobStatus. If you see a line like:

FATAL GetImportJobStatus:: Import job with name 'ImportJobName' is not found

…that confirms the mismatch. The job name the agent is looking for doesn't exist in Azure under that exact name.

The fix requires removing the existing backup policy entirely using the Stop protection and delete data option, not just pausing it. Then schedule a fresh backup policy with corrected offline backup parameters, making absolutely sure the import job name you enter locally matches exactly what you create in the Azure portal. There is no in-place correction for this; the configuration must be rebuilt from scratch.

Increase Upload Threads to Improve Azure Replication Speed

If replication is working but painfully slow, data is moving but your RPO is suffering, the default thread configuration is almost certainly your problem. The MARS Agent ships with a default of 4 upload threads for replicating data into Azure, and 4 download threads for failover recovery. For most production environments, that's not enough.

Open Registry Editor (press Win + R, type regedit, press Enter). Navigate to this key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication

If the key doesn't exist, you'll need to create it. Right-click on Windows Azure Backup, select New > Key, and name it Replication.

Inside that key, create or modify the following DWORD value to increase upload throughput for replication into Azure:

Value Name: UploadThreadsPerVM
Value Type: REG_DWORD
Value Data: 8

The supported range is 1 through 32. Setting it to 8 is a solid starting point for most environments, double the default without going aggressive. After making this change, restart the Azure Recovery Services Agent service via Services.msc (look for Microsoft Azure Recovery Services Agent in the list). Monitor your replication lag over the next 48–72 hours through the Azure portal's replication health dashboard before tuning further upward.

You should see your replication delta shrink noticeably, often by 30–50%, within the first monitoring window after the service restarts.

Tune Download Threads for Faster Azure Failover Recovery

The upload thread fix handles your day-to-day replication performance. But during an actual failover, or more likely a failover test during a DR drill, the bottleneck flips. Now you're pulling data from Azure back to on-premises, and the download thread count is what governs that speed. The default of 4 is equally conservative.

In the same registry location as Step 3:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication

Create or modify this value:

Value Name: DownloadThreadsPerVM
Value Type: REG_DWORD
Value Data: 8

Again, the maximum supported value is 32. I'd recommend keeping upload and download thread counts at parity unless you have a specific reason to diverge, it makes capacity planning and troubleshooting much simpler when both are tuned consistently.

One important note: increasing threads increases CPU and network utilization on the Hyper-V host where the agent is installed. Before setting this anywhere near 32, make sure you understand your host's headroom. Running a failover test during a quiet maintenance window is the right way to validate that your thread count doesn't create a resource contention problem on the host itself. If your Hyper-V hosts are consolidated and running near capacity, start at 8, not 16.

After the registry change, restart the agent service. You'll see the effect most clearly during your next failover test, measure the time from failover initiation to VM availability in the target environment before and after the change.

Configure Bandwidth Throttling to Protect Production Traffic

Running ASR replication without throttling on a production network is a gamble. Replication will use whatever bandwidth it can grab, and during a large initial sync or a big change rate day, it can crowd out your business-critical traffic. Here are both methods to get throttling in place.

Method 1, MMC Snap-in (quickest for one-off configuration):

Open the MMC console (press Win + R, type mmc, press Enter). Go to File > Add/Remove Snap-in and add Windows Server Backup for Local Computer. Expand the tree, click Backup, then in the Actions pane click Change Properties. On the Throttling tab, check Enable internet bandwidth usage throttling for backup operations. Set your work hours bandwidth and non-work hours bandwidth. The valid range is 512 Kbps to 1023 Mbps.

Method 2, PowerShell (best for scripted or remote deployment):

The Set-OBMachineSetting cmdlet handles this cleanly. To throttle bandwidth on Monday and Tuesday from 9 AM to 6 PM:

$mon = [System.DayOfWeek]::Monday
$tue = [System.DayOfWeek]::Tuesday
Set-OBMachineSetting -WorkDay $mon, $tue -StartWorkHour "9:00:00" -EndWorkHour "18:00:00" -WorkHourBandwidth (512*1024) -NonWorkHourBandwidth (2048*1024)

To remove all throttling from a server entirely:

Set-OBMachineSetting -NoThrottle

After applying throttling settings, monitor actual network utilization from your switch or from Windows Performance Monitor (perfmon) using the Network Interface > Bytes Total/sec counter on the Hyper-V host. Validate that replication traffic stays within your defined limit during business hours before signing off on the configuration.

Advanced Troubleshooting

When the standard fixes don't move the needle, you need to go deeper. Here's where experienced engineers actually spend their time during difficult Azure Site Recovery troubleshooting engagements.

Reading the CBEngineCurr.errlog File

The single most information-dense artifact for ASR diagnosis is the Recovery Services Agent error log. It lives at:

C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog

Open it in a text editor with search capability. The log uses a structured format, each line includes a timestamp, thread ID, source file, and correlation GUID. That GUID is critical: it lets you trace a single failed operation across multiple log lines. Search for the correlation GUID from your failed job's error message, then read every line tagged with that GUID in chronological order. You'll often find that the real error, a certificate issue, an authentication failure, a network socket error, appears several lines before the surface-level failure message that shows up in the console.

Registry Configuration for Enterprise and Domain-Joined Servers

Domain-joined servers sometimes have Group Policy settings that conflict with the Recovery Services Agent's required network access. If your agent can't reach Azure endpoints, check whether Windows Firewall Group Policy is overriding local rules. The agent needs outbound HTTPS access (port 443) to *.backup.windowsazure.com, *.blob.core.windows.net, and related Microsoft CDN endpoints. Run gpresult /h gpresult.html from an elevated prompt and open the resulting HTML file, look for any firewall policies under the Computer Configuration section.

Event Viewer Analysis

The Windows Event Log surfaces errors that the agent console doesn't always show. Open Event Viewer (eventvwr.msc) and navigate to Applications and Services Logs > Microsoft Azure Backup. Filter for Error-level events in the time window around your failed job. Pay particular attention to Event ID entries related to certificate validation, expired management certificates are a common silent killer in long-running ASR deployments.

Proxy Configuration Issues

In environments where all outbound internet traffic routes through a proxy server, the agent needs to be explicitly configured to use that proxy. This is done via the Recovery Services Agent console under Change Properties > Proxy Configuration, or via PowerShell with Set-OBMachineSetting. If your proxy requires authentication, the agent's service account needs those credentials stored correctly. A proxy misconfiguration typically manifests as a timeout error rather than an authentication error, which throws people off.

When to Call Microsoft Support

Escalate to Microsoft Support if: your agent is fully up to date and you're still hitting 0x1FBD3; your import job name is confirmed identical and the seeding job is still frozen after rebuilding the policy; or you're seeing certificate-related errors in the event log that persist after certificate renewal. Microsoft's support team has backend telemetry access that can pinpoint issues you genuinely cannot diagnose from the client side. Don't spend more than two hours on an issue that backend telemetry would resolve in twenty minutes.

Prevention & Best Practices

Every ASR issue I've described in this guide is preventable. The fixes are real and they work, but the goal is to never need them in the first place. Here's how production teams who run ASR well actually operate it.

Keep a MARS Agent update schedule. The agent doesn't auto-update in most configurations. You need to proactively check the Microsoft Download Center or subscribe to the Azure Updates RSS feed for Recovery Services announcements. Quarterly reviews at minimum; monthly is better. One stale update cycle is all it takes to end up with a version mismatch that blocks backup jobs.

Document your import job names before you create them. Offline seeding failures almost always come from hasty naming. Create a naming convention for import jobs, write it down in your runbook before you touch the Azure portal or the agent console, and copy-paste the name, never retype it, from your documentation into both the local configuration and the Azure portal. That single habit eliminates the entire category of mismatch failures.

Run DR drills that include failover performance measurement. Bandwidth throttling problems and thread count inadequacies only become visible under load. Schedule a failover test at least twice a year, measure the actual time-to-recovery, and use that data to tune your thread counts and throttle settings before you need them in a real incident.

Monitor replication health proactively in the Azure portal. The Recovery Services vault blade includes a replication health view that shows RPO lag, replication state, and last successful sync time. Set an Azure Monitor alert on replication health status so you get notified before a problem becomes a crisis.

Quick Wins

Set a calendar reminder every 90 days to check the MARS Agent version against the current release on the Microsoft Download Center
Copy-paste import job names, never retype them, to eliminate mismatch errors at their source
Configure an Azure Monitor alert on your Recovery Services vault for replication health degradation so you know before your RPO breaches SLA
Test your throttle settings during a non-critical replication window and verify with Performance Monitor before relying on them in production

Frequently Asked Questions

Why does Azure Site Recovery keep saying "Waiting for Azure Import Job to complete" and never finish?

This happens when the import job name you entered in the MAB offline seeding configuration doesn't exactly match the job name you created in the Azure portal, and "exactly" means case-sensitive, character-for-character. You can confirm by opening the error log at C:\Program Files\Microsoft Azure Recovery Services Agent\Temp\CBEngineCurr.errlog and searching for GetImportJobStatus. The fix is to remove the backup policy using Stop protection and delete data, then create a fresh policy with a corrected, carefully matched import job name. There's no shortcut, the configuration has to be rebuilt.

What does error 0x1FBD3 mean in Azure Backup and how do I fix it?

Error 0x1FBD3 means your Microsoft Azure Recovery Services Agent version is out of date and no longer compatible with the current Azure Backup service backend. You'll see it on the Errors tab of the Job Details dialog in the agent console, in DPM, or in Azure Backup Server. The fix is to close the agent console and install the latest version from https://go.microsoft.com/fwlink/?linkid=225925. Any agent version below 2.0.9083.0 is guaranteed to hit this error class, update immediately.

How do I speed up Azure Site Recovery replication without affecting my production network?

Two registry values control replication speed: UploadThreadsPerVM for data moving into Azure, and DownloadThreadsPerVM for failover recovery, both live under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Replication. The default is 4 for both; setting them to 8 is a safe starting point that typically improves replication performance significantly. To protect production traffic, pair this with bandwidth throttling via the MMC snap-in or the Set-OBMachineSetting PowerShell cmdlet, which lets you define different bandwidth limits for work hours and non-work hours.

Can I configure Azure Site Recovery bandwidth throttling using PowerShell instead of the GUI?

Yes, and for enterprise deployments it's the better approach because it's scriptable and repeatable. The Set-OBMachineSetting cmdlet handles all throttle configuration. To set throttling for specific days and hours, pass -WorkDay, -StartWorkHour, -EndWorkHour, -WorkHourBandwidth, and -NonWorkHourBandwidth parameters. Bandwidth values are in bytes per second, so multiply your target Kbps by 1024. To remove all throttling from a server, run Set-OBMachineSetting -NoThrottle. Valid bandwidth range is 512 Kbps to 1023 Mbps.

Does increasing UploadThreadsPerVM beyond 8 actually help, or does it cause problems?

It depends entirely on your host's available CPU and network capacity. The maximum supported value is 32, but that doesn't mean 32 is right for your environment. Going too high on a consolidated Hyper-V host will create CPU contention that actually slows replication down while also degrading the performance of other VMs on that host. Start at 8, monitor for 72 hours, and increase in increments of 4 while watching CPU utilization via Performance Monitor. If CPU hits sustained peaks above 70% during replication, back off. Let your actual hardware be the guide, not a target number.

I updated the Recovery Services Agent but the version mismatch error is still appearing, what now?

First, verify the update actually took. Open the agent console, go to Help > About Microsoft Azure Recovery Services Agent, and confirm the version number changed. If it did and you're still seeing 0x1FBD3, check whether your server's proxy or firewall is blocking the agent's connection to Azure endpoints, a blocked connection can mimic a version mismatch error. Also check whether DPM or Azure Backup Server on other machines in your environment is still running an old agent version, since that can produce the same error in the console. If all agents are current and the error persists, this is a case to escalate to Microsoft Support, they have backend telemetry that can identify service-side issues you can't see from the client.

Related Microsoft Fix Guides

Sai Kiran Pandrala

Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.