Azure Automation Not Working, Diagnosed and Fixed (2026 Guide)
Why Azure Automation Stops Working
You set up an Azure Automation runbook, it worked fine for weeks, and then one Monday morning, nothing. Jobs are suspended, runbooks aren't triggering, or the Hybrid Runbook Worker is sitting there completely unresponsive. I've seen this exact scenario on dozens of enterprise accounts, and the frustration is real. Your automation pipelines are blocked, your scheduled tasks are silently failing, and Azure's own error messages give you something vague like "The subscription cannot be found" or "Job action 'Activate' cannot be run", which tells you almost nothing useful.
Here's the thing: Azure Automation not working is almost never one single problem. It's usually one of four root causes stacking up, and the symptoms overlap enough that it's easy to chase the wrong fix first.
The four most common culprits:
- Hybrid Runbook Worker connectivity failures. The worker can't reach Azure endpoints, often due to firewall rule changes, expired certificates, or, especially since August 2024, organizations still running the retired agent-based worker instead of the extension-based model.
- Azure sandbox resource limits being exceeded. Azure sandboxes cap runbook jobs at 400 MB of memory and 1,000 concurrent network sockets. Hit either ceiling and your job gets suspended without a clear explanation. This catches people off guard when workloads gradually grow over time.
- Authentication and identity misconfiguration. Since Microsoft pushed hard on managed identities and away from Run As accounts, a lot of runbooks that relied on older auth methods just stop working. Multi-factor authentication enforcement policies can also silently break non-interactive runbook logins.
- Module incompatibility. Mixing Az and AzureRM modules in the same runbook, or having stale module versions cached in your Automation account, causes
"Command not found"and"Cannot bind parameter"errors that are genuinely confusing if you don't know what to look for.
The good news? Every one of these is fixable. This guide walks through each scenario with exact commands, specific portal navigation steps, and the diagnostic tools Microsoft actually gives you, but buries in the documentation. Let's get your Azure Automation runbooks working again.
The Quick Fix, Try This First
Before you go deep on advanced diagnostics, run this checklist. In my experience, about 60% of Azure Automation not working cases are resolved by one of these three actions in under ten minutes.
Step 1: Run the Test Cloud Connectivity tool. This is Microsoft's own built-in diagnostic specifically for Hybrid Runbook Worker connectivity issues, and most people don't know it exists. Go to your Automation account in the Azure portal, navigate to Hybrid worker groups under Process Automation in the left menu, select your worker group, and look for the Test Cloud Connectivity option. It validates that your worker can actually talk to Azure endpoints. If it fails here, you have a network or firewall problem, not an application problem, and you can stop chasing the wrong thing.
Step 2: Check whether your runbook is in Published state. This sounds obvious, but I've seen it waste hours of troubleshooting. You cannot start or schedule a runbook that is in Draft state. In the Azure portal, go to your Automation account → Runbooks under Process Automation → click the runbook → check the status at the top of the Overview pane. If it says Draft, click Publish. That's it.
Step 3: Verify your managed identity has the right permissions. If you see errors like "No permission", "The subscription cannot be found", or authentication failures, your runbook is almost certainly not using a managed identity, or the identity doesn't have the correct role assignment. Go to your Automation account → Identity under Account Settings → confirm the system-assigned managed identity is On. Then go to Azure role assignments and verify it has at least Contributor access on the target subscription or resource group.
If none of these three quick checks resolve Azure Automation not working, continue to the step-by-step section below for deeper diagnostics.
If your runbook jobs are either not starting at all or your Hybrid Runbook Worker shows as unresponsive in the portal, start with connectivity. Connectivity problems are the single most common cause of Azure Automation runbook execution failures on hybrid workers, and they're entirely separate from your runbook code logic.
Open your Automation account in the Azure portal. In the left navigation under Process Automation, click Hybrid worker groups. Select the affected worker group, then click on the individual hybrid worker. From there, use the Hybrid Runbook Worker diagnostics option to run a connectivity check. This tool validates outbound network access to Azure Automation service endpoints, DNS resolution, and certificate validity.
While you're there, look at the Last seen timestamp for the worker. If it hasn't sent a heartbeat recently, the worker itself is down or disconnected, the runbook code is irrelevant until the worker is back online. Check that the machine is powered on, has internet access, and that the Hybrid Worker service is actually running.
For Windows hybrid workers, open Event Viewer on the worker machine and navigate to Applications and Services Logs → Microsoft-SMA → Operational. Event ID 4502 in the Operations Manager log specifically indicates connectivity problems with the Azure Automation service. That event ID is your smoking gun for network-layer issues.
Also confirm you're running an extension-based hybrid worker, not the agent-based model. The agent-based user hybrid runbook worker was retired on August 31, 2024. If your organization hasn't migrated yet, that is why things stopped working, there's a migration guide in the official docs that walks through the upgrade process to extension-based hybrid workers.
Once connectivity is confirmed clean, move to Step 2. If connectivity is broken, resolve the network or firewall issue before continuing, nothing else will work until the worker can reach Azure.
Runbook jobs in Azure Automation get suspended after three consecutive failed start attempts. If you're seeing jobs stuck in a Suspended state and you're running in an Azure sandbox (rather than on a Hybrid Worker), there's a good chance your runbook is hitting a resource ceiling rather than throwing an application error.
Azure sandboxes enforce hard limits: 400 MB of memory per job and 1,000 concurrent network sockets. These limits aren't negotiable in the sandbox environment. If your runbook processes large datasets, imports heavy modules, or opens many parallel connections, these limits get hit silently and the job gets killed without an obvious error message.
To address memory overuse, add explicit cleanup inside your runbook. Clear variables you no longer need:
# Clear a large variable when done with it
$largeDataset.Clear()
# Force immediate garbage collection
[GC]::Collect()
Beyond cleanup, consider these structural changes. Split large runbooks into smaller child runbooks that each handle a portion of the workload. Reduce the volume of data you pull into memory at once, for example, process records in batches of 500 rather than loading 50,000 rows at once. Also watch your use of Write-Output inside loops, excessive output generation consumes memory faster than people expect.
If the workload genuinely can't be trimmed below the sandbox limits, move the runbook execution to a Hybrid Runbook Worker. Hybrid workers don't have the same 400 MB sandbox memory constraint, which makes them the right choice for memory-intensive automation tasks.
After making changes, go to the failed job in the portal, review the Output and Errors streams, and confirm the job completes without hitting the limit. A successful run will show a Completed status in the job view under your runbook's Jobs tab.
This is the category of Azure Automation not working that causes the most confusion right now, because Microsoft has been pushing the transition away from Run As accounts toward managed identities, and a lot of runbooks that were written years ago are now silently broken because of auth changes.
The clearest symptom is the error "The subscription cannot be found". Despite how it reads, this almost never means your subscription is actually missing. It means your runbook is not authenticated with a valid managed identity, or the identity doesn't have visibility to the subscription. Follow these steps:
In the Azure portal, go to your Automation account → Identity (under Account Settings) → System assigned tab. Confirm the status toggle is set to On. If it's off, turn it on and save. Then click Azure role assignments and verify there is a role assignment granting the managed identity access to the relevant subscription, at minimum, Contributor or a custom role with the permissions your runbook needs.
Inside your runbook script itself, authenticate explicitly with the managed identity:
Connect-AzAccount -Identity
If you're still seeing authentication failures and your organization enforces multi-factor authentication, look for the error "Strong authentication enrollment is required". Azure Automation runbooks running in a sandbox cannot interactively respond to MFA prompts. The fix here is to use managed identity authentication exclusively, the managed identity bypasses interactive MFA requirements because it authenticates at the platform level, not through a user credential flow.
One more thing: if your runbook calls webhooks and jobs suddenly stopped working, check whether your webhooks have expired. Webhook expiry is a common silent killer of previously working automation. In the portal, navigate to your runbook → Webhooks → check the expiration dates. Recreate any expired webhooks and update whatever downstream service was calling them.
Module problems generate some of the most confusing Azure Automation error messages. Errors like "Command not found", "Cannot bind parameter", or runbooks that work locally but fail in Azure are almost always a module version mismatch between your local PowerShell environment and what's installed in your Automation account.
First and most importantly: do not mix Az and AzureRM modules in the same runbook. This is not supported in Azure Automation. If your runbook imports both, it will fail. Pick one, the official recommendation is to use only Az modules going forward. Search your runbook code for any Import-Module AzureRM calls and remove them if you're already using Az cmdlets.
To update your Azure PowerShell modules in your Automation account, go to your Automation account → Modules under Shared Resources in the left panel → click Browse gallery to find and import updated versions, or click Update Azure Modules if the option is available. Module updates can take several minutes to complete.
After updating, check the module status, each module should show as Available. A module stuck in Importing state for more than 15 minutes usually indicates a dependency chain issue. In that case, check which modules yours depends on and import those first.
If your runbook uses Python packages and you're on Linux hybrid workers, Python package issues can also cause runbook execution failures. Verify the Python packages your runbook depends on are installed on the hybrid worker machine itself, not just listed in requirements, they need to actually be importable in the Python environment the worker uses.
A clean test after module updates: open your runbook in the portal → click Test pane → run with your test parameters. The test pane shows real-time output and lets you catch module errors before the runbook goes back into a scheduled production slot.
This one is specific to Linux hybrid workers and it's worth calling out separately because the fix is non-obvious and the symptom, a job showing Running indefinitely with no progress and no error, is deeply unhelpful on its own.
The root cause is a CPU quota restriction baked into the hwd.service systemd unit file. By default, the Hybrid Worker daemon is capped at CPUQuota=25%. On busy machines or for CPU-intensive runbooks, that ceiling causes the job to crawl to a halt while appearing to still be running.
Here's exactly how to fix it. SSH into your Linux hybrid worker machine and run:
sudo su
systemctl status hwd.service
Confirm the service is running, then open the unit file:
vi /lib/systemd/system/hwd.service
Find the line that reads CPUQuota=25% and change it to just CPUQuota= with nothing after the equals sign. Leaving it empty removes the restriction entirely and lets the worker use however much CPU the job actually needs. Save the file and exit vi (:wq), then reload the daemon and restart the service:
systemctl daemon-reload
systemctl restart hwd.service
After the restart, re-trigger your runbook job. You should see it progress normally instead of sitting frozen in Running state. If you also see unexpected password prompts when your runbook uses sudo commands on Linux, that's a separate issue, your sudoers configuration needs to allow passwordless sudo for the account the hybrid worker service runs under. Edit /etc/sudoers with visudo to add the appropriate NOPASSWD entry for the automation service account.
One more Linux-specific error to watch for: if your log file contains "The specified class does not exist", that's a class registration problem in the Python environment. The fix is documented under the "Class does not exist error" section of Microsoft's Linux Hybrid Runbook Worker troubleshooting guide.
Advanced Troubleshooting for Azure Automation Not Working
If the five steps above haven't resolved your Azure Automation issues, you're into more complex territory, usually involving enterprise environments, domain-joined machines, certificate problems, or edge cases in job execution behavior. Here's what to check.
Certificate errors on Hybrid Workers. The error "No certificate was found in the certificate store" appears when the machine registration certificate used by the hybrid worker is missing, expired, or in the wrong certificate store. This isn't something you can just regenerate manually without going through the proper re-registration process. Microsoft's hybrid worker troubleshooter has a dedicated "No Certificate Found" section, run the offline version of the agent registration script on the affected machine first, as it checks prerequisites and often identifies the exact certificate state issue.
Machine already registered error. If you see "Machine is already registered" when trying to add a hybrid worker, the machine's previous registration wasn't properly cleaned up before the re-add attempt. The troubleshooting guide for "Unable to add a hybrid runbook worker" covers the cleanup steps. This typically involves removing the old registration entries and then re-running the registration.
Runbook execution behaves differently on hybrid worker vs. Azure sandbox. I've seen this confuse people, the same runbook works fine when tested in the Azure sandbox but acts differently on the hybrid worker. This is almost always an authentication difference. Azure sandbox runbooks have access to certain implicit authentication contexts that hybrid workers don't. On a hybrid worker, authentication is handled at the machine level. Review the Runbook permissions documentation specifically for hybrid scenarios, because the auth model is genuinely different between the two execution environments.
Event 4502 deep dive. If you found Event ID 4502 in the Operations Manager log on a Windows hybrid worker, open Event Viewer and look at the full event details, specifically the Description field. It will tell you whether the issue is certificate-related, endpoint-reachability-related, or a proxy configuration problem. For machines behind a proxy, you need to configure proxy settings for the Hybrid Worker service explicitly, it doesn't pick up system proxy settings automatically in all configurations.
Runbooks that can't connect to Microsoft 365. Runbooks using Connect-MsolService and hitting a sandbox will fail to reach Microsoft 365 services due to sandbox network restrictions. The documented solution is to move this execution to a Hybrid Runbook Worker where outbound connectivity to Microsoft 365 endpoints is possible. This applies to any runbook that needs to connect to external SaaS services that aren't accessible from within Azure sandbox restrictions.
Recovering accidentally deleted runbooks. Azure Automation allows recovery of runbooks deleted within the past 29 days. If someone deleted a runbook that was central to your automation pipeline, it's not necessarily gone forever. You can restore it by running a PowerShell script as a job within your Automation account, the exact script and process is documented in Microsoft's "Restore a deleted runbook" guide.
If you've worked through all the steps in this guide and Azure Automation is still not working, especially if jobs are consistently failing at the infrastructure level rather than the runbook code level, or if you're seeing platform-level errors that don't match any documented error code, it's time to open a support ticket. Specifically, escalate if: the Test Cloud Connectivity tool passes but jobs still won't start; you've confirmed managed identity permissions are correct but still get subscription-not-found errors; or your Hybrid Worker shows as connected but never picks up jobs. Document the job IDs, the exact error text from all output streams, and the timestamp of the failures before you call, it speeds things up significantly. You can reach Microsoft Support directly, and for Azure-specific issues, a support ticket through the Azure portal itself routes faster than the general support site.
Prevention & Best Practices for Azure Automation
Once you've got everything running again, the goal is to make sure you're not back here in two weeks diagnosing the same Azure Automation runbook execution failures. These are the practices that actually make a difference based on what goes wrong most often in production environments.
Migrate to extension-based Hybrid Runbook Workers if you haven't already. The agent-based model was retired August 31, 2024. If you're still on it, you're running unsupported infrastructure. The extension-based model is more stable, gets security updates, and is what Microsoft's entire future roadmap is built around. Migration is documented and it's not a rip-and-replace, you can migrate worker by worker.
Use managed identities everywhere, for everything. Run As accounts are deprecated. Any runbook still using service principal credentials stored as variables is living on borrowed time. Managed identity authentication is more secure, doesn't have expiry dates to track, and eliminates an entire class of auth-failure scenarios. Make the switch proactively, not after an incident.
Set up output stream logging on all production runbooks. Runbooks that fail silently are the worst kind to debug. Add Write-Verbose statements at key checkpoints in your runbooks and enable verbose logging in your Automation account's diagnostic settings. Route these logs to a Log Analytics workspace so you have a queryable history of what happened and when, not just a pass/fail status.
Monitor job status with alerts. In the Azure portal, go to your Automation account → Alerts under Monitoring → create an alert rule triggered by job status conditions. Set an alert for any job transitioning to Failed or Suspended state. Getting notified within minutes of a failure is dramatically better than discovering automation has been broken for three days when someone notices a task didn't run.
Test every runbook in the Test pane before publishing. The test pane runs the runbook in an interactive session where you can see output in real-time. It's much faster than publish-and-test-cycle for catching module issues and auth problems before they hit your production schedules.
- Set webhook expiration reminders in your calendar, webhooks expire silently and kill automation with no error until something calls them
- Keep a single module version list for your Automation account and review it quarterly, stale modules are a slow-burn reliability problem
- Never mix Az and AzureRM modules in the same runbook, use only Az in new and migrated runbooks
- For memory-heavy runbooks, call
[GC]::Collect()and.Clear()on large objects proactively, before hitting the 400 MB sandbox limit
Frequently Asked Questions
Why does my Azure Automation runbook say "The subscription cannot be found" even though the subscription exists?
This error almost always means your runbook is not authenticated with a managed identity that has visibility to the subscription, it's an auth problem, not an actual missing subscription. Go to your Automation account → Identity → confirm the system-assigned managed identity is enabled. Then check Azure role assignments and make sure the identity has Contributor or an appropriate role on the target subscription. Inside your runbook, make sure you're calling Connect-AzAccount -Identity before any resource operations. If you recently migrated from a Run As account to managed identity, double-check that the old credential references in the runbook code were fully replaced.
My runbook jobs keep getting suspended after three attempts, how do I stop this from happening?
Jobs get suspended after three failed start attempts, so the fix is in preventing the underlying failure, not in the retry count itself. The most common causes are exceeding the Azure sandbox memory limit of 400 MB, hitting the 1,000 concurrent network socket ceiling, or a module incompatibility causing an immediate crash on startup. Check the job's Errors output stream first, that's where the actual failure reason hides. If it's a resource limit issue, move the runbook to a Hybrid Runbook Worker, which doesn't have the same sandbox constraints. If it's a module issue, update your Azure PowerShell modules in the Automation account's Modules section.
Can I use both Az and AzureRM modules in the same runbook?
No, this combination is explicitly not supported in Azure Automation. Running both in the same runbook will cause failures. The right move is to standardize on Az modules only, which is the current and supported path. Go through your runbook code, find any Import-Module AzureRM statements or AzureRM-prefixed cmdlets like Get-AzureRmVM, and replace them with their Az equivalents (e.g., Get-AzVM). The Az module command names are different from AzureRM, so it's not always a straight find-and-replace, you may need to consult the Az module documentation for equivalent cmdlet names.
My Linux Hybrid Runbook Worker job shows "Running" but never finishes, what's happening?
This is almost certainly the CPUQuota=25% restriction in the hwd.service systemd unit file throttling the worker's CPU so aggressively that the job stalls out. SSH into the Linux hybrid worker, open /lib/systemd/system/hwd.service with vi or nano, and change the line CPUQuota=25% to CPUQuota= (blank after the equals sign, which removes the cap). Then run systemctl daemon-reload followed by systemctl restart hwd.service. After the service restarts, re-queue your runbook job and it should progress normally.
I accidentally deleted an Azure Automation runbook, is it gone forever?
Not necessarily. Azure Automation can recover runbooks that were deleted within the past 29 days. You restore it by running a PowerShell script as a job directly in your Automation account, Microsoft's "Restore a deleted runbook" documentation has the exact script to use. Outside of that 29-day window, there's no platform-level recovery option, which is why exporting runbooks to a Git repository as part of your normal workflow is such a valuable habit. Source control gives you recovery options independent of the Azure platform's retention window.
What's the difference between running a runbook in Azure sandbox vs. on a Hybrid Runbook Worker, and when should I use each?
The Azure sandbox is Microsoft's managed execution environment, no infrastructure to maintain, but with the 400 MB memory limit, 1,000 socket limit, and restricted network access (can't reach on-premises resources or many SaaS endpoints). A Hybrid Runbook Worker is a machine you manage that runs the runbook locally, it can access on-premises network resources, has no sandbox resource limits, and can connect to services the sandbox can't reach. Use the sandbox for lightweight, cloud-native operations. Use a Hybrid Worker for anything that needs to touch on-premises systems, processes large data, opens many parallel connections, or needs to connect to external services like Microsoft 365 via Connect-MsolService.