If you've ever stared at a red error badge in your Azure DevOps pipeline or watched your App Service deployment spinner freeze at 73% before spitting out a cryptic error code, you're in good company. Azure deployments fail for dozens of reasons, misconfigured service principals, wrong runtime stacks, ZIP deploy timeouts, slot-swap snags, and everything in between. The good news is that almost every Azure deploy failure follows a pattern, and once you know the pattern, the fix is usually straightforward. This guide walks you through diagnosing and resolving the most common Azure deployment failures step by step, from quick wins to deep troubleshooting.
Why Azure Deployments Fail: The Big Picture
Before you start poking around log files, it helps to understand the three layers where Azure deployments can break down. Think of it like a three-story building: if the foundation is cracked, it doesn't matter how nice the top floor looks.
Layer 1, Authentication and Authorization. Your deployment pipeline needs a verified identity with the right permissions to push artifacts, write to storage, update App Service configurations, and trigger slot swaps. If the service principal is expired, the role assignment is missing, or the secret has rotated without being updated in your pipeline variables, the whole process stops before a single file moves.
Layer 2, Build and Artifact Packaging. Even if your credentials are perfect, a broken build, a missing environment variable, or a malformed deployment package will cause a failure at the artifact stage. This layer is often where developers look last, because the CI build itself may appear green while the artifact it produces is incomplete or misconfigured.
Layer 3, Runtime and Infrastructure. Sometimes the package deploys successfully but the application crashes on startup because the App Service plan is the wrong tier, the runtime version doesn't match, a connection string is wrong, or a required IP address is blocked by a network security group. These failures are the trickiest because the deployment pipeline reports success while your users see a 503.
Most Azure deploy problems live in Layer 1 or Layer 2. We'll tackle them in that order, then cover the sneakier Layer 3 scenarios in the advanced section.
Before You Do Anything: Collect the Evidence
Jumping straight into fixes without reading the error is the number one mistake people make. Azure gives you a lot of diagnostic data, you just have to know where to look.
In Azure DevOps, open the failed pipeline run, click the failing task, and expand the full log. Don't just read the last line. Scroll up until you find the first ERROR or FAILED entry, that's usually the root cause. Everything after it is often just cascading noise from the first failure.
Go to your App Service in the Azure portal, navigate to Deployment Center, and click Logs. Each deployment attempt has its own log entry. Click the timestamp of the failed deployment and read the raw output. For ZIP deploy failures especially, this log often contains the real error message that the pipeline UI abstracts away.
If the deployment itself succeeded but the app isn't starting, go to App Service → Monitoring → App Service Logs. Enable Application Logging (Filesystem) at the Verbose level. Then pull the log stream (Log Stream in the left menu) and restart the app. Watch the output in real time. This gives you the startup exception stack trace that tells you exactly what's wrong.
https://<your-app-name>.scm.azurewebsites.net) is your best friend for live debugging. From there you can browse the file system, run commands in a bash or PowerShell shell, and check the LogFiles/ directory for deployment and runtime logs, all without needing a VPN or SSH tunnel.
Step-by-Step Fix: The Most Common Azure Deploy Failures
This error means your service principal exists and authenticated successfully, but it doesn't have the right Azure role to perform the deployment action. Here's how to fix it:
- In the Azure portal, navigate to the target resource (App Service, Resource Group, or Subscription) and click Access Control (IAM).
- Click Add → Add role assignment.
- Choose the Contributor role (or a more scoped custom role if you follow least-privilege). For App Service deployments, Website Contributor is usually sufficient.
- Assign the role to your service principal. Search by the service principal's display name or application ID.
- Click Save and re-run the pipeline.
If you're using a federated credential (OIDC) instead of a client secret, also verify that the federated credential's Subject field exactly matches the pipeline's subject claim. For Azure DevOps, this is typically sc://<org>/<project>/<service-connection-name>.
Service principal client secrets expire. If your pipeline was working and suddenly fails with an authentication error, the secret has almost certainly rotated or expired.
- In the Azure portal, go to Azure Active Directory (Entra ID) → App registrations and find your service principal.
- Click Certificates & secrets. Check the expiration date on your client secret. If it's expired, click New client secret, set an appropriate expiry (12 or 24 months), and copy the value immediately, you won't see it again.
- In Azure DevOps, go to Project Settings → Service connections, find the affected service connection, click Edit, and paste the new secret value.
- Click Verify and save. Run a test pipeline to confirm authentication works.
This typically happens when the App Service's SCM site is locked or another deployment is in progress. The file system locks prevent the new package from being extracted.
- In the Azure portal, go to your App Service and click Deployment Center → Settings. If you see an active deployment in progress, wait for it to complete or cancel it.
- If deployments keep hanging, go to Configuration → Application Settings and add or verify the setting
WEBSITE_RUN_FROM_PACKAGE = 1. This tells Azure to mount your ZIP as a read-only package rather than extracting it to the wwwroot folder, which avoids file lock conflicts entirely. - If
WEBSITE_RUN_FROM_PACKAGEis already set to1and you're still hitting issues, the problem may be that the ZIP is being uploaded to blob storage and the SAS URL has expired. Re-run the pipeline to generate a fresh SAS URL.
Your pipeline task can't find the App Service. This usually means the name, resource group, or subscription in your pipeline YAML doesn't match what's actually in Azure.
- In your pipeline YAML, find the
AzureWebApporAzureRmWebAppDeploymenttask and check theappName,resourceGroupName, andazureSubscriptionparameters. - In the Azure portal, navigate to the App Service and copy the exact name from the Overview tab. Names are case-insensitive in most contexts, but trailing spaces or typos are common culprits.
- Make sure your service connection is connected to the correct subscription. Go to Project Settings → Service connections and verify the subscription ID shown matches your target.
- If you're deploying to a deployment slot, make sure the
slotNameparameter is set correctly (e.g.,staging) and that the slot actually exists in the Azure portal under Deployment slots.
The deployment pipeline says green, but your app is broken. This is a runtime issue, not a deployment issue, but it's the most stressful kind because your users see the error.
- Enable application logging as described in Step 3 of the evidence-collection section above.
- Check App Service → Diagnose and solve problems. Azure's built-in diagnostics will run checks against your app and often identify the root cause automatically, things like missing connection strings, wrong .NET runtime, or out-of-memory conditions.
- Go to Configuration → General settings and verify that the Stack settings (language, version) match what your app was built with. A Node.js 18 app deployed to an App Service configured for Node.js 14 will fail on startup.
- Check Configuration → Application settings and Connection strings. Missing required environment variables are the most common cause of startup crashes. Compare the required variables in your app's startup code against what's configured in the portal.
- If everything looks right in the portal, use the Kudu console to navigate to
D:\home\LogFiles\Applicationand read the latest log file directly.
UnhandledException stack trace that doesn't appear anywhere else.
Slot swaps are supposed to give you zero-downtime deployments, but they can fail if swap warmup isn't configured correctly or if the app doesn't pass its health check.
- In the Azure portal, go to Deployment slots → Swap. If the swap is failing with a health check error, go to Health check under Monitoring and verify that the health check path returns a 200 status on both the staging and production slots.
- If the swap completes but the production slot receives a broken build, your staging slot wasn't fully warmed up. Go to Configuration → General settings on the staging slot and add auto-swap triggers, or increase the Application Initialization module warmup delay.
- To roll back after a bad swap, just swap again, the previous production version is now sitting in the staging slot, ready to be swapped back instantly.
Advanced Troubleshooting: When the Obvious Fixes Don't Work
If you've worked through all the steps above and the deployment is still failing, you're dealing with a more unusual configuration issue. Here are the advanced scenarios we see most often.
Private Endpoints and Network Security Groups Blocking SCM
If your App Service is deployed inside a virtual network with private endpoints enabled, your pipeline agent needs to be able to reach the SCM endpoint (*.scm.azurewebsites.net) over the private network. A Microsoft-hosted Azure DevOps agent running on the public internet cannot reach a private endpoint by default.
The fix is one of three things: use a self-hosted agent deployed inside the same VNet, configure a Private DNS zone so the SCM endpoint resolves to the private IP, or temporarily enable SCM Basic Auth and whitelist the agent's IP range in the App Service's Access restrictions. For production environments, the self-hosted agent approach is almost always the right answer for long-term maintainability.
Docker Container Deploy Failures on Azure Container Apps or AKS
Container deployments add two extra failure points: the registry push and the container pull. If your pipeline logs show manifest unknown or pull access denied, the issue is almost always that the managed identity or service principal attached to your Container Apps environment or AKS cluster doesn't have the AcrPull role on your Azure Container Registry.
Fix it by going to your ACR in the portal → Access Control (IAM) → Add role assignment, and assigning AcrPull to the managed identity of your Container App or the kubelet identity of your AKS cluster. Changes take effect within a few minutes, but you may need to trigger a new revision or pod restart to see the fix apply.
Terraform or Bicep Infrastructure Deploy Failures
If you're deploying infrastructure-as-code alongside your app and the IaC step fails, the most common causes are state file lock conflicts (for Terraform) and API throttling (for both). For Terraform state locks, check your Azure Storage account for a .lock blob in the state container. If the lock is stale (the pipeline that acquired it is no longer running), delete the lock blob manually from the Azure portal and re-run. For API throttling errors (429 Too Many Requests), add retry logic in your pipeline, Azure's Resource Manager API throttles at 1,200 write requests per hour per subscription.
GitHub Actions Deployments Failing with OIDC Token Issues
If you're using GitHub Actions with OIDC (federated credentials) rather than a stored client secret, the token exchange can fail if the audience or subject in the federated credential doesn't match what GitHub sends. Double-check that the subject in your Entra ID federated credential exactly matches the GitHub Actions context: for a branch deploy it's repo:<owner>/<repo>:ref:refs/heads/<branch>, and for an environment it's repo:<owner>/<repo>:environment:<environment-name>. A single character mismatch will cause a silent 401.
Kudu Timeout During Large Deployments
Very large deployment packages (over 500 MB) can exceed the default SCM timeout of 600 seconds, causing the deploy to fail mid-extraction even though the upload completed. There are two solutions: either switch to WEBSITE_RUN_FROM_PACKAGE = 1 (which avoids extraction entirely by mounting the ZIP), or increase the SCM timeout by adding the application setting SCM_COMMAND_IDLE_TIMEOUT = 1800 to your App Service configuration.
Prevention: How to Make Azure Deployments Bulletproof
The best way to handle Azure deploy failures is to prevent them. Here are the practices that will dramatically reduce your incident rate.
Rotate and Monitor Service Principal Secrets Proactively
Don't wait for a secret to expire and break your production pipeline on a Friday afternoon. Set up an Azure Monitor alert rule on your App Registration's credential expiry date. You can do this with an Azure Policy or a simple Logic App that reads the passwordCredentials via Microsoft Graph API and sends an email when any secret is within 30 days of expiry. Better yet, migrate to federated credentials (OIDC) entirely, they have no secrets to rotate.
Use Deployment Slots for Every Production Deployment
Never deploy directly to your production slot. Always deploy to a staging slot first, run a smoke test against the staging URL, and then perform the slot swap. This gives you an instant rollback option if something goes wrong, and it eliminates the brief period of downtime that occurs when a direct production deployment restarts your app.
Pin Your Runtime Versions Explicitly
Azure App Service occasionally auto-updates minor runtime versions, which can break apps that depend on a specific patch version. In your App Service configuration, always set the exact runtime version you need (e.g., WEBSITE_NODE_DEFAULT_VERSION = ~20 or the specific 20.11.0 if you need it). Also pin your pipeline task versions (AzureWebApp@1 → AzureWebApp@1.230.0) to avoid surprise breaking changes from task updates.
Validate Your Deployment Package Locally Before Pushing
If you're using ZIP deploy, extract the ZIP locally and verify its contents before every release. It sounds obvious, but many deployment failures are caused by a misconfigured .gitignore or .deployignore that accidentally excludes required files from the artifact. A quick unzip -l release.zip | head -50 before pushing saves a lot of debugging time.
Implement a Health Check Endpoint
Configure a /health endpoint in your app that returns a 200 only when all critical dependencies (database, cache, external APIs) are reachable. Then configure this as your App Service health check path. Azure will automatically remove unhealthy instances from the load balancer and prevent unhealthy slots from being swapped into production. This single change prevents an entire category of "deployed successfully but broken" incidents.
Use Infrastructure as Code for All Configuration
Manual portal changes to App Service configuration are the silent killers of deployment reliability. When someone changes a connection string in the portal and doesn't update the Bicep or Terraform template, the next infrastructure deployment will revert their change and break the app. Keep all configuration, application settings, connection strings, runtime versions, VNet integration settings, in your IaC, and treat the portal as read-only for configuration.
Frequently Asked Questions
This almost always happens when WEBSITE_RUN_FROM_PACKAGE is set to a blob storage URL value rather than 1. When a URL is set, Azure loads the app from that static blob, and if the URL points to an older artifact, the new deployment has no effect. Go to Configuration → Application settings, find WEBSITE_RUN_FROM_PACKAGE, and either set it to 1 (to always use the latest deployed package) or update it to the new blob URL. Also check that your browser isn't serving a cached version, do a hard refresh with Ctrl+Shift+R and check the app in a private window.
This usually means either your build step is failing silently and not producing the artifact the deploy step expects, or the path in your package parameter no longer matches where the build outputs its files. Check your workflow YAML for the path you're passing to the deploy action (often something like ./dist or ./publish). Then check the Actions run log from the build step, if the build step failed or produced output in a different directory, the deploy step won't find anything. Run the build locally and verify the output path matches what your workflow expects.
The cleanest approach is to use Azure DevOps Environments with approval gates, or GitHub Actions Environments with required reviewers. In your pipeline YAML, define separate stages for each environment, each pointing to a different service connection and App Service name. Use variable groups (Azure DevOps) or environment secrets (GitHub Actions) to store environment-specific settings like connection strings and resource names. Set the production stage to require manual approval before it runs. This gives you a single pipeline YAML that deploys through all environments in sequence with appropriate gates.
Deployment Center is Azure's built-in continuous deployment feature that connects directly to a source code repository (GitHub, Azure Repos, Bitbucket) and deploys automatically on each push using Oryx or Kudu under the hood. Pipeline tasks (like AzureWebApp@1) are more flexible, they run as part of a broader CI/CD pipeline where you can run tests, build artifacts, manage approvals, and deploy to multiple environments in sequence. For anything beyond a simple personal project, pipeline tasks are strongly preferred because they give you full control over the deployment workflow. Using both at the same time can cause conflicts, so pick one approach and stick with it.
If you're using deployment slots, the rollback is instant, just swap the staging and production slots again. The previous production version is sitting in the staging slot waiting. If you're deploying directly without slots, Azure keeps a deployment history. Go to Deployment Center → Logs, find the last successful deployment, click the three-dot menu next to it, and select Redeploy. This reploys the exact artifact from that deployment. For production systems, we strongly recommend always using slots specifically because of this rollback capability, a swap takes about 20–30 seconds versus the 2–5 minutes it takes to re-run a full pipeline.
This error means the Container App couldn't pull your container image from the registry. There are three things to check. First, verify the image name and tag in your Container App revision are exactly correct, including the registry hostname, repository name, and tag. Second, make sure the Container App's managed identity has the AcrPull role on your Azure Container Registry (or that you've provided valid registry credentials if you're using credential-based auth). Third, if your ACR has network restrictions enabled (private endpoint or firewall rules), ensure the Container Apps environment is allowed to reach it, typically by adding the Container Apps environment's subnet to the ACR's allowed virtual networks. The Azure portal's Activity log on the ACR resource will show 403 or 401 errors if it's a permissions issue.
Yes, and you should. The modern approach is to use workload identity federation (OIDC), which lets your pipeline authenticate to Azure using a short-lived token issued by your CI/CD platform (GitHub Actions or Azure DevOps) without any stored secrets at all. In Azure Entra ID, you create an App Registration, add a federated credential that trusts tokens from your CI/CD platform, assign the appropriate roles to the service principal, and configure your pipeline to request an OIDC token at runtime. Neither the pipeline variables nor your repository ever stores a long-lived secret. This approach also eliminates the entire category of "expired secret" failures. Microsoft's documentation for this is under "Configure workload identity federation" in the Entra ID docs.