How to Fix Azure Jenkins Pipeline Errors
Why This Is Happening
I've seen this exact situation on dozens of Azure projects: a developer sets up a Jenkins pipeline to build Docker images, push them to Azure Container Registry, and deploy to AKS , and somewhere in that chain, something silently breaks. The build passes. The push fails. Or the push succeeds but kubectl apply pulls the old image. Or everything appears to work but the load balancer never gets an external IP.
Azure Jenkins pipeline errors are maddening precisely because the failure messages are so generic. You get something like unauthorized: authentication required or Error response from daemon: Get https://myregistry.azurecr.io/v2/: unauthorized and Jenkins just marks the stage red with no further context. Meanwhile the actual root cause , an expired service principal, a missing ACR role assignment, or a misconfigured Jenkinsfile image tag, is buried three levels deep.
Here's the short version of what goes wrong most often. Jenkins runs in its own execution context, which means it doesn't automatically inherit the Azure CLI session from your local terminal. Your local machine might already be authenticated with az acr login, but the Jenkins agent is not. That's why the pipeline fails on the Docker push step while the exact same commands work fine when you run them manually on your dev machine.
There's also a common image tagging problem. When developers set up Azure Jenkins pipeline integration for the first time, they frequently forget to replace the placeholder registry name in the Kubernetes manifest file. The azure-vote-all-in-one-redis.yaml manifest ships with azuredocs/azure-vote-front as the image reference on line 60. If you apply that manifest without updating it to point at your own ACR login server, your AKS cluster will either pull a stale public image or fail entirely when internet access is restricted.
Then there's the watch problem. Jenkins pipelines don't handle interactive commands well. Running kubectl get service --watch inside a pipeline stage will block indefinitely, the stage never completes because it's waiting for user input (Control + C) that will never come. I've seen this eat entire build queues in CI environments.
The good news: every one of these issues is fixable without rewriting your pipeline from scratch. This guide walks you through the exact sequence, grounded in how Azure ACR authentication and AKS deployment actually work. Browse all Microsoft fix guides →
The Quick Fix, Try This First
If your Azure Jenkins pipeline is failing on the Docker push step with an authentication error, this single fix resolves it in about 80% of cases. The root cause is almost always that the Jenkins agent hasn't authenticated to your Azure Container Registry before attempting the push.
Open your Jenkinsfile and locate the stage where you're running docker push. Add an explicit ACR login step immediately before it:
stage('Login to ACR') {
steps {
sh 'az acr login --name myacrregistry'
}
}
stage('Push Image to ACR') {
steps {
sh 'docker push myacrregistry.azurecr.io/azure-vote-front:v1'
}
}
Replace myacrregistry with your actual ACR name, not the full login server URL, just the registry name. The az acr login command handles the token exchange behind the scenes and configures the Docker daemon on the Jenkins agent to authenticate with your registry. Once you've done that, the push command has the credentials it needs.
But there's a dependency here: the Jenkins agent must have the Azure CLI installed and must be running under a service identity that has the AcrPush role assigned on your ACR instance. If az acr login itself fails, jump ahead to Step 2 in this guide to verify the service principal configuration.
If you're on a self-hosted Jenkins agent, also confirm that Docker is installed and the jenkins user is in the docker group. On Linux agents this is a frequent oversight, the Jenkins service account can't reach the Docker socket without that group membership, and you'll get a confusing permission denied error that looks nothing like an auth problem.
:v1 or :${BUILD_NUMBER}) rather than :latest before pushing to ACR. AKS aggressively caches the :latest tag and may not pull your freshly pushed image even after a successful pipeline run, this is one of those maddening Azure Jenkins pipeline issues that takes hours to diagnose the first time you hit it.
Before your Azure Jenkins pipeline can push anything to Azure Container Registry, the image must be tagged with your full ACR login server name. This is a step that trips up almost every developer on first setup, and it's because the Docker tag format for ACR is different from Docker Hub.
Your ACR login server follows the format <registryname>.azurecr.io. You need this full hostname in the tag, not just the registry name. In your Jenkinsfile, the build and tag stage should look like this:
stage('Build and Tag') {
steps {
sh '''
docker build -t azure-vote-front .
docker tag azure-vote-front myacrregistry.azurecr.io/azure-vote-front:v1
'''
}
}
The docker tag command here takes the locally built image (azure-vote-front) and creates a new alias for it that includes the full ACR login server path and a version label. Without this step, when you run docker push, Docker has no idea which registry to target and will either fail or attempt to push to Docker Hub by default.
If you want to use the Jenkins build number as the version tag (recommended for traceability), swap :v1 for :${env.BUILD_NUMBER} in Groovy syntax. This gives every build a unique tag in your ACR repository, which makes rollbacks much cleaner.
After this stage runs successfully, you should see no errors in the Jenkins console output, just the layer IDs confirming the tag was applied. If you see invalid reference format, double-check that there are no spaces or uppercase letters in your registry name. ACR names must be lowercase alphanumeric only.
This is the most critical step in the entire Azure Jenkins pipeline setup, and it's where most pipeline failures actually live. The Jenkins agent needs to be authenticated to ACR before it can push images. There are two approaches: using the Azure CLI (az acr login) or using a service principal with docker login directly. Both work, pick the one that fits your environment.
Option A, Azure CLI login (simpler for managed agents):
stage('ACR Authentication') {
steps {
sh 'az acr login --name myacrregistry'
}
}
This approach requires the Azure CLI to be installed on the Jenkins agent and the agent's managed identity or service principal to have the AcrPush role on the registry. To verify the role is assigned, go to the Azure Portal → your ACR resource → Access Control (IAM) → Role Assignments, and confirm your service principal appears under AcrPush.
Option B, Service principal credentials (more explicit, works anywhere):
stage('ACR Authentication') {
environment {
ACR_CREDS = credentials('acr-service-principal')
}
steps {
sh 'docker login myacrregistry.azurecr.io -u $ACR_CREDS_USR -p $ACR_CREDS_PSW'
}
}
For Option B, store your service principal's client ID and secret as a Jenkins Username/Password credential with the ID acr-service-principal. The service principal needs the built-in AcrPush role, not Owner or Contributor, scope it narrowly to just the registry for security.
Once authentication succeeds, you'll see Login Succeeded in the Jenkins console. If you see UNAUTHORIZED, the role assignment is almost certainly missing. Give it up to five minutes after assigning the role for Azure RBAC propagation before retrying.
With authentication in place and the image correctly tagged, the push step itself is straightforward. But there are a few pipeline-specific gotchas that can still cause it to fail even when everything looks right.
stage('Push to ACR') {
steps {
sh 'docker push myacrregistry.azurecr.io/azure-vote-front:v1'
}
}
Replace myacrregistry with your actual ACR login server name, the same one you used in the tag step. For example, if your registry is named contosoprod, the full push command would be docker push contosoprod.azurecr.io/azure-vote-front:v1.
Watch the Jenkins console output carefully during this stage. You should see each image layer being pushed with a progress indicator, followed by a final line like v1: digest: sha256:abc123... size: 694. That digest confirmation means the push completed and ACR has accepted the image. No digest line means something interrupted the transfer.
If the push hangs indefinitely on large images, you're likely hitting a network timeout between the Jenkins agent and the ACR endpoint. This is common when the agent is outside Azure (on-prem or a different cloud). Check that outbound HTTPS traffic to *.azurecr.io on port 443 is not being blocked by a firewall or proxy. You can test connectivity directly from the agent with curl -I https://myacrregistry.azurecr.io/v2/, a 401 response actually means connectivity is fine (you're reaching the registry, just not yet authenticated). A connection timeout means a network-level block.
Once the push succeeds, verify the image appeared in your registry by checking Azure Portal → Container registries → your registry → Repositories. The azure-vote-front repository should be listed with the v1 tag.
This is the single most overlooked step in Azure Jenkins pipeline to AKS deployments. The azure-vote-all-in-one-redis.yaml manifest file that ships with the Azure vote sample uses azuredocs/azure-vote-front as the image reference. That points to a public Docker Hub image maintained by Microsoft, not your ACR registry. If you deploy that manifest unchanged, AKS will pull Microsoft's public image and completely ignore the one you just built and pushed.
Open the manifest file and navigate to approximately line 60. You'll find a containers section that looks like this:
containers:
- name: azure-vote-front
image: azuredocs/azure-vote-front
Change that image reference to point at your ACR registry:
containers:
- name: azure-vote-front
image: myacrregistry.azurecr.io/azure-vote-front:v1
In a Jenkins pipeline, you'll want to automate this substitution rather than manually editing the file each time. Use sed to do an in-place replacement as part of your deploy stage:
stage('Update Manifest') {
steps {
sh "sed -i 's|azuredocs/azure-vote-front|myacrregistry.azurecr.io/azure-vote-front:v1|g' azure-vote-all-in-one-redis.yaml"
}
}
Or, if you're parameterizing the version tag with the build number, pull it from an environment variable:
sh "sed -i 's|azuredocs/azure-vote-front|myacrregistry.azurecr.io/azure-vote-front:${env.BUILD_NUMBER}|g' azure-vote-all-in-one-redis.yaml"
After this stage, quickly print the relevant line to confirm the substitution worked: sh "grep 'image:' azure-vote-all-in-one-redis.yaml". If it still shows azuredocs, the sed pattern didn't match, check for extra whitespace or different quote styles in your YAML file.
With the manifest corrected, the deployment stage applies it to your AKS cluster and then verifies the load balancer came up successfully. This stage has a common Jenkins-specific problem: the --watch flag on kubectl get service blocks the pipeline indefinitely. Here's how to do it correctly.
stage('Deploy to AKS') {
steps {
sh 'kubectl apply -f azure-vote-all-in-one-redis.yaml'
}
}
stage('Verify Load Balancer') {
steps {
script {
def externalIp = ''
def attempts = 0
while (externalIp == '' && attempts < 20) {
sleep(15)
externalIp = sh(
script: "kubectl get service azure-vote-front -o jsonpath='{.status.loadBalancer.ingress[0].ip}'",
returnStdout: true
).trim()
attempts++
}
if (externalIp) {
echo "Application available at: http://${externalIp}"
} else {
error("Load balancer IP not assigned after 5 minutes")
}
}
}
}
The kubectl apply command creates the Kubernetes load balancer service and tells AKS to schedule the pods. The verification loop polls every 15 seconds up to 20 times (5 minutes total), which is usually enough time for Azure to provision the external IP. This approach doesn't block indefinitely the way --watch does.
If your kubectl commands fail with error: the server doesn't have a resource type "service" or similar, your Jenkins agent likely hasn't been configured with the AKS kubeconfig. Run az aks get-credentials --resource-group myResourceGroup --name myAKSCluster on the agent to set it up, or add this as a pipeline step using an Azure CLI credential. The kubeconfig gets written to ~/.kube/config on the agent and persists across builds.
Advanced Troubleshooting
Service Principal Expiry and Rotation
I've seen more Azure Jenkins pipeline outages caused by expired service principals than by any other single issue. Service principal secrets default to a one-year expiry in Entra ID (formerly Azure Active Directory), and nobody puts a calendar reminder for that. One day the pipeline just starts failing with ClientAuthenticationError and everyone panics thinking something changed in Azure.
To check the expiry on your current service principal, run this in Azure CLI:
az ad app credential list --id <appId> --query "[].{EndDate:endDateTime}" -o table
If the end date has passed, create a new secret, update it in Jenkins Credentials Manager (Manage Jenkins → Credentials → your ACR credential), and update the value. The pipeline will pick up the new secret on the next run without any code changes.
AKS Image Pull Secrets
When AKS can't pull your image from ACR, the pod will be stuck in ImagePullBackOff state. This is a different failure from the pipeline push failing, the push succeeds, the manifest deploys, but the pods never start. Check pod status with:
kubectl describe pod -l app=azure-vote-front
Look for Failed to pull image in the Events section. The fix is to attach your ACR to your AKS cluster using the managed identity integration. In the Azure Portal, go to your AKS cluster → Settings → Integrations → Container Registry, and link it there. This grants the AKS kubelet identity the AcrPull role automatically, no image pull secret required.
Namespace and Context Mismatches
If kubectl apply reports success but you can't find the deployed resources, your kubeconfig context might be pointing at the wrong cluster or namespace. Always explicitly specify the namespace and kubeconfig context in your Jenkins pipeline rather than relying on defaults:
kubectl apply -f azure-vote-all-in-one-redis.yaml --namespace production --context myAKSCluster
Event Viewer Equivalent: Azure Monitor Logs
For persistent pipeline issues that don't surface obvious errors, check Azure Monitor → Log Analytics workspace linked to your AKS cluster. Query for ContainerLog entries to see what the pods are actually outputting at runtime. This is the Azure equivalent of Windows Event Viewer and surfaces errors that never make it back to Jenkins.
If your ACR role assignments look correct, your service principal is valid, the image tags match, and you're still getting authentication failures on push, open a support ticket. You may have hit an ACR-specific networking issue, a geo-replication sync lag, or a permissions propagation bug that requires backend investigation. Go to Microsoft Support, select Azure, then Container Registry, and attach the output of az acr check-health --name myacrregistry --ignore-errors, that single command output saves the support engineer at least 30 minutes of diagnosis.
Prevention & Best Practices
Once you've fixed a broken Azure Jenkins pipeline, the last thing you want is to fix it again next quarter. These practices come from managing CI/CD pipelines across dozens of Azure environments, the ones that stay stable long-term all share these habits.
Use build number tags, never latest. The :latest tag causes silent deployment failures because both Docker and Kubernetes cache it aggressively. When you push a new :latest image, AKS may not pull it. Tag every image with ${BUILD_NUMBER} or a Git SHA, pass that tag through your pipeline as a variable, and use it consistently in both the docker tag command and the Kubernetes manifest substitution. You get full traceability and clean rollbacks for free.
Validate the manifest before applying it. Add a dry-run step before kubectl apply: kubectl apply --dry-run=client -f azure-vote-all-in-one-redis.yaml. This catches manifest syntax errors, wrong API versions, and, most importantly, cases where the image reference substitution failed silently. If dry-run passes with your ACR image reference visible, the actual apply will work.
Store ACR credentials in Jenkins Credentials Manager, never in Jenkinsfile. Hardcoding registry names or service principal secrets directly in Jenkinsfile is a security risk and a maintenance burden. Use the credentials() binding, it masks the values in console output and makes rotation a one-stop update in Jenkins settings rather than a code change.
Set a short pipeline timeout on the deploy stage. Wrap your deploy stage in a timeout(time: 10, unit: 'MINUTES') block. This prevents a hung kubectl command from tying up the Jenkins executor indefinitely and makes failures obvious rather than silent.
- Run
az acr check-health --name <registry>monthly to catch connectivity and config drift before it breaks a build - Set ACR service principal secrets to 2-year expiry and add a calendar reminder at 22 months to rotate before they expire
- Pin your
kubectlversion on Jenkins agents to match the minor version of your AKS cluster, version skew causes unpredictable API errors - Enable ACR geo-replication if your Jenkins agents and AKS clusters are in different Azure regions, it cuts push/pull latency and eliminates cross-region throttling as a failure mode
Frequently Asked Questions
Why does my Azure Jenkins pipeline fail with "unauthorized: authentication required" even after az acr login?
This usually means the az acr login command ran successfully on one Jenkins agent but the Docker push is executing on a different agent in your pool. Docker credentials are stored per-machine, so the agent that authenticated is not the one doing the push. Fix this by either pinning the entire pipeline to a single agent with a node('acr-agent') block, or by switching to explicit docker login with service principal credentials (Option B in Step 2) which doesn't depend on a cached token. Also double-check that the AcrPush role is assigned to the service identity, a missing role assignment returns the same "unauthorized" error and looks identical to an expired token.
kubectl apply says "configured" but my AKS pods are still running the old image, why?
"Configured" means Kubernetes accepted the manifest change, not that it immediately pulled a new image. If you're using the :latest tag (or any tag that existed before), Kubernetes won't pull a new image unless the imagePullPolicy is set to Always. The default is IfNotPresent, which means "use the cached version if the tag already exists locally on the node." Switch to unique version tags per build, or add imagePullPolicy: Always to your container spec in the manifest, though the tag-based approach is cleaner for production. After applying, check the pod's container image with kubectl describe pod <pod-name> | grep Image to confirm which version is actually running.
How do I find my ACR login server name to use in docker tag and docker push commands?
The fastest way is via Azure CLI: run az acr list --query "[].{name:name,loginServer:loginServer}" -o table and you'll see all your registries and their login server URLs in one shot. The login server always follows the pattern <registryname>.azurecr.io. You can also find it in the Azure Portal by navigating to Container registries → your registry → Overview, it's listed as "Login server" right at the top of the essentials panel. Store this value as a Jenkins environment variable at the top of your Jenkinsfile (e.g., ACR_SERVER = 'myacrregistry.azurecr.io') so you only have to update it in one place if you ever change registries.
My AKS load balancer service is stuck on "pending" external IP, how long should I wait?
Typically 2–5 minutes for standard Azure Load Balancer provisioning in most regions. If it's been more than 10 minutes and the external IP is still <pending>, something is wrong. Check the AKS cluster's node resource group in the Azure Portal, it should contain a new public IP resource and load balancer being provisioned. If those resources aren't appearing, your AKS cluster's managed identity may be missing the Network Contributor role on the node resource group, which prevents it from creating public IPs. Also check AKS cluster events with kubectl get events --sort-by='.lastTimestamp' for error messages about IP allocation failures. Regional IP quota exhaustion is another culprit in enterprise subscriptions, check your subscription's public IP quota in Azure Portal → Subscriptions → Usage + quotas.
Can I run the Azure Jenkins pipeline without installing the Azure CLI on the Jenkins agent?
Yes, use the service principal approach with direct docker login (Option B in Step 2). You'll need the service principal's client ID and client secret, both stored as Jenkins credentials. The docker login myregistry.azurecr.io -u <clientId> -p <clientSecret> command authenticates Docker directly against ACR without any Azure CLI dependency. For the kubectl apply step, you can also avoid Azure CLI by pre-generating the kubeconfig and storing it as a Jenkins secret file, then reference it with KUBECONFIG environment variable. This approach works well in air-gapped environments or on minimal Jenkins agents where installing the full Azure CLI toolchain isn't practical.
What's the difference between AcrPush and Contributor roles when setting up Jenkins ACR access?
Use AcrPush, it's a built-in Azure role that gives exactly the permissions Jenkins needs: pushing images to the registry and nothing else. Contributor is a much broader role that grants the service principal permission to modify the registry's configuration, delete repositories, change firewall rules, and more. Giving a CI/CD service account Contributor access to ACR violates the principle of least privilege and creates a significant blast radius if the service principal credentials are ever compromised. Stick with AcrPush for the push pipeline and AcrPull for the AKS pull identity, these two narrow roles cover the entire build-and-deploy workflow with minimal exposure.