How to Fix Azure Kubernetes Fleet Manager Issues

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

I've seen this exact situation more times than I can count , you're managing a sprawling set of AKS clusters across multiple Azure regions and subscriptions, someone decides it's time to bring Azure Kubernetes Fleet Manager into the picture, and then the whole thing stalls out before you even finish the setup wizard. Permissions errors, member clusters refusing to join, hub cluster modes that behave differently than expected, update runs that just... don't run. It's genuinely maddening, especially when your team is counting on you to have multi-cluster orchestration working before the next release window.

Azure Kubernetes Fleet Manager is Microsoft's answer to a real problem: once you have more than two or three Kubernetes clusters, managing upgrades, workload placement, and networking coordination manually becomes a full-time job. Fleet Manager gives platform administrators a single control plane to handle all of it , safe rolling upgrades across member clusters, intelligent resource placement using cluster labels and properties, DNS-based load balancing, and centralized monitoring. When it works, it's genuinely transformative. When it doesn't, the error messages are cryptic and the surface area for things going wrong is enormous.

The most common reason Azure Kubernetes Fleet Manager setup fails is a permissions gap. Fleet Manager requires a very specific set of RBAC permissions, both on the Fleet resource itself and on every cluster you want to join as a member. Miss even one of those permission assignments, and you get vague authorization errors that don't tell you which specific permission is missing. Beyond permissions, the other big trip wire is choosing the wrong hub cluster mode at creation time. This is a one-way door: you pick "With hub cluster" or "Without hub cluster" when you create the Fleet resource, and that choice permanently determines which features you get. The "With hub cluster" mode enables multi-cluster resource placement, managed Fleet namespaces, and DNS load balancing, but it also costs more and adds latency. "Without hub cluster" mode is leaner but only gives you safe cluster upgrades and observability. Most teams pick the wrong one and don't realize it until they try to use a feature that isn't available in their mode.

There's also the version compatibility angle. Member clusters have to be on supported Kubernetes versions, both AKS clusters and Arc-enabled Kubernetes clusters have their own validation requirements. If a cluster is running a deprecated version, Fleet Manager will silently refuse to add it or will add it but then fail during update runs in ways that aren't obvious from the portal.

If you're dealing with any of these frustrations right now, you're in exactly the right place. This guide walks through every common failure mode, starting with the fastest fixes and working up to deeper troubleshooting. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before diving into anything complicated, the single fastest fix for most Azure Kubernetes Fleet Manager problems, especially "member cluster won't join" and "permission denied" errors, is to verify that your identity has all six of the required permission sets assigned at the correct scope.

Open the Azure portal and navigate to your subscription's Access control (IAM) blade. Click Check access, search for your user account or service principal, and verify the following permissions exist. You need all of these, not just some:

For the Fleet resource itself, your identity needs: Microsoft.ContainerService/fleets/read, Microsoft.ContainerService/fleets/write, Microsoft.ContainerService/fleets/members/read, Microsoft.ContainerService/fleets/members/write, Microsoft.ContainerService/fleetMemberships/read, and Microsoft.ContainerService/fleetMemberships/write.

For each AKS member cluster, you additionally need: Microsoft.ContainerService/managedClusters/read, Microsoft.ContainerService/managedClusters/write, and critically, Microsoft.ContainerService/managedClusters/listClusterUserCredential/action. That last one gets skipped constantly and it's almost always the reason cluster joins silently fail.

For Arc-enabled Kubernetes clusters (if applicable), you need: Microsoft.Kubernetes/connectedClusters/read, Microsoft.KubernetesConfiguration/extensions/read, Microsoft.KubernetesConfiguration/extensions/write, and Microsoft.KubernetesConfiguration/extensions/delete.

Once you've confirmed permissions are fully in place, go back to your Fleet Manager resource in the portal, navigate to Member clusters in the left sidebar, and try adding the cluster again. Nine times out of ten, this is the entire fix. The portal doesn't surface exactly which permission is missing, it just fails, so a systematic check is the only reliable approach.

Pro Tip
Assign permissions at the resource group scope rather than individual resource scope wherever possible. Fleet Manager often needs to enumerate clusters across a resource group during member discovery, and resource-level permissions alone can cause intermittent failures that are nearly impossible to reproduce consistently.
1
Create the Fleet Manager Resource with the Right Hub Cluster Mode

This is the step that permanently shapes what your Fleet can and can't do, so get it right the first time. Sign into the Azure portal, select Create a resource from the home page, and search for Kubernetes Fleet Manager. Select Create > Kubernetes Fleet Manager from the results.

On the Basics tab, fill in your subscription, resource group, a meaningful Fleet name, and the target region. Then stop at Hub cluster mode and think carefully before clicking.

Choose With hub cluster if you need any of the following: deploying Kubernetes resources across multiple clusters using resource placement, managed Fleet namespaces with enforced quotas and network policies, DNS-based load balancing across cluster service endpoints, or staging resources from Git repositories using Automated Deployments. This mode spins up a dedicated hub cluster that acts as the control plane for placement decisions.

Choose Without hub cluster only if you exclusively need safe multi-cluster Kubernetes version and node image upgrades, plus centralized monitoring. It's lighter and cheaper, but you simply cannot add hub cluster features later without destroying and recreating the Fleet resource.

Once you've made that choice, click Next: Member clusters. Use the + Add button to add existing clusters, you can filter by name using the search box. For each member cluster you add, the portal generates a name automatically; edit it if you need something more descriptive. Critically, set an update group for each member cluster at this stage. Update groups control how Fleet Manager batches cluster upgrades, clusters in the same group are updated together, and groups are sequenced to protect production from a bad update hitting everything at once.

After adding members, click Next: Advanced. If you need a private hub cluster with API server VNet integration, check Private hub access and select your virtual network, cluster subnet, and API server subnet. You'll also assign or create a user-assigned managed identity here. Once done, add any resource tags you need and select Review + create > Create. Deployment typically takes three to five minutes.

2
Verify Member Cluster Version Compatibility Before Joining

One of the most time-wasting errors in Azure Kubernetes Fleet Manager troubleshooting is discovering, after an hour of permission debugging, that the cluster itself is the problem, specifically that it's running an unsupported Kubernetes version. Fleet Manager enforces version compatibility for both AKS clusters and Arc-enabled Kubernetes clusters, and the validation happens at join time.

For AKS member clusters, check your current version against Microsoft's AKS version support policy. Open each cluster in the portal and navigate to Settings > Cluster configuration. The current Kubernetes version is displayed there. AKS follows an N-2 support model, only the three most recent minor versions are supported at any time. If your cluster is on a version outside that window, Fleet Manager's member join will fail or behave unpredictably.

You can also check this quickly using the Azure CLI:

az aks show \
  --resource-group <your-rg> \
  --name <your-cluster-name> \
  --query "kubernetesVersion" \
  --output tsv

Then compare against the supported versions list:

az aks get-versions \
  --location <your-region> \
  --output table

For Arc-enabled Kubernetes clusters, you need to check the Azure Arc-enabled Kubernetes validation matrix separately, the supported versions are different from AKS. Navigate to the cluster in the portal under Azure Arc > Kubernetes clusters and confirm the connected agent version and Kubernetes distribution are in the validated list.

If you find a cluster that's out of support, upgrade it before attempting to join it to Fleet Manager. Trying to add an out-of-support cluster and then working backwards to figure out why it failed is a frustrating loop you don't need to be in. Once version compatibility is confirmed for all intended member clusters, proceed to adding them through the Fleet Manager Member clusters blade, the joins should complete cleanly.

3
Configure Update Groups and Run Your First Safe Multi-Cluster Update

Getting update runs working correctly is where most teams hit their second major wall with Azure Kubernetes Fleet Manager, right after the initial setup. The concept is straightforward, update runs let you roll Kubernetes version and node image upgrades across all member clusters in a controlled, sequenced way, but the configuration has to be right or the run either refuses to start or executes in an unintended order.

Every member cluster must be assigned to an update group before it can participate in an update run. If you skipped that during setup, fix it now: go to your Fleet Manager resource, click Member clusters, select a cluster, and assign an update group name. Update group names are free-form strings, use something meaningful like dev, staging, production to reflect your rollout tiers.

Next, define a reusable update strategy. In the portal, navigate to Multi-cluster update > Update strategies and create a new strategy. A strategy defines stages, each stage contains one or more update groups, and stages execute sequentially. You might define Stage 1 as dev, Stage 2 as staging, and Stage 3 as production, with pause conditions between stages so you can verify each tier before the next one gets touched.

With groups and a strategy in place, create an update run: go to Multi-cluster update > Update runs, click Create, select your strategy, choose the target Kubernetes version and node image type, and launch. You can monitor progress in real time from the update run detail view, each stage and group shows its current state, and any cluster that fails will surface the reason inline.

If a run fails, check whether the target Kubernetes version is available in the cluster's region. Not all versions are available in all regions simultaneously, and Fleet Manager will fail the run if the target version isn't present in a member cluster's region, even if it's available elsewhere. Use az aks get-versions --location <region> to confirm availability per region before scheduling cross-region update runs.

4
Deploy Resources Across Member Clusters Using Resource Placement

This step only applies if you created a Fleet Manager with hub cluster mode enabled. If you chose without hub cluster, skip ahead to Step 5. Multi-cluster resource placement is one of the most powerful features Fleet Manager offers, the ability to deploy Kubernetes resources from the hub cluster out to member clusters based on cluster labels, properties, and cost signals.

First, access the Fleet Manager hub cluster's Kubernetes API. From your Fleet Manager resource in the portal, navigate to Hub cluster and follow the credential retrieval process to get kubeconfig access. You'll interact with the hub cluster using standard kubectl commands, the hub is itself a Kubernetes cluster, just one that Fleet Manager manages for you.

To place resources, you create a ClusterResourcePlacement object on the hub cluster that specifies what to place and where. Here's the general shape of a placement manifest:

apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
  name: my-app-placement
spec:
  resourceSelectors:
    - group: ""
      version: v1
      kind: Namespace
      name: my-app-namespace
  policy:
    placementType: PickN
    numberOfClusters: 3
    affinity:
      clusterAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          clusterSelectorTerms:
            - labelSelector:
                matchLabels:
                  environment: production

Apply this to the hub cluster with kubectl apply and Fleet Manager's placement controller handles propagating the selected resources to matching member clusters. To check placement status, which clusters received resources, which are pending, which failed, run:

kubectl get clusterresourceplacement my-app-placement -o yaml

The status section will show per-cluster conditions. If a cluster shows Applied: False, the reason field usually tells you exactly what failed, a missing CRD, a namespace conflict, or an API version mismatch between hub and member. Fix the underlying issue on the member cluster and the placement controller will retry automatically.

5
Set Up Auto-Upgrade Profiles to Eliminate Manual Update Toil

Once your Fleet Manager is stable and your first manual update run has completed successfully, the next logical step is removing humans from the routine upgrade loop entirely. Azure Kubernetes Fleet Manager supports auto-upgrade profiles that automatically trigger version upgrades when Microsoft publishes new Kubernetes or node image versions. This is how mature platform engineering teams run Fleet Manager, not babysitting it with manual runs.

Navigate to your Fleet Manager resource, go to Multi-cluster update > Auto-upgrade profiles, and click Create. You'll define one or more profiles, each specifying what to auto-upgrade (Kubernetes version, node image, or both), which update strategy to use, and what channel to follow.

The channel options reflect how aggressively you want to track new versions. The Rapid channel picks up new versions as soon as they're published. Stable waits for versions that have been available for longer and have broader adoption signals. For production workloads, Stable is almost always the right choice, Rapid is fine for dev/test environments where you actually want to see new versions early.

If you've configured manual or automated approval gates on your update strategy stages (this is a preview feature), auto-upgrade profiles will pause at those gates exactly the same way manual runs do. That means you can have fully automated triggering but still require a human to click "approve" before the production stage runs, a good middle ground for teams that need some control without full manual toil.

One important operational note: auto-upgrade profiles and manual update runs coexist, but if both try to run simultaneously on the same member cluster, you'll get a conflict. Build a maintenance window policy into your update strategy so that auto-upgrade runs are scheduled during off-peak hours, and document this clearly so engineers don't kick off manual runs without checking whether an auto-upgrade is already in progress. You can see in-progress runs under Multi-cluster update > Update runs before starting anything manually.

Advanced Troubleshooting

When the standard fixes don't resolve your Azure Kubernetes Fleet Manager issue, it's time to dig into the deeper signal. Here's where experienced platform engineers look when something is genuinely broken.

Azure Activity Log and Resource Diagnostics

The Azure portal's Activity log for your Fleet Manager resource is your first stop for any failed operation. Navigate to your Fleet Manager resource and click Activity log in the left sidebar. Filter by Failed operations and expand the relevant entry, the JSON detail view in the "JSON" tab of the operation usually contains the specific authorization error, quota limit, or API error code that the portal UI hides behind a generic error message. Look for errorCode and message fields in the response body.

Fleet Agent Status on Member Clusters

When a member cluster joins Fleet Manager, an agent is installed on it. If that agent is unhealthy, the cluster will show as joined but updates and placements will silently fail. Check the agent status by running the following against each member cluster's API server:

kubectl get pods -n fleet-system
kubectl describe pod -n fleet-system -l app=fleet-member-agent

Look for CrashLoopBackOff, ImagePullBackOff, or OOMKilled status. CrashLoopBackOff on the fleet-member-agent almost always means a networking issue, the member cluster can't reach the Fleet Manager hub cluster endpoint. Check your network security groups, private endpoint configuration, and VNet peering if you're running a private hub cluster.

Private Hub Cluster Networking Issues

If you enabled private hub access during Fleet Manager creation, member clusters must have network-level connectivity to the hub cluster's private API server endpoint. This is the most common cause of "cluster joined successfully but nothing works" behavior. Verify that VNet peering is correctly configured between the hub cluster's VNet and each member cluster's VNet. Also confirm that your DNS configuration correctly resolves the private hub API server FQDN, private DNS zones need to be linked to each member cluster's VNet, not just the hub's VNet.

Arc-Enabled Cluster Extension Failures

For Arc-enabled Kubernetes member clusters, Fleet Manager installs a Kubernetes configuration extension on the connected cluster. If this extension fails to install, the cluster won't function as a Fleet member. Check extension status:

az k8s-extension show \
  --resource-group <rg> \
  --cluster-name <cluster-name> \
  --cluster-type connectedClusters \
  --name microsoft.fleet.member \
  --output table

A Failed provisioning state here usually means either the Arc-enabled cluster doesn't meet the validation requirements, or the identity used to install the extension doesn't have the required Microsoft.KubernetesConfiguration/extensions/write permission on the connected cluster resource.

Update Run Stuck in "Running" State

Update runs that stay in "Running" indefinitely without progressing to the next stage are usually caused by an approval gate that nobody approved, or by a member cluster that has the update stuck waiting for node drain to complete. Check the update run detail view for any stages showing "WaitingForApproval" status. If you don't see that, navigate directly to the stuck member cluster in AKS and look at the node pool upgrade status, sometimes a node is refusing to drain due to a pod disruption budget that doesn't allow any disruptions. Adjust the PDB temporarily or delete the blocking pod to unblock the drain.

When to Call Microsoft Support
Escalate to Microsoft Support if you're seeing Fleet Manager hub cluster API server returning 5xx errors consistently, if the fleet-controller-manager pod in the hub cluster is crash-looping and you can't access the hub API at all, or if an auto-upgrade profile triggered an upgrade that violates your expected strategy order and you need to roll back. For preview features like Arc-enabled member clusters, manual and automated approval gates, or Automated Deployments from Git, open a support ticket early rather than spending hours debugging something that may be a platform bug.

Prevention & Best Practices

Most Azure Kubernetes Fleet Manager headaches are preventable. Teams that run Fleet Manager smoothly in production do a few things consistently from day one that the teams who struggle don't bother with until after something breaks.

Start with a deliberate hub cluster mode decision. Get the right people in the room before you create the Fleet resource, your platform engineering lead, your application teams, and ideally someone who has mapped out the full workload placement requirements for the next 12 months. That one-way door matters. Choosing without hub cluster to save money today, then discovering six months later that your application team needs ClusterResourcePlacement for multi-region failover, means tearing down the Fleet and rebuilding it. That conversation is painful. Have it before you click Create.

Build your update group structure to match your real environment tiers. Don't just create one group and put everything in it. A proper tiered structure, dev, staging, canary, production, with stages that have pause windows between them is what separates "we confidently upgrade across 20 clusters" from "we're afraid to touch the upgrade button." Define your update strategy as code using ARM templates or Terraform so it's version-controlled and reproducible. Update strategies are reusable across update runs, which means you can invest time in building a good one and then reuse it forever.

Monitor Fleet Manager cluster observability data proactively, not just when something breaks. Fleet Manager provides a centralized location to access monitoring data across all member clusters, build dashboards that surface cluster version drift (clusters running significantly older versions than others), node image age, and agent health status. Alert on version drift before it becomes a compliance issue.

For Arc-enabled member clusters, keep the Arc agent and Fleet extension updated. Stale extensions are a common source of intermittent failures that are difficult to diagnose because they don't fail consistently, they degrade gradually.

Quick Wins
  • Document your hub cluster mode choice and the reasoning behind it in your team's runbook, this context is lost the moment the original engineer leaves
  • Assign update groups to every member cluster at join time, never leave them blank
  • Use a user-assigned managed identity (not system-assigned) for Fleet Manager so you can audit and rotate it independently
  • Test your update strategy with a low-stakes dev cluster run before the first time you run it against production, update strategies look correct on paper but often have stage sequencing surprises the first time through

Frequently Asked Questions

Can I change the hub cluster mode after creating my Fleet Manager?

No, this is a permanent choice you make at creation time. If you created a Fleet Manager without hub cluster and later need resource placement or managed Fleet namespaces, you have to delete the existing Fleet resource and create a new one with hub cluster mode enabled. Before you do that, document all your member cluster assignments and update group names so you can re-add them quickly. Your member clusters themselves don't need to be touched, they just need to be re-joined to the new Fleet resource. Plan this during a maintenance window since all Fleet-managed operations will be unavailable during the transition.

Why does my member cluster show as joined but my update run skips it?

The most common reason is that the cluster wasn't assigned to an update group, or was assigned to a group that isn't included in the update strategy you're using. Check the member cluster's update group in the portal under Fleet Manager > Member clusters and verify that group name exactly matches the group referenced in your update strategy. Group names are case-sensitive. Also confirm the cluster's Kubernetes version is below the target upgrade version, Fleet Manager won't attempt to "upgrade" a cluster that's already at or above the target version, which looks like skipping but is actually correct behavior.

How do I add clusters from a different Azure subscription to my Fleet Manager?

Fleet Manager explicitly supports joining AKS clusters across regions and subscriptions as member clusters. The key requirement is that your identity has the necessary permissions in both the subscription where Fleet Manager lives and the subscription where the member cluster lives. When you click + Add in the member clusters blade, change the subscription filter in the cluster picker to the target subscription. As long as permissions are in place across both subscriptions, the join will work. For service principals used in automation scenarios, make sure the principal has the required managedClusters read/write permissions in the member cluster's subscription, not just in the Fleet's home subscription.

What's the difference between cluster-scoped and namespace-scoped resource placement?

Cluster-scoped resource placement deploys cluster-level Kubernetes resources, like ClusterRoles, ClusterRoleBindings, and custom resource definitions, from the hub cluster out to member clusters. Namespace-scoped resource placement handles namespace-level resources like Deployments, Services, and ConfigMaps within specific namespaces. In practice, you'll use both: cluster-scoped placement to push infrastructure-level configs consistently across all clusters, and namespace-scoped placement to distribute application workloads to specific clusters based on placement policies. Both require hub cluster mode to be enabled on your Fleet Manager.

My auto-upgrade profile triggered but the upgrade failed on one cluster, will it continue to others?

This depends on your update strategy configuration. If the failed cluster is in a stage that's configured to stop on failure (which is the default behavior), the entire update run will pause at that stage and wait for manual intervention. You'll see the run in a "Failed" or "Stopped" state in the Update runs view, with the failed cluster and its error reason visible inline. Fix the underlying issue on the failing cluster, most commonly a PodDisruptionBudget blocking node drain, or a node that's unhealthy and won't complete the cordon/drain cycle, then manually resume the update run from the portal. Subsequent stages, including production, will not proceed until you manually resume.

Can Azure Kubernetes Fleet Manager manage on-premises clusters alongside AKS?

Yes, with a caveat, Arc-enabled Kubernetes clusters running on-premises or in other clouds can be joined as Fleet Manager member clusters, but this feature is currently in preview. That means it's not covered by Microsoft's standard SLA and behavior can change with platform updates. To join an on-premises cluster, it must first be connected to Azure through Azure Arc, and it must pass the Arc-enabled Kubernetes validation requirements. Once connected and validated, you add it to Fleet Manager the same way you'd add any AKS cluster, through the + Add flow in the member clusters blade, filtering by connected cluster type instead of managed cluster type.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.