Azure Operator Service Manager: Fix Setup & Config Errors

Microsoft Fix Advanced 18 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

I've seen this exact situation play out across telecom engineering teams more times than I can count. You've been handed the task of getting Azure Operator Service Manager up and running , maybe to orchestrate a 5G core deployment, maybe to manage a complex multi-vendor network service across hybrid sites. You open the Azure portal, start clicking through the setup, and immediately hit a wall. Either the resource types don't appear, the CLI extension throws an unrecognized command error, or your deployment silently fails with no useful message explaining what went wrong.

Here's the core of it: Azure Operator Service Manager setup issues almost always trace back to one of three root causes. First, the Microsoft.HybridNetwork resource provider isn't registered on your subscription , and without that, nothing works. Full stop. Second, teams skip the role-based persona model entirely, assigning one person or one service principal to do everything, which breaks the expected workflow. Azure Operator Service Manager is deliberately architected around three distinct personas, publisher, designer, and operator, and collapsing those roles together causes permission errors that look completely unrelated to what they actually are. Third, the AOSM CLI extension is either not installed, installed at an outdated version, or misconfigured for your specific network function type.

What makes this especially frustrating is that the Azure portal error messages for Azure Operator Service Manager deployment failures are notoriously vague. You'll see something like DeploymentFailed or ResourceNotFound without any pointer to the actual broken step. The Azure Activity Log does help, but only if you know where to look and what to filter for, which I'll cover in the advanced section.

The other thing that trips up even experienced Azure engineers is the onboarding sequence. Azure Operator Service Manager has a strict order of operations. You can't deploy a site network service before you have a network service design. You can't create a network service design without a published network function description. Jumping steps, or assuming Azure will fill in gaps, results in cascading failures where the root cause is three steps back from where the error actually surfaces.

This guide walks you through the full fix sequence, from the first registration step to a working site network service deployment, including the AOSM CLI extension workflow for both containerized network functions (CNFs) and virtualized network functions (VNFs). Whether you're dealing with a fresh setup that won't start or a mid-deployment error that's blocking your rollout, you'll find the specific answer here.

Browse all Microsoft fix guides →

The Quick Fix, Try This First

If Azure Operator Service Manager resources simply aren't visible in your subscription, or you're getting ResourceProviderNotRegistered errors the moment you try to create any AOSM resource, this is almost certainly the problem: the Microsoft.HybridNetwork service provider is not registered on your Azure subscription.

This is a one-time step that many deployment guides bury in a footnote. It's the first thing I check whenever someone tells me AOSM "just isn't working." Here's exactly what to do:

Open the Azure portal and navigate to your subscription. In the left sidebar, scroll down to Settings and click Resource providers. In the search box, type HybridNetwork. You'll see Microsoft.HybridNetwork in the results. If the Status column shows NotRegistered, click the row to select it, then click the Register button at the top of the pane. Registration takes between 30 seconds and two minutes.

Alternatively, if you prefer the Azure CLI, and for enterprise teams automating this across subscriptions, you should, run this command:

az provider register --namespace Microsoft.HybridNetwork

To check the current registration state before or after running this:

az provider show --namespace Microsoft.HybridNetwork --query "registrationState"

You're looking for "Registered" in the output. Once that comes back registered, go back and retry whatever operation was failing. In the majority of cases, I'd estimate about 60% of first-time AOSM setup issues, this single step unblocks everything.

If you're working in a large enterprise with multiple subscriptions, keep in mind this registration is per subscription. If your publisher, designer, and operator personas are operating across different subscriptions (which is a valid and recommended pattern for production environments), you'll need to register Microsoft.HybridNetwork in each one.

Pro Tip
Don't wait for the portal to confirm registration before testing. Use the CLI command above to poll registration state, the portal UI sometimes shows a stale "Registering" status for several minutes after registration has actually completed. If the CLI shows "Registered," you're good to proceed.
1
Configure the Publisher Role and Create Your Network Function Description

After confirming Microsoft.HybridNetwork is registered, your first real Azure Operator Service Manager task is setting up the publisher role correctly. The publisher is the persona responsible for onboarding the network function itself, creating what the official docs call a Network Function Description (NFD). Get this wrong and everything downstream fails.

First, create a custom role scoped to your publisher tenant. Navigate to Azure Active Directory > Roles and administrators > New custom role. The AOSM publisher role needs permissions over Microsoft.HybridNetwork/publishers and the artifact store resources. Don't try to use the built-in Contributor role for this in production, it over-privileges the publisher and creates audit issues. The official documentation specifically provides a Create a custom role how-to guide for this exact configuration.

Once your custom role exists, assign it to the service principal or managed identity that will act as your publisher. Go to Subscriptions > Your Subscription > Access control (IAM) > Add > Add role assignment, select your custom role, and assign it to the publisher identity.

Now create a user-assigned managed identity specifically for AOSM operations. In the Azure portal, search for Managed Identities, click Create, fill in the resource group and name, and note the Client ID, you'll need it repeatedly during onboarding. Assign this identity the publisher custom role you just created.

With the role configured, you can begin the actual network function description creation. For CNF onboarding, this means packaging your Helm charts; for VNF onboarding, this means your VHD images. The AOSM CLI extension handles both, I'll cover the CLI path in Step 3. If the publisher creation step throws a 403 Forbidden error, the custom role assignment hasn't propagated yet. Wait five minutes and retry, Azure RBAC propagation is eventually consistent and occasionally slow.

If your step completed successfully, you'll see a new publisher resource appear under your resource group in the Azure portal with a status of Succeeded.

2
Install and Configure the AOSM CLI Extension

The AOSM CLI extension is genuinely the fastest path through onboarding, both for CNFs and VNFs. But a lot of teams run into problems because they're either using an outdated version or they haven't set it up correctly for their network function type. Here's the clean installation sequence.

First, confirm you have the Azure CLI installed and are logged in to the correct subscription:

az --version
az account show

Make sure the subscription shown matches the one where you registered Microsoft.HybridNetwork. If not, switch subscriptions:

az account set --subscription "YOUR-SUBSCRIPTION-ID-OR-NAME"

Now install the AOSM CLI extension:

az extension add --name aosm

If you've had a previous version installed and you're seeing unexpected command behavior, remove and reinstall it:

az extension remove --name aosm
az extension add --name aosm --upgrade

Confirm the installed version with:

az extension show --name aosm --query version

The extension exposes commands under az aosm. Run az aosm --help to confirm the extension loaded correctly. If you get az: 'aosm' is not in the 'az' command group, the extension install failed silently, this sometimes happens due to network proxy issues in enterprise environments. Check your proxy settings and try the install again with verbose logging:

az extension add --name aosm --debug 2>&1 | tee aosm_install.log

Once confirmed working, the extension lets you run the full onboarding workflow for both CNF and VNF types, covering parameter exposure, artifact store interaction, and configuration group schema generation. A successful extension install means you're ready to onboard your first network function type.

3
Onboard Your Network Function, CNF or VNF Path

This is where the actual AOSM CLI extension onboarding work happens, and the path splits based on whether you're working with a containerized network function or a virtualized network function. I'll cover both.

For CNF onboarding (Helm-based, targeting Azure Kubernetes Service on Azure Operator Nexus):

# Generate the input file for your CNF
az aosm nfd generate-config --definition-type cnf

# Build the NFD using your completed input file
az aosm nfd build --definition-type cnf --config-file input.jsonc

# Publish the NFD to the artifact store
az aosm nfd publish --definition-type cnf --config-file input.jsonc

The generate-config step creates a template input.jsonc file you need to populate with your Helm chart paths, image registry details, and publisher/resource group info. The most common error here is incorrect Helm chart paths, double-check these are absolute paths or properly relative to where you're running the CLI command.

For VNF onboarding (VHD image-based, targeting Azure Operator Nexus VMs):

# Generate the input file for your VNF
az aosm nfd generate-config --definition-type vnf

# Build and publish the NFD
az aosm nfd build --definition-type vnf --config-file input.jsonc
az aosm nfd publish --definition-type vnf --config-file input.jsonc

VNF onboarding requires your VHD image to already be accessible, either in an Azure Storage Account or an OCI-compatible artifact store. If the publish step throws a BlobNotFound error, the artifact store can't reach your VHD. Verify the storage account URL and that the managed identity you created in Step 1 has Storage Blob Data Reader permissions on that storage account.

For ARM resource onboarding (when you need to include Azure Resource Manager templates as part of your network function), there's a third path: az aosm nfd build --definition-type arm-template. This covers scenarios where your NF deployment includes Azure infrastructure resources alongside the NF itself.

Once publish completes without errors, you'll see the NFD appear in your publisher resource in the Azure portal under Network Function Definitions. That's your confirmation to move to the designer role.

4
Create the Network Service Design and Configuration Group Schema

This is the designer role's domain, and it's where Azure Operator Service Manager setup most commonly stalls for teams that are new to the platform. The designer creates two things: the Network Service Design (NSD) and the Configuration Group Schema (CGS). Think of the NSD as the blueprint of what gets deployed, and the CGS as the form that the operator fills in to customize that deployment for a specific site.

Using the AOSM CLI extension, the NSD workflow mirrors the NFD workflow:

# Generate the NSD input configuration
az aosm nsd generate-config

# Build the NSD package
az aosm nsd build --config-file nsd-input.jsonc

# Publish the NSD
az aosm nsd publish --config-file nsd-input.jsonc

The NSD input file references the NFD you published in Step 3, you'll need the exact publisher name, NFD group name, and NFD version. Mismatches here cause a ReferencedResourceNotFound error. Copy these values directly from the Azure portal NFD resource to avoid typos.

The CGS defines which configuration parameters the operator can or must provide at deployment time. The official AOSM documentation describes two categories of configuration: static configuration (values that never change across deployments) and dynamic configuration (site-specific or runtime-specific values the operator provides). Getting this split right is more important than it might seem, if you expose too much as static, operators can't customize deployments. If you expose too much as dynamic, operators face an overwhelming configuration form at deployment time.

One common CGS authoring mistake: defining a required property in the schema that has no corresponding mapping in the NSD resource element template. This passes validation at build time but fails at deployment with a schema compliance error. Test the schema locally by validating a sample CGV file against it before publishing:

az aosm nsd build --config-file nsd-input.jsonc --skip-deploy

A successful NSD publish means the designer role work is done. You'll see the NSD and its associated CGS appear in the portal under the publisher resource, ready for the operator to use.

5
Create a Site and Deploy the Site Network Service

The operator role takes over here. With a published NSD in place, the operator creates a site (representing a physical or logical deployment location, an Azure region, an Azure Operator Nexus cluster, or an Arc-connected edge location) and then deploys a Site Network Service (SNS) against that site.

In the Azure portal, search for Azure Operator Service Manager and navigate to Sites > Create. Fill in the site name, subscription, resource group, and the managed network fabric or location this site represents. The site resource itself is lightweight, its primary purpose is to anchor the CGV values that are specific to this deployment location.

Next, create the site network service. Navigate to Site Network Services > Create and select the NSD you published in Step 4. The portal will present a configuration form driven entirely by the CGS you defined, every required dynamic parameter will appear here. Fill in the Configuration Group Values (CGVs) for your site. These are validated against the CGS in real time; if you see a red validation indicator, the value doesn't match the schema type or constraints defined in your CGS.

Common SNS deployment failures and what they mean:

  • ConfigurationGroupValueValidationFailed, a CGV value doesn't match the schema. Open the CGS in the portal and check the exact type and constraint for the failing property.
  • ArtifactStoreNotReachable, the managed identity doesn't have access to the artifact store. Re-check role assignments from Step 1.
  • HelmChartInstallFailed (CNF only), the Helm chart itself failed to install on the NAKS cluster. Pull the detailed error from the cluster's event log, not the AOSM portal.

If you need to stop a deployment that's in progress, for example, because you caught a configuration mistake mid-flight, use the Interrupt operation on the SNS resource. The official documentation covers this under "Interrupt a service deployment operation." Don't delete the SNS resource while it's deploying; that creates orphaned infrastructure that requires manual cleanup.

A successful SNS deployment shows a provisioning state of Succeeded in the portal. You now have a working end-to-end Azure Operator Service Manager deployment.

Advanced Troubleshooting

Once you're past the basic setup, Azure Operator Service Manager issues shift toward deployment failures, upgrade errors, and enterprise networking problems. Here's what I've seen come up repeatedly in production environments.

Reading the Azure Activity Log for AOSM Failures

The Azure portal's top-level error messages for AOSM are almost useless on their own. The real diagnostic information lives in the Activity Log. Go to your resource group, click Activity log in the left menu, and filter by Operation: Create or update Site Network Service (or whichever operation failed). Click the failed operation entry, then click JSON in the detail pane. The statusMessage field at the bottom of the JSON is where the actual error code and sub-error live.

Safe Upgrade Failures

Azure Operator Service Manager's safe upgrade system is one of its most valuable features, but it's also a common source of confusion. The platform supports two failure recovery modes: pause-on-failure and rollback-on-failure. If you're seeing an upgrade stuck in a Paused state, it means the system detected a failure and is waiting for operator intervention, this is intentional behavior, not a bug.

To inspect why an upgrade paused, check the upgrade operation's component-level visibility in the portal under the SNS resource's Upgrade History tab. Each component shows its individual status. For CNFs, the most common upgrade pause cause is a failed Helm post-upgrade test. If you have test jobs configured (covered under "Run Tests After Install or Upgrade" in the official docs), a failing test halts the upgrade at that component. Fix the underlying issue, then resume the upgrade operation from the portal or via:

az aosm site-network-service update-state \
  --resource-group YOUR-RG \
  --site-network-service-name YOUR-SNS \
  --provisioning-state Succeeded

Private Link and Edge Registry Issues

For production deployments where you've enabled Private Link between your customer premises and Azure, connectivity issues between the edge cluster and the AOSM artifact store are a recurring problem. The cluster registry uses the edge registry feature to cache artifacts locally, if the edge registry isn't syncing, check that the Network Function Operator extension is installed and healthy on the target cluster. Use:

az k8s-extension show \
  --cluster-name YOUR-CLUSTER \
  --resource-group YOUR-RG \
  --cluster-type connectedClusters \
  --name networkfunction-operator

If the extension shows a Failed provisioning state, delete and recreate it using the "Manage Network Function Operator Extension" how-to in the official documentation.

Multi-Tenant Publisher Scenarios

When your publisher tenant is different from your operator tenant, which is normal in real-world telecom deployments where the NF vendor publishes and the operator deploys, cross-tenant permission errors are common. The publisher resource must explicitly allow the operator tenant's managed identity to read the NFD and artifact store. This is configured on the publisher resource's Access control (IAM) tab. Missing this step causes AuthorizationFailed errors on the operator side that look like subscription registration issues but aren't.

When to Call Microsoft Support

Escalate to Microsoft Support if you're seeing InternalServerError on SNS operations after confirming all role assignments and resource registrations are correct, if the AOSM control plane is returning 5xx errors consistently for more than 15 minutes, or if a safe upgrade rollback has failed and left your service in an unknown provisioning state. For SLA-governed outages, open a Severity A ticket, Azure Operator Service Manager is a generally available service covered by Microsoft's published SLA for online services.

Prevention & Best Practices

Most Azure Operator Service Manager problems I've seen in the field are avoidable. They come from teams rushing through the onboarding sequence, skipping role separation, or treating AOSM like a standard Azure PaaS service when it has its own distinct operational model. Here's what separates teams that deploy AOSM smoothly from teams that spend weeks fighting it.

Respect the persona model from day one. It's tempting, especially in smaller teams, to assign one person or one service principal to do everything. Don't. The publisher, designer, and operator roles have genuinely different permission scopes, and conflating them masks errors that you'll pay for later during upgrades. Even if one human is playing all three roles, create three separate managed identities with the appropriate custom roles. The official documentation covers creating and assigning custom roles specifically for this purpose.

Version everything deliberately. Both NFDs and NSDs are versioned artifacts. The Azure Operator Service Manager version system supports multiple generations simultaneously, use this for upgrade safety, not just rollback insurance. When you publish a new NFD version, keep the previous version active until you've confirmed the upgrade is stable across at least one site. The artifact versioning system is designed to make automated rollback possible; it can only do that job if you're maintaining the version history.

Validate CGS schemas with sample CGVs before publishing. I mentioned this in Step 4, but it bears repeating as a standing practice. Every time you update a configuration group schema, run a validation pass against a representative set of configuration group values before publishing. A schema that passes build-time validation but fails at deployment-time with real operator values is one of the most time-consuming issues to debug in production.

Integrate with Azure DevOps for CI/CD from the start. Azure Operator Service Manager is explicitly designed to work with Azure DevOps pipelines for continuous deployment. Setting up a pipeline that automatically publishes new NFD and NSD versions when you tag a release in your repository takes a few hours upfront but saves enormous time at scale. The combination of AOSM's safe upgrade practices and Azure DevOps' pipeline gates gives you a deployment system that can scale from one site to thousands without manual intervention for each.

Quick Wins
  • Register Microsoft.HybridNetwork in every subscription you'll use, publisher, designer, and operator tenants, before starting any onboarding work.
  • Pin the AOSM CLI extension version in your CI/CD pipelines using az extension add --name aosm --version X.Y.Z to prevent breaking changes from automatic updates during deployments.
  • Enable Azure Private Link for all production AOSM deployments from the start, retrofitting Private Link onto an existing deployment is significantly more complex than building it in from day one.
  • Set up Azure Monitor alerts on SNS provisioning state transitions so your team is notified immediately when an upgrade pauses or fails, rather than discovering it hours later.

Frequently Asked Questions

What exactly is Azure Operator Service Manager and do I actually need it?

Azure Operator Service Manager is a cloud orchestration service built specifically for telecom operators who need to manage the full lifecycle of network functions, think 5G cores, RAN components, packet gateways, running on Azure Operator Nexus. It's not a general-purpose deployment tool; it's purpose-built for the telecom industry's requirements around multi-vendor service composition, safe upgrades, and hybrid operations across Azure regions and Arc-connected edge sites. If you're running network functions at scale across multiple sites and vendors, and you need repeatable, auditable deployments that can survive edge disconnection events, you need it. If you're deploying a single network function in a single region for a proof of concept, the AOSM model might be more overhead than your scenario warrants right now.

Why can't I find Azure Operator Service Manager resources in my Azure subscription?

Almost certainly because the Microsoft.HybridNetwork resource provider isn't registered on your subscription. This is a manual step that doesn't happen automatically when you create an Azure account, you have to explicitly register it. Go to your subscription in the Azure portal, navigate to Resource providers, search for Microsoft.HybridNetwork, and click Register. Alternatively run az provider register --namespace Microsoft.HybridNetwork from the Azure CLI. Registration typically completes within two minutes. If you've already done this and resources still don't appear, verify you're looking in the correct subscription, AOSM resources are subscription-scoped, and the registration is also per-subscription.

What's the difference between a CNF and VNF in Azure Operator Service Manager, and how do I know which one I have?

A Containerized Network Function (CNF) is packaged as Docker images and deployed via Helm charts onto a Kubernetes cluster, specifically the Nexus Azure Kubernetes Service (NAKS) cluster on Azure Operator Nexus. A Virtualized Network Function (VNF) is packaged as a Virtual Hard Disk (VHD) image and deployed as a virtual machine on Azure Operator Nexus. Your NF vendor will tell you which type their product is, it's determined by how they built and packaged the software, not a configuration choice you make. CNF onboarding uses the az aosm nfd build --definition-type cnf path; VNF onboarding uses --definition-type vnf. The AOSM CLI handles the rest of the infrastructure differences between them automatically.

My site network service deployment is stuck at "Updating", how do I safely stop it?

Use the Interrupt operation on the SNS resource, don't delete it. In the Azure portal, navigate to your Site Network Services resource, and look for the Interrupt button in the top action bar (it only appears when the SNS is in an active deployment state). This signals AOSM to stop the deployment at its next safe checkpoint rather than mid-component, which reduces the risk of leaving infrastructure in a broken state. After interruption, the SNS returns to a stable provisioning state and you can investigate the cause, fix it, and redeploy. Deleting an actively deploying SNS is the one action I'd strongly warn you against, it can leave orphaned infrastructure resources on your Nexus cluster that require manual cleanup.

Can I use Azure Operator Service Manager without Azure Operator Nexus?

Azure Operator Service Manager is designed primarily for managing workloads on Azure Operator Nexus, which provides the bare-metal and virtualization infrastructure layer for telecom workloads. That said, AOSM can also manage network functions on Arc-connected infrastructure outside of Nexus, the hybrid operations capability is explicitly designed for this. What you can't do is use AOSM as a general-purpose application deployment tool for standard Azure resources unrelated to network functions. The service's design, personas, and orchestration model all assume you're managing network function software that follows telecom industry standards from organizations like 3GPP, ETSI, and ONAP.

How do I set up safe upgrade practices in Azure Operator Service Manager so a bad upgrade doesn't take down my service?

Azure Operator Service Manager's safe upgrade system has two key failure recovery modes you configure at the NSD level: pause-on-failure and rollback-on-failure. Pause-on-failure stops the upgrade when it detects a problem and holds the partially-upgraded service in a stable state, waiting for you to decide whether to continue or abort, this is the safer option for production because it gives you time to diagnose. Rollback-on-failure automatically reverts to the previous version when a failure is detected, which is faster to recover from but doesn't give you time to inspect the failure state. You can also configure post-install and post-upgrade tests that run automatically and feed into the failure detection logic. Both options are configured in the NSD's upgrade behavior settings; the "Control Upgrade Behavior on Failure" concept guide in the official AOSM documentation covers the exact schema properties you need to set.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.