Azure Elastic SAN: Fix Setup, iSCSI & Config Errors
Why This Is Happening
If you've been fighting with Azure Elastic SAN for the last few hours , a volume that won't mount, an iSCSI session that drops randomly, or a deployment that errors out before it even finishes provisioning , I want you to know you're not alone. I've seen this exact pattern across dozens of Azure migrations, especially when teams move from traditional on-premises SAN hardware and assume Azure's version behaves the same way out of the box. It doesn't, and Microsoft's error messages rarely tell you why.
Azure Elastic SAN is a fully managed, cloud-native storage area network service. The core idea is elegant: instead of spinning up separate managed disks or storage accounts for each workload, you provision a single SAN and carve it into volumes that your VMs, Azure Kubernetes Service clusters, or Azure VMware Solution instances can all consume. The performance pool is shared across all those volumes, which is exactly what makes it cost-effective, and also exactly what creates most of the problems people run into.
Here's why things break. Azure Elastic SAN has a strict three-layer resource hierarchy: the SAN itself sits at the top, then volume groups in the middle, then individual volumes at the bottom. Miss a configuration step at any layer and the whole thing stops working. Volume group settings, especially virtual network rules, cascade down to every volume in that group. If you set a network rule that blocks your VM's subnet, none of your volumes will connect, and you'll get a silent iSCSI timeout instead of a clear error.
Naming requirements are another source of pain. The SAN name, volume group name, and volume name all have different character limits and rules. Get one character wrong, an uppercase letter, a double hyphen, or a name that's too short, and the Azure portal or CLI will reject the deployment with a validation error that doesn't always explain which field failed.
Performance issues are more subtle. Because the SAN's IOPS and throughput are distributed across all volumes, a single volume that's hammering the SAN can throttle everything else. And if a volume is too small, it won't be able to reach its theoretical ceiling of 80,000 IOPS even if the SAN has plenty of headroom. I've seen production databases get throttled because someone provisioned tiny volumes and couldn't figure out why the numbers didn't match the SAN specs.
The iSCSI connection itself is another common failure point. Azure Elastic SAN uses iSCSI to connect volumes to clients, which means your VMs need the iSCSI initiator configured correctly, your network security groups need to allow the right ports, and private endpoints need to be set up if you're not going through a public endpoint. Skip any of that and the volume simply won't appear on the OS side.
I know this is frustrating, especially when your team is mid-migration and a broken storage layer is blocking everything downstream. This guide walks through every common failure scenario, in order of how likely they are to be your problem. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before going deep into diagnostics, run through this checklist. In my experience, about 60% of Azure Elastic SAN connection and deployment problems come down to one of these four things being wrong.
Step 1: Check your naming. Open the Azure portal, go to your Elastic SAN resource, and hover over the SAN name, volume group name, and volume name. The SAN name must be 3–24 characters, lowercase letters, numbers, hyphens, and underscores only, and it must start and end with a letter or number. Volume group names follow the same rules but can be up to 63 characters. Volume names are also 3–63 characters with the same character set. If any name has an uppercase letter, consecutive hyphens, or starts/ends with a special character, the deployment will fail or the resource will sit in a broken state.
Step 2: Verify network rules on the volume group. In the portal, navigate to your volume group and open the Networking blade. Check which virtual networks are allowed. If your VM's subnet isn't listed, the iSCSI connection will silently fail. Add the subnet, hit Save, and wait about 60 seconds for propagation.
Step 3: Confirm iSCSI initiator is running on the client VM. On Windows, open Server Manager → Tools → iSCSI Initiator and make sure the service is running. On Linux, run:
sudo systemctl status iscsid
If it's not running, start it:
sudo systemctl enable --now iscsid
Step 4: Check redundancy type selection. Azure Elastic SAN supports only LRS (Locally Redundant Storage) and ZRS (Zone-Redundant Storage) redundancy types. If you're deploying via Bicep or ARM and accidentally specified GRS or GZRS, the deployment will fail. Fix the redundancy parameter and redeploy.
If all four of those check out and you're still stuck, keep reading, the step-by-step section covers every scenario in depth.
The most common reason an Azure Elastic SAN deployment fails silently, or gets stuck in a "Failed" provisioning state, is a naming violation. Microsoft's validation error messages don't always point directly at the offending field, so you end up staring at a generic "deployment failed" message in the Activity Log.
Here's what the rules actually are, layer by layer:
SAN-level naming: The name must be 3–24 characters. Allowed characters: lowercase letters (a–z), numbers (0–9), hyphens (-), and underscores (_). The name must begin and end with a letter or number. A hyphen or underscore must always be surrounded by alphanumeric characters, so my--san or my-san- are both invalid.
Volume group naming: Same character rules, but the length range is 3–63 characters. Note that underscores are allowed in SAN names but the volume group name follows essentially the same pattern.
Volume naming: Same 3–63 character range with the same allowed characters. The volume name matters beyond just provisioning, it becomes part of the iSCSI Qualified Name (IQN) that your client OS uses to identify the target. If the name changes after provisioning, your iSCSI sessions will break.
To audit your current names via Azure CLI, run:
az elastic-san show \
--resource-group <your-rg> \
--elastic-san-name <your-san-name> \
--query "{name:name, provisioningState:provisioningState}"
If provisioningState shows Failed, delete the resource and redeploy with a corrected name, you can't rename an Elastic SAN after creation. After a successful deployment, you should see provisioningState: Succeeded and the resource becomes visible in the portal under Storage → Elastic SANs.
Volume groups are the management layer that makes Azure Elastic SAN genuinely useful at scale. Any setting you apply to a volume group, a virtual network rule, a private endpoint, an encryption policy, is automatically inherited by every volume in that group. This is the right way to think about them: not as folders, but as policy containers.
To create a volume group in the portal: navigate to your Elastic SAN resource → select Volume groups in the left menu → click + Add volume group. Give it a compliant name, select your encryption settings, then move to the Networking tab.
On the Networking tab, you have two choices: allow access from selected virtual networks (recommended for most workloads) or allow public access. For production workloads, always select Selected virtual networks. Then click + Add existing virtual network and select the VNet and subnet where your compute resources live.
If you're using Azure CLI instead:
az elastic-san volume-group create \
--resource-group <your-rg> \
--elastic-san-name <your-san-name> \
--volume-group-name <your-vg-name> \
--network-acls virtualNetworkRules="[{virtualNetworkResourceId:'/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>/subnets/<subnet>',action:'Allow'}]"
After saving, wait about 60 seconds before attempting to connect a volume. The network rule propagation isn't instant, and if you try to initiate an iSCSI session immediately, it will time out even if the config is correct. You'll know the volume group is ready when its status in the portal shows Provisioning state: Succeeded.
One thing that trips people up: if you add a volume to a group after the network rule is already set, the new volume inherits the rule automatically. You don't need to update anything at the volume level. This is a feature, not a gap, it's what allows you to manage hundreds of volumes from a single control plane.
Connecting an Azure Elastic SAN volume to a Windows VM is a multi-step process, and each step has to be done in the right order. I've seen teams get the Azure side right and then get stuck entirely on the Windows iSCSI initiator configuration.
First, grab the iSCSI target IQN and storage target portal IP from the Azure portal. Navigate to your volume, click on it, and look at the Connect tab. You'll see the target portal address (an IP and port 3260) and the IQN string. Copy both, you'll need them on the VM.
On the Windows VM, open Server Manager → Tools → iSCSI Initiator. If this is the first time you've opened it, Windows will ask if you want to start the iSCSI service, click Yes. Then:
- Go to the Discovery tab → click Discover Portal
- Enter the target portal IP address. Port is 3260.
- Click OK. The portal will appear in the list.
- Switch to the Targets tab. You should see the IQN of your volume appear as a Discovered Target.
- Select it → click Connect → check Add this connection to the list of Favorite Targets so it reconnects after reboot.
After connecting, open Disk Management (run diskmgmt.msc). The volume should appear as a new uninitialized disk. Initialize it, create a new simple volume, format it, and assign a drive letter.
If the target doesn't appear in the Targets tab after discovery, go back and check your NSG rules. Port 3260 (TCP) must be allowed outbound from the VM's NIC, and your volume group network rule must include the VM's subnet. If both are correct and it still doesn't show, confirm the volume is in Provisioning state: Succeeded, a volume that's still provisioning won't accept iSCSI connections.
Linux iSCSI configuration for Azure Elastic SAN follows a similar pattern to Windows but uses different tools. The open-iscsi package handles discovery and session management, and you'll interact with it through iscsiadm.
First, install and enable open-iscsi if it's not already present:
sudo apt-get update && sudo apt-get install -y open-iscsi # Ubuntu/Debian
sudo yum install -y iscsi-initiator-utils # RHEL/CentOS
sudo systemctl enable --now iscsid
Discover the iSCSI target using the portal IP from your volume's Connect tab:
sudo iscsiadm --mode discovery \
--type sendtargets \
--portal <target-portal-ip>:3260
This command should return the IQN of your volume. If it returns nothing or times out, your network rules or NSG are blocking port 3260. Once discovery succeeds, log in to the target:
sudo iscsiadm --mode node \
--targetname <volume-iqn> \
--portal <target-portal-ip>:3260 \
--login
To make the session persistent across reboots:
sudo iscsiadm --mode node \
--targetname <volume-iqn> \
--op update \
--name node.startup \
--value automatic
After logging in, the volume appears as a new block device, typically /dev/sdb or similar. Confirm with lsblk. Then partition and format it as you would any block device:
sudo mkfs.ext4 /dev/sdb
sudo mkdir /mnt/elasticsan
sudo mount /dev/sdb /mnt/elasticsan
Add an entry to /etc/fstab to mount it automatically on reboot. Use the volume's UUID (found with blkid /dev/sdb) rather than the device name, since device assignments can shift.
Performance problems with Azure Elastic SAN are almost always one of three things: the SAN's aggregate capacity is too low, individual volumes are undersized, or the VM SKU is creating a ceiling before the SAN even becomes the bottleneck.
Here's how the math works. The SAN's total IOPS and throughput scale with the base capacity and additional capacity units you provision. All volumes share this pool. Each individual volume can scale up to 80,000 IOPS, but only if the SAN has enough headroom and the volume itself is large enough. A 10 GiB volume hitting 80,000 IOPS is physically impossible; you need to size volumes appropriately for the workload they carry.
To check your SAN's current performance allocation in the portal: open your Elastic SAN → Overview blade → look at the Performance section. You'll see total provisioned IOPS and throughput. Compare that against the sum of what your volumes are actually consuming. In Azure Monitor, you can pull per-volume IOPS metrics:
az monitor metrics list \
--resource <volume-resource-id> \
--metric "VolumeUsedIOPS" \
--interval PT1M \
--output table
If a single volume is consuming most of the SAN's IOPS budget, your other volumes will see degraded performance even though the SAN looks healthy from the outside. The fix is to either increase the SAN's total provisioned capacity (which increases the performance budget) or identify which workload is the hot volume and move it to a dedicated volume group.
Also check the VM SKU. Azure VMs have their own IOPS and throughput limits at the NIC level, and an Azure Elastic SAN volume connected via iSCSI can actually bypass the standard VM disk IOPS limits, but the VM's network throughput cap still applies. If your VM is hitting its network ceiling, no amount of SAN tuning will help. Check the VM's network metrics in Azure Monitor alongside the volume metrics. Upgrading to a higher-tier VM SKU with a larger network bandwidth allocation is often the fastest fix for this class of problem.
Advanced Troubleshooting
Private Endpoint Configuration for Azure Elastic SAN
If your organization requires that storage traffic never traverse the public internet, you'll want to use private endpoints with Azure Elastic SAN rather than virtual network rules. Private endpoints work at the volume group level, you create one private endpoint per volume group, and all volumes in that group become accessible through it.
To create a private endpoint in the portal: navigate to your volume group → Networking blade → Private endpoint connections tab → click + Private endpoint. Walk through the wizard: select your VNet and subnet, choose whether to integrate with a private DNS zone (recommended, this lets name resolution work automatically), and finish the wizard.
After creation, confirm the private endpoint shows Connection state: Approved. If it shows Pending, go to the private endpoint resource and manually approve it. Once approved, your VMs in the connected VNet can reach the volume through the private IP rather than a public endpoint.
Diagnosing Connection Drops with Event Viewer
Intermittent iSCSI disconnections are maddening. On Windows VMs, open Event Viewer → Applications and Services Logs → Microsoft → Windows → iSCSI → Operational. Look for Event ID 70 (session disconnected) and Event ID 9 (login failure). These will tell you whether the drops are network-layer (timeouts) or authentication-layer (credential or target mismatch).
On Linux, check dmesg and /var/log/messages (or journalctl -u iscsid on systemd distros). Look for lines containing iSCSI login failure or connection to target lost. Persistent drops with no clear error usually mean the TCP keepalive isn't configured tightly enough, edit /etc/iscsi/iscsid.conf and set:
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 30
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
Restart iscsid after saving.
AKS Integration: iSCSI CSI Driver vs. Azure Container Storage
When connecting Azure Elastic SAN to Azure Kubernetes Service, you have two paths: the Kubernetes iSCSI CSI driver or Azure Container Storage. The iSCSI CSI driver gives you more direct control and is a good fit if you already have an existing iSCSI workflow. Azure Container Storage is a higher-level abstraction that manages the SAN integration for you as a storage class inside AKS.
The most common AKS-related failure I see is the worker nodes not having the iSCSI initiator tools installed. For the CSI driver path, every node in the node pool needs open-iscsi installed. If you're using a custom node image, verify this before deployment. For Azure Container Storage, the add-on handles most of this, but you still need to make sure the AKS cluster's subnet is included in the volume group's network access rules.
Checking Redundancy Type After Deployment
You can't change the redundancy type of an Elastic SAN after creation, LRS or ZRS is set at deploy time and is immutable. If you deployed with the wrong redundancy, you need to create a new SAN with the correct setting and migrate your volumes. There's no in-place upgrade path here. This is one of those decisions worth getting right the first time, which is why I always recommend ZRS for production workloads that need zone-level resilience.
Prevention & Best Practices
Most Azure Elastic SAN headaches are preventable. Here's what I tell every team before they go anywhere near a production deployment.
Plan your capacity before provisioning. Unlike managed disks where you pay per disk, with Azure Elastic SAN you're provisioning a shared performance pool upfront. Underprovisioning the SAN to save money means your volumes will hit performance ceilings during peak load, and you'll spend more time troubleshooting throttling than you saved. Do the math on your expected aggregate IOPS across all volumes before you provision. Build in at least 20% headroom.
Use volume groups as your policy unit. Don't create one volume group and dump everything into it. Group volumes by workload type or security boundary. Databases in one group, application file shares in another, backups in a third. Each group gets its own network rules and, if needed, its own private endpoint. This makes it easy to adjust access controls without accidentally locking out unrelated workloads.
Test iSCSI connectivity before attaching workloads. After connecting a volume, always run a quick I/O test before putting real data on it. On Linux, use fio. On Windows, use diskspd. Confirm you're getting the IOPS numbers you expect. If the numbers are way off at baseline, investigate now rather than after your application is relying on it.
Document your IQNs and keep them consistent. The volume name becomes part of the iSCSI IQN. If you ever need to re-provision a volume (after an accidental deletion, for example), give it the exact same name so your client configuration still works. Keep a record of all your volume names, their IQNs, and which clients are connected to them. A simple spreadsheet saves hours of pain during incident response.
Enable Azure Monitor alerts on volume IOPS. Set an alert that fires when any volume exceeds 70% of the SAN's total IOPS budget. This gives you early warning before users start noticing slowness. Navigate to Azure Monitor → Alerts → Create and select your Elastic SAN as the scope.
- Always verify volume group network rules are set to Selected virtual networks, never leave production groups on public access
- Choose ZRS redundancy for any SAN serving production workloads across availability zones
- Keep volume names short and descriptive, they become part of the iSCSI IQN and appear in OS-level disk management tools
- Take SAN-level snapshots before any major configuration change, snapshots are supported and give you a fast rollback path
Frequently Asked Questions
What is Azure Elastic SAN and how is it different from Azure Managed Disks?
Azure Elastic SAN is a fully managed cloud SAN that lets you provision a shared storage pool and carve it into volumes that multiple compute resources can use simultaneously, VMs, AKS clusters, and Azure VMware Solution. Managed Disks, by contrast, are dedicated block storage attached to a single VM. The big difference is cost model and scale: Elastic SAN is more cost-effective when you have a large number of IO-intensive workloads because you share performance across volumes rather than over-provisioning each disk individually. If you only have one or two VMs doing modest IO, managed disks are simpler. If you're running a fleet of databases, Elastic SAN pays for itself quickly.
What are the Azure Elastic SAN naming requirements, what characters are actually allowed?
For the SAN itself, names must be 3–24 characters using only lowercase letters (a–z), numbers (0–9), hyphens, and underscores. The name must start and end with a letter or number, and every hyphen or underscore must have alphanumeric characters on both sides, so no consecutive special characters. Volume group names follow the same rules but can be up to 63 characters. Volume names are also 3–63 characters with the same constraints. Uppercase letters are never allowed at any level, and this is a common cause of silent deployment failures when names are generated programmatically.
Does Azure Elastic SAN encrypt data at rest and in transit?
Data at rest is encrypted, Azure Elastic SAN supports encryption at rest just like other Azure Storage services. Encryption in transit is a different story: as of the current documentation, encryption in transit is not supported by Azure Elastic SAN. This is an important consideration for compliance-sensitive workloads. If your organization requires encryption in transit for storage traffic, factor this into your architecture decisions and check the official Azure docs for any updates, as the feature support table notes that supported features can change over time.
Why won't my VM connect to the Azure Elastic SAN volume over iSCSI?
The three most common reasons are: the VM's subnet isn't included in the volume group's network access rules, the iSCSI service isn't running on the VM (check iscsid on Linux or the iSCSI Initiator service on Windows), or the volume is still in a provisioning state. Start by confirming the volume shows Provisioning state: Succeeded in the portal. Then verify the volume group networking settings include your VM's VNet and subnet. Finally, check that port 3260 TCP isn't blocked by a Network Security Group rule on the VM's NIC or subnet.
How do I connect Azure Elastic SAN to Azure Kubernetes Service (AKS)?
You have two options. The first is Azure Container Storage, which is a managed add-on for AKS that abstracts the SAN integration and exposes volumes as Kubernetes storage classes, this is the easier path if you don't need fine-grained iSCSI control. The second is the Kubernetes iSCSI CSI driver, which gives you direct control but requires that every AKS worker node has open-iscsi installed and the iSCSI daemon running. Either way, make sure the AKS node pool subnet is whitelisted in your volume group's network access rules before attempting to connect, without that, all connection attempts will time out.
What redundancy types does Azure Elastic SAN support, and can I change it after deployment?
Azure Elastic SAN supports LRS (Locally Redundant Storage) and ZRS (Zone-Redundant Storage). LRS replicates your data within a single datacenter; ZRS spreads it across availability zones within a region for higher resilience against zone-level failures. The critical thing to know: you cannot change the redundancy type after the SAN is created. It's set at provisioning time and is immutable. If you need to change it, you have to create a new Elastic SAN with the correct redundancy, migrate your data, and delete the old one. For production workloads, ZRS is almost always the right call.