How to Fix Azure Machine Learning Data Science Virtual Machine Errors

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

You spun up an Azure Machine Learning Data Science Virtual Machine, opened a Jupyter notebook, imported azureml.core, and got hit with an ImportError or a silent failure that gives you no useful hint. I've seen this exact scenario dozens of times across enterprise teams and individual researchers. The frustrating part? The error message rarely tells you the real problem.

Here's the thing: the Azure DSVM is a preconfigured VM image , not a fully managed service. That's actually the key distinction most people miss. When you're comparing a DSVM against an Azure Machine Learning Compute Instance, the Compute Instance is fully managed (Microsoft handles everything), while the DSVM is unmanaged , you're responsible for keeping your conda environments, SDK versions, and GPU drivers in sync. That independence is powerful, but it's also exactly why things break.

The most common failure points I see with DSVM AzureML API setup are:

  • Wrong conda environment activated. The DSVM ships with multiple pre-built conda environments: Python3.8-default, Python3.8-Tensorflow-Pytorch, and Python3.8-AzureML. If you're running your notebook in Python3.8-default and expecting the AzureML SDK to be there, it won't be, or it'll be a stale version.
  • AzureML SDK version mismatch. The image is built at a point in time. If your workspace was created or upgraded after the DSVM image was baked, you'll hit API version incompatibilities that surface as AuthenticationException or workspace connection timeouts.
  • GPU not recognized by deep learning frameworks. On Ubuntu 20.04 DSVMs, the GPU support path is CUDA + cuDNN + NVIDIA Driver, all of which come preinstalled. But kernel updates can break the NVIDIA driver binding, and you'll only find out when PyTorch silently falls back to CPU mode.
  • SSH access misconfigured on a domain-joined VM. For enterprise setups, the NSG (Network Security Group) rules need explicit port 22 inbound rules. Many IT teams lock this down by default without telling you.
  • Jupyter kernel not matching your environment. You activate Python3.8-AzureML in terminal, but the Jupyter notebook is still pointed at Python3.8-default. The notebook kernel dropdown and the active conda environment are completely separate things.

I know this is frustrating, especially when you're in the middle of a training run or trying to hit a project deadline. The good news: every single one of these problems is fixable, and most of them come down to a few targeted commands. Browse all Microsoft fix guides →

One more thing to be clear on before we dive in: this guide covers the standalone DSVM, not Azure ML Compute Instances. The Compute Instance is fully managed and has built-in SSO and hosted notebooks out of the box. If you're on an enterprise team and you keep running into permission errors, your admin may have already set up Compute Instances for you, and switching over would solve most of your headaches without any of the fixes below.

The Quick Fix, Try This First

Before doing anything else, SSH into your DSVM (or open a terminal window if you're already connected via RDP on Windows Server) and make sure you're in the right conda environment. This single mistake accounts for about 60% of the AzureML API issues I troubleshoot.

On Ubuntu 20.04 DSVM, run:

conda activate Python3.8-AzureML
python -c "import azureml.core; print(azureml.core.VERSION)"

On Windows Server 2022 DSVM, open the Anaconda Prompt (not PowerShell, not cmd) and run the same commands. If this throws an error or prints a version older than 1.48, that's your problem right there.

Now update the SDK in that environment:

pip install --upgrade azureml-core azureml-dataset-runtime azureml-train-automl-client

Once the upgrade finishes, verify again:

python -c "import azureml.core; print(azureml.core.VERSION)"

If you're using Jupyter notebooks, you also need to make sure the notebook kernel is pointing at the right environment. In JupyterLab, go to Kernel → Change Kernel… and select Python3.8-AzureML from the dropdown. Save and re-run your notebook from the top. The majority of "AzureML API not working on DSVM" issues are resolved at this exact step.

If you're on Windows Server 2019 or 2022 DSVM and can't find the Anaconda Prompt, look in your Start Menu under Anaconda3 (64-bit). Don't use the regular PowerShell window, conda path variables aren't set there by default on the Windows DSVM image.

Pro Tip
Always check which Jupyter kernel your notebook is using before reporting an "SDK not found" error. The kernel name is displayed in the upper-right corner of JupyterLab and the upper-right of classic Jupyter Notebook. A kernel named "Python 3" with no suffix usually means you're in the base environment, not the AzureML-specific one, and that's the number one cause of false "broken DSVM" reports on Stack Overflow and the Azure forums.
1
Verify Your DSVM Provisioned Successfully in the Azure Portal

Before touching any code, confirm the VM itself is healthy. Go to the Azure portal (portal.azure.com), navigate to Virtual Machines in the left nav, and find your DSVM. Click into it and check the Overview tab. The status should read Running. If it shows Stopped (deallocated), click Start and wait about 2–3 minutes before trying to connect.

While you're on the Overview tab, note the VM size. If you provisioned an N-series GPU VM (like Standard_NC6s_v3 or Standard_ND40rs_v2), verify under Size that it's actually a GPU SKU. A common mistake is selecting a D-series or B-series VM during setup, everything looks the same until you try to run a GPU workload and nothing accelerates. Azure free accounts explicitly don't support GPU-enabled VM SKUs, so if you're on a free trial and expecting GPU compute, that's a dead end without upgrading your subscription.

Next, check the Networking tab. Confirm that your inbound port rules allow:

  • Port 22 for SSH (Ubuntu DSVM)
  • Port 3389 for RDP (Windows Server DSVM)
  • Port 8888 if you're exposing Jupyter directly (only do this with proper NSG restrictions)

If the VM provisioning itself failed, the Azure portal will show a red error banner. Click through to the Activity Log and look for entries tagged Failed. The error code and message there will tell you whether it was a quota issue, a region capacity problem, or a permission gap on your subscription.

When the VM is confirmed Running with correct networking, proceed to Step 2. If the VM won't start at all, check your subscription's vCPU quota for the selected region, quota exhaustion is the most common provisioning failure for the larger N-series VM sizes used for deep learning on the DSVM.

2
Connect to Your DSVM and Identify the Active Conda Environment

Once the VM is confirmed healthy, connect to it. On Ubuntu, open a terminal and SSH in using the admin credentials you set during VM creation:

ssh your-admin-username@<your-vm-public-ip>

On Windows Server 2022 or 2019 DSVM, use the Connect → RDP option from the Azure portal. Download the RDP file, open it, and sign in with your admin username and password from the VM creation step, not your Azure portal credentials, since those are separate.

Once connected, check what conda environments are available:

conda env list

You should see at minimum these three environments on any current DSVM image (Ubuntu 20.04, Windows Server 2019, or Windows Server 2022):

Python3.8-default
Python3.8-Tensorflow-Pytorch
Python3.8-AzureML

If any of these are missing, your DSVM image may be corrupted or you may be on an older image version that predates the current environment layout. In that case, skip to Step 4 to rebuild the AzureML environment manually.

Activate the AzureML environment now:

conda activate Python3.8-AzureML

Run conda list | grep azureml to see all AzureML packages installed in this environment. A healthy output will list packages like azureml-core, azureml-dataset-runtime, azureml-train-core, and several others. If this list is empty or you get no output, the environment exists but the AzureML packages were never installed, jump to Step 3.

3
Install or Reinstall the AzureML SDK in the Correct Environment

With Python3.8-AzureML activated, install the core AzureML packages. Don't install everything at once, start with what you actually need to avoid dependency conflicts:

pip install azureml-core
pip install azureml-dataset-runtime
pip install azureml-train-core
pip install azureml-mlflow

If you need AutoML support (and many DSVM users do), add:

pip install azureml-train-automl-client

If you're getting ERROR: pip's dependency resolver does not currently take into account all the packages that are installed, that's usually a conflict between the preinstalled packages in the conda environment and the newer SDK version. Fix it by pinning your install:

pip install "azureml-core>=1.48,<1.60" --force-reinstall

After installation finishes, do a quick sanity check:

python -c "
import azureml.core
from azureml.core import Workspace
print('AzureML SDK version:', azureml.core.VERSION)
print('Import successful')
"

If this runs without errors, the SDK is correctly installed. The next thing to test is whether you can actually connect to your Azure ML workspace. You'll need your subscription ID, resource group name, and workspace name, all available from the Azure portal under Machine Learning → your workspace → Overview. Keep those values handy for Step 4.

4
Configure and Test Your Azure ML Workspace Connection from the DSVM

With the SDK installed, the next step is authenticating and connecting to your AzureML workspace. Create a Python script or run these lines in an interactive Python session:

from azureml.core import Workspace
from azureml.core.authentication import InteractiveLoginAuthentication

# Use interactive auth, opens a browser window for sign-in
auth = InteractiveLoginAuthentication(tenant_id="your-tenant-id")

ws = Workspace(
    subscription_id="your-subscription-id",
    resource_group="your-resource-group",
    workspace_name="your-workspace-name",
    auth=auth
)
print(ws.name, ws.location, ws.resource_group)

If you're on a headless Ubuntu DSVM with no browser access, use device code authentication instead:

from azureml.core.authentication import DeviceLoginAuthentication
auth = DeviceLoginAuthentication()

This will print a device code and a URL. Open that URL on any browser, enter the code, sign in with your Azure account, and the DSVM session will authenticate automatically.

After connecting, save a workspace config file so you don't need to re-authenticate every session:

ws.write_config(path="./", file_name="config.json")

From that point on, on any subsequent session you can do:

from azureml.core import Workspace
ws = Workspace.from_config()

If you see ProjectSystemException: This request is not authorized to perform this operation, your Azure account doesn't have the Contributor or Owner role on the ML workspace. Have your Azure admin navigate to Azure portal → Machine Learning workspace → Access Control (IAM) and add your account as a Contributor.

5
Validate GPU Detection and Register the Jupyter Kernel

If you're running on an N-series GPU VM, verify that PyTorch or TensorFlow can see the GPU. Both come preinstalled in the Python3.8-Tensorflow-Pytorch environment. Activate it and check:

conda activate Python3.8-Tensorflow-Pytorch
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU:', torch.cuda.get_device_name(0))"

You can also run the NVIDIA System Management Interface directly to confirm driver-level GPU visibility:

nvidia-smi

A healthy output shows your GPU model, driver version, CUDA version, and memory usage. If nvidia-smi returns command not found or NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver, the driver binding is broken, usually from a kernel update. This is the most common GPU issue on Ubuntu DSVMs after the OS applies automatic updates.

To fix a broken NVIDIA driver after a kernel update on Ubuntu 20.04:

sudo apt-get install --reinstall nvidia-driver-525
sudo reboot

After the VM reboots, SSH back in and re-run nvidia-smi. It should return the GPU information now.

Finally, register your AzureML conda environment as a Jupyter kernel so you can select it in notebooks:

conda activate Python3.8-AzureML
python -m ipykernel install --user --name Python3.8-AzureML --display-name "Python 3.8 - AzureML"

Restart your Jupyter server, refresh the browser, and the kernel should now appear in the Kernel → Change Kernel… dropdown. Select it before running any AzureML notebook code.

Advanced Troubleshooting

If the five steps above didn't fully resolve your Azure Machine Learning Data Science Virtual Machine issues, here's where we go deeper. These scenarios usually come up in enterprise environments, domain-joined VMs, or situations where the DSVM has drifted significantly from its original image state.

Diagnosing AzureML SDK Errors with Verbose Logging

The AzureML SDK has a built-in logging system that most people never turn on. Before escalating anything, turn on verbose output:

import logging
logging.basicConfig(level=logging.DEBUG)
import azureml.core

This floods your console, but it shows exactly where authentication is failing, which endpoint is timing out, and what HTTP status codes are being returned. Look for 401 Unauthorized (authentication problem), 403 Forbidden (RBAC/permissions problem), or 503 Service Unavailable (Azure regional outage, check status.azure.com).

Fixing Proxy and Network Connectivity Issues

In corporate networks, the DSVM may be behind a proxy that blocks outbound HTTPS to Azure endpoints. The AzureML SDK needs to reach *.azureml.ms, *.blob.core.windows.net, and login.microsoftonline.com. If any of these are blocked, set your proxy environment variables before running Python:

export HTTPS_PROXY=http://your-corporate-proxy:8080
export HTTP_PROXY=http://your-corporate-proxy:8080

On Windows Server DSVM, set these via System Properties → Advanced → Environment Variables as system-level variables, then restart the Jupyter server.

Comparing DSVM vs. Azure ML Compute Instance for Enterprise Teams

I see a lot of enterprise teams bang their heads against DSVM permission issues that they'd never encounter on Compute Instances. Here's the core difference: the DSVM is an unmanaged VM, no built-in SSO, no built-in hosted notebooks, no automatic environment management. You have to wire everything up yourself. Azure Machine Learning Compute Instances handle all of that natively, including built-in SSO and hosted notebooks, plus they're managed by the Azure ML service so you get automatic security patches without breaking your GPU drivers.

If your team is spending more than 30 minutes per week fighting DSVM environment problems, the honest answer is: migrate to Compute Instances. They support both Python and R, run Ubuntu, have SSH access, and can scale up to N-series GPU VMs. The only things you lose are RDP access (Compute Instances don't support RDP), the pre-installed desktop apps like Power BI Desktop and Microsoft Office, and some specialty tools like Julia pre-config. For pure ML workloads, that trade-off is almost always worth it.

Event Log Analysis on Windows Server DSVM

On Windows Server 2019 or 2022 DSVM, open Event Viewer (eventvwr.msc) and navigate to Windows Logs → Application. Filter by Error level and look for source names like Python, conda, or pip. Event ID 1000 (Application Error) with a faulting module of python38.dll often indicates a corrupted Python installation. In that case, reinstall the environment from scratch using the conda environment YAML files stored in C:\dsvm\tools\.

When to Call Microsoft Support
Escalate to Microsoft Support if: (1) your DSVM fails to provision despite correct quota and permissions, Azure infrastructure issues need Microsoft's backend team; (2) nvidia-smi fails even after driver reinstall on a confirmed N-series VM, this can indicate a hardware-level GPU assignment failure in the Azure hypervisor; (3) you're on an Enterprise Agreement and hitting workspace-level RBAC issues that your Azure admin can't resolve, Microsoft's Azure ML team has dedicated support engineers for EA customers. For everything else in this guide, you should be able to self-resolve.

Prevention & Best Practices

Once your Azure Machine Learning Data Science Virtual Machine is working correctly, the goal is to keep it that way. The DSVM is an unmanaged image, meaning Microsoft gives you a great starting point, but long-term maintenance is on you. Here's what I recommend doing from day one.

Pin your conda environments before updating anything. Export your working environment configuration immediately after getting a clean setup:

conda activate Python3.8-AzureML
conda env export > azureml-env-backup.yml

Store that YAML file somewhere outside the VM (Azure Blob Storage works great). If a pip upgrade breaks your environment, you can recreate it exactly with conda env create -f azureml-env-backup.yml.

Disable automatic OS updates on Ubuntu DSVM if you're using GPU workloads. The number one killer of GPU driver stability on Ubuntu DSVM is unattended-upgrades applying a kernel update that breaks the NVIDIA driver binding. You can disable automatic kernel updates while keeping security patches:

sudo nano /etc/apt/apt.conf.d/50unattended-upgrades

Add linux-image and linux-headers to the Unattended-Upgrade::Package-Blacklist section. This keeps your kernel stable while still getting security patches for other packages.

Use the AzureML workspace config file, not hardcoded credentials. Never put subscription IDs, tenant IDs, or access keys directly in notebook cells. Always use ws = Workspace.from_config() pointing to a config.json that's excluded from any version control with a proper .gitignore.

Choose the right DSVM flavor for your workload upfront. If you need deep learning with GPU, you need Ubuntu 20.04 DSVM with an N-series VM size. Windows Server DSVM gives you GPU support too, but deep learning on GPUs is specifically optimized on the Ubuntu edition. If you need the full pre-installed desktop software suite (Power BI Desktop, Microsoft Office 365, SQL Server Management Studio), that's Windows-only. Make this decision before provisioning, migrating between DSVM OS flavors means creating a new VM.

Quick Wins
  • Export conda environment YAML files to Azure Blob Storage right after initial setup, your recovery plan if environments break
  • Run nvidia-smi after every OS update to catch GPU driver breaks before they block your work
  • Always open Jupyter from within an activated conda environment rather than from a system-level Jupyter install
  • Set a VM auto-shutdown schedule in the Azure portal (VM → Auto-shutdown) to avoid runaway compute costs when you forget to deallocate

Frequently Asked Questions

What tools does the Azure Data Science Virtual Machine actually include?

The DSVM ships with a seriously deep toolset. On all three current flavors (Ubuntu 20.04, Windows Server 2019, Windows Server 2022) you get PyTorch, TensorFlow, XGBoost with CUDA support, Vowpal Wabbit, Apache Spark 3.1 standalone, Anaconda Python, CRAN-R with popular packages, Julia, JupyterLab, Visual Studio Code, Git, OpenJDK 11, and the full Azure SDK. On Ubuntu specifically you also get LightGBM with GPU and MPI support, H2O, CatBoost, Intel MKL, OpenCV, Dlib, NCCL, and Horovod, making Ubuntu the better choice for distributed deep learning workloads. Windows Server adds Microsoft Office 365, Power BI Desktop, Visual Studio 2019, SQL Server Management Studio, and Microsoft Teams. The full tool list is documented in Microsoft's official DSVM tools page, but the above covers 90% of what data scientists actually use day to day.

What's the difference between an Azure DSVM and an Azure Machine Learning Compute Instance?

The biggest difference is managed vs. unmanaged. A Compute Instance is fully managed by the Azure ML service, Microsoft handles security patches, environment updates, and built-in SSO. A DSVM is an unmanaged VM image you control entirely, which gives you more flexibility but also more responsibility. Compute Instances have hosted notebooks and built-in collaboration built right in; the DSVM requires additional configuration for both. Compute Instances don't support RDP and don't have desktop apps like Office 365 or Power BI pre-installed. If your primary workload is ML model training and you want less infrastructure management overhead, Compute Instances win. If you need the broader software ecosystem, custom kernel-level configurations, or Windows desktop apps alongside your data science tooling, the DSVM is the right call.

Why is my Jupyter kernel showing "Python 3" instead of the AzureML environment?

This means the AzureML conda environment hasn't been registered as a Jupyter kernel yet, or the registration was lost after a Jupyter update. The fix is to activate the environment and register it manually: run conda activate Python3.8-AzureML, then python -m ipykernel install --user --name Python3.8-AzureML --display-name "Python 3.8 - AzureML". Restart your Jupyter server after running that command. Once you reload the browser, go to Kernel → Change Kernel… and you should see "Python 3.8 - AzureML" as an option. Always verify the active kernel before running any cell that imports from azureml.core, mismatched kernels are the most common cause of "module not found" errors on the DSVM.

How do I set up my DSVM for the first time after creating it in the Azure portal?

After provisioning finishes (which takes 10–20 minutes for the VM to fully provision), connect via SSH (Ubuntu) or RDP (Windows Server) using the admin credentials you entered during VM creation. On Windows, you'll find all the installed tools accessible from the Start menu, look for tiles and application shortcuts already laid out. On Ubuntu, open a terminal and run conda env list to confirm your three base environments are present, then activate Python3.8-AzureML and install any additional SDK packages you need. The Azure portal has a Connect button on your VM's Overview page that shows exact SSH commands and lets you download the RDP file. From there, explore tools via the Start menu on Windows or run ls /dsvm/tools/ on Ubuntu to see what's available.

Why can't I use GPU SKUs on my Azure free account with the DSVM?

Azure free accounts explicitly don't support GPU-enabled virtual machine SKUs, this is a hard platform limitation, not something you can work around. N-series VMs (the GPU-enabled SKUs like NC, ND, and NV series) require a paid subscription. If you need GPU compute for deep learning on the DSVM, you'll need to either upgrade your Azure subscription to Pay-As-You-Go or set up an Enterprise Agreement. Once you're on a paid tier, you may also need to request a vCPU quota increase for N-series VMs in your target region, since those quotas start at zero by default in many regions, go to Subscriptions → your subscription → Usage + quotas and request an increase for the specific N-series family you need.

How do I connect my DSVM to Azure Machine Learning and use it for experiment tracking?

The integration between the DSVM and Azure Machine Learning works through the Azure ML Python SDK, which is pre-installed in the Python3.8-AzureML conda environment. Start by activating that environment, then authenticate with Workspace.from_config() (if you have a config.json) or Workspace(subscription_id=..., resource_group=..., workspace_name=...) directly. From there you can log metrics using run = experiment.start_logging() and run.log("metric_name", value), register datasets, manage models, and submit training runs to remote compute clusters, all from your DSVM as the control plane. The DSVM also supports the Azure ML CLI, which you can use from the terminal to submit runs without writing Python: az ml job create --file train.yml. Integration with Azure Machine Learning is available across all three DSVM editions (Ubuntu, Windows Server 2019, and Windows Server 2022) via the Python SDK, CLI, and sample notebooks.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.