How to Troubleshoot Azure Synapse Analytics Dedicated SQL
Why This Is Happening
I've spent years troubleshooting Azure Synapse Analytics Dedicated SQL pool issues across enterprise environments, and I can tell you this: it's one of the most deceptively complex services Microsoft offers. You've got a massively parallel processing (MPP) engine, a control node coordinating dozens of compute nodes, distributed storage via Azure Data Lake, and a connection layer sitting on top of all of it. When something breaks , and it will , the error messages you get are often maddeningly vague. "A connection attempt failed." "Query was cancelled." "The operation could not be completed." Thanks, Azure. Super helpful.
The reality is that Azure Synapse Analytics Dedicated SQL pool problems fall into a handful of distinct categories, each with their own root cause. Connection failures are the most common complaint I hear, and they're usually caused by one of three things: the pool is paused (yes, Dedicated SQL pools auto-pause by default and cost you nothing while paused, but they don't tell you that's why you can't connect), a firewall rule is missing or stale, or an Azure Active Directory (now Microsoft Entra ID) authentication token has expired. None of these produce a clear error message that points you directly at the fix.
Query performance issues are the second most common category. Your Azure Synapse dedicated SQL pool query timeout might stem from a resource class that's too small, outdated statistics causing the query optimizer to make terrible decisions, a workload group that's hit its concurrency slot ceiling, or a tempdb spill because your query ran out of in-memory space. Each of these needs a completely different fix.
Then there are the deeper infrastructure problems: DWU (Data Warehouse Unit) scaling stuck in a transitioning state, distributed query errors with error code 110802 or 104570, CETAS (CREATE EXTERNAL TABLE AS SELECT) failures due to storage account firewall rules, and deadlocks caused by incompatible transaction isolation levels across sessions.
Who sees these issues most? Primarily data engineers running ETL pipelines, analysts running ad-hoc queries against large fact tables, and Azure architects connecting BI tools like Power BI or Tableau to a dedicated SQL pool endpoint. If that's you, you're in the right place. Browse all Microsoft fix guides →
The frustrating thing about Azure Synapse Analytics Dedicated SQL troubleshooting is that Microsoft's built-in error messages rarely tell you why something failed, they tell you what failed, usually in language designed for developers reading documentation, not for engineers who just need to get their pipeline running again at 2 AM. This guide is built for the 2 AM scenario.
The Quick Fix, Try This First
Before you go deep into diagnostics, run through this fast checklist. I've seen it resolve about 60% of Azure Synapse Analytics Dedicated SQL issues in under five minutes.
Step 1: Is your pool actually running? Open the Azure Portal, navigate to your Synapse workspace, click SQL pools under the Analytics pools section, and check the Status column. If it says Paused, that's your entire problem. Click the pool name, then click Resume. It takes 3–5 minutes to come back online. While it's resuming, the status shows as Resuming, don't try to connect until it hits Online.
Step 2: Check your server firewall. Go to your Synapse workspace in the portal, click Networking in the left sidebar, and look at the firewall rules. If your client IP isn't listed, or if "Allow Azure services and resources to access this workspace" is toggled off, your connection will fail silently. Add your IP range and save.
Step 3: Test the connection string directly. Open SSMS (SQL Server Management Studio) or Azure Data Studio, and connect to your dedicated SQL pool endpoint, it looks like yourworkspace.sql.azuresynapse.net, not the serverless endpoint which uses yourworkspace-ondemand.sql.azuresynapse.net. I can't count how many times someone has been trying to run a workload against the wrong endpoint. Those two endpoints are completely different engines with different billing, different performance, and different SQL feature support.
Step 4: If queries are hanging, check concurrency slots. Run this in your pool:
SELECT * FROM sys.dm_pdw_waits WHERE state = 'queued';
If rows come back, your queries are queued waiting for concurrency slots. You'll need to either scale up your DWU, move queries to a smaller resource class, or wait for the current workload to finish.
This sounds obvious, but it trips up experienced engineers constantly. Azure Synapse Analytics Dedicated SQL pools have an auto-pause feature that's enabled by default during workspace creation. Unlike Azure SQL Database's serverless tier, which resumes in under a second, a Dedicated SQL pool takes 3 to 8 minutes to resume depending on its DWU tier and whether it has any pending distributed transactions to clean up.
To check and resume via the portal: navigate to portal.azure.com → Your Synapse Workspace → SQL pools (under Analytics pools) → Click your pool name → Overview → Resume button. If the Resume button is greyed out, the pool may already be in a transitioning state. Check the Activity Log for any stuck operations.
You can also resume via PowerShell, which is faster if you're scripting this:
Resume-AzSynapseSqlPool `
-WorkspaceName "yourworkspacename" `
-Name "yoursqlpoolname" `
-ResourceGroupName "yourresourcegroup"
Or via Azure CLI:
az synapse sql pool resume `
--name yoursqlpoolname `
--workspace-name yourworkspacename `
--resource-group yourresourcegroup
If the pool status is stuck on Scaling or Pausing for more than 30 minutes, that indicates a hung internal operation. You can't force-cancel it from the portal. Open a support ticket, this is a platform-side issue that requires Microsoft's backend team to reset the pool state. Do not try to delete and recreate the pool without backing up your data first.
Once the status reads Online, try your connection again. You should see a successful connection within 10 seconds. If the pool shows Online but connections still fail, proceed to Step 2.
Azure Synapse Analytics Dedicated SQL pool firewall rules live at the workspace level, not the pool level, this confuses a lot of people. If you've set firewall rules on the old SQL Server resource (for workspaces created before November 2020), those rules don't apply to newer workspace-managed endpoints.
Navigate to: Your Synapse Workspace → Networking → Firewall rules. You need to add a rule for your client's IP address. If you're connecting from a corporate network that uses NAT, you'll need your public egress IP, not your internal 192.168.x.x address. To find your public IP, open PowerShell on the client machine and run:
(Invoke-WebRequest -Uri "https://api.ipify.org").Content
For connections from Azure services, including Azure Data Factory, Azure Databricks, or Power BI gateways, toggle on "Allow Azure services and resources to access this workspace". This opens the Azure backbone network to your workspace. Note that this is a broad permission; if you're in a regulated environment, use managed private endpoints instead.
If your workspace is behind an Azure Virtual Network with a managed VNet integration, the setup is different. You need to create a managed private endpoint for the Dedicated SQL pool: go to Workspace → Managed private endpoints → + New, select the Azure Synapse Analytics (Dedicated SQL) target type, and approve the private endpoint connection request in the target resource's Private endpoint connections blade.
After updating firewall rules, wait about 60 seconds for changes to propagate. Then test connectivity using SSMS or this command-line test:
Test-NetConnection -ComputerName yourworkspace.sql.azuresynapse.net -Port 1433
You want to see TcpTestSucceeded: True. If you get False, the traffic is being blocked before it even reaches Azure, check your local firewall, NSG (Network Security Group) rules, and proxy settings.
Azure Synapse Analytics Dedicated SQL pool query performance problems almost always come back to one of four things: the wrong resource class, stale statistics, missing or incorrect distribution keys, or runaway tempdb usage. Let's tackle them systematically.
Check active requests and waits:
-- See all active and queued requests
SELECT r.request_id, r.status, r.submit_time,
r.resource_class, r.command
FROM sys.dm_pdw_exec_requests r
WHERE r.status NOT IN ('Completed','Failed','Cancelled')
ORDER BY r.submit_time;
Check if statistics are stale on your key tables. Outdated statistics are the single most common cause of Azure Synapse dedicated SQL pool slow query performance I've seen in production:
-- Find tables with no statistics or very old stats
SELECT OBJECT_NAME(object_id) AS table_name,
name AS stats_name,
stats_date(object_id, stats_id) AS last_updated
FROM sys.stats
WHERE OBJECT_NAME(object_id) NOT LIKE 'sys%'
ORDER BY last_updated ASC;
Update stats on a problematic table:
UPDATE STATISTICS [dbo].[YourLargeTable];
Check resource class assignment: If a user is running large queries on the default smallrc resource class, they only get a tiny slice of memory. Move them up:
EXEC sp_addrolemember 'largerc', 'yourusername';
After these changes, re-run the problem query. For queries doing full table scans on large distributed tables, add OPTION (LABEL = 'your_query_label') to the end so you can track it specifically in DMVs. A noticeable improvement in execution time confirms the fix. If the query still hangs, proceed to Step 4.
Azure Synapse Analytics Dedicated SQL pools use a concurrency slot system. Every DWU tier gets a fixed number of concurrency slots, DW100c gives you 4, DW500c gives you 20, DW2000c gives you 80, and so on up to DW30000c with 120 slots. Every query consumes slots based on its resource class: smallrc uses 1 slot, mediumrc uses 4, largerc uses 8, and xlargerc uses 16. When all slots are taken, new queries queue up and wait.
This is a very common cause of the Azure Synapse dedicated SQL pool query timeout error that pipelines hit during peak hours.
Check your current slot usage:
SELECT SUM(CASE WHEN status = 'running'
THEN concurrency_slots_used ELSE 0 END) AS slots_in_use,
SUM(CASE WHEN status = 'queued'
THEN concurrency_slots_used ELSE 0 END) AS slots_queued
FROM sys.dm_pdw_resource_waits;
Identify which sessions are consuming the most slots:
SELECT session_id, request_id, concurrency_slots_used,
resource_class, command
FROM sys.dm_pdw_resource_waits
WHERE status = 'running'
ORDER BY concurrency_slots_used DESC;
If a runaway session is blocking everything else, you can kill it:
KILL [session_id];
For a more permanent fix, create a Workload Group with explicit resource limits and a Workload Classifier to route specific query types or users into it. This is the modern replacement for the older resource class system:
CREATE WORKLOAD GROUP ETLLoads
WITH (
MIN_PERCENTAGE_RESOURCE = 30,
CAP_PERCENTAGE_RESOURCE = 60,
REQUEST_MIN_RESOURCE_GRANT_PERCENT = 10
);
CREATE WORKLOAD CLASSIFIER ETLClassifier
WITH (
WORKLOAD_GROUP = 'ETLLoads',
MEMBERNAME = 'etlserviceaccount'
);
Once the classifier is in place, queries from etlserviceaccount will be capped at 60% of resources and won't be able to starve out other users. You should see queued queries begin executing within the next few minutes after applying these limits.
I know this is frustrating, especially when authentication failures block your entire pipeline. The most common Azure Synapse Analytics Dedicated SQL authentication errors I've diagnosed fall into two buckets: wrong endpoint, and expired or misconfigured Entra ID (formerly Azure AD) tokens.
Correct connection string format for a Dedicated SQL pool:
Server=tcp:yourworkspace.sql.azuresynapse.net,1433;
Initial Catalog=YourDedicatedSQLPoolName;
Persist Security Info=False;
User ID=sqladmin;
Password=YourPassword;
MultipleActiveResultSets=False;
Encrypt=True;
TrustServerCertificate=False;
Connection Timeout=30;
Note: Initial Catalog must match your SQL pool name exactly, not your workspace name, not "master". Connecting to "master" on a Synapse dedicated endpoint will fail for most user accounts since only the SQL admin has access to master.
For Microsoft Entra ID (Azure AD) authentication failures, the most common cause is that the client app's service principal hasn't been added to the SQL pool as a contained database user. Run this as the SQL admin:
-- Run inside your Dedicated SQL pool database (not master)
CREATE USER [your-service-principal-name] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-service-principal-name];
ALTER ROLE db_datawriter ADD MEMBER [your-service-principal-name];
For token expiry issues in long-running ADF pipelines or notebooks, check if your linked service is using a managed identity or service principal with a secret that may have expired. In ADF, open the linked service, click Test connection, a failure here with error code UserErrorFailedToConnectToSqlServer or InvalidAuthenticationTokenTenant almost always means token or credential expiry.
After creating the contained user and verifying the connection string, re-test. A successful SSMS connection showing your database tables in Object Explorer confirms the auth fix worked.
Advanced Troubleshooting
If the steps above haven't resolved your Azure Synapse Analytics Dedicated SQL issue, you're dealing with something deeper. Here's where I take it when the standard fixes don't work.
Analyzing Distributed Query Failures with Error Codes 110802 and 104570
Error 110802 (An internal DMS error occurred that caused this operation to fail) and 104570 (A DMS component encountered an error) indicate a problem in the Data Movement Service, the engine that shuffles data between compute nodes during distributed queries. These usually surface when you're running CTAS (CREATE TABLE AS SELECT) operations or queries that require a large broadcast or shuffle operation.
Check the DMS error details:
SELECT request_id, step_index, operation_type,
status, error_id, PDMS_error_count,
end_compile_time
FROM sys.dm_pdw_request_steps
WHERE request_id = 'QID[your_request_id]'
ORDER BY step_index;
If PDMS_error_count is non-zero on a ShuffleMoveOperation or BroadcastMoveOperation step, the data movement itself is failing. Common causes: the source data has rows with null values in a distribution column (breaks hash distribution), or the result set is larger than the available memory buffer. Solution: explicitly specify a distribution key in your CTAS that avoids data skew, and ensure statistics are current.
Investigating with Event Viewer and Azure Monitor Logs
Azure Synapse diagnostic logs can be routed to a Log Analytics workspace. Once enabled, you can query them. In Azure Monitor Logs:
// Find all failed dedicated SQL pool requests in the last 4 hours
SynapseSqlPoolRequestSteps
| where TimeGenerated > ago(4h)
| where Status == "Failed"
| project TimeGenerated, RequestId, Command, Error
| order by TimeGenerated desc
Domain-Joined and Enterprise Scenarios
If your organization uses Entra ID Conditional Access policies, connections from non-compliant devices or from outside the corporate network may be blocked silently. The error you'll see client-side is AADSTS53003 or AADSTS50158. Check with your identity team, the fix is usually an exclusion policy for the Synapse workspace application ID in Conditional Access, or ensuring the connecting machine is Entra ID-joined and compliant.
For IP-restricted scenarios where you need to connect from an Azure SSIS Integration Runtime or a self-hosted IR behind a corporate proxy, you need to whitelist the IR's public IP in the Synapse workspace firewall. Get the IR's public IP from ADF Studio under Manage → Integration runtimes → [Your IR] → Nodes.
Tempdb Spills and OOM Errors
If your queries fail with Query processor ran out of internal resources and could not produce a query plan (error 8622) or you see tempdb usage climbing in DMVs, the fix is a larger resource class, or restructuring the query to use fewer intermediate result sets. Check tempdb usage per node:
SELECT pdw_node_id, used_page_count * 8.0 / 1024 AS used_mb
FROM sys.dm_db_partition_stats
WHERE OBJECT_ID = OBJECT_ID('tempdb..SYS_QUERY_STORE_QUERY_TEXT');
Escalate to Microsoft Support when: your pool is stuck in a Scaling or Pausing state for more than 30 minutes; you're seeing error 40613 (database unavailable) repeatedly despite the pool showing Online; you've hit a capacity limit and DWU scaling isn't available in your region; or you suspect data loss after a service interruption. Always note your workspace name, SQL pool name, the request IDs of failing queries (from sys.dm_pdw_exec_requests), and the approximate UTC time of the failure before you open the ticket, it saves hours.
Prevention & Best Practices
Once you've fixed the immediate Azure Synapse Analytics Dedicated SQL issue, here's how you make sure you're not back here next week.
Set up auto-pause and auto-resume thoughtfully. Auto-pause is great for dev and test environments where you're not running 24/7 workloads, but in production you often want the pool always on, the 3–8 minute resume delay breaks SLAs. If you're running scheduled pipelines, configure them to resume the pool explicitly as the first step using an Azure Data Factory Web activity calling the REST API, rather than relying on the auto-resume triggered by a failed connection attempt.
Keep statistics current. Schedule a weekly maintenance job that runs UPDATE STATISTICS on your largest and most-queried tables. Synapse doesn't auto-update stats the way SQL Server does, you have to manage this yourself. I recommend a simple stored procedure wrapped in an ADF pipeline triggered every Sunday at 2 AM.
Right-size your DWU tier proactively. Use Azure Monitor metrics, specifically DWU Used Percentage and Tempdb Space Used, to set alerts. If DWU utilization consistently hits 85%+ during business hours, scale up during those windows using an automation runbook. Scaling takes 1–2 minutes, is non-disruptive to active sessions (they complete normally), and you're only charged for what you use.
Design tables with distribution in mind. Hash-distribute your large fact tables on high-cardinality columns that appear frequently in JOIN conditions, usually a date key or customer ID. Round-robin is fine for staging tables. Never hash-distribute on a column with significant null values or low cardinality (like a boolean flag), this creates severe data skew that causes exactly the DMS failures described in the advanced section above.
Monitor your workload classifier rules. As teams grow and new service accounts are added, workload classifiers can become stale or have gaps. Review sys.workload_management_workload_classifiers quarterly to ensure every production service account is properly classified and won't accidentally run at smallrc on a heavy job.
- Enable diagnostic logging to Log Analytics on Day 1, retroactive log collection isn't possible and you'll always regret not having it during an incident
- Create a dedicated SQL login for each application workload rather than sharing the SQL admin account, this makes per-workload auditing and resource class assignment far easier
- Set an Azure Monitor alert on the
FailedConnectionsmetric with a threshold of 5 in 5 minutes, it'll catch pool pauses and auth failures before your users do - Tag all your CTAS and INSERT...SELECT statements with
OPTION (LABEL = 'pipeline_name_step'), when something hangs at 3 AM, you'll immediately know which pipeline step is the culprit without reverse-engineering the SQL hash
Frequently Asked Questions
Why can't I connect to my Azure Synapse Dedicated SQL pool even though it shows as Online in the portal?
The most common reason is a missing or incorrect firewall rule. Even when the pool is Online, Azure blocks all inbound TCP connections on port 1433 unless your source IP is explicitly whitelisted in the Synapse workspace Networking settings. Go to your Synapse workspace in the portal, click Networking, and add your current public IP address as a firewall rule. If you're connecting from an Azure service like ADF or Databricks, toggle on "Allow Azure services and resources to access this workspace." Save the change, wait 60 seconds, and retry. If still failing, run Test-NetConnection yourworkspace.sql.azuresynapse.net -Port 1433 from your client to confirm network-level connectivity.
My Azure Synapse dedicated SQL pool query has been running for hours, how do I find out what's wrong?
First, grab the request ID: run SELECT request_id, status, command FROM sys.dm_pdw_exec_requests WHERE status = 'running'. Then drill into the specific steps: SELECT * FROM sys.dm_pdw_request_steps WHERE request_id = 'QID[your_id]' ORDER BY step_index. Look for any step with status running for an unusually long time, particularly ShuffleMoveOperation or BroadcastMoveOperation steps, which indicate data movement bottlenecks. Check if statistics are stale on the tables involved (DBCC SHOW_STATISTICS), and verify the query isn't waiting on a concurrency slot by querying sys.dm_pdw_waits. If the query is genuinely stuck and blocking others, KILL [session_id] will cancel it without affecting other sessions.
What's the difference between Azure Synapse Dedicated SQL and Serverless SQL, and does it matter for troubleshooting?
It matters enormously. Dedicated SQL is a provisioned, always-allocated MPP cluster, you pay by the hour regardless of query activity (though you can pause it). Serverless SQL is a query-on-demand engine that reads directly from Azure Data Lake Storage and charges per terabyte scanned. They use different endpoints: Dedicated uses yourworkspace.sql.azuresynapse.net and Serverless uses yourworkspace-ondemand.sql.azuresynapse.net. They have different SQL feature support (Dedicated supports DML, DDL, transactions, and external tables; Serverless supports only read operations against external data sources), different performance tuning levers, and entirely different troubleshooting approaches. Most connection errors people think are Dedicated SQL problems turn out to be the person accidentally connecting to the wrong endpoint.
How do I fix "Login failed for user" errors when using Azure Active Directory / Entra ID with Synapse Dedicated SQL?
This almost always means the Entra ID user or service principal hasn't been provisioned as a contained database user inside the Dedicated SQL pool database. Connect as the SQL admin and run: CREATE USER [user@domain.com] FROM EXTERNAL PROVIDER; and then grant the appropriate role like ALTER ROLE db_datareader ADD MEMBER [user@domain.com]. For service principals, use the application display name (not the app ID). If the user already exists but login still fails, check your organization's Conditional Access policies, error codes like AADSTS53003 in the detailed error indicate the device or network isn't meeting a policy requirement, and you'll need your Azure AD admin to create a policy exclusion or compliant device enrollment.
My Azure Synapse Dedicated SQL pool is stuck "Scaling" and won't finish, what do I do?
A pool stuck in a Scaling state for more than 30 minutes indicates a hung internal platform operation, and there's unfortunately nothing you can do from the portal or CLI to unstick it. Azure is performing the scale operation on the backend infrastructure and has hit an issue completing the distributed transaction or state synchronization involved. Your first step is to check the Activity Log in the portal for any associated error events, occasionally a scale failure will log a specific infrastructure error code that support can act on faster. Then open a Microsoft support ticket via Help + support → New support request, selecting Azure Synapse Analytics as the service and Scaling as the problem type. Provide your workspace name, pool name, the DWU tier you're scaling from and to, and the exact UTC time the scaling started. Do not delete the pool, your data is in Azure Data Lake and is safe, but delete operations on a transitioning resource can sometimes compound the issue.
How do I prevent data skew in Azure Synapse Dedicated SQL pool and why does it cause such bad performance?
Data skew happens when your hash distribution column has uneven value distribution, for example, if 40% of your rows have the same customer ID, all those rows land on the same compute node while others sit idle. The result: that one overloaded node becomes a bottleneck for every query touching that table, and you'll see consistently slow Azure Synapse Analytics Dedicated SQL query performance that doesn't improve even when you add DWUs. Diagnose it with: DBCC PDW_SHOWSPACEUSED('dbo.YourTable'), if the row counts per distribution vary by more than 20%, you have skew. The fix is to recreate the table with a better distribution column using CTAS: CREATE TABLE dbo.YourTable_New WITH (DISTRIBUTION = HASH(BetterColumn)) AS SELECT * FROM dbo.YourTable. Choose a column with high cardinality (many unique values), low nulls, and frequent use in JOIN predicates. A good distribution key is the single highest-leverage performance decision you'll make in Synapse.