SQL Server Errors, Connection, Throttling, and Query Performance Fixes
Why This Is Happening
You opened SQL Server Management Studio this morning, fired up a query that ran fine last week, and now you're staring at a red error message. Or maybe your Analysis Services instance just crashed overnight and nobody knows why. Or your MDX queries are returning either wrong data or nothing at all. I've been in that exact seat, and I know how disorienting it is, especially when the error message reads like it was written to confuse you on purpose.
SQL Server errors span a wide range. Some are surface-level connection problems. Others are buried deep in transaction isolation mismatches, Analysis Services (SSAS) processing failures, or disk exhaustion events that look exactly like a crash but aren't quite. The tricky part is that SQL Server doesn't always tell you what went wrong, it tells you what it observed at the moment things fell apart, which is not the same thing.
Here's a breakdown of the most common root causes I see in production environments:
- Disk exhaustion during SSAS processing rollbacks. When a processing job fails, SQL Server Analysis Services has to roll back the transaction. Rolling back requires I/O. If your disk is already full or nearly full, that I/O fails, and SSAS has no safe path forward. The service shuts itself down because continuing to serve data in an inconsistent state would be worse than crashing. You'll see errors like "There is not enough space on the disk" written to the Application log, followed by what looks like a crash but is actually a controlled emergency shutdown. This affects SQL Server 2012, 2014, 2016, 2017 (on both Windows and Linux), and later.
- Dirty reads and transaction isolation conflicts. When multiple processes are reading and writing to the same data source simultaneously, you can hit locking issues, dirty reads, or phantom rows. Enabling the wrong isolation level, or failing to enable the right one, is the silent killer of many BI workloads built on top of Analysis Services.
- MDX queries returning wrong or empty results. This one bites everyone eventually. You use
LastChildto find the last time period with data, and instead of getting the last period that actually has records, you get the last period that exists in the dimension, which might be months ahead with zero data. The error isn't a crash, it's bad data, which is arguably worse. - Connection throttling and pool exhaustion. Too many concurrent connections hammering the same SQL Server instance will cause new connection requests to time out or get refused. This looks like a connection error to the application but the underlying cause is resource saturation.
What makes SQL Server errors particularly hard to diagnose is that the Application Event Log, SQL Server Error Log, and the actual error message you see in SSMS can all be pointing at slightly different moments of the same failure. You need to triangulate. I'll walk you through exactly how to do that.
I know this is frustrating, especially when it blocks a production report or a scheduled processing job. Let's fix it. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before you go deep into Event Viewer or start touching isolation level settings, run through this triage checklist. In my experience, one of these three things is the cause about 70% of the time.
1. Check available disk space on the SSAS data drive right now.
Open File Explorer or run this in PowerShell:
Get-PSDrive -PSProvider FileSystem | Select-Object Name, @{N='Free(GB)';E={[math]::Round($_.Free/1GB,2)}}, @{N='Used(GB)';E={[math]::Round($_.Used/1GB,2)}}
If any drive hosting your SSAS data directory is under 10–15% free, that's your problem. SSAS needs working space during processing, not just space for the final data, but extra room to write temporary files, run rollback I/O, and perform cleanup on restart. If you're hitting the disk space wall, free up space before you restart the service. Restarting SSAS on a full disk just crashes it again.
2. Check the Windows Application Event Log for SSAS errors.
Open Event Viewer (run eventvwr.msc), navigate to Windows Logs > Application, and filter by Source = MSSQLServerOLAPService. Look for event IDs in the 1000–1100 range. The specific messages to watch for are:
File system error: The following error occurred while writing to the file 'LazyWriter Stream': There is not enough space on the disk.
File system error: The background thread running lazy writer encountered an I/O error.
If you see those two lines, disk exhaustion caused your SSAS shutdown. That's confirmed. The fix is: free disk space, then restart the SSAS service. On restart, SSAS automatically runs a cleanup pass through its data directory, removing any partial or unconfirmed files from the failed processing job before it makes data available for querying. You don't have to manually clean anything, but you do have to give the disk space to work with.
3. Verify your MDX queries are using NonEmptyCrossJoin for time-based analysis.
If your issue is wrong query results rather than a crash, check whether your MDX is using LastChild or FirstChild to navigate time dimensions. Those functions return the first or last member that exists in the dimension structure, not the first or last member that actually has data. Switch to the pattern I'll cover in Step 3.
C:\Program Files\Microsoft SQL Server\MSAS[version].[instance]\OLAP\Data) for any .tmp or partial partition files left over from the failed job. After a successful restart and cleanup, those should be gone, but if SSAS crashed hard enough, you may need to manually remove orphaned temp files before the next processing run completes cleanly.
Once you've confirmed that disk exhaustion is causing your SSAS crashes, the immediate fix is straightforward but requires a specific order of operations. Don't skip steps here.
First, stop the SSAS service before you do anything else:
Stop-Service -Name "MSSQLServerOLAPService" -Force
Or go to Services (run services.msc), find SQL Server Analysis Services ([instance name]), right-click, and select Stop.
Next, free up disk space. The fastest targets are:
- Windows temp files, run
cleanmgr.exeas Administrator and check all boxes - Old SQL Server backup files (
.bak,.trn) that have already been archived offsite - SSAS processing log files in the OLAP\Log directory, safe to delete if they're from completed runs
- Windows Update cleanup, in Disk Cleanup, click Clean up system files to remove old update packages
Aim for at least 20% free on the target drive, not just enough to technically restart. SSAS processing jobs for large cubes can generate substantial temporary write load, and you need headroom.
Now restart the service:
Start-Service -Name "MSSQLServerOLAPService"
Watch the Application Event Log as SSAS starts. You should see a startup message followed by a cleanup enumeration event, this is SSAS scanning its data directory and removing any unconfirmed files from the failed job. If startup completes without new error events in the next 60 seconds, you're good. Run a test processing job on a small partition first before re-running the full job that originally failed.
If your SQL Server connection errors or query failures are related to blocking, dirty reads, or phantom rows, especially in Analysis Services workloads reading from a relational SQL Server data source, you need to look at your transaction isolation level. The default READ COMMITTED setting causes readers to block writers and vice versa, which kills performance under concurrent load.
The fix is enabling snapshot isolation on your source database. This gives readers a consistent point-in-time view of the data without blocking writers. Here's how to do it in SQL Server Management Studio.
Connect to your relational SQL Server instance (not the SSAS instance, the one that holds the actual source data). Then run:
ALTER DATABASE [YourDatabaseName]
SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE [YourDatabaseName]
SET ALLOW_SNAPSHOT_ISOLATION ON;
Replace [YourDatabaseName] with the actual name of the database your Analysis Services project reads from. The first statement enables Read Committed Snapshot Isolation (RCSI), which makes all READ COMMITTED reads non-blocking by default. The second allows connections to explicitly request snapshot isolation.
After running these statements, you need to wire Analysis Services to use the right provider. In Business Intelligence Development Studio (or Visual Studio with SSAS extensions), open your Analysis Services project, go to Solution Explorer, and either create a new data source (right-click Data Sources > New Data Source) or double-click your existing one. In the Connection Manager dialog, select Native OLE DB\SQL Native Client from the Provider list. Configure the server name and authentication, then test the connection using the Test Connection button before saving.
If it worked, you'll immediately see reduced wait times on your SSAS processing jobs and fewer blocking-related timeout errors in the Application log.
This one trips up even experienced SSAS developers. You write an MDX query to find the latest period with data, use LastChild, and your dashboard shows zeroes for the last few months because the query returned a period that exists in the dimension but has no actual fact data behind it.
Here's the problem in plain terms. The LastChild function returns the last child member of a dimension node, based on the dimension structure, not based on whether any measure values exist for that member. So if your Time dimension goes up to December 2025 but your fact table only has data through September 2025, LastChild will happily return December 2025 with a null or zero value.
Same issue exists with FirstChild going the other direction.
The correct approach uses HEAD() and TAIL() combined with NonEmptyCrossJoin() to filter down to members that actually have data:
To get the first time member with data:
SELECT HEAD(NonEmptyCrossJoin([Time].Members, 1), 1) ON COLUMNS
FROM [YourCube]
To get the last time member with data:
SELECT TAIL(NonEmptyCrossJoin([Time].Members, 1), 1) ON COLUMNS
FROM [YourCube]
If you need both the first and last members with data in a single query, useful for dynamic date range calculations, combine them with UNION():
SELECT
UNION(
HEAD(NonEmptyCrossJoin([Time].Members, 1), 1),
TAIL(NonEmptyCrossJoin([Time].Members, 1), 1)
) ON COLUMNS
FROM [YourCube]
NonEmptyCrossJoin() is doing the heavy lifting here, it filters out all dimension members that have no associated measure data before HEAD() or TAIL() picks from the resulting set. The result is always a member that has real data behind it. Run this in SSMS against your cube and compare the output to what your original LastChild query was returning, you'll likely see a different, correct member.
Connection throttling happens when SQL Server can't accept new connections because it's already managing too many, or because the connection pool on the application side has been exhausted. The symptoms feel like an outage, your app throws connection timeout errors, but the SQL Server instance itself is still running and serving existing connections just fine.
First, check your current connection count in SSMS:
SELECT
DB_NAME(dbid) AS DatabaseName,
COUNT(dbid) AS ConnectionCount,
loginame AS LoginName
FROM sys.sysprocesses
WHERE dbid > 0
GROUP BY dbid, loginame
ORDER BY ConnectionCount DESC;
This tells you exactly how many connections are open per database and per login. If one database or login account has an unusually high connection count, say, 200+ for a typical mid-size app, that's your throttle point.
Next, check the SQL Server maximum connections setting. In SSMS, right-click your server instance, go to Properties > Connections, and look at Maximum number of concurrent connections. A value of 0 means unlimited (governed by available memory), which is usually correct. If someone has set a hard cap here, connections above that number are refused.
On the application side, check your connection string for the Max Pool Size parameter. The ADO.NET default is 100 connections per pool. If your application is opening connections faster than it closes them, a very common leak pattern, the pool fills and new requests block until timeout. The fix on the app side is ensuring every connection is properly wrapped in a using block or explicitly closed in a finally clause.
To see which connections are idle and potentially leaked, run:
SELECT session_id, status, login_time, last_request_start_time, login_name, host_name, program_name
FROM sys.dm_exec_sessions
WHERE status = 'sleeping'
AND last_request_start_time < DATEADD(MINUTE, -30, GETDATE())
ORDER BY last_request_start_time ASC;
Connections sleeping for 30+ minutes are almost certainly leaked or orphaned. You can kill individual sessions with KILL [session_id] as a short-term fix, but the real fix is patching the application code.
Slow SQL Server queries are their own category of pain. The query ran in two seconds last month and now it takes two minutes. Nothing changed, or so you think. Usually something did change: data volume crossed a threshold, statistics went stale, or a plan was cached before an index was added and SQL Server is still using the old cached plan.
Start by pulling the actual execution plan in SSMS. Open your query, press Ctrl+M to enable Include Actual Execution Plan, then run it. Look for operators with a high Cost %, anything above 30% of the total query cost is worth examining. Table scans on large tables are the usual suspect.
Check for missing index recommendations that SQL Server surfaces right in the execution plan. They show up as green text: "Missing Index: CREATE INDEX...", don't ignore these. Create them in a test environment first, re-run the query, and verify the plan improves before applying to production.
If the plan itself looks fine but the query is still slow, your statistics are probably stale. Update them:
UPDATE STATISTICS [YourSchema].[YourTable] WITH FULLSCAN;
Or for an entire database:
EXEC sp_updatestats;
If you suspect a bad cached plan is the issue, especially after a major data change, you can clear the plan for a specific query without flushing the entire cache. First find the plan handle:
SELECT qs.plan_handle, qt.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.text LIKE '%YourTableName%';
Then remove just that plan:
DBCC FREEPROCCACHE ([paste plan_handle here]);
SQL Server will generate a fresh plan on the next execution. In most cases this alone resolves the performance regression without touching anything else.
Advanced Troubleshooting
If the steps above didn't resolve your SQL Server errors, you're dealing with something deeper. Here's how I approach the tougher cases.
SQL Server Error Log deep dive. The Windows Application Event Log shows you SSAS-level events, but SQL Server's own error log is more granular. You can read it in SSMS by going to Management > SQL Server Logs > Current or query it directly:
EXEC xp_readerrorlog 0, 1, NULL, NULL, NULL, NULL, 'desc';
This returns the current error log in descending time order. Look for severity 17+ errors, those indicate resource failures, not just informational messages. Severity 20+ means the connection was terminated. Severity 24 indicates hardware-level errors, which might mean disk or memory issues rather than SQL Server configuration problems.
Disk I/O performance under SSAS processing loads. When SSAS processing jobs are running, they generate significant sequential write I/O to the data directory. If that directory is on a slow disk, especially a shared SAN volume or a spinning disk with fragmentation, I/O latency spikes and processing jobs take much longer, which increases the window during which a disk-full condition could be hit. Check your disk latency with:
SELECT
DB_NAME(vfs.database_id) AS DatabaseName,
vfs.file_id,
mf.physical_name,
vfs.io_stall_read_ms / NULLIF(vfs.num_of_reads, 0) AS avg_read_ms,
vfs.io_stall_write_ms / NULLIF(vfs.num_of_writes, 0) AS avg_write_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
JOIN sys.master_files mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
ORDER BY avg_write_ms DESC;
Average write latency above 20ms for SSAS data files indicates a storage bottleneck. Above 50ms and you'll see processing jobs timing out and failing.
Snapshot isolation at the enterprise level. In domain-joined environments with multiple SQL Server instances, enabling READ_COMMITTED_SNAPSHOT on a database requires that no other connections are active on that database at the moment the ALTER DATABASE command runs. In practice this means scheduling the change during a maintenance window. You can verify the current isolation settings with:
SELECT name,
is_read_committed_snapshot_on,
snapshot_isolation_state_desc
FROM sys.databases
WHERE name = 'YourDatabaseName';
MDX query performance in large SSAS cubes. NonEmptyCrossJoin() is accurate but not free, on very large Time dimensions with millions of members, it can be slow because it has to evaluate every member for data presence. If performance is a concern, consider pre-calculating the first and last periods with data as named sets or calculated members in your cube definition, and referencing those instead of running NonEmptyCrossJoin() on the fly in every query.
Connection issues after failover in Always On AG environments. If your SQL Server is part of an Always On Availability Group and you're seeing connection errors after a failover event, check that your application connection string includes MultiSubnetFailover=True and is pointing at the AG Listener name, not a specific node name. Connecting directly to a node name will fail if that node has become a secondary replica.
If SSAS keeps crashing after you've freed disk space and restarted clean, or if you're seeing severity 24 hardware errors, data directory corruption, or your cube stops being accessible after the cleanup pass, that's time to call in the professionals. Escalate to Microsoft Support if: the SSAS data directory contains files you can't delete because the service claims they're in use; the cleanup on restart doesn't complete; or your isolation level changes seem to have no effect and blocking is still occurring. Have your Event Viewer Application log exported and your SSAS error log from the OLAP\Log directory ready to share, it saves significant back-and-forth.
Prevention & Best Practices
Most SQL Server errors I've seen in production were preventable. Not with exotic solutions, with basic operational hygiene that gets deprioritized until something breaks. Here's what actually makes a difference.
Disk space monitoring with automated alerts. Set up a SQL Server Agent job or a Windows Task Scheduler script that checks free disk space on your SSAS data drive every hour and sends an alert when it drops below 20%. This is the single most effective prevention step for SSAS crash-on-rollback scenarios. You want to know about a disk space problem before the processing job runs, not after it fails and crashes the service. A simple PowerShell script that emails you when free space drops below a threshold costs about 15 minutes to set up and has saved me from many late-night emergencies.
Separate your SSAS data and log directories onto different drives. When disk exhaustion happens, having data and logs on the same volume means both fail simultaneously. If you split them, SSAS data on D:\, SSAS logs on E:\, a full log drive doesn't necessarily prevent the data writes needed for rollback, and vice versa. You can configure SSAS directory locations in the Analysis Services Properties dialog in SSMS.
Always test MDX queries against real data before deploying. The LastChild/FirstChild trap catches people because queries that look correct in development, where test data fills the dimension, silently return wrong results in production where the dimension extends beyond the data. Always test your time-navigation MDX against a production-sized dataset that mirrors the actual data coverage gap you'd expect in real use.
Review transaction isolation settings when adding new Analysis Services data sources. Every time a new SQL Server database gets wired up as an SSAS data source, check whether READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION are enabled. Make this part of your data source onboarding checklist. Doing it reactively after you start seeing blocking issues means doing it under pressure during a production outage.
Schedule index maintenance and statistics updates. Query performance degradation is almost always tied to stale statistics or index fragmentation on large tables. Set up a weekly maintenance plan using Ola Hallengren's SQL Server Maintenance Solution (the industry standard for this) or the native SSMS Maintenance Plan Wizard to rebuild fragmented indexes and update statistics during off-hours. The performance difference between maintained and unmaintained indexes on tables with millions of rows is not subtle.
- Set a SQL Server Agent alert on disk space at 80% and 90% full, give yourself two warning levels, not one
- Replace all
LastChild/FirstChildMDX patterns withTAIL(NonEmptyCrossJoin())equivalents as a standard practice - Enable
READ_COMMITTED_SNAPSHOTon all SSAS source databases at setup time, not as an emergency fix later - Keep 30%+ free space on your SSAS data directory volume, not 10%, not 15%, because processing jobs write working files that can temporarily double the data footprint during a large cube refresh
Frequently Asked Questions
Why does SSAS crash instead of just failing the processing job when disk runs out?
When a processing job fails, SSAS has to roll back the transaction to get back to a consistent state. That rollback operation itself requires writing to disk, it's not a read-only operation. When there's no disk space, the rollback I/O fails, and SSAS finds itself in a state where it can't safely roll back to consistency. Serving data in that inconsistent state would be dangerous, queries could return corrupt or partially-written data. So the service deliberately shuts itself down. It's not technically a crash; it's an emergency shutdown initiated internally. On restart, SSAS runs a cleanup enumeration of the data directory to remove any unconfirmed files before making data available again.
My SQL Server applies to 2012, 2014, 2016, 2017, does this disk exhaustion crash affect all of them?
Yes. This behavior affects SQL Server 2012 Enterprise, 2014 Enterprise, 2016 Enterprise, SQL Server 2017 on Windows (all editions), and SQL Server 2017 on Linux (all editions). The underlying reason, that rollback I/O failure forces a shutdown, is not version-specific, it's fundamental to how SSAS handles transactional consistency. The same pattern applies to later versions as well. The fix is the same regardless of your version: free disk space before restarting, ensure adequate headroom before running large processing jobs.
What's the difference between LastChild in MDX and using TAIL(NonEmptyCrossJoin()), why does it matter?
LastChild returns the last member in the dimension hierarchy based purely on the dimension's structure, it doesn't check whether there's any actual data (measure values) for that member. If your Time dimension is defined through December 2025 but your fact data only goes through September 2025, LastChild returns December 2025 with a null or zero value. TAIL(NonEmptyCrossJoin([Time].Members, 1), 1) filters the dimension members down to only those with actual data first, then picks the last one, so you get September 2025 with real numbers. This matters enormously for any calculation that bases its scope on "the latest period we have data for."
Will enabling READ_COMMITTED_SNAPSHOT on my database cause any downtime or data issues?
Enabling READ_COMMITTED_SNAPSHOT requires an exclusive moment on the database, specifically, no other connections can be active when the ALTER DATABASE command executes. In practice this means running it during a maintenance window when your application is offline or during off-hours. The change itself doesn't cause data loss or corruption. What it does is change how read queries see in-flight write transactions, they now see a version snapshot rather than blocking on the write lock. The trade-off is a small additional load on tempdb, which stores the row version data, so make sure tempdb is sized appropriately.
I see sleeping connections in sys.dm_exec_sessions that are 2–3 hours old. Is it safe to kill them?
Connections that have been sleeping for hours with no recent activity are almost certainly leaked or orphaned, the application opened them and never properly closed them. It's generally safe to kill them using KILL [session_id] in SSMS, but you should first verify what program and login they belong to by checking the program_name and login_name columns in sys.dm_exec_sessions. Killing a connection mid-transaction will roll back any open transaction on that session, which is the safe behavior. The long-term fix is patching the application code to properly close connections, otherwise the leaked connections will build back up again.
How do I tell if my MDX query is returning a member with no data vs. a member with a value of zero?
This is a subtle but important distinction. A member with no data returns NULL in MDX, while a legitimate zero value in your fact table returns 0. You can test this in SSMS MDX query editor using the IsEmpty() function, if IsEmpty([YourMeasure]) returns true, the member has no data at all. If it returns false but the value displays as 0, there's a genuine zero in the fact data. This distinction matters for time-navigation queries: NonEmptyCrossJoin() filters out NULL members (no data) but retains zero-value members. If your business logic needs to skip zero-value periods too, you'll need additional filtering in your MDX set expression.