How to Troubleshoot SQL Server: Fix Errors Fast

Microsoft Fix Intermediate 14 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

I've seen this exact scenario play out on dozens of production floors: it's 9 AM, the sales team can't pull their morning reports, your ERP is throwing cryptic messages, and someone's already sent an "urgent" email with three question marks in the subject line. SQL Server is down , or at least something is very wrong , and you need answers now.

Here's the frustrating part. When you try to troubleshoot SQL Server, Microsoft's error messages are almost deliberately unhelpful. Error 18456, "Login failed for user", tells you almost nothing about whether the problem is a wrong password, a locked account, a firewall rule, or a mismatched authentication mode. Error 17, "SQL Server does not exist or access denied", could mean literally a dozen different things. The messages were written for engineers who already know the internals, not for the sysadmin who just inherited a legacy database server at 7 PM on a Friday.

So what actually causes SQL Server to misbehave? The real-world culprits break down into a few main buckets:

Connection and network issues account for probably 40% of the tickets I see. This covers everything from the SQL Server Browser service being stopped (which kills named-instance connections instantly), to Windows Firewall blocking TCP port 1433, to mismatched authentication modes between the client application and the server.

Authentication and permissions failures are the next big category. After a Windows or SQL Server update, service accounts can lose their rights. Group Policy pushes can reset local security policy. Someone disables SA and forgets to create an alternative sysadmin login. Mixed-mode authentication gets toggled during a migration and nobody tells the application team.

Performance and blocking problems show up differently, the server is technically "up," but queries are grinding, apps are timing out, and the ERRORLOG is filling with lock timeout messages. A single long-running transaction can block an entire table for minutes, cascading into application failures that look like an outage.

Corruption and storage issues are the scenarios that make your stomach drop. Error 824, 823, or 825 in the SQL Server error log means the storage subsystem returned unexpected data. These need immediate attention, they don't fix themselves.

The good news: most SQL Server problems follow predictable patterns, and once you know where to look, you can diagnose the root cause in under ten minutes. This guide walks you through exactly that process. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you go deep on diagnostics, run through this 90-second checklist. It resolves about half of all SQL Server troubleshoot calls without any further work.

Open SQL Server Configuration Manager. Go to Start → search "SQL Server Configuration Manager", or press Win+R and type SQLServerManager16.msc (adjust the number to match your SQL version: 15 for SQL 2019, 14 for 2017, 13 for 2016). In the left pane, click SQL Server Services.

Check the status of two services specifically:

  • SQL Server (MSSQLSERVER), or SQL Server (INSTANCENAME) for a named instance. If it shows "Stopped," right-click and hit Start.
  • SQL Server Browser, this must be running if you're connecting to any named instance. It's stopped by default on many installs. Right-click → Start, then right-click → Properties → Service tab → set Start Mode to Automatic.

If both services are running and you still can't connect, check the TCP/IP protocol. In Configuration Manager, expand SQL Server Network ConfigurationProtocols for MSSQLSERVER. Make sure TCP/IP shows "Enabled." If it says Disabled, right-click → Enable, then restart the SQL Server service for it to take effect.

Next, quickly verify the firewall. Open an elevated PowerShell window and run:

Test-NetConnection -ComputerName YOUR_SQL_SERVER -Port 1433

If TcpTestSucceeded comes back False, the firewall is blocking you. Jump to Step 3 in the guide below for the exact fix.

Finally, open SQL Server Management Studio (SSMS) and attempt to connect using Windows Authentication locally on the server itself. If that works, the problem is network/firewall or authentication-mode related, not a database issue. If even local Windows Auth fails, you're looking at a service account or login problem.

Pro Tip
When SQL Server won't start at all, the Windows Event Viewer Application log and the SQL Server ERRORLOG file (located at C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\Log\ERRORLOG) will always tell you the real reason, a corrupted system database, a missing file, a port conflict. Check these two places before reinstalling anything. Nine times out of ten, it's a two-line fix once you see the actual error message.
1
Read the SQL Server ERRORLOG Before Doing Anything Else

This is the single most important habit for anyone who needs to troubleshoot SQL Server effectively. The ERRORLOG file is SQL Server's own diary, it records every startup, shutdown, login failure, corruption event, and configuration change. Most people skip it and go straight to Googling the error code. Don't do that.

You can open the current ERRORLOG directly in SSMS. Connect to the server, expand ManagementSQL Server Logs, then double-click Current. Alternatively, find the file on disk, the default path for a default instance on SQL Server 2022 is:

C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\Log\ERRORLOG

For named instances, replace MSSQL16.MSSQLSERVER with MSSQL16.YOURINSTANCENAME.

You can also read the log via T-SQL, which is useful when SSMS won't connect:

EXEC xp_readerrorlog 0, 1, NULL, NULL, NULL, NULL, 'DESC';

Scroll to the bottom of the current log and work backwards. You're looking for lines marked Error or Warning. Pay close attention to the timestamp, did errors start immediately after a Windows Update, a backup job, or a specific time of day?

Common patterns you'll spot:

  • Error: 18456, Severity: 14, State: 8, wrong password for SQL login
  • Error: 18456, Severity: 14, State: 11, valid login but no access to server
  • TDSSNIClient initialization failed, TCP/IP protocol issue at startup
  • Error: 9001, transaction log unavailable, often disk full
  • Error: 823 or 824, I/O error, potential disk corruption

Once you identify the error code and state number, you have exactly what you need to target the fix. The state number on a 18456 error alone narrows down the cause to a specific authentication failure reason, something the generic "Login failed" message never tells you.

2
Diagnose and Fix SQL Server Login Failures (Error 18456)

Error 18456 is the most searched SQL Server troubleshoot query on the internet, and for good reason, it's vague to the point of uselessness without the state code. Here's the breakdown of the states you'll actually encounter in the wild:

  • State 1: Generic failure, check ERRORLOG for the real state
  • State 5: Invalid username (user doesn't exist)
  • State 6: Attempt to use a Windows login with SQL authentication
  • State 7: Login disabled and wrong password
  • State 8: Correct login, wrong password
  • State 9: Invalid password format
  • State 11 or 12: Valid login but access to the server not granted
  • State 38: Login valid but database doesn't exist or access denied

For State 8 (wrong password), reset the password in SSMS: expand SecurityLogins, right-click the login → Properties → enter a new password → uncheck "Enforce password expiration" if this is a service account.

For State 11/12, the login exists but lacks server access. Run this in SSMS:

ALTER LOGIN [YourLoginName] ENABLE;
GO

If you're locked out entirely and need to recover sysadmin access, you'll need to restart SQL Server in single-user mode. Stop the SQL Server service, then start it from an elevated command prompt:

net start MSSQLSERVER /m"SQLCMD"

Then immediately connect with SQLCMD and add your Windows account as sysadmin:

sqlcmd -S . -Q "CREATE LOGIN [DOMAIN\YourUser] FROM WINDOWS; ALTER SERVER ROLE sysadmin ADD MEMBER [DOMAIN\YourUser];"

Restart the service normally afterward. If everything went correctly, you'll now be able to log in with full admin rights.

Also check authentication mode: In SSMS, right-click the server → PropertiesSecurity. Make sure it's set to SQL Server and Windows Authentication mode if your application uses SQL logins. A server in Windows-only mode silently rejects all SQL logins with a 18456 error.

3
Open Firewall Ports and Fix Network Connectivity

This one catches people constantly, especially after a Windows Server update, firewall rules can get reset or overridden by Group Policy. SQL Server listens on TCP port 1433 by default for the default instance. Named instances use dynamic ports by default, which is why the SQL Server Browser service (UDP port 1434) must be running for named-instance connections to work.

First, confirm what port your instance is actually listening on. Run this from an elevated PowerShell window on the SQL Server machine:

netstat -ano | findstr :1433

If you see a LISTENING entry, SQL Server is bound to that port. If nothing comes back, either the service isn't running or TCP/IP is disabled in Configuration Manager.

To add a Windows Firewall rule for SQL Server, open an elevated PowerShell session and run:

# Allow SQL Server default instance
New-NetFirewallRule -DisplayName "SQL Server Default Instance" `
  -Direction Inbound -Protocol TCP -LocalPort 1433 -Action Allow

# Allow SQL Server Browser (needed for named instances)
New-NetFirewallRule -DisplayName "SQL Server Browser" `
  -Direction Inbound -Protocol UDP -LocalPort 1434 -Action Allow

If you're in an environment managed by Group Policy, a locally-created firewall rule may get wiped on the next policy refresh. In that case, the rule needs to be created at the GPO level, work with your domain admin to add it under Computer Configuration → Windows Settings → Security Settings → Windows Defender Firewall with Advanced Security → Inbound Rules.

For named instances using dynamic ports, you have two options: either assign a static port (recommended for production) via SQL Server Configuration Manager → TCP/IP Properties → IP Addresses tab → set TCP Port under IPALL to 1433 or another fixed port, or ensure the SQL Server Browser service is running and clients are using the instance name format SERVER\INSTANCENAME rather than a port number directly.

After the firewall rule is in place, rerun the Test-NetConnection command from the quick fix section. A TcpTestSucceeded: True result means you're clear at the network layer.

4
Identify and Kill Blocking Queries Causing Timeouts

When users complain that "SQL is slow" or applications are timing out even though the server appears healthy, blocking is almost always the culprit. A blocking chain happens when one session holds a lock on a resource and other sessions queue up waiting for it. If the blocking head has been sitting there long enough, you'll see cascading timeouts across your entire application.

The fastest way to spot blocking in SQL Server is this query, run it in SSMS while the problem is active:

SELECT 
    r.session_id,
    r.status,
    r.blocking_session_id,
    r.wait_type,
    r.wait_time / 1000 AS wait_seconds,
    r.cpu_time,
    DB_NAME(r.database_id) AS database_name,
    SUBSTRING(st.text, (r.statement_start_offset/2)+1,
        ((CASE r.statement_end_offset
            WHEN -1 THEN DATALENGTH(st.text)
            ELSE r.statement_end_offset
        END - r.statement_start_offset)/2)+1) AS statement_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st
WHERE r.blocking_session_id <> 0
ORDER BY r.wait_time DESC;

The blocking_session_id column tells you which session is the blocker. Note that SPID number. Now look at what that session is doing:

DBCC INPUTBUFFER(BLOCKING_SPID_HERE);

If it's a long-running report, an open transaction that forgot to commit, or a backup job locking system tables, you have your answer. In an emergency where the blocking head is clearly stuck, you can terminate it:

KILL 52; -- replace 52 with the actual blocking SPID

I know it feels drastic, but KILL on a blocking head is generally safe, SQL Server will roll back any open transaction cleanly. The sessions that were waiting will immediately resume. For ongoing blocking issues, look at whether your application is leaving transactions open, whether missing indexes are causing full table scans that hold locks longer than necessary, and whether you should enable Read Committed Snapshot Isolation (RCSI) on the database to reduce reader/writer conflicts. Enable it with:

ALTER DATABASE YourDatabase 
SET READ_COMMITTED_SNAPSHOT ON;
5
Check Disk Space and Fix Transaction Log Full Errors

SQL Server error 9002, "The transaction log for database X is full due to LOG_BACKUP", is one of those problems that stops everything cold. When the transaction log fills up, SQL Server cannot write any new transactions. Inserts fail. Updates fail. The database effectively goes read-only. I know this is frustrating, especially when it hits production at the worst possible moment.

First, check the current log usage across all databases:

DBCC SQLPERF(LOGSPACE);

This returns each database's log file size and the percentage currently in use. Anything above 90% needs immediate attention.

Next, find out why the log isn't being reused. The log_reuse_wait_desc column in sys.databases tells you exactly what's holding the log space:

SELECT name, log_reuse_wait_desc, recovery_model_desc
FROM sys.databases;

Common values and their fixes:

  • LOG_BACKUP, transaction log backups are not running. Take an immediate log backup: BACKUP LOG YourDB TO DISK = 'NUL'; (for testing, use a real path in production). Then schedule regular log backups.
  • ACTIVE_TRANSACTION, an open transaction is preventing log truncation. Find and close it using the blocking query from Step 4.
  • NOTHING, the log is ready to reuse, it just hasn't been shrunk. Run DBCC SHRINKFILE (YourDB_log, 1024); to reclaim space (replace 1024 with your target size in MB). Don't over-shrink, the log will just grow again immediately, causing performance problems.

If the underlying disk volume is genuinely full (not just the log file), check overall disk space from PowerShell:

Get-PSDrive -PSProvider FileSystem | Select-Object Name, Used, Free

If the data drive itself is full, you need to either add a disk, move files, or expand the volume before SQL Server can operate normally. As a very short-term measure only, you can set the database to the Simple recovery model temporarily, which truncates the log automatically at each checkpoint, but this eliminates your ability to do point-in-time recovery, so it's genuinely only for emergencies while you fix the disk situation.

Advanced Troubleshooting

When the basics don't resolve it, you need to go deeper. Here's how to troubleshoot SQL Server problems at the enterprise and infrastructure level.

Using Extended Events and Profiler

SQL Server Profiler is deprecated but still works and is the fastest way to see exactly what's hitting your server in real time. Open it from SSMS → Tools → SQL Server Profiler. Start a new trace with the "TSQL_Duration" template, this captures all statements along with their execution duration, helping you spot slow queries instantly without touching production code.

For a more production-safe approach, use Extended Events. In SSMS, expand Management → Extended Events → Sessions → system_health. This always-on session captures deadlocks, long-running queries, and error conditions automatically. Right-click → Watch Live Data to see events in real time.

Diagnosing Deadlocks

Deadlocks generate an XML deadlock graph in the system_health Extended Events session. To pull the last few deadlocks:

SELECT xdr.value('@timestamp', 'datetime2') AS deadlock_time,
       xdr.query('.') AS deadlock_graph
FROM (
    SELECT CAST(target_data AS XML) AS target_data
    FROM sys.dm_xe_session_targets t
    JOIN sys.dm_xe_sessions s ON t.event_session_address = s.address
    WHERE s.name = 'system_health' AND t.target_name = 'ring_buffer'
) AS data
CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') AS xdt(xdr)
ORDER BY deadlock_time DESC;

The XML shows you exactly which sessions were involved, which objects they were locking, and which query was the deadlock victim. With that information, you can add indexes to reduce lock duration or redesign transaction ordering to eliminate the circular dependency.

Group Policy and Domain-Joined Scenarios

In enterprise environments, Group Policy is often the silent saboteur. GPO can reset service account permissions, enforce Windows-only authentication, restrict TCP ports, or apply audit policies that fill the SQL Server error log with noise. Run gpresult /R /SCOPE COMPUTER on the SQL Server to see what policies are applied. Pay particular attention to policies under Computer Configuration → Windows Settings → Security Settings → Local Policies → User Rights Assignment, specifically "Log on as a service" and "Replace a process level token," both of which the SQL Server service account needs.

Registry, Service Account and Startup Parameters

SQL Server startup parameters (including the -T trace flags and the paths to master.mdf and the ERRORLOG) are stored in the registry at:

HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQLServer\Parameters

If someone has accidentally changed the path to master.mdf or the errorlog, SQL Server will refuse to start. Verify that -dC:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\DATA\master.mdf and the -e errorlog path both point to existing files.

Performance Monitor Counters to Watch

Add these perfmon counters (perfmon.exe) to a baseline collector: SQLServer:Buffer Manager → Buffer cache hit ratio (should stay above 95%), SQLServer:SQL Statistics → Batch Requests/sec, SQLServer:Locks → Lock Waits/sec, and Physical Disk → Avg. Disk sec/Read (above 20ms sustained is a storage problem). Comparing today's values against your baseline often pinpoints whether a degradation is query-related, memory-related, or storage-related within minutes.

When to Call Microsoft Support

If you're seeing errors 823, 824, or 825 (I/O errors and potential corruption), do not attempt further repairs without a verified backup in hand. Run DBCC CHECKDB (YourDatabase) WITH NO_INFOMSGS; and if it returns errors, stop right there. Corruption repair requires careful sequencing and the wrong move can make data unrecoverable. Similarly, if SQL Server refuses to start after you've verified the service account, network configuration, and startup parameters, and the ERRORLOG shows "SQL Server is unable to run" or master database errors, it's time to escalate to Microsoft Support. Enterprise support contracts include direct access to SQL Server engineering teams who have tools unavailable to the public.

Prevention & Best Practices

I've watched organizations spend weeks recovering from SQL Server crises that were entirely predictable. The fixes above will get you out of today's fire. What follows will stop next month's fire from starting.

Automate log backups without exception. If you're running Full or Bulk-Logged recovery model (which you should be for any production database), transaction log backups need to run on a schedule, typically every 15 to 60 minutes depending on your RPO requirements. Use SQL Server Agent jobs with maintenance plan wizards or Ola Hallengren's widely respected free maintenance solution scripts. A database in Full recovery with no log backups is a ticking clock toward log-full errors.

Set up SQL Server Agent alerts for critical error numbers. In SSMS → SQL Server Agent → Alerts, create alerts for severity levels 19-25 (these are always serious system-level errors) and for specific error numbers like 823, 824, 825, 9002, and 18456. Route them to an operator (SQL Server Agent → Operators) with an email address. Database Mail setup takes about 20 minutes and means you hear about problems before your users do.

Monitor disk space proactively, not reactively. Add a PowerShell-based scheduled task or SQL Server Agent job that alerts you when any drive holding SQL data files, log files, or the tempdb drops below 20% free space. The log-full scenario from Step 5 is entirely preventable with a simple threshold alert.

Keep statistics and indexes in shape. Outdated statistics cause the query optimizer to generate terrible execution plans, queries that should take milliseconds start taking minutes. Run index maintenance weekly (rebuild fragmented indexes, reorganize moderately fragmented ones) and update statistics monthly or more frequently on tables with heavy insert/update/delete activity.

Quick Wins
  • Enable Read Committed Snapshot Isolation (RCSI) on OLTP databases to drastically reduce reader-writer blocking without changing application code
  • Set max server memory in SQL Server properties, leaving it uncapped lets SQL Server consume all RAM, starving the OS and causing instability; a common rule of thumb is to leave 10-20% free for the OS
  • Create a dedicated SQL Server monitoring login with limited rights for your monitoring tools rather than using SA, this prevents a monitoring misconfiguration from locking out the SA account
  • Keep SQL Server patched to the latest Cumulative Update for your major version, many connection, performance, and corruption bugs are fixed in CUs that are not widely announced

Frequently Asked Questions

SQL Server was working fine yesterday, why did it suddenly stop after a Windows Update?

Windows Updates, especially cumulative updates and .NET framework updates, occasionally reset service account permissions, disable startup services, or trigger firewall policy resets. The first things to check are the SQL Server service account's "Log on as a service" right (in Local Security Policy → secpol.msc), whether the SQL Server service itself is set to Automatic startup, and whether any new firewall rules were pushed. Check Windows Event Viewer → System log around the time of the update for clues. In my experience, about 70% of these post-update failures are a service set to Manual start or a service account that lost its logon right during the update process.

How do I troubleshoot SQL Server when I can't connect to SSMS at all?

If SSMS won't connect remotely, first try connecting locally on the server itself using Windows Authentication and the server name as a period: . or localhost. If that works, it's a network or firewall problem, not a SQL Server problem. If local connection also fails, open the ERRORLOG file directly on disk (see Step 1 for the path) using Notepad or any text editor, this file is always readable even when SQL Server is in a broken state and will tell you exactly why the service won't accept connections. From there, SQLCMD.exe is your best friend for getting a basic command-line connection when SSMS won't cooperate.

What's the difference between SQL Server error state 5 and state 11 for error 18456?

State 5 means the login name itself doesn't exist on the SQL Server instance, the account was never created, was deleted, or you're connecting to the wrong server. State 11 means the login does exist but has been explicitly denied access or disabled. You'll fix State 5 by creating the login (CREATE LOGIN [name] FROM WINDOWS for a Windows account or CREATE LOGIN [name] WITH PASSWORD = 'pwd' for a SQL login). State 11 requires enabling or re-granting access: ALTER LOGIN [name] ENABLE;. The state codes are only visible in the SQL Server ERRORLOG, not in the error message returned to the client, which is an intentional security measure to avoid leaking account information to attackers.

My SQL queries are running much slower than normal but the server looks healthy, how do I find the problem?

Start with missing index recommendations, SQL Server tracks these automatically in sys.dm_db_missing_index_details. Run this to see the top candidates: SELECT TOP 20 * FROM sys.dm_db_missing_index_details ORDER BY (user_seeks + user_scans) DESC;. Then check for outdated statistics with DBCC SHOW_STATISTICS on your most-queried tables, and look for parameter sniffing problems (a query using a cached plan optimized for different data than it's currently running against). The Query Store feature, available from SQL Server 2016 onwards, is your best tool here, enable it with ALTER DATABASE YourDB SET QUERY_STORE = ON; and use the "Regressed Queries" report in SSMS to find queries that recently got slower.

Is it safe to run DBCC SHRINKFILE on a production database?

It's safe in the sense that it won't corrupt data, but it has real performance costs. Shrinking a data file causes severe index fragmentation, SQL Server has to physically move data pages to compact the file, which scrambles the logical order your indexes depend on. If you shrink a data file, plan to run an index rebuild on the affected database immediately afterward. For log files, shrinking is less harmful since indexes aren't stored there, and it's sometimes necessary after an unexpectedly large operation inflated the log. The real answer to repeated file-growth-and-shrink cycles is to provision appropriately sized files from the start with autogrowth events sized in MB, not percentages.

How do I troubleshoot SQL Server tempdb being full?

A full tempdb is one of those failures that looks like something else, you'll see random "could not allocate space" errors or even application timeouts with no obvious cause. Run this to see what's consuming tempdb right now: SELECT TOP 20 session_id, internal_objects_alloc_page_count, user_objects_alloc_page_count FROM sys.dm_db_task_space_usage ORDER BY (internal_objects_alloc_page_count + user_objects_alloc_page_count) DESC;. The top offenders are usually large sort operations that spill to disk, cursors processing millions of rows, or queries building enormous worktables. Immediate relief comes from killing the offending sessions. Long-term, add more tempdb data files (SQL Server recommends one per logical processor up to eight files), pre-size them to avoid constant autogrowth, and add the autogrowth files to your disk space monitoring.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.