How to Troubleshoot Windows Server, Complete Fix Guide

Microsoft Fix Intermediate 18 min read Official Docs Grounded Updated April 20, 2026

Why This Is Happening

I've seen this play out more times than I can count: it's 9 AM on a Monday, your production Windows Server is crawling, throwing cryptic errors, or simply not responding, and you're staring at a black console window wondering where to even start. Windows Server troubleshooting isn't one problem. It's dozens of possible problems wearing the same face.

The frustrating reality is that Windows Server's built-in error messages rarely tell you what actually went wrong. You get something like "The service failed to start due to a logon failure" or a blue screen with a generic stop code like 0x0000007E, and you're left piecing together a puzzle without the box. That's by design, these messages are written to cover every possible hardware and software configuration, so they end up being useful to almost no one.

Let me walk you through the real reasons these failures happen. Windows Server instability typically falls into one of six buckets:

  • Resource exhaustion: CPU pegged at 100%, RAM fully consumed, or the system drive under 10% free space. Windows Server needs breathing room, disk, memory, and CPU headroom, to operate reliably. When any of these are saturated, cascading failures follow fast.
  • Failed or corrupted Windows services: A core service like Windows Management Instrumentation (WMI), Remote Procedure Call (RPC), or Server service crashes silently. Other dependent services then topple like dominos.
  • Driver conflicts or outdated drivers: This is especially common after Patch Tuesday updates or hardware swaps. A bad NIC driver can kill network connectivity; a storage controller driver bug can trigger disk I/O errors showing up as Event ID 11 or 15 in the System log.
  • DNS and Active Directory replication failures: On domain-joined servers, broken DNS resolution or AD replication lag creates authentication timeouts, Group Policy application failures, and mysterious logon errors, all of which look completely unrelated on the surface.
  • Windows updates gone wrong: A partially applied cumulative update leaves the server in a broken state. You'll see error codes like 0x80070057 or 0x800706BE in Windows Update history.
  • Hardware faults: Failing RAM (single-bit errors that ECC may or may not catch), dying drives, or overheating CPUs cause intermittent, hard-to-reproduce crashes. These are the worst kind because they're inconsistent.

The good news: most Windows Server problems, whether you're on Server 2016, 2019, or 2022, follow a logical diagnostic chain. You don't need to guess. The server itself is recording what went wrong in Event Viewer, Performance Monitor, and system logs. You just need to know where to look and what to look for. That's exactly what this guide teaches you.

I know this is stressful, especially when users are hammering you with complaints or a business process is blocked. Take a breath. Work the problem systematically and you'll find it. Browse all Microsoft fix guides →

The Quick Fix, Try This First

Before you dive into deep diagnostics, there are three fast checks that resolve the majority of Windows Server issues I encounter in the field. Do these first, in order, before anything else.

Step 1: Check Event Viewer immediately. Press Win + R, type eventvwr.msc, and hit Enter. Expand Windows Logs and click System. Sort by "Level" to surface Critical and Error entries. Look at the timestamp, events logged right before the problem started are almost always the cause, not a symptom. Write down the Event ID numbers. The most important ones to know cold: Event ID 41 (unexpected shutdown / kernel power failure), Event ID 6008 (unexpected shutdown recorded on next boot), Event ID 7034 (service terminated unexpectedly), and Event ID 1000 (application crash with faulting module listed).

Step 2: Check disk space right now. Open File Explorer and look at your C: drive. If it's under 15% free, that alone can cause services to fail, logs to stop writing, and Windows Update to break. On a Server Core installation, run this PowerShell one-liner:

Get-PSDrive -PSProvider FileSystem | Select-Object Name, Used, Free, @{N='%Free';E={[math]::Round($_.Free / ($_.Used + $_.Free) * 100, 1)}}

Anything under 15% free is a red flag. Clear temp files with cleanmgr.exe or the Disk Cleanup tool before going further.

Step 3: Restart the Windows Management Instrumentation service. A staggering number of server management failures trace back to a hung WMI service. Open an elevated PowerShell prompt and run:

Stop-Service winmgmt -Force
Start-Service winmgmt
Get-Service winmgmt

If the service comes back with Status: Running, check whether your original problem resolves. You'd be surprised how often this single fix clears Windows Server not responding symptoms, broken monitoring agents, and remote management failures.

If those three quick checks don't crack it, keep going, the step-by-step section below will get you there.

Pro Tip
When you open Event Viewer during an active incident, don't just look at the System log. Switch to Applications and Services Logs → Microsoft → Windows → Diagnostics-Performance → Operational. This log records slow boot and slow shutdown events with the exact process name and delay in milliseconds, it's one of the most underused diagnostic tools in the entire Windows Server toolkit, and it's been there since Server 2012.
1
Read the Event Viewer Logs Systematically

Event Viewer is the single most important tool for Windows Server troubleshooting. Most admins open it, see a wall of red and yellow, panic, and close it. Here's how to actually use it.

Open Event Viewer (Win + Reventvwr.msc). You want to work through three logs in order:

1. System Log, Right-click System under Windows Logs and select Filter Current Log. Set "Event level" to Critical and Error only. Set the time range to cover the period before your problem started. Click OK. Now sort by "Date and Time" descending. The event at the top of the list, or the cluster of events right before symptoms started, is almost always your culprit.

2. Application Log, Same process. Look specifically for Event ID 1000 (Application Error) which tells you the faulting application name and faulting module. If you see ntdll.dll as the faulting module, that often points to memory corruption or a driver issue, not the application itself.

3. Security Log, If users can't log in or you're seeing authentication errors, check here for Event ID 4625 (failed logon) and look at the "Sub Status" code. 0xC000006D means bad username/password; 0xC0000064 means the user doesn't exist at all; 0xC000006F means logon outside permitted hours. These codes tell you exactly where the authentication chain is breaking.

Once you have specific Event IDs, you can search them precisely. A record of Event ID 7023, for example, means a service terminated with an error, and the event detail will name the specific service. That gives you a concrete starting point instead of guessing.

If it worked: you'll have an Event ID and a timestamp that pins down what failed and when, giving you the exact thread to pull on.

2
Diagnose High CPU and Memory Usage

Windows Server high CPU or memory usage is one of the most common performance complaints, and it's almost never caused by what people first suspect. Let me show you how to find the real culprit in under five minutes.

Open Task Manager (Ctrl + Shift + Esc), click More details, then click the Details tab (not Processes, Details gives you individual process instances). Sort by CPU or Memory descending. If you see svchost.exe consuming high resources, that's a generic host for Windows services, you need to drill deeper. Right-click the high-CPU svchost.exe instance and select Go to service(s). This highlights which service(s) are running inside that host process.

For deeper analysis, open an elevated PowerShell prompt and run:

Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, Id, CPU, WorkingSet | Format-Table -AutoSize

For memory specifically, check for memory leaks with:

Get-Process | Sort-Object WorkingSet -Descending | Select-Object -First 10 Name, Id, @{N='Memory(MB)';E={[math]::Round($_.WorkingSet / 1MB, 1)}} | Format-Table -AutoSize

If a process is climbing in memory over hours and never releasing it, you're looking at a Windows Server memory leak, typically in a third-party service or an application running on the server. Identify the process name, then check its version and whether a patch exists.

For server-wide performance baselines, open Performance Monitor (Win + Rperfmon.msc). Add counters for Processor\% Processor Time, Memory\Available MBytes, and PhysicalDisk\Avg. Disk Queue Length. A disk queue consistently above 2 per spindle is a serious I/O bottleneck.

If it worked: CPU drops back below 80% sustained, or you've identified the specific process responsible for memory growth.

3
Fix Failed or Stuck Windows Services

Windows Server service failures are responsible for a huge percentage of "the server is broken" calls I receive. A service failing to start, stopping unexpectedly, or hanging in a "Starting" state can bring down dependent services and make the whole system look broken when really only one component is misbehaving.

Open Services (Win + Rservices.msc). Sort by Status to surface all stopped services. Cross-reference with what you found in Event Viewer. Any service that should be running but shows as Stopped is worth investigating. Right-click a stopped service, choose Properties, and check the Recovery tab, this tells you what Windows is configured to do on first, second, and subsequent failures.

To restart a specific service from PowerShell with better error output, use:

Restart-Service -Name "wuauserv" -Force -Verbose

Replace wuauserv with the service name shown in the Properties dialog under "Service name" (not the Display name). If a service won't start, check its dependencies first:

Get-Service -Name "wuauserv" -RequiredServices

If any dependency is stopped, start it first. Service startup failures often fail silently because a dependency was missed. For the Windows Server service itself failing (the one that enables file and printer sharing, service name: LanmanServer), an Event ID 7036 or 7023 usually accompanies it in the System log with a Win32 error code that tells you exactly why.

For services stuck in "Starting" state for more than 30 seconds, the nuclear option is:

sc.exe queryex ServiceName

This gives you the PID. Then kill it with taskkill /PID [PID] /F and restart the service cleanly.

If it worked: the service shows as Running in Services.msc and Event ID 7036 ("entered the running state") appears in your System log.

4
Troubleshoot Network Connectivity and DNS

Windows Server network connectivity issues are particularly brutal because they often manifest as completely unrelated symptoms, authentication failures, slow file access, RDP not working, applications timing out. Almost all of them trace back to DNS. I say this from years of field experience: when something mysterious is broken on a Windows Server, check DNS first.

Start with basic connectivity verification in an elevated PowerShell prompt:

# Test gateway reachability
Test-NetConnection -ComputerName 192.168.1.1 -InformationLevel Detailed

# Test DNS resolution
Resolve-DnsName -Name "dc01.yourdomain.local" -Type A

# Check which DNS servers are configured
Get-DnsClientServerAddress -AddressFamily IPv4

If DNS resolution fails for internal domain names, that's your problem. On a domain-joined server, the primary DNS server should always be a domain controller, never an external DNS like 8.8.8.8 or 1.1.1.1 as the first entry. External DNS as the primary DNS breaks Active Directory lookups entirely.

For Windows Server RDP not working specifically, the most common causes are: the Remote Desktop Services service is stopped, the firewall is blocking port 3389, or the RDP listener is corrupted. Check the listener state:

qwinsta /server:localhost

If you don't see rdp-tcp in the output with a status of "Listen", reset it:

netsh int ip reset resetlog.txt
netsh winsock reset

For broader network adapter issues, check for errors on the NIC itself:

Get-NetAdapterStatistics | Select-Object Name, ReceivedPackets, ReceivedErrors, OutboundDiscardedPackets

Received errors or outbound discarded packets climbing over time indicate a NIC driver problem or physical layer issue (bad cable, switch port errors).

If it worked: Test-NetConnection returns TcpTestSucceeded: True, DNS resolves correctly, and RDP sessions connect without the dreaded "Remote Desktop can't connect to the remote computer" dialog.

5
Run System File Checker and DISM to Repair Corruption

One of the most overlooked causes of persistent Windows Server instability is Windows system file corruption. This can happen from a bad Windows Update, an unclean shutdown during patching, ransomware activity, or simply bit rot on aging storage. The good news is Windows Server has two built-in tools that can detect and repair this automatically.

Open an elevated Command Prompt or PowerShell as Administrator and run System File Checker first:

sfc /scannow

This scans all protected system files and replaces corrupted or missing files from a cached copy. The scan takes 10–20 minutes. When it finishes, it reports one of three results: no integrity violations found, found and repaired corruption, or found corruption it couldn't fix.

If SFC reports it couldn't repair some files, or if you want to repair the Windows image itself (which SFC pulls from), run DISM:

DISM /Online /Cleanup-Image /CheckHealth
DISM /Online /Cleanup-Image /ScanHealth
DISM /Online /Cleanup-Image /RestoreHealth

Run these in sequence. /CheckHealth is instant, just reads a flag. /ScanHealth takes a few minutes and does a full scan. /RestoreHealth actually downloads and replaces corrupted components from Windows Update, this one can take 15–45 minutes and requires internet access or a mounted Windows Server ISO as a source.

If your server can't reach Windows Update, specify a local source:

DISM /Online /Cleanup-Image /RestoreHealth /Source:WIM:D:\Sources\Install.wim:1 /LimitAccess

After DISM completes successfully, run sfc /scannow again. In my experience, the second SFC run after a successful DISM restore almost always comes back clean.

If it worked: SFC reports "Windows Resource Protection did not find any integrity violations" and the intermittent errors, crash events, or failed services that brought you here no longer appear in Event Viewer after a reboot.

Advanced Troubleshooting

If the five steps above haven't resolved your issue, you're dealing with something deeper. Here's how I approach the harder cases, the ones that take domain knowledge and patience.

Group Policy Processing Failures

On domain-joined Windows Servers, Group Policy failures cause a specific, frustrating category of problems: security settings not applying, software not deploying, login scripts not running. The Windows Server Group Policy troubleshooting process always starts with the same command:

gpresult /h C:\Temp\GPReport.html /f

Open the resulting HTML file in a browser. Look for any policies listed under "Denied GPOs" or policies with errors. The "Computer Configuration" section shows exactly which policies applied and which failed, with error reasons. A common culprit is the Group Policy Client service (gpsvc) failing, check Event ID 1085, 1096, or 1129 in Applications and Services Logs → Microsoft → Windows → GroupPolicy → Operational.

Registry-Level Service Fixes

When a service refuses to start even after all the normal steps, the service's registry configuration may be corrupted. Each service entry lives at:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\[ServiceName]

The key values to verify: Start (0=Boot, 1=System, 2=Automatic, 3=Manual, 4=Disabled), Type, and ImagePath (must point to the correct executable). A corrupted ImagePath, one pointing to a nonexistent path, is a common cause of Windows Server service startup failures that show as Event ID 7000 with error code 2 (file not found).

Before editing any registry key, always export it first: right-click the key → Export. That gives you a one-click rollback.

Performance Monitor Data Collector Sets

For intermittent Windows Server slow performance issues that you can't catch in real time, set up a Data Collector Set to capture the problem when it recurs. Open Performance Monitor → expand Data Collector Sets → right-click User Defined → New → Data Collector Set. Choose "Create manually," add performance counters (Processor, Memory, PhysicalDisk, Network Interface), and configure it to start automatically. When the slowdown recurs, you'll have timestamped data showing exactly which resource was the bottleneck.

Analyzing Memory Dumps After a Crash

After a Windows Server blue screen (BSOD), a memory dump file is written to C:\Windows\Minidump\ (small memory dump) or C:\Windows\MEMORY.DMP (complete dump, if configured). Install the Windows Debugging Tools (part of the Windows SDK), then open WinDbg and run:

!analyze -v

This outputs the stop code, the faulting driver or module, and a stack trace. The "BUGCHECK_STR" and "MODULE_NAME" fields are what you need. If the faulting module is a third-party driver (identifiable by a non-Microsoft filename like SomeSAN_driver.sys), update or roll back that driver immediately.

Active Directory Replication Health

For domain controllers specifically, replication failures are silent killers. Run this regularly:

repadmin /replsummary
repadmin /showrepl
dcdiag /test:replications /v

Any replication failures need immediate attention. A domain controller that's been out of replication sync for more than the tombstone lifetime (default 180 days) is in a "lingering object" state that requires specific remediation steps.

When to Call Microsoft Support
If you're seeing consistent memory dump files with a Microsoft-signed driver as the faulting module, Active Directory corruption that dcdiag can't repair, or Windows Update failures that persist after DISM /RestoreHealth, it's time to escalate. These scenarios require Microsoft's debugging tools and symbol server access that go beyond what's practical to DIY. Contact Microsoft Support, for production server outages, the Premier/Unified Support track gets you an engineer on a live call fast. Have your System Center or Azure Arc data ready if you have it; it dramatically cuts diagnosis time.

Prevention & Best Practices

The best Windows Server troubleshooting is the kind you never have to do. After years of managing production Windows Server environments, here are the practices that actually move the needle on stability.

Baseline everything before you need it. Run Performance Monitor data collection on a healthy server and save those baselines. When something goes wrong, you'll have a "known good" comparison point. Without a baseline, you're guessing what "normal" CPU or memory usage looks like. This is especially important for Windows Server 2022 deployments where Secured-Core features can change baseline performance profiles significantly.

Patch methodically, not impulsively. Apply Cumulative Updates on a monthly cycle, not the day they release. Let Microsoft's telemetry catch any bad patches first. Test on a non-production server or VM before your production fleet. Windows Server Update Services (WSUS) lets you approve updates on a ring-based schedule: test servers first, production after a two-week soak period. This single practice eliminates the largest category of self-inflicted Windows Server crashes I see.

Monitor disk space with automated alerts. Set an alert, in your RMM tool, SCOM, Azure Monitor, or even a simple scheduled PowerShell script, to fire when any volume drops below 20% free. Disk space issues that are caught at 20% are a five-minute fix. The same issue caught at 2% free might mean a crashed SQL Server, corrupted logs, and a two-hour incident. The differential is enormous.

Keep driver versions locked to known-good builds. Configure Windows Update to not automatically update drivers (use Group Policy: Computer Configuration → Administrative Templates → Windows Components → Windows Update → Do not include drivers with Windows Updates → Enabled). Driver updates from Windows Update have caused more Windows Server instability in enterprise environments than almost anything else. Manage driver updates manually, via vendor release notes, tested in a staging environment first.

Quick Wins

Frequently Asked Questions

My Windows Server is stuck at "Applying computer settings" on login, how do I fix it?

This almost always means Group Policy processing is stalling, usually because of a network connectivity issue or a slow or unreachable domain controller. First, try pressing Ctrl + Alt + Del to see if the machine is responsive at all, sometimes it's just slow, not frozen. If it's genuinely stuck, boot into Safe Mode with Networking (F8 during POST or hold Shift at the restart menu), log in with a local administrator account, and run gpresult /h C:\Temp\gp.html to see which policy is failing. Check Event ID 1053 or 1058 in the System log, these are the most common Group Policy stuck-at-startup culprits and both point to a specific policy that failed to apply.

Windows Server keeps rebooting randomly, what's causing it?

Random reboots on Windows Server almost always trace back to one of three things: a hardware fault (check IPMI/iLO/iDRAC hardware logs if you have server-grade hardware, these record thermal events and memory errors that Windows never sees), an automatic restart configured after a BSOD (check System Properties → Advanced → Startup and Recovery, "Automatically restart" is often checked, hiding the actual blue screen stop code), or a failed Windows Update requiring a restart loop. Check Event ID 41 (kernel power failure, unexpected shutdown) and Event ID 1074 (clean, initiated restart with reason code) in your System log. Event ID 1074 includes the process that triggered the reboot, often wuauclt.exe or an update installer.

How do I fix "The trust relationship between this workstation and the primary domain failed" on a server?

This error means the server's computer account password in Active Directory is out of sync with the local copy the machine holds. The fastest fix that doesn't require a rejoin: on the affected server, log in with a local administrator account, open an elevated PowerShell prompt, and run Test-ComputerSecureChannel -Repair -Credential (Get-Credential), enter domain admin credentials when prompted. This resets the secure channel between the machine and AD without removing the machine from the domain, preserving all local profiles and settings. If that fails, the traditional fix of removing and rejoining the domain works but is more disruptive. Always try the secure channel repair first.

Windows Server 2019 / 2022 is extremely slow after a Windows Update, what should I do?

After a Cumulative Update, Windows Server often runs a background task called Software Protection Platform Service (SPPSVC) or Windows Update service cleanup that hammers CPU and disk. Give it 30–60 minutes after the post-patch reboot before concluding something is wrong, this is normal but poorly communicated by Microsoft. If it's still slow after an hour, check Task Manager's Details tab for TiWorker.exe or MsMpEng.exe (Windows Defender scanning new update files) consuming resources. If a specific update is truly to blame, check Windows Update history in Settings, identify the KB number, and search for Microsoft's known issues page for that KB, rollback instructions are published there when a bad patch ships. You can uninstall a specific update with: wusa.exe /uninstall /kb:XXXXXXX /quiet /norestart.

How do I find out what caused a Windows Server blue screen after the fact?

After a BSOD reboot, Windows writes a minidump to C:\Windows\Minidump\, each file is timestamped with the crash time. The free tool WhoCrashed (by Resplendence) can analyze these minidump files without requiring the full Windows Debugging Tools setup, it's a two-minute install and gives you a plain-English summary of the stop code, faulting driver, and recommended action. For production environments or complex crashes, install WinDbg Preview from the Microsoft Store, open the minidump file, and run !analyze -v in the command window. The output includes a "Probable Cause" section that names the driver or component. If the minidump folder is empty, your system may be configured to skip small dumps, check System Properties → Advanced → Startup and Recovery → Write debugging information, and change it to "Small memory dump (256 KB)" or "Automatic memory dump."

Windows Server Event Viewer shows thousands of errors, do I need to fix all of them?

No, and this is a really important thing to understand about Windows Server administration. Event Viewer on any active Windows Server is never going to be clean. Informational events, occasional warnings, and even some errors are completely normal background noise generated by scheduled tasks, COM object timeouts, driver polling, and application health checks. The events that matter are: Critical entries, any errors that appeared immediately before a user-reported problem, errors that are repeating on a regular cadence (check the count in the Details pane), and Event IDs 41, 6008, 7034, 1000, and 4625. Everything else is usually safe to acknowledge and move on. Focus your energy on events correlated with actual symptoms, don't chase every red icon or you'll spend more time in Event Viewer than fixing real problems.

Related Microsoft Fix Guides

H
Sai Kiran Pandrala
Our team includes certified Microsoft engineers, Azure architects, and system administrators with 10+ years of enterprise IT experience. Every guide is written from hands-on troubleshooting, not guesswork. We test every fix before publishing.