Nvidia (Mellanox): OSPF neighbor stuck ExStart
By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30
| Vendor | Nvidia (Mellanox) |
|---|---|
| Operating system | Cumulus Linux / NVOS / SONiC |
| Category | Routing Issues |
| Skill level | Intermediate to advanced |
| DIY-able? | Yes with CLI access; some scenarios need Nvidia Enterprise Support + RMA. |
What this guide covers
Diagnose and fix OSPF neighbor stuck ExStart on a Nvidia (Mellanox) device (Cumulus Linux / NVOS / SONiC).
Most likely cause + fix
MTU mismatch is the most common cause. Set both sides to the same IP MTU on the OSPF interface.
Diagnostic CLI
nv show interface
# Use the Cumulus Linux / NVOS / SONiC equivalents of:
# show ip route / show route
# show ip bgp summary / show bgp summary
# show ip ospf neighbor / show ospf neighbor
# show log | include BGP|OSPF
When the issue persists
- Capture cl-support (Cumulus) / show techsupport (SONiC) and open a Nvidia Enterprise Support case.
- Cross-reference https://docs.nvidia.com/networking/category/etmnetworkingdevices for known issues in your release.
Frequently asked questions
Will this work on my specific Cumulus Linux / NVOS / SONiC version?
The procedure reflects current Cumulus Linux / NVOS / SONiC behaviour. Older releases may need minor syntax adjustments, use the CLI help (? or tab-completion) to verify.
Should I open a Nvidia Enterprise Support case immediately?
Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.
Where can I find the Nvidia (Mellanox) official documentation?
https://docs.nvidia.com/networking/. search the product family + feature name.
Is this procedure safe in production?
Test in a lab or maintenance window first. Capture pre-change state so you can roll back.
Related guides
Related fixes
Related guides worth a look while you sort this one out:
- Nvidia (Mellanox): BGP neighbor stuck Active
- Nvidia (Mellanox): BGP neighbor stuck Idle
- Nvidia (Mellanox): OSPF duplicate router-id
- Nvidia (Mellanox): OSPF MTU mismatch
- Nvidia (Mellanox) SN2010 stuck at boot loader prompt: Diagnose & Fix
- Nvidia (Mellanox) SN2100 stuck at boot loader prompt: Diagnose & Fix
References
- Nvidia (Mellanox) support portal: https://enterprise-support.nvidia.com/
- Nvidia (Mellanox) knowledge base: https://docs.nvidia.com/networking/
- Nvidia (Mellanox) security advisories: https://www.nvidia.com/en-us/security/
- Open a case: https://enterprise-support.nvidia.com/s/createcase
Reference material, not professional advice. Validate against your specific Cumulus Linux / NVOS / SONiC version and test in a non-production environment before applying.
Why this matters for your day-to-day
A Nvidia device that's misbehaving costs more than the fix itself: lost productivity, missed calls, security risk, even safety risk in some categories. Treating the symptom quickly with a documented procedure is cheaper than letting it persist. The steps above are written to get you back to working in under an hour where possible, and to flag clearly when escalation is the right call.
Before you start
A few things to confirm so the Nvidia device fix goes cleanly:
- Latest firmware downloaded if you're going to update.
- Warranty + support contract status checked, opening sealed parts may void it.
- Backup of current configuration (where applicable) taken.
- Spare parts on hand if you anticipate replacement.
- Adequate workspace, lighting, and time: rushing causes regressions.
Quick verification
Before you walk away from a Nvidia device fix, run through:
1. Reproduce the original trigger, does the issue reappear? 2. Check the device's status / health screen for any new alerts. 3. Confirm paired devices (app, hub, controller) reconnected. 4. Save / commit any configuration changes per the device's normal workflow. 5. Note the change in your maintenance log with date + firmware version.
When to call Nvidia support instead
Escalate if:
- The same symptom returns within 24 hours of a clean fix.
- You see physical damage (burn marks, swollen battery, cracked PCB).
- The device is in warranty and a hardware replacement is the cheaper outcome.
- Repair requires specialised tools you don't own (alignment jigs, calibration software).
- Following the official path keeps the warranty intact, which matters more than the time spent.
More frequently asked questions
How long does this fix usually take?
Most users complete the steps in 20-45 minutes the first time, and 5-10 minutes on subsequent runs once the menu paths are familiar.
Why is this happening on a brand-new unit?
Out-of-box defects do occur. If you've owned the device under 30 days and the symptom persists after a factory reset, escalate to the seller for replacement under DOA terms before opening a manufacturer support case.
What if my model isn't exactly the same revision?
Cross-check the model code on the rating plate against the manufacturer support page. Major firmware generations sometimes shift the menu path; the option is usually under a similarly-named section.
Is it safe to apply during business hours?
If the device is in production use, apply during a scheduled maintenance window. Most procedures need 2-15 minutes of downtime. Capture pre-change state so you can roll back if needed.
How often should I run preventive checks?
Quarterly for most consumer devices; monthly for production / commercial devices. Set a calendar reminder so the device stays healthy between issues.