Nvidia (Mellanox) SN3420: Upgrade Path to latest LTS / GA
By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30
| Vendor | Nvidia (Mellanox) |
|---|---|
| Operating system | Cumulus Linux / NVOS / SONiC |
| Category | Upgrade Paths |
| Skill level | Intermediate to advanced |
| DIY-able? | Yes with CLI access; some scenarios need Nvidia Enterprise Support + RMA. |
Image upgrades on Nvidia (Mellanox) platforms have one cardinal rule: verify the running image first. `nv show system` on Cumulus Linux / NVOS / SONiC is the single most useful command in a change window because it tells you exactly what you are rolling back to if something breaks.
Across the SN3420 family the upgrade syntax is `onie-nos-install /home/cumulus/cumulus-linux-5.x.img`: pay attention to the activation step because Cumulus Linux / NVOS / SONiC treats download and activate as separate transactions. Forgetting the activation step is the single most common reason an 'upgrade' silently does nothing.
Nvidia Enterprise Support expects you to capture pre-upgrade state and have a console session open during the change window. Anything less is a support-case waste of time if it goes sideways.
What this guide covers
Upgrade procedure for Nvidia (Mellanox) SN3420 to latest LTS / GA (Cumulus Linux / NVOS / SONiC).
Notes specific to this combination
Verify the supported upgrade path in the Nvidia (Mellanox) release notes before proceeding. Some Cumulus Linux / NVOS / SONiC releases require an intermediate hop; some support direct upgrade.
Step-by-step
- Verify current version:
nv show system. - Read the release notes for supported upgrade paths.
- Confirm minimum RAM / disk for the target release.
- Download target image; verify checksum.
- Schedule maintenance window.
- Back up running configuration.
- Copy image to local flash.
- Run
onie-nos-install /home/cumulus/cumulus-linux-5.x.img. - Reboot:
nv action reboot system. - Verify;
nv config saveif healthy.
CLI / commands
nv show system
nv show platform inventory
onie-nos-install /home/cumulus/cumulus-linux-5.x.img
nv config save
Frequently asked questions
Will this work on my specific Cumulus Linux / NVOS / SONiC version?
The procedure reflects current Cumulus Linux / NVOS / SONiC behaviour. Older releases may need minor syntax adjustments, use the CLI help (? or tab-completion) to verify.
Should I open a Nvidia Enterprise Support case immediately?
Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.
Where can I find the Nvidia (Mellanox) official documentation?
https://docs.nvidia.com/networking/. search the product family + feature name.
Is this procedure safe in production?
Test in a lab or maintenance window first. Capture pre-change state so you can roll back.
Related guides
Related fixes
Related guides worth a look while you sort this one out:
- Nvidia (Mellanox) SN2010: Upgrade Path to latest LTS / GA
- Nvidia (Mellanox) SN2100: Upgrade Path to latest LTS / GA
- Nvidia (Mellanox) SN2410: Upgrade Path to latest LTS / GA
- Nvidia (Mellanox) SN2700: Upgrade Path to latest LTS / GA
- Nvidia (Mellanox) SN3700: Upgrade Path to latest LTS / GA
- Nvidia (Mellanox) SN3420: Upgrade Path to latest hardening patch
References
- Nvidia (Mellanox) support portal: https://enterprise-support.nvidia.com/
- Nvidia (Mellanox) knowledge base: https://docs.nvidia.com/networking/
- Nvidia (Mellanox) security advisories: https://www.nvidia.com/en-us/security/
- Open a case: https://enterprise-support.nvidia.com/s/createcase
Reference material, not professional advice. Validate against your specific Cumulus Linux / NVOS / SONiC version and test in a non-production environment before applying.
Why this matters for your day-to-day
A Nvidia device that's misbehaving costs more than the fix itself: lost productivity, missed calls, security risk, even safety risk in some categories. Treating the symptom quickly with a documented procedure is cheaper than letting it persist. The steps above are written to get you back to working in under an hour where possible, and to flag clearly when escalation is the right call.
Before you start
A few things to confirm so the Nvidia device fix goes cleanly:
- Latest firmware downloaded if you're going to update.
- Warranty + support contract status checked, opening sealed parts may void it.
- Backup of current configuration (where applicable) taken.
- Spare parts on hand if you anticipate replacement.
- Adequate workspace, lighting, and time: rushing causes regressions.
How to confirm it's actually fixed
On a Nvidia device, the test is rarely "reboot and see". Use this list:
- Active reproduction: trigger the original failure path on purpose.
- Indirect reproduction: do an activity that would expose the same subsystem.
- Status indicator review: every LED / display / app status should be green.
- 24-hour soak: leave the device under normal load overnight; check the next morning.
- Telemetry check: review the device or app's diagnostic log for new error entries.
Escalation guide
For a Nvidia device, the right escalation depends on impact:
- Cosmetic / minor: log a ticket via the Nvidia app or web portal. Response 1-3 business days.
- Mid-impact: phone support. Have your serial number ready.
- Critical (production down, safety issue): in-person dealer / TAC visit. Bring proof of purchase.
- Out of warranty: third-party repair shop with manufacturer-certified technicians.
More frequently asked questions
Why is this happening on a brand-new unit?
Out-of-box defects do occur. If you've owned the device under 30 days and the symptom persists after a factory reset, escalate to the seller for replacement under DOA terms before opening a manufacturer support case.
Does this affect other devices on my network?
Generally no. The procedure is local to this device. Network-side changes (firmware updates that affect TLS, SMB, or routing) are flagged explicitly in the steps.
What if the fix returns after a reboot?
Persistent fault returns mean either: a hardware fault (escalate), a configuration that's being overwritten by a sync source (check cloud profiles), or a regression in a recent firmware update (rollback).
How long does this fix usually take?
Most users complete the steps in 20-45 minutes the first time, and 5-10 minutes on subsequent runs once the menu paths are familiar.
Are there safer alternatives for non-technical users?
Yes, the manufacturer's self-service troubleshooter (HP Smart, LG ThinQ, Samsung Members, similar) usually walks through the same steps in a guided UI. Use that first if you're not comfortable with menu paths.