IP / Network Issue

Nvidia (Mellanox) switch: asymmetric routing detected

By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30

⚡ At a glance
VendorNvidia (Mellanox)
Operating systemCumulus Linux / NVOS / SONiC
CategoryIP / Network Issue
Skill levelIntermediate to advanced
DIY-able?Yes with CLI access; some scenarios need Nvidia Enterprise Support + RMA.

What this guide covers

Fix asymmetric routing detected on a Nvidia (Mellanox) switch.

Step-by-step

  1. Stateful firewall sees TCP packets in only one direction.
  2. Identify the asymmetric leg via traceroute from each direction.
  3. Fix with policy-based routing or by collapsing to a single egress.

CLI / commands

nv show interface
nv show interface swp1
nv show platform inventory

When the issue persists

Frequently asked questions

Will this work on my specific Cumulus Linux / NVOS / SONiC version?

The procedure reflects current Cumulus Linux / NVOS / SONiC behaviour. Older releases may need minor syntax adjustments, use the CLI help (? or tab-completion) to verify.

Should I open a Nvidia Enterprise Support case immediately?

Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.

Where can I find the Nvidia (Mellanox) official documentation?

https://docs.nvidia.com/networking/: search the product family + feature name.

Is this procedure safe in production?

Test in a lab or maintenance window first. Capture pre-change state so you can roll back.

Related guides worth a look while you sort this one out:

References


Reference material, not professional advice. Validate against your specific Cumulus Linux / NVOS / SONiC version and test in a non-production environment before applying.

Common patterns we see

When this symptom shows up on a Nvidia device, three patterns repeat:

1. Recent firmware update changed behavior, the symptom started within a week of an OTA push. Rollback or wait for the hotfix. 2. Environmental trigger. temperature, humidity, line voltage, network changes. Look at what changed in the environment. 3. Cumulative wear, components like batteries, gaskets, fans degrade over time. Replace the consumable rather than chasing a software fix.

Knowing which pattern applies saves time on the wrong fix.

Safety + preconditions

Before any work on a Nvidia device:

Quick verification

Before you walk away from a Nvidia device fix, run through:

1. Reproduce the original trigger, does the issue reappear? 2. Check the device's status / health screen for any new alerts. 3. Confirm paired devices (app, hub, controller) reconnected. 4. Save / commit any configuration changes per the device's normal workflow. 5. Note the change in your maintenance log with date + firmware version.

Escalation guide

For a Nvidia device, the right escalation depends on impact:

More frequently asked questions

What if the fix returns after a reboot?

Persistent fault returns mean either: a hardware fault (escalate), a configuration that's being overwritten by a sync source (check cloud profiles), or a regression in a recent firmware update (rollback).

How long does this fix usually take?

Most users complete the steps in 20-45 minutes the first time, and 5-10 minutes on subsequent runs once the menu paths are familiar.

Are there safer alternatives for non-technical users?

Yes. the manufacturer's self-service troubleshooter (HP Smart, LG ThinQ, Samsung Members, similar) usually walks through the same steps in a guided UI. Use that first if you're not comfortable with menu paths.

Should I update firmware first or last?

Update firmware first if a release note specifically mentions your symptom. Otherwise, finish the troubleshooting flow first, then update; that way you can isolate whether the update or the underlying fix solved it.

Can I roll this back if something breaks?

Yes for software-level changes (firmware rollback, config rollback). Hardware changes are usually one-way. Always back up settings before starting.