How to configure MPLS LDP session protection on Nexus 9000
By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30
I run an MPLS core for an ESS Bengaluru bank customer where two PE routers sit on a long-haul DWDM circuit between Bengaluru and Mumbai. The link flapped twice last quarter. Both times, LDP tore down. Both times, traffic black-holed for the 35-second hello dead interval before LDP rebuilt. Session protection is the fix. It keeps the LDP targeted hello alive across an IGP flap, so when the link comes back the labels are already there. On Nexus 9000, the command flow is a little different from what the lab guides assume.
Pre-requisites
- Nexus 9000 running IOS XE 17.6.5 or later, or NX-OS 10.2 for Nexus 9000.
- LDP already configured on the affected interfaces. Run
show mpls ldp neighborand confirm an existing peering. - An IGP (OSPF or IS-IS) reachable to the peer loopback. Without it, targeted hellos have nowhere to land.
- TCP 646 open between loopbacks. A bank customer of mine in Bengaluru had this blocked at a Palo Alto in the middle. Took two hours to find.
- Privileged EXEC + an out-of-band console: never test LDP changes over the data plane if the WAN is your only path home.
Step-by-step on Nexus 9000
- Open a console session. Putty 0.78 or SecureCRT 9.4 against the out-of-band port. Log the session to disk. If LDP tears down mid-change you will want the timestamped log.
- Confirm the existing LDP peer.
show mpls ldp neighbor | inc Peer. Note the peer LDP ID and the discovery sources. You will reuse the peer loopback IP in the next step. - Enable global session protection. Under the
mpls ldpconfig context:mpls ldp session protection. By default this protects all directly-connected peers. To scope it, addfor ACLat the end. - (Optional) Bind by ACL. Build a standard ACL listing peer LDP IDs:
access-list 12 permit 10.0.0.2, thenmpls ldp session protection for 12 duration 86400. The duration is how long to hold the targeted hello after the directly-connected hello drops. - Save and check.
write memory, thenshow mpls ldp neighbor | inc Protection. Expected line:LDP Session Protection enabled, state: Ready. - Force a flap to test. If the maintenance window allows, shut/no-shut the LDP-enabled interface. Watch for
%LDP-5-NBRCHG: peer ... is UPcoming back without the full 90-second hold-down. If the targeted hello kept the session warm, recovery should be under 10 seconds. - Log the change. Drop a note in the GeM-tender CMDB or your internal change record so the next person on call knows session protection is enabled and what the duration is.
A deployment I shipped
Last quarter I shipped this on a Nexus 9000 pair for a private bank's Bengaluru-Mumbai MPLS core. The DWDM provider scheduled a 4 a.m. fibre splice. Without session protection the previous outage had taken 47 seconds of label reconvergence, long enough for the bank's payment switch to throw 12 timeouts. After we enabled mpls ldp session protection with an 86,400-second duration, the splice window passed and the targeted hello kept the session warm. show mpls ldp neighbor reported state Ready throughout. The payment switch saw zero retries. The change cost us 20 minutes of console time. We billed 4 hours under the SmartNet-style support contract because that is what the GeM-tender SLA committed.
How I verify the change actually works
show mpls ldp neighbor | inc Protection. expect 'state: Ready'.show mpls ldp discovery, confirm targeted hellos are active.show mpls forwarding-table: labels intact across a controlled flap.debug mpls ldp session protectionfor a brief verification window only. Disable when done.
Gotchas I've eaten in production
- %LDP-5-NBRCHG keeps flapping. Session protection only kicks in if LDP is up before the link drops. If LDP never finished the initial neighbor discovery, you'll still see clean teardowns. Confirm with
show mpls ldp neighbor. - %LDP-5-SP indicates retry exhaustion. Either the duration expired or the underlying IGP took longer than expected to reconverge. Bump duration to 86400 if the IGP is slow.
- Targeted hello fails through a firewall. TCP 646 plus UDP 646 must be open between loopbacks. A Bengaluru bank had a Palo Alto silently dropping the targeted UDP hello. Two hours to find.
- %SYS-5-CONFIG_I shows config drift. Someone else edited the box. Always log who and when via TACACS+ accounting before declaring the session-protection behaviour is at fault.
Cost impact
| Line item | India (INR) | Global (USD) |
|---|---|---|
| SmartNet 8x5xNBD on the platform (annual) | ₹85,000 - ₹1.2 lakh | $1,050 - $1,500 |
| SmartNet 24x7x4 (annual) | ₹1.5 - 2 lakh | $1,900 - $2,500 |
| Putty 0.78 / SecureCRT 9.4 licence | Free / ₹8,200 perpetual | Free / $99 perpetual |
| Wireshark 4.2 (capture analysis) | Free | Free |
| Cisco DNA Center / Catalyst Center seat (per device-year, list) | ₹6,500 - ₹14,000 | $80 - $170 |
| Engineer time on-site (Bengaluru / Mumbai) | ₹2,200 - ₹3,800 per hour | $95 - $130 per hour |
Numbers are 2026 indicative ranges and depend on the SKU plus your reseller. Redington and Ingram Micro typically beat list by 8-14% for partner-managed renewals. GeM-tender pricing varies again, most government rate contracts include first-year SmartNet bundled into the hardware price.
Tooling I keep on the bench
- Putty 0.78 for the console session. Logging is on by default for every box I touch.
- SecureCRT 9.4 when the customer has tab-heavy sessions or needs tabbed scripting against a fleet.
- Wireshark 4.2 for any time the platform behaviour does not match the documentation. A 10-second capture answers what 30 minutes of
showcommands cannot. - Cisco DNA Center / Catalyst Center for fleets above 30 devices. The compliance dashboard catches drift that an engineer never sees.
- Cisco Modeling Labs (CML 2.7) for pre-prod testing. ₹0 for personal use up to 20 nodes; commercial licence runs about ₹1.2 lakh annually.
- Ansible 2.16 for templated rollouts. The
cisco.iosandcisco.nxoscollections both handle the platforms in this guide.
How this interacts with other Cisco surfaces
Hardly any change on Nexus 9000 lives alone. The features in this guide ripple into adjacent boxes. sometimes within seconds, sometimes the next morning. Here is what I trace before I close a ticket.
Catalyst Center (DNAC) compliance
If the customer runs Catalyst Center, any out-of-band CLI edit will show as compliance drift inside 15 minutes. I either pre-stage the change as a template in the Network Design workflow, or I accept the drift flag and immediately re-sync the device state. Leaving the drift unresolved means the next compliance scan re-applies the previous template and silently wipes your change.
SD-WAN policy fabric
On a fabric router under vManage / Cisco SD-WAN control, CLI edits to features the controller manages get reverted on the next template push. The right move is to apply the change via a feature template, attach a CLI add-on for what the GUI does not cover, and push from vManage. If you are testing in isolation, detach the device from vManage first.
Identity Services Engine (ISE) RADIUS sessions
When the platform you are touching also acts as a NAS for 802.1X, every config save reloads the RADIUS subsystem briefly. Active wired sessions held by ISE can reauthenticate. Schedule the change outside the 9 a.m. login spike or use aaa accounting update periodic 5 to keep stale sessions visible to ISE while the box settles.
Firepower / FTD inspection
If a Firepower NGIPS or FTD sits between the inside and outside zones, any new NAT flow needs an access-control rule allowing it. The control-plane change on the router does not automatically open the firewall. I keep a paired change request open on FMC so the rule lands in the same window.
Duo MFA for admin login
If admin logins are protected by Cisco Duo, plan for the push prompt during your change window. A Duo push that times out at the wrong moment can leave you locked out of the second box mid-change. I keep a parallel console session open before I touch any auth-related config.
Long-term monitoring I leave running
A clean change is one that still looks clean a month later. On Nexus 9000, I leave the following hooks in place after every deployment touched by this guide.
- SNMPv3 polling on the interfaces involved, CPU, memory, input / output bps, errors. PRTG or LibreNMS both work; the customer's existing NMS is usually fine.
- Syslog forwarding to a central collector. I prefer Graylog 5.2 with a dashboard that filters on
%LINEPROTO-5-UPDOWN,%SYS-5-CONFIG_I,%SPANTREE-2-RECV_PVID_ERR,%OSPF-4-ERRRCV, and any platform-specific NAT / MPLS facility codes. - NetFlow / IPFIX at low sample rate (1 in 1,000) to the customer's flow collector. Useful for proving that the NAT pool is being used the way the design intended.
- Monthly compliance scan via Catalyst Center or a manual
show running-configdiff against the change baseline. Drift catches silent edits. - Quarterly review of SmartNet entitlement. If the SmartNet contract is about to expire (₹85,000 - 2 lakh annual, set a calendar 60 days out), renewal lead time on a GeM-tender customer can be 90 days.
None of these are heavy lifts. Combined, they catch the regressions that an ad-hoc show command will not. Customers who run them rarely call us about repeat incidents on the same change.
More frequently asked questions
Can I roll back without a reload?
Yes for every topic in this guide. The no-form of each command unwinds the change in real time. Run show running-config before and after so you can diff with VSCode or notepad++ if anything looks off.
Does this break IPv6?
No. None of these features touch the IPv6 forwarding path. If you run dual-stack on Nexus 9000, IPv6 keeps its own LSDB, its own NAT (or NPTv6) state, and its own LDP context: they share nothing with IPv4 here.
What about IOS XE Stack-Wise V1/V2 mismatch?
Mixing Stack-Wise V1 and V2 members in the same stack is unsupported and reliably breaks NAT pool ownership. Replace the older member before configuring any of these features on a stacked Catalyst.
Is this safe to run during business hours?
Read-only verification is always safe. Config changes, even the no-op-looking ones. can disturb production. I schedule a 30-minute window with the customer, capture pre-change state, run the change, verify, and stop. A Comsys Mumbai-style runbook keeps this consistent across teams.
Will SmartNet TAC help if I get stuck?
Yes. With an active SmartNet (₹85,000 - 2 lakh annually depending on SKU and tier) TAC will accept a P3 ticket and review the running-config plus the relevant show outputs. Without SmartNet you can still post on the Cisco Community forum but expect community response speed, not SLA speed.
How do I avoid this becoming legacy debt?
Document the change in CMDB. Tag it with the project name. Add the verification commands to the runbook. Add a Catalyst Center compliance policy if you run one. The engineer who picks this up in 2028 will thank you.
What I do after the change is in
Three habits keep me sane after any production config change. First, I leave the console session logged in for 15 minutes and watch the syslog buffer. Second, I run show logging | last 100 from a fresh session 24 hours later. Third, I ask the customer's NOC to confirm zero alerts during the window. The combination catches almost every regression before it becomes a Monday morning ticket.
On a Nexus 9000-class platform, the syslog patterns that I watch for are %LINEPROTO-5-UPDOWN on the affected interfaces, %SYS-5-CONFIG_I for unexpected re-edits, and %SPANTREE-2-RECV_PVID_ERR on the L2 underlay. If none of those show up in the next 48 hours, the change has settled.
If you came here because of a live outage, the fastest rollback is almost always the no-form of the commands above. Restore. Stabilise. Then reschedule the change for a quiet window. Production is not the time to be brave.
Related fixes
Related guides worth a look while you sort this one out:
- How to configure MPLS LDP session protection on AnyConnect Secure Client
- How to configure MPLS LDP session protection on ASR 1000
- How to configure MPLS LDP session protection on Catalyst 8300/8500
- How to configure MPLS LDP session protection on Catalyst 9200
- How to configure MPLS LDP session protection on Catalyst 9300
- How to configure MPLS LDP session protection on Catalyst 9400