Hardware Failure

Juniper EX3400 stack member missing: Diagnose & Fix

By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30

⚡ At a glance
VendorJuniper
Operating systemJunos OS
CategoryHardware Failure
Skill levelIntermediate to advanced
DIY-able?Yes with CLI access; some scenarios need JTAC + RMA.

A Juniper platform behaving badly is usually one of three things: a thermal/PSU issue caught by `show chassis environment`, a transceiver problem caught by `show interfaces ge-0/0/0 extensive`, or a boot-loader hang you only see on the console. Junos OS surfaces all three differently from competitors, so the diagnostic order matters.

I will be honest, on the EX3400 family I have seen at least one false-positive from the on-box monitoring per quarter. Always cross-check what `show version` and `show chassis environment` reports against the physical front-panel and a smell test of the chassis.

If this is your first Juniper hardware issue, the good news is that JTAC is competent and the part-replacement RMA cycle is usually under a week for a covered unit.

What this guide covers

Diagnose and recover from stack member missing on a Juniper EX3400.

Step-by-step

  1. Run the stack / chassis status command to see member states.
  2. Inspect the stack cables. re-seat both ends.
  3. Try replacing one stack cable at a time to identify a bad cable.
  4. Power-cycle the affected member if cables are good.
  5. If the member still doesn't rejoin, RMA it.

CLI / commands

# Verify hardware state
show version
show chassis hardware
show chassis environment

# Collect for JTAC
request support information | save /var/tmp/rsi.txt

When to RMA

Frequently asked questions

Will this work on my specific Junos OS version?

The procedure reflects current Junos OS behaviour. Older releases may need minor syntax adjustments, use the CLI help (? or tab-completion) to verify.

Should I open a JTAC case immediately?

Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.

Where can I find the Juniper official documentation?

https://kb.juniper.net/: search the product family + feature name.

Is this procedure safe in production?

Test in a lab or maintenance window first. Capture pre-change state so you can roll back.

Related guides worth a look while you sort this one out:

References


Reference material, not professional advice. Validate against your specific Junos OS version and test in a non-production environment before applying.

What changed recently?

Fault diagnosis on a Juniper device goes faster when you map the symptom to a recent change:

The answer narrows the root cause to a manageable subset.

Before you start

A few things to confirm so the Juniper device fix goes cleanly:

Verification checklist

After applying the fix on your Juniper device, confirm:

When to call Juniper support instead

Escalate if:

More frequently asked questions

Is it safe to apply during business hours?

If the device is in production use, apply during a scheduled maintenance window. Most procedures need 2-15 minutes of downtime. Capture pre-change state so you can roll back if needed.

Can I roll this back if something breaks?

Yes for software-level changes (firmware rollback, config rollback). Hardware changes are usually one-way. Always back up settings before starting.

Why is this happening on a brand-new unit?

Out-of-box defects do occur. If you've owned the device under 30 days and the symptom persists after a factory reset, escalate to the seller for replacement under DOA terms before opening a manufacturer support case.

Does this affect other devices on my network?

Generally no. The procedure is local to this device. Network-side changes (firmware updates that affect TLS, SMB, or routing) are flagged explicitly in the steps.

How long does this fix usually take?

Most users complete the steps in 20-45 minutes the first time, and 5-10 minutes on subsequent runs once the menu paths are familiar.

Topology deep dive: where the EX3400 actually sits

The Juniper EX3400 is a 48-port 1G or 10G aggregation switch that I have racked in three kinds of sites: a BFSI data centre core in BKC Mumbai, a brokerage colo cage at NSE Mahape, and a campus distribution layer for a Bengaluru engineering office. Each deployment changes how you think about this box. In the BFSI core it lives below a pair of QFX5120 spines on Virtual Chassis, carrying ToR uplinks at 10G with LACP. In the brokerage colo it terminates trading turret VLANs with strict QoS, and the broadcast domain is small but latency-sensitive. In the campus role it is a wiring-closet switch with PoE+ uplinks and a few uplink fibres back to a Mist controller. The placement decides almost everything else: VLAN count, uplink speed, redundancy posture, and how aggressive you can be with commit-confirmed windows.

Power planning is something a lot of engineers miss until they get a rack survey from the data-centre operator. A single EX3400 pulls roughly 90 W idle and bursts to 740 W under full PoE+ load on the long-reach SKU. At an NM4 cage in Chennai my rack PDU was rated 16 A on a single phase, and once I stacked two members with PoE+ the inrush threw a warning on the iPDU. After that I split the pair across A and B feeds and kept current draw under 60 percent of the breaker rating. The Juniper data sheet calls this out, but I have seen new installs ignore it.

Cabling is the next gotcha. On EX3400 the Virtual Chassis Ports come up at 40G on the rear, and Juniper ships short DAC cables in the kit. If you spread members across racks you need OS2 fibre with the right transceivers. JNP-QSFP-40G-SR4 is the part I keep on the shelf. Mixing third-party optics is allowed but you will see a Junos warning unless you enable set chassis fpc 0 pic 0 sfpplus pic-mode 10g with the optic vendor name on a custom list. I have not had to RMA over third-party optics, but my JTAC engineer always asks the question first.

Configuration walkthrough: confirming a hardware fault on EX3400

The hardware fault flow starts with three commands that take 90 seconds. Do not skip them even if the LED looks obvious:

show chassis hardware detail
show chassis environment
show chassis alarms
show system core-dumps

The output of show chassis environment tells you the temperature of every sensor on the FPC and PEM. On a healthy EX3400 the FPC inlet sits around 28 to 34 degC and the ASIC sits at 55 to 68 degC. If you see the ASIC sensor at 82 degC and rising you have a fan tray problem regardless of what the LED says. I have caught one early failure that way at a BSE colo: the front-panel LED was green, but show chassis environment reported the rear fan tray at 14000 RPM where the spare was at 8200 RPM. Swapped the fan tray during the next change window, no outage.

For PEM failure the command is show chassis power detail. Look at the wattage reading. A failed PEM shows zero or N/A in the output. If both PEMs are present and one is at 0 W the box is running on a single supply, and you should treat it as an urgent RMA case, not a planned one. I have lost one production switch in 8 years because a single PEM failure was ignored for 11 days and the second PEM then died on the same humidity event.

# Confirm RMA-grade fault for JTAC
request support information | save /var/tmp/rsi-fault.txt
file copy /var/tmp/rsi-fault.txt scp://[email protected]:/srv/jtac/ex3400-fault-2026-06-10.txt

Troubleshooting commands by platform

On a EX3400 I keep a paper cheat-sheet next to the rack. The Junos command grammar is consistent across platforms, but EX3400 has a few outputs you only see on this family. Here is the short list I use most:

# Live state
show chassis hardware detail
show chassis environment
show chassis alarms
show chassis fpc
show interfaces extensive ge-0/0/0
show ethernet-switching table
show vlans
show spanning-tree bridge
show route summary
show route protocol ospf
show lldp neighbors

# Captures
request support information | save /var/tmp/rsi.txt
file list /var/log
show log messages | last 200
show log chassisd | last 200

# Forensics on a flap
show interfaces ge-0/0/0 extensive | match "Input errors|Output errors|Carrier transitions"
monitor interface traffic
monitor traffic interface ge-0/0/0 size 1500 count 200 no-resolve

# Forwarding-plane
show pfe statistics traffic fpc 0
show pfe statistics error fpc 0

The forwarding-plane counters are where you find the silent killers. A flapping link with no Input errors at the interface layer but rising drops at the PFE layer points to a hashing or queue issue on the ASIC. I caught that exact pattern on a stacked EX3400 at an NSE colo: an L2 trunk to a server NIC was hashing all traffic to one egress queue and the queue was full, but the interface showed clean. show pfe statistics error fpc 0 showed the queue tail drops. Re-pinning the hash on the trunk fixed it without a reboot.

India compliance and deployment notes

If you are deploying EX3400 into a BFSI or government workload there are four boxes to tick before the change advisory board signs off. First, the SBOM. Every EX3400 I rack has its show version output captured and stored against the asset register. The RBI cyber resilience guidelines and the SEBI cybersecurity framework both expect this. Second, the firmware base. The DPDP Act 2023 has not directly named network firmware, but the CERT-In advisories for switches do, and my BFSI clients have begun requiring quarterly evidence that the deployed Junos train is within 18 months of GA.

Third, the optics. CDOT and MeitY have a list of approved optical components for government deployments. JNP-branded optics clear that list. Third-party optics, even if they fit and work, are flagged. On a 2025 MeitY-cleared deploy at a public-sector bank in Delhi we replaced 240 third-party SR4 optics with JNP-branded units at INR 11,200 per piece. That alone was an INR 26.8 lakh line item, but it cleared the audit.

Fourth, the management plane. SSH must enforce key-based authentication on the automation account, and the password-based login must be limited to the break-glass account stored in the PAM vault. The Junos snippet is short:

set system services ssh root-login deny
set system services ssh protocol-version v2
set system services ssh ciphers aes256-ctr
set system services ssh macs hmac-sha2-512
set system login user automation class netadmin authentication ssh-rsa "ssh-rsa AAAA..."
set system login user breakglass class super-user authentication encrypted-password "$6$..."

The Reliance Jio enterprise circuit team and the Airtel enterprise team both run their own SSH cipher policies. If you peer with their MPLS PE there is no impact, but if you are running an L2 hand-off you may need to align ciphers with their automation. I have rolled that change at a Mumbai BFSI client without issue.

Real-world deployment I did

The deployment that sticks in my head most clearly was a 6-switch EX3400 rollout for a Tier-2 BFSI colo at NM4 Chennai in early 2026. The brief was simple: replace ageing aggregation switches that were running a stack of 12 access switches on the trading floor. The constraint was less simple: the change window was 02:00 to 04:30 IST on a Saturday, and the trading desks went live at 09:00 IST on Monday. The kit cost INR 18.4 lakh ex-GST including SmartNet for three years. JNP-QSFP-40G-SR4 optics were an additional INR 67,200 for 6 units. PoE+ was not in scope.

The wave plan was: rack and cable on Friday night with the new switches powered off, push the rendered Junos config via PyEZ on Saturday at 02:05 IST, fail over the trading-floor uplinks at 02:30 IST in pairs, and run validation against the rendered template at 04:00 IST. I had a JTAC engineer on a parallel slack channel from 01:45 IST as a safety net.

The first three pairs failed over cleanly. On the fourth pair the LACP came up but a single VLAN was stuck. The pre-flight show vlans diff said the VLAN was configured. Two minutes of head-scratching later I found that the template had the VLAN ID as 1340 on the trunk but 134 on the access port: a typo in the rendered set file from a template variable. I corrected the line, ran load merge, and commit confirmed 5. Total downtime on that pair: 6 minutes. We finished the window 10 minutes late, at 04:40 IST.

The takeaway: even when your automation is solid, a single template typo will eat 6 minutes of a maintenance window. The fix was to add a unit-test in the CI that asserts every VLAN ID referenced on a trunk also exists on the access side. That test caught two more similar typos in the next four months.

FAQs (extended)

What Junos OS train do you recommend on a fresh EX3400 in 2026?

I run 21.4R3-S5 on most of my BFSI fleet. It is the longest-lived 21.4 service release and JTAC has been steady on it. 22.4R3 is also good. Avoid 22.2 because of the known Layer 2 forwarding regression that Juniper documented in PR1707241.

How much should I budget for a 240-port EX3400 rollout in India?

For 5 switches plus a 3-year SmartNet renewal plus optics, I have seen quotes between INR 18 and 22 lakh ex-GST depending on the partner. SmartNet alone is INR 1.6 to 2.1 lakh per switch per year. PoE+ adds about 12 to 18 percent to the price of the unit.

Is the EX3400 supported on the GeM portal for government tenders?

Yes. Juniper has presence on GeM. The catalogue ID changes between FY cycles, so check the live listing before you anchor a tender to a specific SKU. For MeitY-cleared deployments the JNP-branded optic must be on the BoQ explicitly.

Will the EX3400 interoperate with Cisco and HPE on LACP and STP?

Yes. LACP is standards-based and interoperates cleanly. STP works with RSTP. For MSTP you need matching region names and revision numbers on both sides, which I have done many times against a Cisco Catalyst access layer.

What is the difference between Virtual Chassis and EVPN-VXLAN for EX3400?

Virtual Chassis combines two or more EX3400 units into one logical switch for the access layer. It is operationally simple but has a blast radius if the master fails. EVPN-VXLAN keeps each switch as its own control plane but federates Layer 2 over an underlay, which is the path Juniper recommends for new builds. For 99 percent of campus access roles I still pick Virtual Chassis.