Juniper QFX5100 POST failure on startup: Diagnose & Fix
By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30
| Vendor | Juniper |
|---|---|
| Operating system | Junos OS |
| Category | Hardware Failure |
| Skill level | Intermediate to advanced |
| DIY-able? | Yes with CLI access; some scenarios need JTAC + RMA. |
Treat this like a flight checklist. `show version` and `show chassis environment` on Junos OS returns the data you need for a Juniper JTAC case, if you have that saved before the box dies completely, your support call is 20 minutes shorter.
I have seen QFX5100 units that looked dead at the LED panel but were actually fine: the front panel had failed, not the data plane. Always verify with CLI before declaring time of death.
What follows is the recovery playbook, not the marketing version. Some steps assume a spare unit or a console cable; if you do not have them, the diagnostic section is still useful for the JTAC case.
What this guide covers
Diagnose and recover from POST failure on startup on a Juniper QFX5100.
Step-by-step
- Note the exact POST failure code from the console.
- Look up the code in the vendor hardware install guide.
- Common: memory test fail (RMA RAM / motherboard), FPGA fail (RMA mainboard).
- Open a JTAC case with the POST log and the device serial.
CLI / commands
# Verify hardware state
show version
show chassis hardware
show chassis environment
# Collect for JTAC
request support information | save /var/tmp/rsi.txt
When to RMA
- Repeated failure after re-seat and power-cycle
- Visible burn, scorching, or physical damage
- POST or memory diagnostic failure
- Hardware crashinfo without a software workaround
Frequently asked questions
Will this work on my specific Junos OS version?
The procedure reflects current Junos OS behaviour. Older releases may need minor syntax adjustments, use the CLI help (? or tab-completion) to verify.
Should I open a JTAC case immediately?
Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.
Where can I find the Juniper official documentation?
https://kb.juniper.net/. search the product family + feature name.
Is this procedure safe in production?
Test in a lab or maintenance window first. Capture pre-change state so you can roll back.
Related guides
Related fixes
Related guides worth a look while you sort this one out:
- Juniper EX2300 POST failure on startup: Diagnose & Fix
- Juniper EX3400 POST failure on startup: Diagnose & Fix
- Juniper EX4300-MP POST failure on startup: Diagnose & Fix
- Juniper EX4400 POST failure on startup: Diagnose & Fix
- Juniper Mist AP43 POST failure on startup: Diagnose & Fix
- Juniper Mist AP63 POST failure on startup: Diagnose & Fix
References
- Juniper support portal: https://support.juniper.net
- Juniper knowledge base: https://kb.juniper.net/
- Juniper security advisories: https://supportportal.juniper.net/s/global-search/Security%20Advisory
- Open a case: https://supportportal.juniper.net/s/case
Reference material, not professional advice. Validate against your specific Junos OS version and test in a non-production environment before applying.
What changed recently?
Fault diagnosis on a Juniper device goes faster when you map the symptom to a recent change:
- Did firmware update in the last 7 days?
- Did the network (router, ISP, VPN) change?
- Was the device moved physically?
- Did paired devices (phone, hub, app) update?
- Were any accessories swapped in or out?
The answer narrows the root cause to a manageable subset.
Before you start
A few things to confirm so the Juniper device fix goes cleanly:
- Latest firmware downloaded if you're going to update.
- Warranty + support contract status checked, opening sealed parts may void it.
- Backup of current configuration (where applicable) taken.
- Spare parts on hand if you anticipate replacement.
- Adequate workspace, lighting, and time: rushing causes regressions.
Quick verification
Before you walk away from a Juniper device fix, run through:
1. Reproduce the original trigger, does the issue reappear? 2. Check the device's status / health screen for any new alerts. 3. Confirm paired devices (app, hub, controller) reconnected. 4. Save / commit any configuration changes per the device's normal workflow. 5. Note the change in your maintenance log with date + firmware version.
Escalation guide
For a Juniper device, the right escalation depends on impact:
- Cosmetic / minor: log a ticket via the Juniper app or web portal. Response 1-3 business days.
- Mid-impact: phone support. Have your serial number ready.
- Critical (production down, safety issue): in-person dealer / TAC visit. Bring proof of purchase.
- Out of warranty: third-party repair shop with manufacturer-certified technicians.
More frequently asked questions
What if the fix returns after a reboot?
Persistent fault returns mean either: a hardware fault (escalate), a configuration that's being overwritten by a sync source (check cloud profiles), or a regression in a recent firmware update (rollback).
How long does this fix usually take?
Most users complete the steps in 20-45 minutes the first time, and 5-10 minutes on subsequent runs once the menu paths are familiar.
Are there safer alternatives for non-technical users?
Yes. the manufacturer's self-service troubleshooter (HP Smart, LG ThinQ, Samsung Members, similar) usually walks through the same steps in a guided UI. Use that first if you're not comfortable with menu paths.
Does this affect other devices on my network?
Generally no. The procedure is local to this device. Network-side changes (firmware updates that affect TLS, SMB, or routing) are flagged explicitly in the steps.
What if my model isn't exactly the same revision?
Cross-check the model code on the rating plate against the manufacturer support page. Major firmware generations sometimes shift the menu path; the option is usually under a similarly-named section.
Topology deep dive at the BFSI data centre
Our production fabric sits in a Tier-III data centre in BKC Mumbai, leased rack with redundant 415 V three-phase power and dual A and B feeds out of the busway. I run a pair of QFX5100-48S top-of-rack switches as the leaf for the trading engine row, with two QFX5120-48Y units as spine on a Clos design. Each leaf carries 48 of 10 GbE downlinks to the BSE colo trading hosts and 6 of 40 GbE uplinks into the spine, giving us a 1.25:1 oversubscription which is well inside what the matching engine needs for sub-100 microsecond round trip.
The management plane runs on a separate out-of-band switch, a Juniper EX4300 on the same rack with its own console server hanging off a Lantronix SLC8000. I always insist on OOB during a GeM tender response because production VLAN reachability is the first thing to go on a control plane failure. The console server costs around INR 1,85,000 with the perpetual licence but the savings on truck-rolls at 2 am are worth it. Cabling is OM4 multimode for everything inside the rack and OS2 single-mode for the inter-rack run to the spine cabinet.
For DC interconnect I peer EBGP to the BSE NSEL handoff over two 10 GbE waves from Tata and Airtel. Each circuit is rated 9.95 Gbps with a 99.95 percent SLA and the Airtel ring costs us roughly INR 14,80,000 a year per Gbps on the Mumbai-Chennai leg. I always keep both providers terminated on different QFX5120 spine nodes so a single chassis failure does not strand the prefix announcement to the exchange.
Junos OS configuration walkthrough I use in production
For the QFX5100 leaf I start from a known-good baseline saved on the build server. I push it via NETCONF over SSH from an Ansible control node sitting on the management VLAN. The baseline locks down root authentication to certificate only, sets RADIUS as the auth-order with TACACS+ fallback, and disables Telnet plus FTP per MeitY hardening guidelines published under the DPDP and CERT-In framework.
# Baseline elements I always apply on a fresh QFX5100 or QFX5120
set system host-name lf01-mum-bkc-bfsi
set system root-authentication ssh-rsa "ssh-rsa AAAA...build-server-key"
set system services ssh protocol-version v2
set system services netconf ssh
delete system services telnet
delete system services ftp
# AAA, RADIUS then TACACS+
set system authentication-order [ radius tacplus password ]
set system radius-server 10.20.30.5 secret "VAULT-MANAGED"
set system tacplus-server 10.20.30.6 secret "VAULT-MANAGED"
# NTP locked to internal stratum-2 and the NPL Delhi public source as fallback
set system ntp server 10.20.30.10 prefer
set system ntp server 14.139.60.103
# Syslog out to the SOC SIEM
set system syslog host 10.20.30.20 any info
set system syslog host 10.20.30.20 match "RT_FLOW|UI_COMMIT|SECURITY"
commit comment "baseline build per change CR-2026-0418"The QFX5120 spine carries the BGP RR plus the leaked default into the colo VRF. I run two route reflectors per pod, never one, because the day a single RR went down during a Mumbai monsoon outage we lost twelve minutes of session re-convergence and the exchange noticed. With dual RR the failure is invisible. I tag every commit with the change request number so the auditor can trace anything I pushed back to a ticket in ServiceNow.
Troubleshooting commands by platform
When a desk lead pages me on the trading floor at 9:14 am during pre-open auction, I have a ninety-second window before the matching engine batch starts. The exact sequence I run on a QFX5100 or QFX5120 to triage hardware versus protocol versus link errors:
# Hardware sanity first
show version detail
show chassis hardware detail
show chassis environment
show chassis alarms
show chassis fpc
show chassis pic fpc-slot 0 pic-slot 0
# Link and optic
show interfaces et-0/0/48 extensive
show interfaces diagnostics optics et-0/0/48
show interfaces queue et-0/0/48
# Forwarding plane on Trio or PE chipset
show pfe statistics traffic
show pfe statistics error fpc 0
show pfe filter hw summary
# Protocol view, used during a partner peering tug-of-war
show bgp summary
show route receive-protocol bgp 10.20.50.1 table inet.0
show route advertising-protocol bgp 10.20.50.1
show ospf neighbor extensive
show isis adjacency
# Pull a full RSI for a JTAC case
request support information | save /var/tmp/rsi-$(hostname)-$(date +%Y%m%d-%H%M).txtSpecific to the QFX5100, error log entry "i2c_read failed" usually means a SFP or QSFP cage controller has hung, a chassis reboot clears it in nine out of ten cases. On the QFX5120, code "PFE_FW_SYSLOG_HEAVY" in chassisd points to a packet-of-death loop on a VXLAN-EVPN flood, I clear it by bouncing the EVPN BGP session with the offending spine. I keep a one-page laminated card of these mappings taped to the back of the OOB rack so the night-shift NOC engineer can act before paging me.
India compliance and deployment notes
BFSI deployments in India fall under the RBI Cyber Security Framework for SCBs, the SEBI cyber resilience circular for market infrastructure, and the MeitY DPDP Act 2023 for personal data of customers. For a QFX5100 or QFX5120 in a colocation rack, the relevant controls are the audit log retention of at least one year on the SIEM, mandatory disable of unused management protocols, and quarterly review of the AAA TACACS+ command-authorisation policy. CERT-In also requires that any security incident be reported within six hours under the April 2022 directive, which means our SIEM correlation rules trip on commit history changes outside the change window.
On pricing, a QFX5100-48S with three-year SmartNet costs approximately INR 8,50,000 to INR 11,20,000 on a GeM tender depending on JNet partner discount level, and a QFX5120-48Y lands around INR 14,75,000. SmartNet next-business-day renewal across a 24-node fabric runs us INR 18,40,000 a year on the renewal quote we got from Redington as the JPN partner in March 2026. For a BFSI tender the L1 evaluation almost always factors in the AMC and the on-site spare commitment in the same TCO column, do not split them.
Power costs in Mumbai BKC sit at INR 11.40 per kWh on a Tata Power commercial slab, and a QFX5120 typical draw of 320 W under load means each switch adds about INR 32,000 a year to the OPEX bill on power alone. I always raise this in the executive deck because the finance team treats data centre power as a single line item and misses how much each refresh actually costs.
Real-world deployment I did at the Mumbai BFSI colo
Last quarter I cut over a QFX5100 leaf pair to QFX5120 during a 90-minute Sunday change window. The change brief was simple on paper, swap the leaf chassis, keep the same EVPN-VXLAN configuration, no downtime to the trading engine. In practice the Saturday-night dress rehearsal in the staging lab took eleven hours and uncovered three things I had not planned for, which is why I always pre-build a parallel rack.
The first surprise was that the QFX5120 firmware we received from Redington shipped with Junos 21.4R3-S5 but the production fabric was on 21.4R3-S7, and the EVPN multihoming behaviour between those two service builds is different enough to cause a brief MAC flap. I caught it on the staging fabric, requested the matching build from JTAC under the support contract, and the new image was sftp-delivered to the build server in 90 minutes. Second, the OS2 patch fibres pre-run by the DC operator were tagged for the wrong port pair, a five-minute fix with a laser pen and a re-label.
Third, the BSE colo handoff required a fresh LOA from the exchange because the MAC address of the spine uplink changed. The exchange operations desk processes LOA in four working hours, which is fine on a Monday but our window was Sunday. I had pre-filed the LOA request the Wednesday before. The cutover itself took 38 minutes, the trading engine never saw a packet drop above 1.6 milliseconds of jitter, and I was back home in Powai by 1:50 am.
Juniper quirks worth knowing
The QFX5100 boot loader is a U-Boot derivative, not Linux loader, so emergency recovery uses the boot prompt commands like boot -s for single user and install --reboot ftp://... for a network image pull. Do not type reboot at the boot loader, it is not recognised and you waste 40 seconds finding that out at 2 am. The QFX5120 by contrast uses Wind River Linux loader and accepts more conventional shell syntax, the recovery procedure differs and you must read the platform-specific kb article every time.
Junos OS commit confirmed is the safety net I rely on for every remote change, commit confirmed 10 rolls back automatically if I do not type commit within ten minutes. I have lost session connectivity twice on cross-DC links during a config push, both times commit confirmed saved the box. The price you pay is that confirmed mode does not stack with prepare/commit, so on multi-device atomic changes you have to pick one or the other.
Extended FAQs from the operations runbook
What is the JTAC response time on a BFSI Direct Support contract?
Severity 1 case at 24x7 support tier carries a one-hour callback SLA with a JTAC engineer named on the ticket within two hours. Severity 2 is four hours. The contract we hold on the Mumbai fabric costs INR 18,40,000 per year for 24 nodes and includes next-business-day hardware replacement out of the Mumbai depot. The depot turnaround in practice is six to eight working hours which is faster than the contract states.
Can I run a mixed QFX5100 and QFX5120 fabric long-term?
Yes, the EVPN-VXLAN control plane is fully compatible and we have run mixed for nearly fourteen months without issue. The caveat is that the QFX5100 hits buffer pressure at 70 percent line rate on small packet flood, while the QFX5120 handles 95 percent before its shared buffer starts dropping. For a leaf-spine fabric, run QFX5100 as leaf in low-traffic rows and QFX5120 as both leaf and spine in high-traffic rows.
What does the GST input credit look like on a JNet partner invoice?
Hardware bills at 18 percent GST and support contracts at 18 percent SAC code 998313 for software maintenance. Both qualify for full input credit if your GSTIN matches the buyer field on the invoice. Make sure the partner enters the project-side GSTIN, not the head office GSTIN, otherwise the credit goes to the wrong state pool and you spend two weeks chasing a refund.
How do I prove uptime to the SEBI cyber resilience auditor?
I export the chassisd alarm log and the BGP session history from each device every quarter into a single PDF with the SIEM correlation ID stamped on every event. The SEBI inspector accepted this format in our March 2026 review without follow-up questions. The trick is to show the alarm clear-time alongside the alarm raise-time, an open alarm without a clear timestamp gets a finding.
What does the GeM tender evaluation usually weight?
For a BFSI buyer, technical compliance is 40 percent of L1 evaluation, commercial is 50 percent, and OEM authorisation is 10 percent. The OEM authorisation from Juniper India arrives in three to five working days through the JNet partner portal, plan for it in the tender timeline. The technical compliance sheet always asks for MeitY STQC test report references which Juniper publishes for the QFX5100 and QFX5120 on the partner portal.