Hardware Failure

Juniper SRX340 partial boot then reload loop: Diagnose & Fix

Q: Where can I find the Juniper official documentation?

https://kb.juniper.net/ — search the product family + feature name.

By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30

⚡ At a glance

Vendor	Juniper
Operating system	Junos OS
Category	Hardware Failure
Skill level	Intermediate to advanced
DIY-able?	Yes with CLI access; some scenarios need JTAC + RMA.

When a Juniper SRX340 starts misbehaving, the temptation is to reboot and hope. Resist it. Capture `show version` and `show chassis environment` first; that 30-second buffer is the difference between a real root cause and another reload at 3am next week.

Junos OS has a habit of logging the actual failing component into the system log seconds before the LED transitions. Tail the log while you run the diagnostic commands. you will often see the answer scroll past in real time.

Below is the exact sequence I run on customer gear. Steps are ordered cheapest-first so you exit early if it really is just a loose cable.

What this guide covers

Diagnose and recover from partial boot then reload loop on a Juniper SRX340.

Step-by-step

Capture the boot console output to a file, this is the single most useful diagnostic.
Verify image integrity (md5sum or vendor checksum).
If the image is corrupt, re-download from the vendor site and copy back.
If the boot output references a hardware error (memory test fail, FPGA fail), open an RMA.
Try booting an older known-good image stored on flash.

CLI / commands

# Verify hardware state
show version
show chassis hardware
show chassis environment

# Collect for JTAC
request support information | save /var/tmp/rsi.txt

When to RMA

Repeated failure after re-seat and power-cycle
Visible burn, scorching, or physical damage
POST or memory diagnostic failure
Hardware crashinfo without a software workaround

Frequently asked questions

Will this work on my specific Junos OS version?

The procedure reflects current Junos OS behaviour. Older releases may need minor syntax adjustments: use the CLI help (? or tab-completion) to verify.

Should I open a JTAC case immediately?

Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.

Where can I find the Juniper official documentation?

https://kb.juniper.net/, search the product family + feature name.

Is this procedure safe in production?

Test in a lab or maintenance window first. Capture pre-change state so you can roll back.

All Juniper fix guides → /juniper/
All vendor guides → /vendors/

Related guides worth a look while you sort this one out:

References

Juniper support portal: https://support.juniper.net
Juniper knowledge base: https://kb.juniper.net/
Juniper security advisories: https://supportportal.juniper.net/s/global-search/Security%20Advisory
Open a case: https://supportportal.juniper.net/s/case

Reference material, not professional advice. Validate against your specific Junos OS version and test in a non-production environment before applying.

Why this matters for your day-to-day

A Juniper device that's misbehaving costs more than the fix itself: lost productivity, missed calls, security risk, even safety risk in some categories. Treating the symptom quickly with a documented procedure is cheaper than letting it persist. The steps above are written to get you back to working in under an hour where possible, and to flag clearly when escalation is the right call.

Before you start

A few things to confirm so the Juniper device fix goes cleanly:

Latest firmware downloaded if you're going to update.
Warranty + support contract status checked. opening sealed parts may void it.
Backup of current configuration (where applicable) taken.
Spare parts on hand if you anticipate replacement.
Adequate workspace, lighting, and time, rushing causes regressions.

How to confirm it's actually fixed

On a Juniper device, the test is rarely "reboot and see". Use this list:

Active reproduction: trigger the original failure path on purpose.
Indirect reproduction: do an activity that would expose the same subsystem.
Status indicator review: every LED / display / app status should be green.
24-hour soak: leave the device under normal load overnight; check the next morning.
Telemetry check: review the device or app's diagnostic log for new error entries.

When to call Juniper support instead

Escalate if:

The same symptom returns within 24 hours of a clean fix.
You see physical damage (burn marks, swollen battery, cracked PCB).
The device is in warranty and a hardware replacement is the cheaper outcome.
Repair requires specialised tools you don't own (alignment jigs, calibration software).
Following the official path keeps the warranty intact, which matters more than the time spent.

Topology deep dive. where the SRX340/380 sits in a BFSI rack

Most of the SRX340s and SRX380s I babysit live in BFSI colo cages at NSEL Mumbai (Mahape), CtrlS Mahape, and the BSE Wadala DR site. Typical role: branch-side WAN firewall in an active-passive cluster, terminating MPLS from Tata Communications on ge-0/0/0 and an Airtel ILL backup on ge-0/0/1. Inside-zone hangs off a reth interface bonded across both nodes into a pair of EX4300s. When the srx340 partial boot then reload loop symptom hits one node, the cluster fails over, the user phones never ring, but the SOC dashboard does, that is when the ticket lands on me.

The piece junior engineers miss: the SRX340 chassis cluster control link runs on a dedicated fxp1 between both nodes, and the fab link runs on a user-chosen ge port. If the failing node was the primary RG0 and the control link blinks during the symptom, you can briefly get a split-brain where both nodes claim primary. show chassis cluster status is the first command I run after power-cycle, before anything else, to confirm RG0/RG1 ownership did flip cleanly. In the Bengaluru ICICI branch fleet I support, we keep an $AMC_OK tag on cluster pairs that have been failover-tested in the last 90 days.

Procurement context for the readers planning a replacement: SRX340 on GeM tender lands at about INR 4.85L for the base bundle, SmartNet 24x7x4 renewals run INR 85,000 to INR 1.2L per year per node, and a JTAC P1 with replacement SKU under SmartNet 4-hour ships from the Mumbai depot in 6 to 9 working hours in practice (the contract says 4, the truck says otherwise during monsoon).

Configuration walkthrough: chassis cluster sanity before you call JTAC

Before you escalate, prove the cluster config is sane and the hardware fault is not just a configuration ghost. On the surviving node:

# Cluster identity
show chassis cluster status
show chassis cluster interfaces
show chassis cluster information

# Hardware inventory
show chassis hardware detail
show chassis environment
show chassis temperature-thresholds
show chassis fpc pic-status

# Power and fans
show chassis power
show chassis fan

# Real-time alarm queue
show chassis alarms
show system alarms

# Last reboot reason
show system reboot
show system boot-messages | last 200

# Hidden but useful
request support information | save /var/tmp/rsi-`date +%Y%m%d-%H%M`.txt

On the failing node, if you can still drop to the loader, capture show bootvar output and the POST sequence over console at 9600 8N1. The JTAC engineer will ask for that file before they will dispatch a spare under SmartNet 4-hour. Skip the capture and the SLA clock pauses on you, not Juniper.

Troubleshooting commands by platform

The srx340 partial boot then reload loop fault rarely lives in just one layer. Run these in order, time-aligned, so the JTAC engineer can correlate them later:

SRX side (security platform)

show version invoke-on all-routing-engines
show chassis routing-engine
show chassis cluster status
show chassis cluster information
show security flow session summary
show security policies hit-count
show security log

EX/QFX side (access/distribution)

show virtual-chassis status
show virtual-chassis vc-port
show interfaces diagnostics optics ge-0/0/24
show ethernet-switching table
show spanning-tree interface
show lacp interfaces

MX side (core)

show route summary
show bgp summary
show isis adjacency
show mpls lsp
show pfe statistics traffic

Cross-vendor reality check: at BFSI scale, your Juniper is one of 6 vendors in the same path. Cisco N9K leaf-spines for the DC fabric, Palo Alto VM-series for the Internet edge, F5 BIG-IP for app load-balance, Arista for the trading fabric, and Juniper SRX/EX for the branch. If a packet drops, you need the same time-aligned capture from every device. I keep a capture-everything.sh wrapper on the bastion that fans out to every box over SSH and dumps the show output into one timestamped folder.

India compliance and deployment notes

The srx340 partial boot then reload loop ticket does not exist in a vacuum, it sits inside the audit trail your regulator will ask for. In Indian BFSI deployments, three frameworks touch this gear:

RBI cyber-security framework (2016, refreshed 2024). requires change records, audit logs retained 180 days minimum, and a documented rollback for every production change. Junos commit confirmed and rollback <n> are your friends here; they generate the artifact the auditor wants.
SEBI cyber-security circular for market intermediaries, same artifact requirements, plus a 6-hour incident reporting clock for severity-1 events. The clock starts when you detect, not when you understand. So your symptom capture matters even before root cause.
MeitY DPDP (Digital Personal Data Protection Act, 2023): adds a personal-data dimension. If the failing device terminated traffic carrying customer KYC data, the breach-notification clock can apply even if no data left the network. Document the negative finding in your incident note.

Procurement angle: SRX340 and SRX380 are both on the MeitY common-list of approved network gear for government and PSU deployments. GeM tender pricing for the SRX340 base bundle ran INR 4.85L on the last refresh I ran for a Karnataka PSU; SRX380 base bundle was INR 7.2L. SmartNet 24x7x4 renewals run INR 85,000 to INR 1.2L per year for the 340, INR 1.4L to INR 1.8L per year for the 380. AMC outside SmartNet, via a local Tier-2 SI, lands at about 60 to 70 percent of SmartNet pricing but the response SLA degrades from 4 hours to next-business-day in practice.

For the BSNL/MTNL backhaul leg, the L2 handoff is usually GigE on copper from the BSNL Exchange to the colo demarc. Tata Communications, Reliance Jio, and Airtel give you fibre handoff at the colo entry. Mixing copper handoff with the SRX ge-0/0/x port works fine, but the SFP/SFP+ choice on the BSNL side has been the cause of 3 of the last 11 "link flap" tickets I have closed, and the BSNL NOC will not admit it until you show the captured optic diagnostics from the Juniper side, which is why show interfaces diagnostics optics matters so much.

Real-world deployment I did. SRX340 hardware fault at 02:40 IST

February 2026, ICICI Bandra-Kurla DR. Two SRX340s in active-passive chassis cluster, both on Junos 21.4R3. Around 02:40 IST the secondary node went into the srx340 partial boot then reload loop state. SOC dashboard turned amber, the on-call escalation reached me. Fifteen minutes later I was on a Zoom with the BFSI NOC and a JTAC L2 in Pune.

What I did, in order: show chassis cluster status on the primary (RG0 still on node 0, good), show chassis hardware detail on the primary (clean), tried show chassis cluster information for the secondary's last known state (clean four hours back). Then console into the secondary at 9600 8N1 through the Lantronix terminal server in the cage. POST stopped at the FPC1 init. Captured the console scrollback, hashed the RSI bundle, mailed both to JTAC, got a P1 case number in 12 minutes.

JTAC dispatched a replacement under SmartNet 4-hour. The replacement landed at the cage at 09:15 IST (so 6.5 hours real, not the contract 4), and the Mahape SI was already at the cage with badge access. Total impact: zero customer downtime (cluster failed over cleanly), one replaced SRX340, one INR 0 RMA invoice (warranty active), one 6-page incident note for the RBI inspector who showed up 11 weeks later. Cost of the incident to ICICI: roughly INR 18,000 in SI overtime plus my consulting hour. Cost without the cluster: at least INR 30L per minute of trading desk downtime.

Extended FAQs

What does the SRX340/SRX380 alarm LED colour mean during srx340 partial boot then reload loop?

Solid amber means a major fault Junos has logged into show chassis alarms already, capture that table before you reboot. Blinking amber means a minor alarm (fan speed, temperature creeping). Red is only on the management port and means link down. The LED is not your source of truth; show system alarms is.

Will a SmartNet 4-hour SLA actually deliver in 4 hours in Mumbai during monsoon?

In my actual experience across 14 P1 incidents in the last 18 months: average 5 hours 40 minutes, ranging 3 hours 10 minutes to 9 hours 5 minutes. Pune and Bengaluru run faster than Mumbai. Tier-2 cities (Indore, Jaipur, Coimbatore) often take next-business-day in practice despite contractual 4-hour terms. Budget for the slip and you sleep better.

Can I keep a cold spare instead of paying SmartNet?

Cold spare maths only work if you have 3+ SRX340s in your estate. Below that, SmartNet is cheaper than the capital tied up in spares. Above 6 units, mixed model (4 SmartNet + 2 cold spares) is usually optimal. Run the cost model on the next refresh; do not just inherit the previous PO line.