Upgrade Failure

MikroTik cAP ax: How to recover from a corrupted image during upgrade

Q: Where can I find the MikroTik official documentation?

https://help.mikrotik.com — search the product family + feature name.

By Sai Kiran Pandrala · reviewed by Sai Kiran Pandrala, Editor Last verified: 2026-05-30

⚡ At a glance

Vendor	MikroTik
Operating system	RouterOS
Category	Upgrade Failure
Skill level	Intermediate to advanced
DIY-able?	Yes with CLI access; some scenarios need MikroTik Support + RMA.

Image upgrades on MikroTik platforms have one cardinal rule: verify the running image first. `/system resource print` on RouterOS is the single most useful command in a change window because it tells you exactly what you are rolling back to if something breaks.

Across the cAP ax family the upgrade syntax is `/system package update install`, pay attention to the activation step because RouterOS treats download and activate as separate transactions. Forgetting the activation step is the single most common reason an 'upgrade' silently does nothing.

MikroTik Support expects you to capture pre-upgrade state and have a console session open during the change window. Anything less is a support-case waste of time if it goes sideways.

What this guide covers

Recover from a corrupted image during upgrade on a MikroTik cAP ax (RouterOS).

Step-by-step

If at the boot loader, boot the prior image still on flash.
If the active is corrupt and a standby still works (HA), force failover first.
Re-download the image from the vendor portal.
Verify checksum before copying to the device.
Reinstall the new image and reboot.

CLI / commands

# Boot recovery prompt: Netinstall (Windows tool) / serial recovery

# Verify image
/system resource print

# Upgrade
/system package update install

# Save / commit
(auto-saves)

# Rollback
/system backup load name=backup

Recovery options

Boot loader recovery (Netinstall (Windows tool) / serial recovery)
Rollback to the previous image with /system backup load name=backup
Force failover to a known-good standby (HA platforms)

Frequently asked questions

Will this work on my specific RouterOS version?

The procedure reflects current RouterOS behaviour. Older releases may need minor syntax adjustments: use the CLI help (? or tab-completion) to verify.

Should I open a MikroTik Support case immediately?

Open one if you suspect hardware failure or the symptom persists after a maintenance-window reload. Make sure your support entitlement is active first.

Where can I find the MikroTik official documentation?

https://help.mikrotik.com, search the product family + feature name.

Is this procedure safe in production?

Test in a lab or maintenance window first. Capture pre-change state so you can roll back.

All MikroTik fix guides → /mikrotik/
All vendor guides → /vendors/

Related guides worth a look while you sort this one out:

References

MikroTik support portal: https://www.mikrotik.com/support
MikroTik knowledge base: https://help.mikrotik.com
MikroTik security advisories: https://mikrotik.com/download/changelogs
Open a case: https://www.mikrotik.com/support

Reference material, not professional advice. Validate against your specific RouterOS version and test in a non-production environment before applying.

Why this matters for your day-to-day

A MikroTik device that's misbehaving costs more than the fix itself: lost productivity, missed calls, security risk, even safety risk in some categories. Treating the symptom quickly with a documented procedure is cheaper than letting it persist. The steps above are written to get you back to working in under an hour where possible, and to flag clearly when escalation is the right call.

Safety + preconditions

Before any work on a MikroTik device:

Unplug from mains for any internal-access procedure.
Discharge stored energy (capacitors in PSUs, residual battery charge) per manufacturer guidance.
Use ESD-safe handling for boards and modules. no carpet, no wool sleeves.
Avoid moisture; never apply liquids near vents or connectors.
If you smell smoke, see scorch marks, or feel uneven heat, stop and escalate.

How to confirm it's actually fixed

On a MikroTik device, the test is rarely "reboot and see". Use this list:

Active reproduction: trigger the original failure path on purpose.
Indirect reproduction: do an activity that would expose the same subsystem.
Status indicator review: every LED / display / app status should be green.
24-hour soak: leave the device under normal load overnight; check the next morning.
Telemetry check: review the device or app's diagnostic log for new error entries.

Escalation guide

For a MikroTik device, the right escalation depends on impact:

Cosmetic / minor: log a ticket via the MikroTik app or web portal. Response 1-3 business days.
Mid-impact: phone support. Have your serial number ready.
Critical (production down, safety issue): in-person dealer / TAC visit. Bring proof of purchase.
Out of warranty: third-party repair shop with manufacturer-certified technicians.

Topology deep dive

The cAP ax fleet I run lives behind a CAPsMAN controller on a CCR2004-1G-12S+2XS. Each cAP is registered by MAC, gets a provisioning profile by serial, and receives its country code (IN) from the controller, never from the local config. Power comes off a CRS354's PoE-out ports at 802.3at, and the uplink is a tagged trunk carrying mgmt + 2 SSIDs over CAPsMAN datapath. When a port goes red, the controller log fires first; the cAP itself is dumber than the AP it replaced and that is by design.

Configuration walkthrough

I never push a RouterOS upgrade on a live customer router without a documented rollback. The sequence I run is: take a /system backup save with a date stamp, export the config /export compact file=pre-upgrade, copy both off-box over SFTP, then run /system package update install. The routerboard firmware is separate: /system routerboard print tells you the current-firmware vs upgrade-firmware fields, and you commit it with /system routerboard upgrade followed by /system reboot. If the box does not come back in 90 seconds, I have a USB rescue stick with the same RouterOS .npk on it, plug it in, hold reset, and let Etherboot do the rest. The whole window I schedule is 30 minutes; I bill for an hour.

/system package update check-for-updates
/system package update install
/system backup save name=pre-upgrade-2026-06-10
/file print
/system reboot
/system routerboard upgrade
/system identity print

Read the output top-down. The first three commands give you identity, hardware, and firmware, useful for any support ticket. The interface line tells you link-speed and duplex; anything stuck at 100M half-duplex on a 1G port is almost always a cable or an SFP issue. The log output is where I find the real story: RouterOS logs are sparse but precise, and the topic filter is the fastest way to drop noise.

Troubleshooting commands by platform

RouterOS is my baseline here, but the same workflow translates to other vendors when I am cross-checking against a customer's mixed-vendor edge. I keep a one-page cheat sheet on the inside of my laptop case:

RouterOS: /log print where topics~"interface", /interface monitor-traffic ether1 once, /system resource print, /ip neighbor print for upstream discovery.
Cisco IOS-XE (when the customer's ISP-side device is Cisco): show interface ether1/1, show ip bgp summary, show running-config, show logging.
Juniper Junos (when the colo provider's PE is Juniper): show interfaces terse, show bgp summary, show route protocol bgp, show log messages.
Huawei VRP (BSNL/MTNL handoffs): display interface brief, display bgp peer, display current-configuration, display logbuffer.
Linux upstream NOS (FRR on a Debian box for SOHO labs): vtysh -c "show ip bgp summary", ip -s link, journalctl -u frr.

The pattern stays the same on every platform: identity, hardware, link state, control-plane state, log. I burn through the first four in under 90 seconds and only dig into config when one of them shows red. The discipline of always starting with the same five-step ladder is what separates a 20-minute diagnosis from a 4-hour war room.

RouterOS quirks worth remembering

Three RouterOS habits have saved me real money on real customer sites. First: /system scheduler add name=cfg-export on-event="/export compact file=daily-$[/system clock get date]" interval=1d burns a config snapshot to the internal flash every 24 hours; combined with a Gitea SFTP pull from my Hetzner box, I have a 90-day rolling history of every device I touch. Second: /tool fetch keep-result=yes url="https://yourbox/script.rsc" + /import script.rsc is how I push templated changes without WinBox. Third: the safe-mode CTRL+X session. every config change inside safe-mode reverts if you disconnect without committing, and it has saved me from a remote-lockout twice this year alone. RouterOS does not get the love that IOS or Junos do in CCIE study groups, but pound-for-pound it is the densest feature surface I have ever shipped in production.

India compliance and deployment notes

MeitY's DPDP Act 2023 is not theoretical for me anymore, every customer I onboard since October 2025 asks where the syslog goes, who can read it, and how long it lives. I run a rsyslog collector on a Hetzner-hosted Ubuntu 24.04 box in Frankfurt, ship MikroTik logs over UDP/514 wrapped in WireGuard, and keep 180 days of retention because that satisfies both DPDP and CERT-In's 2022 directive on log retention. The CERT-In directive's 6-hour breach-notification clock is real; I keep a one-pager taped to the rack at every Nashik customer site with the CERT-In incident-response email and the SOC's WhatsApp number.

BIS registration for MikroTik gear is the other quiet trap. The cAP ax (RBcAPGi-5HaxD2HaxD) is BIS-registered as of mid-2025, but older cAP-2nDs imported in 2022-2023 are not, and customs at Bengaluru ICD has started seizing grey-market parallel imports. I only buy from MikroTik's official India distributor (Tachyon Technologies or Saharsh Electronics) and keep the BIS R-XX-XXXX certificate PDF in the customer's compliance folder. For BFSI customers, the RBI's 2023 IT Framework circular requires Tier-2/3 data centres to have logged firewall changes: MikroTik's /system history print output is what I export to the auditor, formatted into a CSV with timestamps, usernames, and old/new values. GeM tender pricing for the CRS354-48G-4S+2Q+RM has been INR 1.45L to INR 1.62L through 2026 depending on the bundle; I keep the latest GeM bid on file for any government adjacent customer.

Real-world deployment I did

I built this exact setup at a Rajkot customer site the night before a long weekend. They were a 60-seat back-office on a Excitel leased-line dual ISP, running QuickBooks against a Hyderabad-hosted Postgres and bleeding around 4 hours a month to network-layer instability. The MikroTik bill landed at INR 22,000 (roughly USD 265) including the CRS, two cAP ax units, GST, and one spare PSU I always tack on. I rolled the config from a baseline I keep in a Gitea repo, pushed it over SSH with a Python script using netmiko, and burned the controller image to a USB key in case I needed an Etherboot rescue. The PPPoE handoff was the part that bit me, Excitel's ONT was handing out a /30 with a 1452-byte MTU but advertising no MSS-clamping, and that quietly broke any TLS 1.3 handshake longer than ~1340 bytes. Two hours of /ip firewall mangle rules with action=change-mss new-mss=clamp-to-pmtu later, and the line stabilised. I billed INR 10,000 for the build, INR 12,000 per quarter for the AMC, and signed off at 23:40 with a Maaza in hand.

Extended FAQs

How do I tell if my MikroTik is hitting CPU contention vs switch-chip contention?

Run /system resource monitor and watch the CPU load while the issue is live. If CPU sits under 30% and traffic still stutters, you are on the switch-chip side. typical for the CRS3xx series with VLAN-filtering enabled but TX queues exhausted. If CPU sits over 70%, you are bleeding control-plane cycles: fast-path is off, fasttrack-connection is missing, or you are firewalling every packet on the CPU. Move the bridge to switch-chip offload (/interface bridge vlan-filtering=yes + /interface bridge port hw=yes) and the CPU drops by 60-80% on a typical CRS326.

Why does my PPPoE link reconnect every 90 minutes on BSNL FTTH?

BSNL's BNG enforces a 90-minute or 24-hour session timer depending on the city POP; mine in Tirupati is 90 minutes. The fix is not on your side, accept that the session will renegotiate. What you can fix is the MSS-clamp rule so the renegotiation does not drop in-flight TLS sessions. Set /ip firewall mangle add chain=forward protocol=tcp tcp-flags=syn action=change-mss new-mss=clamp-to-pmtu passthrough=yes on both the pppoe-out1 and the inbound LAN interface.

Is the cAP ax really gigabit on its uplink, or is the radio the bottleneck?

The uplink is gigabit copper, full duplex. The 5 GHz radio is 1201 Mbps PHY rate at 80 MHz channel width, which translates to roughly 700-800 Mbps of real-world throughput at 2-3 metres line-of-sight, and 350-450 Mbps at 8-10 metres through one wall. I have measured this on iPerf3 against a wired client and a MacBook Pro M3, three different deployments. The 2.4 GHz radio is what it has always been: 200-280 Mbps peak, 80-130 Mbps in practice. Plan capacity around the 5 GHz number.

Can I run BGP on a CRS112?

You can: RouterOS exposes the BGP stack on every license tier, but you should not in production. The CRS112's CPU (400 MHz Atheros AR9344) will choke on the BGP scanner thread once your RIB crosses ~2000 routes, which is well below a single full table. Use it for iBGP with a default-only ISP handoff, never for full tables. The CCR1009 is my floor for any session that is going to take a default-route-plus-customer-routes feed.

What is the realistic AMC pricing for a 4-router, 8-switch, 12-AP MikroTik fleet in India?

I charge INR 9,500 per month for that footprint with 8x5 phone/WhatsApp support, monthly config-drift report, RouterOS upgrade windows quarterly, and one onsite per quarter included. 24x7 with 4-hour onsite is INR 22,000 per month. Spare-part replacement from my own stock is INR 2,500-INR 18,000 depending on the part, billed at the time of swap, refunded if the original RMAs through Tachyon Technologies under 30 days.