CarrierGatewayMissing on Wavelength + Local Zones: what causes it and how to fix
| Service | AWS Wavelength and Local Zones |
|---|---|
| Cloud | Amazon Web Services (AWS) |
| Guide type | Procedure |
| Skill level | Intermediate to advanced |
| Time | 15 - 60 minutes depending on account size |
When CarrierGatewayMissing on Wavelength + Local Zones, what causes it and how to fix bites you on AWS Wavelength and Local Zones, the first instinct is to open a ticket. Most of the time you do not have to. The steps below are the ones AWS Support would walk you through on the call.
What carriergatewaymissing on wavelength + local zones, what causes it and how to fix actually involves on AWS Wavelength and Local Zones
The CarrierGatewayMissing error from AWS typically surfaces with the message "No carrier gateway found for VPC". The error code itself is what you grep for in AWS re:Post or in AWS Support cases, not the human-readable line.
On Wavelength + Local Zones, this most often comes from one of three causes: a missing or restrictive IAM permission, a service-level limit you have hit, or a transient AWS-side capacity issue. The fix path differs by which.
The rest of this page is the structured fix path. Start with diagnose, then remediation, then the automation options so you do not have to do this by hand the next time it surfaces. Verify and safety sections at the end are the discipline that keeps the fix from regressing in production.
What you'll see
Check CloudWatch Logs for the calling service. Lambda, ECS, EKS, Step Functions, API Gateway, and most managed services write detailed traces to CloudWatch Logs under predictable log group names. Use CloudWatch Logs Insights with fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50 to surface the most recent failures.
Run aws sts get-caller-identity first. About one in five 'why does this not work' tickets are actually 'I am in the wrong account' or 'my session expired and the SDK is using stale creds'. The 5-second sanity check costs nothing and saves real time when the answer is that simple.
Diff against last known good. The last config change you made is the cause about three quarters of the time, even when the change should not have mattered. Use AWS Config history (or your Terraform / CloudFormation drift report) to see the actual delta between the resource state when it worked and when it broke. The change you remember is often not the only change that happened.
Solution-focused remediation path
When the fix involves a destructive operation (delete VPC endpoint, swap KMS key, rotate root credential), do it during a maintenance window with at least one teammate watching. Several AWS Wavelength and Local Zones operations have implicit dependencies that only show up when traffic starts flowing again. Document the rollback path before you start, not during the incident.
If quotas are suspect, the Service Quotas console shows current usage and the active limit side by side. Request increases through Service Quotas, not through Support tickets - quota dashboard requests usually approve faster (often within minutes for soft limits) and they are auditable in CloudTrail. Set up Service Quotas + CloudWatch alarms at 80 percent usage so you get notified before you hit the wall.
If you cannot reproduce the failure consistently, the cause is probably a race condition or a session-cache issue. Run the call with --profile set to a fresh STS session, in a different region you control, with a single concurrent request. If it works there but fails in your normal setup, the difference is the bug.
Automate this fix so you do not do it twice
Wire the fix into EventBridge for self-healing
If the failure mode is recurring, automate the remediation instead of the diagnosis. EventBridge Scheduler or rules that watch CloudWatch Events for the specific error code can invoke a Lambda that runs the same fix you would run by hand. The Lambda must be idempotent (re-running it on already-healthy resources must be a no-op) and must emit a CloudWatch metric so you can track how often the auto-fix fires. A spike in auto-fix invocations is itself a signal worth alerting on.
# EventBridge rule pattern (JSON)
{ "source": ["aws.wavelength"], "detail-type": ["AWS API Call via CloudTrail"], "detail": { "errorCode": ["AccessDenied", "ThrottlingException"] }
}Codify the fix in Terraform or CloudFormation
When you reach for the console to fix the same issue twice, the third occurrence should be solved in IaC, not in the console. Terraform's terraform import and CloudFormation's resource importer let you adopt the existing resource into state without recreating it. Lock the corrected attribute behind a variable so the next operator does not have to rediscover the value. Add a moved {} block or CloudFormation resource refactor to keep the diff clean.
Add a CloudWatch alarm so you know next time
The cheapest way to never see the same incident twice is a CloudWatch alarm on the metric that would have warned you. For AWS Wavelength and Local Zones, the relevant metrics live under AWS/wavelength namespace or under custom metrics published by your Lambda or ECS task. Set thresholds based on observed normal range plus one or two standard deviations, not on round-number guesses. CloudWatch anomaly-detection alarms remove the threshold-guessing problem entirely for metrics with regular seasonality.
Common traps
The most common pitfall when fixing this on AWS Wavelength and Local Zones is treating it as a one-off rather than as a recurring class of incident. The same misconfiguration tends to happen again after a deployment, a role rotation, or a region migration unless the fix is codified. Add a CloudFormation hook, Service Control Policy condition, or AWS Config rule that prevents the same misconfig from being introduced again. Documentation alone does not survive turnover.
Another common trap: confirming the fix on a single resource and assuming the fleet is healthy. Loop your check across every account, region, and IAM principal that could exhibit the same symptom. If you cannot enumerate the affected scope without a script, you do not yet understand the scope.
The repair
- Reproduce the original symptom path. If it still surfaces in any account or region or IAM role, you have not fixed it.
- Watch for 24 to 48 hours. AWS metrics and policy systems can mask issues with cached health for 6 to 12 hours, especially CloudFront and Route 53.
- Run a smoke test under realistic load. Happy-path tests miss race conditions and IAM session-cache issues.
- Capture the new state in a runbook so the next person on call does not have to rediscover this. Push it to Confluence or your team wiki, not into Slack.
- If the fix involved a permission change, run IAM Access Analyzer one more time to confirm you did not open a separate hole while closing this one.
Safety, rollback, blast radius
- Test in a non-production account if your environment has Control Tower or AWS Organizations. The cost of one sandbox account is cheaper than one rollback meeting.
- Export the existing config before changing it. Most AWS Wavelength and Local Zones resources support describe + export to JSON via CLI - capture that to source control before you start.
- Know your rollback path. Some AWS Wavelength and Local Zones operations are one-way (region migration, account-level feature opt-in, KMS key deletion past pending window). Confirm reversibility on the AWS doc before you commit.
- Be aware of cross-service impact. IAM role changes ripple to every service trusting that role. KMS key changes break every workload depending on that key. VPC endpoint changes affect every VPC consumer of that endpoint.
- Maintenance window discipline: if the change touches DNS, certificate rotation, or anything that emits TLS handshakes, line up a window with stakeholder notification, not a heroic mid-day swap.
FAQ
aws wavelength describe-... first, then commit it before you change anything. A few operations are one-way (KMS key deletion past the pending window, region migration, account closure). Check the AWS doc for the specific API before you commit.aws CLI or SDK calls - those almost always still work.References
- docs.aws.amazon.com - official documentation for AWS Wavelength and Local Zones
- AWS re:Post (formerly forums) - community Q&A with AWS-staff-verified answers
- AWS Health Dashboard at health.aws.amazon.com
- AWS Service Quotas console and AWS Well-Architected Tool
Related fixes
Related guides worth a look while you sort this one out:
- LocalZoneInstanceUnsupported on Wavelength + Local Zones. what causes it and how to fix
- WavelengthZoneNotEnabled on Wavelength + Local Zones, what causes it and how to fix
- How to choose between Wavelength Local Zones and Outposts for low latency
- How to deploy EKS workloads on Local Zones
- How to fix Availability Zone is not opted-in on Local Zones
- How to fix instance type not supported in this Local Zone launch failure