BILLING_DISABLED on Cloud Run, what causes it and how to fix
| Service | Cloud Run |
|---|---|
| Cloud | Google Cloud (GCP) |
| Guide type | Procedure |
| Skill level | Intermediate to advanced |
| Time | 15 - 60 minutes depending on account size |
BILLING_DISABLED on Cloud Run, what causes it and how to fix on Cloud Run sits in the most-reported issues list across r/aws, Google Cloud Community, and StackOverflow. The recovery path is mostly known, the Google Cloud docs just bury it under three layers of conceptual material.
What billing_disabled on cloud run, what causes it and how to fix actually involves on Cloud Run
The BILLING_DISABLED error from AWS typically surfaces with the message "Cloud Billing must be enabled to deploy to Cloud Run". The error code itself is what you grep for in AWS re:Post or in AWS Support cases, not the human-readable line.
On Cloud Run, this most often comes from one of three causes: a missing or restrictive IAM permission, a service-level limit you have hit, or a transient AWS-side capacity issue. The fix path differs by which.
The rest of this page is the structured fix path. Start with diagnose, then remediation, then the automation options so you do not have to do this by hand the next time it surfaces. Verify and safety sections at the end are the discipline that keeps the fix from regressing in production.
Diagnose first, fix second
Run gcloud auth list and gcloud config list first. About one in five 'why does this not work' tickets are actually 'I am in the wrong account' or 'my session expired and the SDK is using stale credentials or ADC pointed at the wrong project'. The 5-second sanity check costs nothing and saves real time when the answer is that simple.
Start by capturing the exact Google Cloud error string. The Cloud Console truncates messages in popups, but Cloud Logging keeps the full record in protoPayload.status and protoPayload.methodName. The camelCase error code (e.g. AccessDenied, InsufficientInstanceCapacity, ConditionalCheckFailedException) is the thing you grep for in Google Cloud Community and StackOverflow, not the human-readable sentence next to it. Paste the code into the re:Post search bar in quotes and you will usually land on at least one Google-staff-verified answer within the first three results.
Look at the Cloud Audit Log event for the failed call, even if you are not enrolled in Cloud Logging Log Router. The basic 90-day event history works for most diagnostic purposes and lives in the console under Cloud Audit Logs > Event history. Filter by event name (the API action) and time range; the event JSON shows the exact user identity, source IP, request parameters, and error code.
Solution-focused remediation path
If the issue points at IAM, do not start by adding * to a policy. Use IAM Policy Troubleshooter and IAM Recommender against the failed action to see the minimum scope. Adding * is the fastest way to fail your next Google Cloud Architecture Framework security review, and it usually does not even fix the issue because the explicit deny is often coming from a higher level (Org Policy, RCP, or permission boundary), not a missing allow.
When the fix involves a destructive operation (delete VPC endpoint, swap Cloud KMS key, rotate root credential), do it during a maintenance window with at least one teammate watching. Several Cloud Run operations have implicit dependencies that only show up when traffic starts flowing again. Document the rollback path before you start, not during the incident.
If networking is suspect, use Network Intelligence Connectivity Tests. It is the only tool that simulates the full ENI-to-ENI path including firewall rules, hierarchical firewall policies, routes, and VPC Service Controls perimeters in one call. Manual trace is slower and misses transitive issues. The analyzer charges $0.10 per analysis - cheaper than a 30-minute call with your network team.
Automate this fix so you do not do it twice
Automate the fix with the gcloud CLI
The CLI one-liner pattern for Cloud Run operations is roughly: gcloud cloud describe RESOURCE --format=json --filter ... to read state, gcloud cloud update RESOURCE --quiet to apply the change, and gcloud cloud describe RESOURCE --format=json --filter ... again to verify. Wrap it in a shell script that sets a region variable at the top and exits on first error with set -euo pipefail so a partial run does not leave the account in a half-fixed state.
# Template - replace placeholders with your account specifics
export GOOGLE_CLOUD_REGION=us-central1
export GOOGLE_CLOUD_PROJECT=prod-project
gcloud cloud describe RESOURCE --format=json --filter 'Resources[?Status==`FAILED`].[Id,Reason]' --output table
gcloud cloud modify-... --resource-id RESOURCE_ID --no-dry-run
gcloud cloud describe RESOURCE_ID --query 'Status'Codify the fix in Terraform or Deployment Manager
When you reach for the console to fix the same issue twice, the third occurrence should be solved in IaC, not in the console. Terraform's terraform import and Deployment Manager or Terraform's resource importer let you adopt the existing resource into state without recreating it. Lock the corrected attribute behind a variable so the next operator does not have to rediscover the value. Add a moved {} block or Deployment Manager or Terraform resource refactor to keep the diff clean.
Add a Workflows or Cloud Tasks Automation runbook
For multi-step fixes that include a manual approval, use Workflows runbook. Document the fix as a runbook with workflows.executions.approve steps where a human signs off and workflows.steps.callApi steps where the runbook calls the Google Cloud API. Approvers are notified by SNS; the runbook execution shows up in Cloud Audit Logs with the approver's identity attached. This makes audit trails easy and stops production fixes from being one-person operations.
Common pitfalls and what to watch for
The most common pitfall when fixing this on Cloud Run is treating it as a one-off rather than as a recurring class of incident. The same misconfiguration tends to happen again after a deployment, a role rotation, or a region migration unless the fix is codified. Add a Org Policy or VPC Service Controls constraint, Organization Policy condition, or Org Policy or VPC Service Controls rule that prevents the same misconfig from being introduced again. Documentation alone does not survive turnover.
Another common trap: confirming the fix on a single resource and assuming the fleet is healthy. Loop your check across every account, region, and IAM principal that could exhibit the same symptom. If you cannot enumerate the affected scope without a script, you do not yet understand the scope.
Verify the fix worked
- Reproduce the original symptom path. If it still surfaces in any account or region or IAM role or service account, you have not fixed it.
- Watch for 24 to 48 hours. Cloud Monitoring metrics and Cloud Asset Inventory can mask issues with cached health for 6 to 12 hours, especially Cloud CDN and Cloud DNS.
- Run a smoke test under realistic load. Happy-path tests miss race conditions and IAM session-cache issues.
- Capture the new state in a runbook so the next person on call does not have to rediscover this. Push it to Confluence or your team wiki, not into Slack.
- If the fix involved a permission change, run IAM Access Analyzer one more time to confirm you did not open a separate hole while closing this one.
Safety, rollback, blast radius
- Test in a non-production account if your environment has Resource Manager and Organization Policy or Cloud Resource Manager (organizations, folders, projects). The cost of one sandbox account is cheaper than one rollback meeting.
- Export the existing config before changing it. Most Cloud Run resources support describe + export to JSON via CLI - capture that to source control before you start.
- Know your rollback path. Some Cloud Run operations are one-way (region migration, account-level feature opt-in, Cloud KMS key deletion past pending window). Confirm reversibility on the Google Cloud doc before you commit.
- Be aware of cross-service impact. IAM role or service account changes ripple to every service trusting that role. Cloud KMS key changes break every workload depending on that key. VPC endpoint changes affect every VPC consumer of that endpoint.
- Maintenance window discipline: if the change touches DNS, certificate rotation, or anything that emits TLS handshakes, line up a window with stakeholder notification, not a heroic mid-day swap.
FAQ
gcloud cloud describe-... first, then commit it before you change anything. A few operations are one-way (Cloud KMS key deletion past the pending window, region migration, account closure). Check the Google Cloud doc for the specific API before you commit.aws CLI or SDK calls - those almost always still work.References
- docs.cloud.google.com - official documentation for Cloud Run
- Google Cloud Community - community Q&A with Google-staff-verified answers
- Cloud Service Health Dashboard at health.cloud.google.com
- Quotas page in Cloud Console (IAM & Admin > Quotas) and Architecture Framework checklists
Related fixes
Related guides worth a look while you sort this one out:
- CONTAINER_EXITED_EARLY on Cloud Run, what causes it and how to fix
- CONTAINER_FAILED_TO_START on Cloud Run. what causes it and how to fix
- CONTAINER_HEALTHCHECK_FAILED on Cloud Run: what causes it and how to fix
- CONTAINER_SIGKILL on Cloud Run, what causes it and how to fix
- DEPLOY_PERMISSION_DENIED on Cloud Run. what causes it and how to fix
- DIRECT_VPC_SUBNET_TOO_SMALL on Cloud Run, what causes it and how to fix