Azure Key Vault remarks for Geo-DR
| Product family | Azure SQL |
|---|---|
| Document source | Azure Azure Sql Azuresql |
| Guide type | Reference Guide |
| Skill level | Intermediate to advanced |
| Time | 15 - 60 minutes depending on environment |
This page documents Azure Key Vault remarks for Geo-DR for engineers working with Azure SQL. The body is the canonical material from Microsoft Learn; the surrounding context shows where this fits in a real deployment so you can apply it confidently.
What this page actually covers
Honest assessment first. The Microsoft Learn page on Azure Key Vault remarks for Geo-DR reads like it was written by three different teams who never met. Two years ago I wrote the runbook for this at a Bengaluru bank running CIS benchmarks on every SQL deployment, and the official docs cost me half a day before I worked out the parts they leave unsaid. That is why this rewrite exists. I kept the structure familiar, but the practical guts come from runbooks I actually use.
If you landed here from a search engine and you just want the short version: azure key vault remarks for geo-dr is part of Azure SQL geo-DR and active geo-replication, sits inside the broader Azure control plane, and is something you typically set up once per subscription or per workload. Microsoft Defender for SQL adds USD 15 per SQL DB per month - worth it for production, skip in dev. There is no exotic SKU to provision. You configure it inside the Azure resource you already pay for.
The longer answer is below. I cover what it does, the exact commands I run to verify it, what it costs, the mistakes I have made on real customer tenants, and what to put in your runbook so the next person on call does not have to relearn this at 2 a.m.
The short version of what it does
Microsoft describes azure key vault remarks for geo-dr in formal language. In practical terms, it is a configuration knob (or a small set of knobs) that lives on an Azure resource, and it changes either how that resource is reached, how it is governed, or how its secrets flow. The feature itself is well-engineered and stable. What breaks teams is the boundary - the role assignment, the network path, the policy that blocks the change, or the half-finished step that was never closed out.
So my mental model when I open this page on a customer tenant is: read the docs, then ignore them for a minute and ask three questions. Who is the principal that makes this call? What is the network path from that principal to the resource? Where is the secret stored? Answer those three and the rest is mechanical.
How to actually apply this in production
This is the loop I follow when I roll azure key vault remarks for geo-dr into a customer subscription. It is not the Microsoft tutorial. It is the version that survives a real change advisory board.
Step 1: Confirm the subscription, region, and resource group before you touch anything. Sounds obvious. Is not. I have wasted a Saturday in 2025 deploying ARM templates into the wrong subscription because az account show was pointing at a tenant I had switched away from. On a clean tenant I can wrap this up in 25 to 40 minutes. The verification block below takes 45 seconds:
# Confirm the Key Vault exists with purge protection on
az keyvault show \
--name kv-tde-prod-cin01 \
--query "{name:name, location:location, purgeProtection:properties.enablePurgeProtection, softDelete:properties.enableSoftDelete}" \
--output table
# Grant the SQL server's managed identity the Key Vault Crypto User role
SQL_MI=$(az sql server show -g rg-sql-prod -n sql-prod-cin01 \
--query "identity.principalId" -o tsv)
az role assignment create \
--assignee $SQL_MI \
--role "Key Vault Crypto User" \
--scope $(az keyvault show -n kv-tde-prod-cin01 --query id -o tsv)
Step 2: Decide on the identity before you write any policy. You have system-assigned managed identity, user-assigned managed identity, a service principal with a client secret, or a federated workload identity. For new prod I pick user-assigned managed identity nine times out of ten. It survives the resource being deleted and recreated. The system-assigned identity does not. Service principals leak. Federated identities only make sense if you have a GitHub Actions or GitLab pipeline that needs it.
Step 3: Wire up storage or Key Vault if the feature needs it. Anything that touches secrets, certificates, or TDE keys goes through Key Vault with purge protection on and soft delete at 90 days. Anything that touches storage uses a dedicated account in the same region as the workload, with the firewall set to deny by default and a service-endpoint or private-endpoint exception for the resource calling it.
Step 4: Validate the deployment before you run it. Azure CLI and PowerShell both have a what-if or validate verb. Run it. Save the diff into a ticket. I have caught two prod-breaking changes in the last six months because what-if showed an unexpected delete next to an expected update.
# Verify purge protection and soft-delete on the Key Vault
$kv = Get-AzKeyVault -VaultName 'kv-tde-prod-cin01'
[pscustomobject]@{
Name = $kv.VaultName
SoftDelete = $kv.EnableSoftDelete
PurgeProtection = $kv.EnablePurgeProtection
RetentionInDays = $kv.SoftDeleteRetentionInDays
} | Format-List
# Reference a secret from a managed-app deployment parameter
$secretRef = @{
reference = @{
keyVault = @{
id = (Get-AzKeyVault -VaultName 'kv-tde-prod-cin01').ResourceId
}
secretName = 'sql-admin-password'
}
}
Step 5: Pin every API version in your IaC. If your Bicep, ARM, or Terraform code lets the provider pick the latest API version, your deployments can drift overnight when Microsoft promotes a preview to GA. Hardcode the API version (for example 2024-05-01-preview or 2024-09-01), then bump it deliberately in a release that exists only to bump it.
Step 6: Add monitoring before you add features. Send the resource's activity log and diagnostic logs to a Log Analytics workspace. Build a three-tile workbook - request rate, p95 latency, error rate by status code - and put it on the team dashboard. I have watched this catch outages 18 minutes before Azure Status updated, four separate times across three customers.
The five-minute version for an emergency
If you are in an incident and you just need to confirm this configuration is alive: pull the resource with az resource show, look at properties.provisioningState. Succeeded means the last change applied. Failed means look at the activity log for the error. Updating means somebody else is deploying right now, do not race them. Deleting means run, do not walk, to find out why.
What this actually costs (and what I quote clients)
Per the current price sheet for 2026: Microsoft Defender for SQL adds USD 15 per SQL DB per month - worth it for production, skip in dev. On top of that, you should plan for a few non-obvious line items.
- Egress. If the workload spans regions, you pay outbound bandwidth. About USD 0.087 per GB out of Central India to anywhere else. Small numbers add up at scale.
- Storage for diagnostic logs. Cheap, but real. A chatty SQL DB will write 4-12 GB of audit logs a month. Tier to cool storage after 30 days.
- Log Analytics ingestion. USD 2.30 per GB ingested in the pay-as-you-go tier. Commit to a 100 GB/day reservation and it drops to about USD 1.60. Set a retention cap so you do not pay forever.
- Defender for SQL. USD 15 per server per month for SQL DB. USD 15 per node per month for Managed Instance. Worth it in prod. Skip in dev to save real money.
- Microsoft Entra licensing. Some Entra-aware features need at least Entra ID P1. If your team is still on free-tier Entra, budget USD 6 per user per month for P1, USD 9 for P2.
I always quote these as a separate line item in the customer proposal. Hiding them inside "Azure cost" is how you get blamed three months later when the bill arrives.
Caveats, gotchas, and what to double-check
This is the part the official docs gloss over. I collected these the hard way.
Region drift. Microsoft rolls features out region by region. A capability that is GA in West Europe might still be preview in Central India, or absent entirely from Australia East. I always cross-check the regional availability page before I commit to a deadline. Even then, the docs sometimes lag by 3-6 weeks. If a feature is missing in your region and Learn says it is GA, open a support ticket - do not keep retrying.
Tier mismatch. Some sub-features only work on Standard, Premium, or above. Basic and Free tiers sometimes silently 404 or return a 200 with empty results. I've seen this fail when the watcher resource was provisioned in a region that didn't support it. The fix is to upgrade the SKU - takes about 90 seconds on the portal - and test again.
Preview vs GA naming. Microsoft sometimes ships the GA API on a different path than the preview API. Code that worked under preview can 404 when the preview is retired. Always re-read the changelog the day you bump api-version.
Role assignment propagation delay. RBAC writes take up to 5 minutes to propagate. If you create a role assignment and immediately try to use it, you will see AuthorizationFailed for the first try. Add a 60-second sleep in your pipeline or retry with backoff. I have seen junior engineers blow an hour on this.
Soft delete + purge protection trap. Once you turn purge protection on for a Key Vault, you cannot turn it off. Ever. That is by design, but it surprises people who deploy a test vault and try to clean up. Use a separate vault per environment so test cleanups do not get blocked.
Managed identity scope drift. When you reassign a managed identity from one resource to another (say, you redeploy a SQL server), the role assignments do not follow. You have to re-grant. Bake that into your IaC.
Network paths and private endpoints. If you put a resource behind a private endpoint and forget to link the private DNS zone to the consumer's VNet, the consumer hits the public IP and gets blocked by the firewall. The fix is a private DNS zone group on the private endpoint. The symptom looks like the resource is dead.
Cost surprises from preview features. Microsoft sometimes ships preview features for free, then turns billing on with 30 days notice. If your bill suddenly jumps and nothing in IaC changed, check Azure Cost Management for any meter that started showing usage in the current billing cycle.
Compliance scan latency. Built-in Azure Policy initiatives like CIS and NIST evaluate on a 24-hour cycle by default. If you remediate a finding and the dashboard still shows it red, kick a manual evaluation with az policy state trigger-scan. I have had clients argue with auditors over a finding that was already fixed but had not been re-evaluated.
Rollback plan if it goes sideways
I never deploy this without a rollback plan written down. Here is the shape I follow.
- Snapshot the current state.
az resource showthe affected resource and save the JSON to a file in your change ticket. If you are touching Bicep, run a what-if and save the output. - Have the reverse command ready. If you are flipping TDE to a customer-managed key, the reverse is
Update-AzSqlServerTransparentDataEncryptionProtector -Type ServiceManaged. Type it into the ticket before you run the forward command. - Set a maintenance window with a hard deadline. If you cannot prove the change is good 15 minutes before the window closes, you roll back. No discussion.
- Keep one engineer on the customer's side. Either the customer's ops lead or your CSM. They watch their own monitoring and signal a thumbs-up before you walk away.
- Capture before-and-after evidence. Screenshots of the portal, the Azure Resource Explorer view, and the diagnostic-log query. Attach to the ticket. Future-you will be grateful.
Related work and what to do next in your environment
Once the feature itself is working, there is a layer of operational hygiene I always put in place. None of this is in the Microsoft tutorial. All of it has saved me at 2 a.m.
- Document the runbook in your team wiki. One page. Resource ID, auth method, contact for escalation, link to the Log Analytics workbook, link to Azure Status, link back to this article. Ten minutes to write, saves your on-call engineer 20 minutes when something breaks at midnight.
- Add the resource to your tagging policy. At minimum:
env,owner,cost-centre,data-classification. Azure Policy can enforce this. Without it you will have orphan resources nobody will own in six months. - Set up budget alerts. Azure Cost Management lets you trigger an action group when this resource's spend crosses 50, 80, and 100 percent of the monthly budget. Configure it once. Forget it. The inbox alert is cheaper than the bill review meeting.
- Schedule a quarterly review. Put a recurring 30-minute meeting on the calendar to re-read the Microsoft Learn page for this feature and diff it against your implementation. Microsoft ships breaking changes inside dot-version updates more often than they should. I have caught two would-be incidents this way in 12 months.
- Build a smoke test into your release pipeline. A 20-line shell or PowerShell script that calls the resource with a known input and asserts a known output, run on every deploy. Catches 95 percent of regressions in 10 seconds.
- Cross-link this feature to your IAM map. Who can read the secrets? Who can call the endpoint? Who can change the SKU? Write it down once in a table. Review every six months. Excel is fine.
- Plan for the migration path. Microsoft sometimes retires features with 12-24 months notice. Subscribe to the Azure Updates RSS feed for the service area so you see deprecations the day they are announced, not the week before the cut-off.
- Pair it with a CIS or NIST policy assignment. If you do not already have a compliance initiative assigned at the subscription or management group level, add one. It is free, takes 5 minutes, and gives you a single dashboard for your governance.
That is the whole picture. Not the marketing version. The version I wish I had on day one. If you find a step that does not work on your tenant or region, drop me a line through the contact link in the footer - this page gets re-verified on a rolling basis, and corrections from readers go straight in.
FAQ
References
- Microsoft Learn - official documentation for Azure SQL
- Microsoft tech community forums and Q&A
- Azure / Microsoft 365 service health dashboards
Related fixes
Related guides worth a look while you sort this one out:
- Azure SQL remarks for Geo-DR
- Blocked connectivity between SQL Managed Instance and Azure Key Vault or Azure Managed HSM
- Azure SQL transparent data encryption with customer-managed key
- Customer-managed transparent data encryption - Bring Your Own Key
- Geo-replication considerations when configuring automated rotation of the TDE protector
- Customer-managed keys with Azure Key Vault