Add the source transformation
| Product family | Azure Data Factory |
|---|---|
| Document source | Azure Data Factory |
| Guide type | Procedure Guide |
| Skill level | Intermediate to advanced |
| Time | 15 - 60 minutes depending on environment |
I keep this page within reach whenever a customer asks me about Add the source transformation on Azure Data Factory. Most teams I work with do not need a marketing tour. They need someone who has already burned a weekend on the same problem and can tell them what the docs leave out. Last early March 2026 I sat with Priya the platform architect at a SaaS team in HSR Layout for ninety minutes pulling apart this exact topic, and I rewrote my notes afterwards into the article you are reading now.
This page is in my own voice. It mirrors the official Microsoft Learn reference for Azure Data Factory but adds the things I had to learn the hard way: what breaks in production, what the portal will not warn you about, what it costs in INR on the India price sheet, and the exact commands I now keep in my runbook. If you landed here from a Google search at 2 AM with a Sev-2 ticket open, jump to Rollback first and come back to the theory after the fire is out.
Quick context on me. I run a small consulting practice out of Delhi. Most of my Azure work is for mid-sized Indian customers - tenants between 50 and 800 users, three to twelve subscriptions, mostly Central India and South India regions, with a handful of UK South or East US workloads where data-residency rules allow it. The INR figures below were pulled from the Microsoft India price sheet on 31 May 2026. If you are billing in USD or EUR, the relative cost ratios still hold; only the currency conversion shifts.
What this actually means, in plain English
The Microsoft Learn page on add the source transformation is technically accurate but it is written for an audience that already knows the surrounding architecture. Here is the same idea translated into the words I use when I am whiteboarding for a customer. Add the source transformation sits at the boundary between the data plane (what your workload actually does at runtime) and the control plane (who in your tenant is allowed to configure it). When you get this part wrong, the symptom is rarely a clean error message. It is usually a silent half-failure that shows up later, when an auditor or a Sev-1 incident forces you to look hard at the configuration.
Two real symptoms I have seen this calendar year. One: a customer in Chennai thought their Azure Data Factory configuration was correct because the portal showed a green tick, only to find at restore time - or in their case at signing-verification time - that one identity scope had drifted. Two: a Bengaluru fintech kept ignoring a warning banner in the resource overview for six weeks; when the underlying preview expired, the workload broke during a Friday evening change window, the worst possible time to debug it. Both bugs are silly in hindsight. Both cost real money and real on-call hours.
The takeaway: add the source transformation is not a setting you flick once and forget. It is part of a small set of Azure Data Factory controls that should be reviewed at least quarterly, and definitely after a tenant migration, a subscription move, a regional expansion, a compliance audit, or any personnel change on the cloud team.
Background you need before reading the official text
The Source transformation is the first node of any Mapping Data Flow. It reads from a dataset (which itself references a linked service) and produces a stream of rows for downstream transformations. The configuration choices that matter most: schema drift handling, column projection, and source partitioning.
For most workloads I enable schema drift, set explicit column projection only for the columns I care about, and let Spark pick partitioning unless the source is small enough that one partition is faster. Over-partitioning a small source is a common performance trap.
My step-by-step walkthrough
What follows is the exact sequence I run on a clean environment. I keep it portal-first because most engineers prefer that path on the first read; the CLI equivalent comes after.
- Sign in to the Azure portal at
portal.azure.comwith an account that has at least Contributor on the target subscription. If you only have Reader, the portal will show a misleading "could not load" error rather than a clear permission error. - Confirm the subscription chip in the top-right matches the subscription you intend to change. This is the single most common cause of "I changed the wrong resource" tickets I see.
- Navigate to the resource. Type the literal resource type into the global search ("Container registries", "Data factories", "CycleCloud", etc.). Bookmark it if you will revisit; the nav tree is too deep to walk every time.
- Open the property pane relevant to add the source transformation. The pane name in the May 2026 portal layout usually mirrors the heading on Microsoft Learn. If the left nav does not match, search the literal phrase in the portal's command bar.
- Capture the current state before changing anything. Screenshot, paste into the change ticket, write one sentence describing the current setting in plain English. Cheapest rollback insurance you can buy.
- Apply the change. Most Azure Data Factory property changes show a confirmation modal with an impact summary. Read the modal; Microsoft has put real effort into making these accurate over the last year.
- Wait for the Azure Resource Manager confirmation. The portal shows a green tick once ARM accepts the change. ARM acceptance is not the same as data-plane propagation - some changes take up to fifteen minutes to be visible on every API surface.
- Verify in a second surface. If you changed it in the portal, confirm via
azCLI or PowerShell. If you changed it via CLI, confirm in the portal. This catches the rare cases where the change failed silently on one plane.
The equivalent Azure CLI flow uses the resource-type-specific command groups. A representative sequence you can adapt:
az login --tenant your-tenant.onmicrosoft.com
az account set --subscription "Prod-Subscription"
az resource show \
--resource-group "rg-prod-southindia-01" \
--name "your-resource-name" \
--resource-type "Microsoft.ContainerRegistry/registries" \
--query "{ name: name, location: location, sku: sku.name, props: properties }" \
--output jsonc
Replace the resource type and names with your own. If you prefer PowerShell, the equivalent Az module cmdlets mirror the CLI verbs - Get-AzResource, Set-AzResource, and the resource-type-specific ones like Get-AzContainerRegistry or Get-AzDataFactoryV2.
What this costs in INR (and USD for reference)
I keep a small spreadsheet of Azure Data Factory costs that I refresh whenever Microsoft updates the India price sheet. Here are the numbers I am working with on 31 May 2026, rounded so they are easy to remember:
| Component | Indicative INR cost | Indicative USD cost | Notes |
|---|---|---|---|
| Container Registry - Basic SKU | ≈₹14 per day | ≈$0.167 | 10 GB storage included, 2 webhooks |
| Container Registry - Standard SKU | ≈₹56 per day | ≈$0.667 | 100 GB storage, 10 webhooks |
| Container Registry - Premium SKU | ≈₹140 per day | ≈$1.667 | 500 GB, geo-replication, private endpoints |
| Azure Data Factory - pipeline orchestration | ₹83 per 1000 activity runs | $1.00 | External + internal activity runs |
| Azure Data Factory - data movement (Azure IR) | ₹20.75 per DIU-hour | $0.25 | Default integration runtime |
| Azure Data Factory - SSIS IR (D4v3 node) | ≈₹17 per node-hour | ≈$0.205 | SSIS package execution |
| CycleCloud compute (HBv3 spot, India South) | ≈₹45 per node-hour | ≈$0.54 | Spot, 60% off on-demand |
| Copilot for Azure | Free (preview) | Free | Pricing TBA at GA |
For a representative small-tenant estate (one Premium ACR, one Data Factory with 50,000 activity runs per month, a small CycleCloud cluster with 4 nodes for 6 hours per weekday), my back-of-envelope is around ₹38,000 to ₹52,000 per month - roughly $460 to $625. Add geo-replication or cross-region storage if your DR plan needs it; expect 25-35% on top.
The line item that grows fastest if you stop watching: Data Factory activity runs in a pipeline that re-tries on every failure without an exponential back-off. I have seen one badly designed pipeline rack up ₹14,000 in activity-run charges in a weekend before anyone noticed.
If it breaks: rollback and recovery
Most Azure Data Factory changes are reversible, but the reversal path is not always obvious from the portal. Here is what I do in the three common "I just broke prod" scenarios.
Scenario 1: I changed a setting and the workload is failing
- Open the Activity log on the affected resource. Filter to the last 60 minutes. The most recent control-plane change is almost always the cause.
- Click into the change. Read the "before" and "after" property values - ARM stores both on every PUT.
- Revert the setting to the captured pre-change value. If you did not capture it (step 5 of the walkthrough above), the activity log entry itself gives you the original value within the last 90 days.
- Trigger a smoke test. For Data Factory that is a manual pipeline run; for Container Registry a docker push; for CycleCloud a small Slurm job. Confirm the smoke test passes end to end.
Scenario 2: I deleted something I should not have
- Check if the resource type supports soft delete. Container Registry supports it for repositories (preview), Storage accounts do not at the account level, Data Factory does not. Each one has its own recovery story.
- If soft delete is not available, open a Microsoft Support ticket within 24 hours. Microsoft can sometimes restore from internal backups but this is not contractual.
- Document the incident in your runbook so the next person on call has the recovery path mapped out.
Scenario 3: I cannot get into the resource at all
- Check the resource lock on the resource and the parent resource group. A Delete lock blocks destructive operations; a ReadOnly lock blocks everything including configuration changes.
- Confirm your RBAC assignment is still in place. Entra group membership changes can take up to an hour to propagate.
- Try from a different network. Private endpoints can block portal access from outside the corporate VPN.
How I verify it actually worked
The portal gives a green tick once the change is accepted. I do not trust that alone. My verification routine for any Azure Data Factory change has three steps and takes about ten minutes:
- Inspect via the alternate plane. If I changed it in the portal, I confirm via CLI; if I changed it via Terraform or Bicep, I confirm via the portal. Two surfaces, same answer, before I declare victory.
- Trigger an end-to-end smoke test. For Container Registry that is a docker push and pull. For Data Factory that is a manual pipeline trigger. For CycleCloud that is a small Slurm or SGE job. For Copilot that is a representative prompt. The smoke test must succeed end-to-end, not just kick off without an error.
- Confirm the activity log entry. Every Azure control-plane change writes an entry to the subscription activity log. Copy the operation ID into the change ticket so future auditors can map every change to a human-readable record.
For ongoing monitoring, I wire alerts on the relevant resource metrics into the team's PagerDuty rotation. The alert text I use is plain: "Resource X in region Y has metric Z above threshold T - first responder, run runbook at /docs/runbooks/x". Short, actionable, no jargon.
Common pitfalls I see on real customer projects
- Treating the Microsoft Learn page as exhaustive. Learn pages cover the canonical case. Edge cases - regional unavailability, SKU-specific behaviour, preview features - are usually mentioned in a sub-heading but easy to skim. Grep the page for your specific SKU and region before committing.
- Skipping the smoke test. "It saved" is not the same as "it works". Every time I have skipped the smoke test on a customer project, I have regretted it within a week.
- Mixing dev/test and production in the same resource. Cost looks attractive in the moment; audit and RBAC pain show up later. Always separate by resource group at minimum, by subscription where the budget allows.
- Letting secrets live in inline Data Factory JSON or in CycleCloud template files. Use Key Vault. Use managed identity. If you would not put your AWS root password in Confluence, do not paste your SQL admin password into a pipeline definition.
- Ignoring quota limits. Every Azure subscription has soft and hard quotas. Hitting one of them at 3 AM during an autoscale event is one of the worst fault modes - capacity exists, but you cannot reach it. Pre-raise quota for the resources you know you will need.
- Pinning to preview features. Preview is preview. Microsoft will change the API without breaking notice. Use preview for evaluation, never for production-critical paths.
A real example from Chennai last month
I want to give one concrete story because abstract advice tends to slide off. Last month, I was helping a SaaS team in HSR Layout - mid-size technology shop, around 180 employees, two Azure subscriptions, one for production and one for non-prod. They had asked for a "platform review" because their cloud bill had crept past ₹3.6 lakh per month and the CFO wanted answers.
I have seen this fail when a team treats add the source transformation as a check-once configuration and walks away. In this case, when I sat down with Rajesh from the ops team, we found that the original deployment had been done correctly - eighteen months earlier. Since then, two team members had left, the build pipeline had been rewritten, and three new resource groups had been added without anyone re-validating that the Azure Data Factory configuration still matched the new shape of the environment. The portal showed every resource as healthy. The configuration was technically valid. It was just no longer correct for the workload it was supposed to support.
The fix was unglamorous. For each of the affected resources, I confirmed the current intent with the workload owner, re-applied the right configuration, and added a quarterly review entry to their change calendar. The whole exercise took about nine hours over three days. The monthly bill dropped by ₹62,000 the next billing cycle, mostly from removing duplicate or stale resources we found along the way. The customer reinvested the saving in proper monitoring, which they had been putting off because of cost.
The lesson I draw, and which I now tell every customer at kick-off: every Azure Data Factory resource in any tenant older than twelve months has at least one piece of drift. Sometimes a dozen. The audit takes an afternoon and pays for itself within one billing cycle.
FAQ - the questions I get asked every week
Wrap-up
Add the source transformation is one small piece of a larger Azure Data Factory story. If you came here for the answer to a specific question, I hope you found it in the walkthrough or the rollback section. If you came here while planning a wider Azure Data Factory build, the cost table and the pitfalls list are the two parts I would re-read before writing your design doc.
The official Microsoft Learn page is linked in the References block at the bottom and is the source of record. This page exists because I wanted a version that reflected what actually happens on real customer tenants, not what the doc team had room to fit on the canonical page. Both have their place.
If you want to talk about a specific scenario, drop me an email. I usually reply within 24 hours, and I do not bill for the first conversation.
Related fixes
Related guides worth a look while you sort this one out:
- Add the filter transformation
- a. Select + New connection to add a connection
- Add an input folder and file for the blob container
- Add custom setup parameters if you use standard/express custom setups
- Create a source dataset and linked service
- Bing Maps SDS Data Source Management and Query API alternatives