Fix Azure Synapse Analytics Setup Errors (Step-by-Step)
Why This Is Happening
You spin up an Azure Synapse Analytics workspace, open Synapse Studio with genuine excitement, paste in your first OPENROWSET query against a Parquet file , and immediately hit a wall. Maybe it's Login failed for user '<token-identified principal>'. Maybe your KQL script just times out. Maybe you created what you thought was a perfectly valid external data source and Synapse has no idea what you're talking about. I've seen this pattern on dozens of enterprise Azure onboarding projects, and the frustrating part is that Microsoft's error messages almost never tell you the actual root cause.
Azure Synapse Analytics is three different compute engines sitting inside one workspace , a serverless SQL pool, a dedicated SQL pool, and a Data Explorer pool, each with its own authentication model, permission system, and configuration quirks. That's genuinely powerful once it's working. But during setup, that same layered architecture means there are three different places where something can silently go wrong before you even run a query.
The most common failure point I see is identity and permissions. Azure Synapse serverless SQL pool doesn't work like a traditional SQL Server instance where you add a login and move on. It sits on top of Azure Data Lake Storage Gen2, authenticates through Microsoft Entra ID (formerly Azure Active Directory), and uses a permission model that spans both the Synapse workspace level and the storage account level. If your Entra ID admin isn't configured on the workspace, every token-based login will fail, including your own, with that cryptic principal error message.
The second big class of problems comes from the data exploration database setup. You cannot create custom objects like external data sources, credentials, or user-defined functions in the master database. Plenty of first-timers try, hit an error, and assume something is broken with the workspace. It's not broken, master just isn't meant for your objects. Once you create a dedicated exploration database with the right UTF-8 collation, things start behaving correctly.
And then there's collation. This one bites teams who are reading UTF-8 encoded Parquet files or querying Azure Cosmos DB containers. If your database was created without the _UTF8 suffix on the collation, text columns can come back garbled or truncated. It's subtle, it doesn't throw an obvious error, and it's fixed with one line of SQL that most setup guides skip entirely.
I know this is frustrating, especially when it's blocking a demo, a data migration deadline, or an executive dashboard that was supposed to be live yesterday. The good news is that every one of these problems has a clear, documented fix. Let's work through them together. Browse all Microsoft fix guides →
The Quick Fix, Try This First
If you're getting Login failed for user '<token-identified principal>' when running queries in Azure Synapse Analytics, this is almost always a missing Entra ID admin on the workspace. Here's how to fix it in under five minutes.
- Open the Azure Portal at portal.azure.com and search for your Synapse workspace name in the top search bar.
- In the left-side navigation of your workspace resource, scroll down to the Settings section and click Microsoft Entra ID.
- Click the Set admin button. A panel will slide out showing your organization's directory.
- Search for your account or a security group that should own admin access, select it, and click Select.
- Click Save at the top of the Microsoft Entra ID blade.
- Go back to Synapse Studio, open a new SQL script, and re-run your query.
In the vast majority of cases, that's it. The token-identified principal error vanishes immediately because Synapse can now resolve your Entra identity against the workspace. No workspace restart required, no waiting for replication.
If you're hitting a different error, maybe your OPENROWSET query against a Parquet file is returning File cannot be opened or your external data source isn't resolving, scroll down to Step 4 where I cover the full OPENROWSET configuration against Azure Data Lake Storage Gen2.
This is the foundation step. Without a Microsoft Entra ID admin configured on your Azure Synapse Analytics workspace, token-based authentication fails for serverless SQL pool queries, even for the account that created the workspace. It's a chicken-and-egg setup requirement that the provisioning wizard doesn't always make obvious.
Navigate to the Azure Portal and locate your Synapse workspace resource. Under Settings in the left panel, click Microsoft Entra ID. Hit Set admin and assign either your personal account or an appropriate admin group from your Entra tenant.
Once saved, return to Synapse Studio. In the top toolbar of any SQL script panel, verify the Connect to dropdown shows Built-in for serverless pool queries. Then try a minimal test query like:
SELECT TOP 1 *
FROM OPENROWSET(
BULK 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet',
FORMAT = 'PARQUET'
) AS [result]
If your authentication is now correct, this query executes and returns a single row from the Parquet file. If you still see the login error, double-check that the account you assigned as Entra admin matches the account you're logged into Synapse Studio with. Browser session caching sometimes holds stale credentials, sign out of Synapse Studio and sign back in to force a fresh token exchange.
You should see a result grid with column headers derived from the Parquet schema. That's your confirmation that the identity layer is healthy.
Once authentication is working, the next thing to sort out is your database structure. The master database in Azure Synapse Analytics serverless SQL pool is a system database, you cannot create external data sources, database-scoped credentials, or custom views inside it. Trying to do so will give you a permission error that looks more serious than it actually is.
Switch to the master database context using the database dropdown in the Synapse Studio toolbar, then create a new database with this exact syntax:
CREATE DATABASE DataExplorationDB
COLLATE Latin1_General_100_BIN2_UTF8
The collation is not optional if you're working with Parquet files or Cosmos DB containers. Latin1_General_100_BIN2_UTF8 ensures UTF-8 encoded text is correctly converted to VARCHAR columns when you query those sources. Skipping this and using a default collation can cause garbled string data on non-ASCII characters, a bug that's infuriatingly hard to track down after the fact because the data looks almost right.
After the database is created, switch to it immediately:
USE DataExplorationDB
Or use the database dropdown in the toolbar to select DataExplorationDB. All subsequent object creation, external data sources, credentials, users, must happen in this context, not in master.
Once the USE statement runs without error, you'll see DataExplorationDB appear in the database dropdown. That confirms it's live and available for your exploration objects.
Hard-coding a full Azure Data Lake Storage Gen2 URL into every single OPENROWSET call works, but it's messy and brittle. If your storage account URL changes or you're managing dozens of queries, you want an external data source, a named reference that you maintain in one place.
With DataExplorationDB selected as your active database, create the external data source like this:
CREATE EXTERNAL DATA SOURCE ContosoLake
WITH ( LOCATION = 'https://contosolake.dfs.core.windows.net')
A few things worth understanding here. First, if you don't attach a database-scoped credential to this data source, Synapse will use the caller's own Entra identity to access the storage account. That works fine for personal exploration, but in a team environment where multiple analysts share the workspace, you'll want to attach a managed identity credential or a shared access signature so that permissions are predictable and auditable.
Second, the LOCATION value should point to the DFS endpoint (.dfs.core.windows.net), not the blob endpoint (.blob.core.windows.net). The DFS endpoint is the Azure Data Lake Storage Gen2 interface and is what enables hierarchical namespace operations. Using the blob endpoint sometimes works but can cause subtle issues with directory-level queries.
Once created, any query in this database can reference files with a relative path instead of a full URL:
SELECT TOP 100 *
FROM OPENROWSET(
BULK '/users/NYCTripSmall.parquet',
DATA_SOURCE = 'ContosoLake',
FORMAT = 'PARQUET'
) AS [result]
You'll see results instantly, same data, much cleaner query.
If you're setting up Azure Synapse Analytics for a team, analysts, data engineers, BI developers, you need to create individual database users rather than having everyone authenticate as the workspace admin. This is how you control who can read which data sources without giving everyone broad admin rights.
Start by creating a SQL login in the context of the master database. Switch back to master using the dropdown, then run:
CREATE LOGIN data_explorer WITH PASSWORD = 'My Very Strong Password 1234!';
Use a genuinely strong password here, Azure enforces complexity requirements and will reject anything too simple. Mix uppercase, lowercase, numbers, and special characters, and make sure it's at least 12 characters long.
Now switch back to DataExplorationDB and create the corresponding database user, then grant the permission that allows this user to run bulk read operations:
CREATE USER data_explorer FOR LOGIN data_explorer;
GRANT ADMINISTER DATABASE BULK OPERATIONS TO data_explorer;
The ADMINISTER DATABASE BULK OPERATIONS permission is specifically what allows a user to run OPENROWSET queries in a serverless SQL pool database. Without it, any analyst login will get a permission denied error the moment they try to query external data, even if they have read access to the underlying storage account.
Test the new user by connecting with those credentials in a new browser session or by using a SQL client like Azure Data Studio. A successful SELECT TOP 10 via OPENROWSET confirms the permission grant worked.
For high-performance analytics workloads, large aggregations, joins across billions of rows, PowerBI reports that need sub-second response, the dedicated SQL pool is what you want, not the serverless pool. The dedicated pool uses clustered columnstore indexes and massively parallel processing to handle that kind of query load. But getting data into it requires a different approach: the COPY INTO command.
Here's a real example loading NYC taxi trip data from a Parquet file into a dedicated SQL pool table. This assumes you've already created the dbo.NYCTaxiTripSmall table as a heap table in SQLPOOL1:
COPY INTO dbo.NYCTaxiTripSmall
(VendorID 1, store_and_fwd_flag 4, RatecodeID 5, PULocationID 6, DOLocationID 7,
passenger_count 8, trip_distance 9, fare_amount 10, extra 11, mta_tax 12,
tip_amount 13, tolls_amount 14, ehail_fee 15, improvement_surcharge 16,
total_amount 17, payment_type 18, trip_type 19, congestion_surcharge 20)
FROM 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'
WITH (
FILE_TYPE = 'PARQUET',
MAXERRORS = 0,
IDENTITY_INSERT = 'OFF'
)
The column-to-ordinal mapping in the column list (the numbers after each column name) tells Synapse which column in the Parquet file maps to which table column. If your Parquet schema exactly matches your table schema in order and name, you can omit the mapping entirely, but explicit mapping is safer when you control the table definition independently of the source file schema.
This script typically finishes in under 60 seconds for a 2-million-row dataset. Once it completes, right-click the dbo.NYCTaxiTripSmall table in the Data hub under SQLPOOL1 and select New SQL Script > Select TOP 100 Rows to verify the data loaded correctly. If the table doesn't appear immediately, hit the refresh button on the Tables node, Synapse Studio doesn't auto-refresh the schema tree.
Advanced Troubleshooting
Setting Up a Data Explorer Pool for KQL Workloads
Azure Synapse Analytics also includes a Data Explorer pool, a Kusto-based engine designed for log analytics, time-series data, and operational telemetry. If your team works with event streams, IoT sensor data, or application logs, this is the compute engine you want. But it has its own setup sequence that's completely separate from the SQL pool workflow.
To create a Data Explorer pool, go to Synapse Studio and click Manage in the left-side panel, then select Data Explorer pools and click New. On the Basics tab, there are hard naming rules that are easy to violate: the pool name must be lowercase letters and numbers only, between 4 and 15 characters, and must start with a letter. Names like MyPool or data-explorer-1 will fail validation. Something like contosodataexplorer is exactly right.
For workload type, Compute optimized gives you a higher CPU-to-SSD ratio which performs well for most analytics queries. For development and testing, set Node size to Small (4 cores) to avoid unnecessary cost.
After the pool provisions, create a Data Explorer database by going to Data in the left panel, clicking + (Add new resource), and selecting Data Explorer database. Set your retention period (365 days is a reasonable default for most teams) and your cache period (31 days keeps frequently queried data in fast SSD/RAM storage).
Once the database is live, create a KQL script from the Develop panel. Use the Connect to dropdown to select your Data Explorer pool and the Use database dropdown to select your database. Then create a table and ingest sample data:
.create table StormEvents (StartTime: datetime, EndTime: datetime,
EpisodeId: int, EventId: int, State: string, EventType: string,
InjuriesDirect: int, InjuriesIndirect: int, DeathsDirect: int,
DeathsIndirect: int, DamageProperty: int, DamageCrops: int, Source: string,
BeginLocation: string, EndLocation: string, BeginLat: real, BeginLon: real,
EndLat: real, EndLon: real, EpisodeNarrative: string,
EventNarrative: string, StormSummary: dynamic)
.ingest into table StormEvents
'https://kustosamples.blob.core.windows.net/samplefiles/StormEvents.csv'
with (ignoreFirstRecord=true)
After ingestion, verify with a simple KQL query:
StormEvents
| sort by StartTime desc
| take 10
If you don't see the table in the left panel after creation, select the contosodataexplorer more menu and click Refresh, the schema tree is not live-updating.
Querying Data After Setting Up a Logical Data Warehouse
For teams building a relational layer on top of Azure Data Lake Storage, what Microsoft calls a Logical Data Warehouse, the serverless SQL pool is your engine. The pattern involves creating views in your DataExplorationDB that wrap OPENROWSET calls with DATA_SOURCE references. This lets downstream tools like Power BI connect to Synapse as if it were a traditional SQL Server, while all the actual data lives in Data Lake Storage Gen2. The key to making this perform well is defining the column types explicitly in your OPENROWSET calls using WITH clauses rather than letting Synapse infer them, schema inference adds latency on every query.
Pausing Your Dedicated SQL Pool to Control Costs
The dedicated SQL pool bills continuously whether or not you're running queries. During setup, testing, and off-hours, you should pause it. In Synapse Studio, go to Manage > SQL pools, hover over your dedicated pool, and click the Pause button. Confirm the pause in the dialog. The pool state changes to Paused within a few minutes. Resume it the same way when you need it again. Forgetting to pause during initial exploration is one of the fastest ways to generate an unexpected Azure bill.
Prevention & Best Practices
Most Azure Synapse Analytics setup problems are avoidable with a handful of configuration decisions made at workspace creation time rather than discovered through painful debugging later. Here's what I'd tell any team starting fresh.
Always assign your Entra ID admin before running a single query. This is step zero. It's not listed prominently in the quickstart UI flow, but skipping it means every subsequent authentication attempt will fail with a cryptic error. Make it a checklist item in your workspace provisioning runbook.
Create your data exploration database with the UTF-8 collation from the start. Changing a database's collation after the fact requires dropping and recreating all objects inside it. That's a painful refactor that's completely avoidable with one extra parameter in the CREATE DATABASE statement: COLLATE Latin1_General_100_BIN2_UTF8. Set it on day one.
Use external data sources instead of full storage URLs in queries. Hard-coded URLs in OPENROWSET calls become a maintenance problem at scale. When your storage account moves, gets renamed, or changes authentication, you want to update one external data source definition, not hunt through hundreds of SQL scripts.
Set up least-privilege users from the beginning. Give data analysts the ADMINISTER DATABASE BULK OPERATIONS permission in the exploration database and nothing else. Don't use the workspace admin account for day-to-day data work. This makes your security posture auditable and limits blast radius if credentials are ever compromised.
Tag and monitor your dedicated SQL pool costs proactively. The dedicated pool is the most expensive component of a Synapse workspace. Add a cost alert in Azure Cost Management at 80% of your monthly budget threshold, and build the pause/resume habit into your team's working practices. Synapse also supports scheduled auto-pause through Azure Automation if your team wants a safety net.
- Set your Entra ID admin in the Azure Portal before running any Synapse Studio queries, fixes the token-identified principal error instantly
- Always create exploration databases with
COLLATE Latin1_General_100_BIN2_UTF8to avoid UTF-8 text encoding issues with Parquet and Cosmos DB sources - Pause your dedicated SQL pool (SQLPOOL1) when not actively running production queries, it bills continuously even at idle
- Verify table creation in the Data Explorer pool by using the Refresh action on the contosodataexplorer node in the Data panel, the schema tree does not auto-update
Frequently Asked Questions
How do I fix "Login failed for user token-identified principal" in Azure Synapse Analytics?
This error means your Azure Synapse workspace doesn't have a Microsoft Entra ID admin configured, so it can't resolve your token-based identity. Go to the Azure Portal, find your Synapse workspace resource, click Settings > Microsoft Entra ID, then hit Set admin and assign your account or a group. Save the change, then sign out and back into Synapse Studio to force a fresh token. The error will be gone immediately, no workspace restart needed.
Why can't I create an external data source in the master database in Synapse?
The master database in Azure Synapse Analytics serverless SQL pool is a system database, and Microsoft explicitly prevents custom objects, external data sources, credentials, user functions, and views, from being created there. You need to create a separate database first using CREATE DATABASE YourDBName COLLATE Latin1_General_100_BIN2_UTF8, then switch to it with USE YourDBName before creating your external data source. This separation keeps system metadata clean and prevents accidental conflicts with system operations.
What collation should I use when creating a database in Azure Synapse serverless SQL pool?
Use Latin1_General_100_BIN2_UTF8. This collation correctly handles UTF-8 encoded text when reading from Parquet files and Azure Cosmos DB containers, converting it properly to VARCHAR columns. Without the _UTF8 suffix, non-ASCII characters in your data can come back garbled or incorrectly mapped, and the error is subtle enough that you might not notice it immediately. Set this at database creation time, changing it afterward requires dropping and recreating all database objects.
How long does it take to load data into a dedicated SQL pool with COPY INTO?
For a typical 2-million-row Parquet file of moderate size, the COPY INTO command finishes in under 60 seconds. Larger datasets naturally take longer, but the dedicated SQL pool's massively parallel processing architecture means it scales well, adding more DWUs (Data Warehouse Units) to the pool directly reduces load time for big ingestions. If your load is taking much longer than expected, check that MAXERRORS isn't masking silent row-level failures, and verify the Parquet file is accessible from the pool's network configuration.
What are the naming rules for a Data Explorer pool in Azure Synapse Analytics?
Data Explorer pool names are strict: lowercase letters and numbers only, between 4 and 15 characters, and must start with a letter. No hyphens, underscores, uppercase letters, or special characters are allowed. So contosodataexplorer works but Contoso-DE-Pool does not. The Azure Portal will show a validation error if you violate these rules during pool creation, so you'll catch it before provisioning starts, but it's helpful to know the rules upfront so you can pick a valid name on the first try.
How do I analyze data in Azure Storage after setting up Synapse Analytics?
Once your workspace is configured, you can query Azure Data Lake Storage Gen2 files directly from the serverless SQL pool using OPENROWSET with a FORMAT='PARQUET' parameter. Create an external data source pointing to your storage account's DFS endpoint, then use relative paths in your queries instead of full URLs. For dedicated SQL pool workloads, use COPY INTO to load files from storage into pool tables, then run standard T-SQL analytics queries against the loaded data. The Synapse Studio Data hub also lets you right-click any table and auto-generate SELECT TOP 100 scripts for quick exploration.