Microsoft Advertising Data Clean Room: Fix & Setup Guide
Why This Is Happening
I've seen this exact scenario play out at enterprise accounts more times than I can count: a media team spends weeks getting access to the Microsoft Advertising data clean room environment, finally gets their Spark SQL queries running, and then hits a wall. Maybe the exposure feed isn't returning the columns they expected. Maybe their audience segment data looks completely empty. Maybe the Customer Match activation just silently fails with no meaningful error message. Whatever the specific issue, it's maddening , especially when your campaign measurement timeline is slipping.
The Microsoft Advertising data clean room is genuinely different from other ad data products. It was built in a formal partnership with LiveRamp (previously known as Habu Clean Room), and that architectural lineage matters. You're not just querying a flat database, you're operating inside a privacy-compliant environment where the output rules, column availability, and data joins are constrained by design. That's not a bug. But it does mean that what works in a standard data warehouse query won't necessarily translate directly here.
The core purpose of this environment is to let enterprise advertisers combine their own first-party data with Microsoft's signals, including Search impressions and Audience impressions from the Microsoft Advertising Platform (MSAP), without either party ever exposing raw user-level data to the other. All outputs are delivered in aggregate form. That's a core privacy requirement, not a setting you can switch off. When something goes wrong, it's almost always because a team either misunderstands this constraint or runs into a configuration mismatch between their first-party data schema and one of the three core datasets: the Exposure feed, the Audience segments feed, or the Metadata tables.
The error messages in this environment are notoriously vague. A Spark SQL query that fails due to a column type mismatch might just return a generic execution error. An audience activation job that hits an API permissions wall might simply return no results rather than an explicit failure code. Microsoft's own UI for this integration doesn't surface granular diagnostics by default, you need to know where to look and what to look for.
Who sees these issues most? Typically, it's data engineers or analytics teams who are comfortable with SQL and ad platforms in general but are new to the specific constraints and schema of the Microsoft Advertising data clean room setup. Enterprise advertisers integrating first-party CRM data for audience matching, and agencies running cross-platform measurement reports on behalf of clients, also hit these problems frequently.
The good news: almost every issue I've seen in this environment falls into a predictable set of root causes. Schema mismatches, wrong column type assumptions, incorrect query scope for Microsoft Monetize-specific columns, and Customer Match API permission gaps account for the vast majority of problems. Work through this guide systematically and you'll cover all of them. Browse all Microsoft fix guides →
The Quick Fix, Try This First
Before you go deep on troubleshooting, run this sanity check. The single most common cause of broken queries and missing data in the Microsoft Advertising data clean room is querying columns that don't exist in your specific feed configuration, particularly the columns that are exclusive to either Microsoft Monetize or Microsoft Advertising Search campaigns.
Here's the thing: the Exposure feed has 27 columns, but not all 27 are available to every advertiser. Columns 2, 15, 17, 19, and 21 (specifically auction_id_64, advertiser_id, insertion_order_id, line_item_id, and creative_id) are specific to Microsoft Monetize. If your account isn't a Monetize account, referencing those columns in your Spark SQL query will cause the query to fail or return nulls, depending on how your environment handles schema validation.
Similarly, columns 26 and 27, bidded_keyword and query_id, are only populated for Search campaigns within Microsoft Advertising. If you're running display or audience campaigns exclusively, those fields will always be empty. Querying them expecting data is a common time-waster.
So the quick fix: audit your Spark SQL query and strip out any columns that don't apply to your account type or campaign type. Use only the core columns that are universally available: rguid, date_time, event_type, hashed_email, hashed_ip_address, geographic fields, publisher fields, advertiser_account_id, campaign_id, ad_group_id, ad_id, device fields, and postal_code. Run that stripped-down query first. If it returns data, you've confirmed the environment is working, then add specialized columns back one at a time to isolate which one is causing your issue.
date_time column in the Exposure feed is typed as a LONG (Unix timestamp), not a human-readable date string. I've seen teams waste hours wondering why their date filter returns nothing, they were comparing a LONG value against a string like '2025-09-15'. Always cast or convert: use FROM_UNIXTIME(date_time) in Spark SQL to get a readable timestamp, or compare against epoch integers directly. The Metadata tables, by contrast, store start_date and end_date as STRING type for Microsoft Advertising campaigns, so the type handling is inconsistent across the three feeds, and that trips up even experienced engineers.
Start by confirming exactly which interfaces your account has been provisioned for. The Microsoft Advertising data clean room exposes three distinct datasets: the Exposure feed, the Audience segments feed, and the Metadata tables. Your account may not have access to all three by default, access is provisioned at the enterprise customer level, and what was agreed in your onboarding determines your feed availability.
Log into your Microsoft Advertising account and navigate to your clean room environment via the LiveRamp interface you were provisioned with. Look at the available data assets in your workspace. You should see one or more of the three feeds listed. If Audience segments aren't showing up but your use case requires audience targeting and analytics, that's an access provisioning issue, not a query error.
For the Exposure feed specifically, confirm whether your account is a Microsoft Advertising account or a Microsoft Monetize account (or both). This distinction is critical. Monetize is the programmatic platform; standard Microsoft Advertising accounts cover Search and Audience campaigns. If you only have standard Microsoft Advertising access, your Exposure feed will not populate the Monetize-specific columns (2, 15, 17, 19, 21). Attempting to SELECT those columns won't throw a clean error in all environments, sometimes it just returns null without explanation.
Open your feed schema in the LiveRamp interface and compare it against the 27-column definition in Microsoft's documentation. If your schema shows fewer columns, that's expected based on your account configuration. Document exactly which columns are available to you before writing any queries, this saves significant debugging time downstream.
Once you've confirmed your provisioned columns, run a simple test query to verify data is flowing at all:
SELECT rguid, date_time, event_type, country
FROM exposure_feed
LIMIT 10
If this returns rows, your environment is active and connected. If it returns zero rows or an execution error, the issue is at the provisioning or connection layer, not your query logic, and you'll need to contact your Microsoft Advertising account representative.
The Exposure feed is the most used, and most broken, component in the Microsoft Advertising data clean room. It delivers granular, pseudonymized event-level data for ad measurement. The word "pseudonymized" is doing real work here: user identifiers are hashed, not raw. The hashed_email field (column 5) is a hashed email address, and hashed_ip_address (column 6) is the hashed IP. You cannot reverse these. Any query trying to join on plaintext email addresses against this feed will never return matches, your first-party data must be pre-hashed using the same hashing method before the join.
Confirm your hashing algorithm matches what Microsoft uses for its hashed_email column. If your CRM export uses SHA-256 lowercased and trimmed, but Microsoft's pipeline uses a different normalization, your join will silently return zero matches. Verify this with your Microsoft account team, they can confirm the exact hashing specification.
For the device_type_id (column 23) and device_type_name (column 24) columns, note that device_type_id is an INTEGER and device_type_name is a STRING. A common mistake is filtering on device type by joining an integer lookup table while accidentally casting to STRING in the WHERE clause:
-- Wrong, type mismatch causes silent filter failure
WHERE device_type_id = '3'
-- Correct
WHERE device_type_id = 3
The postal_code field (column 25) is typed as INTEGER, not STRING. US ZIP codes that start with zero, think anything in New England, will be stored as the numeric value (e.g., 1234 instead of 01234). If your geographic analysis depends on postal codes, pad the integer to 5 characters when joining against external postal code reference data:
SELECT LPAD(CAST(postal_code AS STRING), 5, '0') AS postal_code_padded
FROM exposure_feed
Once your column types are correct, your query success rate will improve dramatically. You should see consistent row counts matching your expected impression volume.
The Audience segments feed is beautifully simple on paper, just three columns: hashed_email, segment_id, and segment_name. In practice, getting meaningful data out of it requires getting the join logic exactly right, and most failures I see come down to one of two things: hashing inconsistency (same issue as the Exposure feed) or misunderstanding what the segment data actually represents.
The Audience segments feed categorizes users into Microsoft Advertising's In-Market Audience segments. These are Microsoft's own proprietary audience definitions, they're not custom segments you create. The segment_id is an INTEGER identifier for a specific Microsoft In-Market Audience segment, and segment_name is the STRING label for that segment. If you're expecting your own custom audience definitions to appear here, they won't. This feed is specifically for Microsoft's pre-built In-Market segments.
To check what segments are represented in your data:
SELECT segment_id, segment_name, COUNT(*) as user_count
FROM audience_segments
GROUP BY segment_id, segment_name
ORDER BY user_count DESC
If this query returns an empty result set and you know your account has active campaigns, the most likely cause is that the hashed_email values in the audience segments feed don't overlap with the universe of users in your provisioned data. This feed is populated based on Microsoft's own matching logic, you don't control which users appear in it.
To join audience segment data against your exposure data for campaign performance analysis:
SELECT e.advertiser_account_id, e.campaign_id, a.segment_name,
COUNT(DISTINCT e.hashed_email) as unique_users
FROM exposure_feed e
INNER JOIN audience_segments a ON e.hashed_email = a.hashed_email
WHERE e.event_type = 1 -- impressions
GROUP BY e.advertiser_account_id, e.campaign_id, a.segment_name
A successful result here means your join is working and you can now build audience insights reports. If the join returns zero rows despite both tables having data, the hashing mismatch is your culprit, escalate to your account team with a sample of the hashed values from both tables for them to diagnose.
The Metadata dataset is the glue that makes your Exposure feed data interpretable. Without it, you have a bunch of IDs, campaign_id, ad_group_id, advertiser_account_id, but no human-readable names. Joining Metadata lets you map those IDs to actual campaign names, ad group names, and advertiser names for reporting.
There are two separate Metadata schemas: one for Microsoft Advertising and one for Microsoft Monetize. This is a frequent source of confusion. If you're a standard Microsoft Advertising advertiser, use the Microsoft Advertising metadata table. If you're operating in Microsoft Monetize, use the Monetize metadata table. The schemas are different and the joining keys differ.
For Microsoft Advertising metadata, the primary join keys are advertiser_account_id, campaign_id, and ad_group_id, all typed as LONG. For Microsoft Monetize metadata, the keys are advertiser_id, insertion_order_id, and line_item_id, all typed as INTEGER. Getting the types wrong on these joins causes silent failures. LONG and INTEGER are not always interchangeable in Spark SQL depending on your environment configuration.
An important nuance: the end_date field in Microsoft Advertising metadata is marked as optional. Ongoing campaigns won't have an end date, and querying for end_date IS NOT NULL will exclude all currently active campaigns from your results. Account for this in any date-range filtering logic:
SELECT m.campaign_name, m.ad_group_name, m.start_date, m.end_date
FROM msad_metadata m
WHERE m.start_date >= '2025-01-01'
AND (m.end_date IS NULL OR m.end_date >= '2025-01-01')
When this join works correctly, your reports will show campaign names instead of numeric IDs, the first sign that your full data pipeline is functioning end-to-end.
Building a custom audience inside the data clean room and then activating it via the MSAP Customer Match API is the most powerful feature here, and the most technically demanding. If your audience build query runs successfully but the activation doesn't produce a usable audience in Microsoft Advertising, here are the specific things to check.
First, confirm that your output from the clean room query meets the aggregate output requirement. The environment is designed to deliver results in aggregate to safeguard user-level information, this isn't optional. If your query produces output at the individual user level (individual hashed emails without any aggregation), the system may strip or suppress those results before they reach the Customer Match API. Your query should aggregate to a cohort, not a list of individual identifiers.
Second, verify your MSAP Customer Match API credentials and permissions are correctly configured for your advertiser account. The API activation is a separate permission from clean room data access, you can have one without the other. In Microsoft Advertising, navigate to Tools > API Access and confirm Customer Match is enabled for your account. If it's not listed there, you need to request access from your Microsoft Advertising account representative.
Third, check the audience size threshold. Customer Match audiences below Microsoft's minimum size threshold won't activate, this is a privacy control. Microsoft doesn't publish an exact number, but from what I've seen in enterprise accounts, audiences under approximately 1,000 matched users are typically suppressed. If your target audience is small, this may be a fundamental constraint rather than a fixable error.
For Search campaign–specific activation, remember that columns 26 (bidded_keyword) and 27 (query_id) are only available for Search. If you're building a Search audience based on keyword targeting behavior, confirm these columns are populated in your exposure data before building activation logic on top of them:
SELECT COUNT(*) as search_events
FROM exposure_feed
WHERE bidded_keyword IS NOT NULL
A non-zero count confirms Search data is available. Zero means either you have no Search campaigns or Search data isn't in your provisioned feed.
Advanced Troubleshooting
If the step-by-step fixes above haven't resolved your issue, you're likely dealing with one of a handful of more complex scenarios. Let me walk through the ones I see most often in enterprise environments.
Agency Multi-Advertiser Environments
Agencies running clean room queries on behalf of multiple advertiser clients need to filter by advertiser_account_id (for Microsoft Advertising) or advertiser_id (for Microsoft Monetize) in every query. Without this filter, your results will commingle data across all advertisers in your clean room workspace. The advertiser_account_id field is a LONG type in both the Exposure feed and the Microsoft Advertising Metadata table, they should join cleanly. But double-check that the same advertiser has consistent IDs across both tables; in rare cases where accounts have been migrated or merged, the IDs can be out of sync.
Cross-Platform Measurement Report Failures
One of the headline use cases for the Microsoft Advertising data clean room is generating cross-platform measurement reports using Spark SQL. These typically join your Exposure feed against your own first-party conversion data. If these reports are returning unexpected zeros or wildly incorrect numbers, the most common cause is time zone misalignment. The date_time column in the Exposure feed stores Unix timestamps in UTC. Your own first-party conversion data may be stored in a local time zone. A 5-hour offset between UTC and US Eastern time can make it look like all your conversions happened the day after the impressions, which will make your attribution logic fail completely.
Convert your date_time values explicitly:
SELECT rguid,
FROM_UNIXTIME(date_time, 'yyyy-MM-dd HH:mm:ss') AS event_time_utc,
CONVERT_TIMEZONE('UTC', 'America/New_York',
FROM_UNIXTIME(date_time)) AS event_time_eastern
FROM exposure_feed
LIMIT 100
Spark SQL Execution Errors with No Clear Message
If your query fails with a generic execution error and no column-level detail, try breaking your query into smaller pieces and running each piece independently. Complex multi-table joins involving all three feeds simultaneously are the most common source of silent execution failures. Start with a simple two-table join (Exposure + Metadata), confirm it works, then add the Audience segments join. This isolates which join is causing the problem.
Also check for reserved word conflicts, Spark SQL has a specific set of reserved keywords that differ slightly from standard SQL. Column names like region (column 8 in the Exposure feed) can conflict with Spark built-ins in some environments. Wrap column names in backticks when in doubt:
SELECT `region`, `country`, COUNT(*) as impressions
FROM exposure_feed
GROUP BY `region`, `country`
Privacy Threshold Suppression
This one is genuinely frustrating because it's by design: the data clean room suppresses any aggregate output that falls below a minimum count threshold. If your query returns a table with some rows having null or zero counts despite your logic being correct, those are suppressed cells, the underlying data exists but isn't surfaced because it would allow inference about individual users. The fix here is not to circumvent the privacy controls (you can't and shouldn't), but to adjust your query granularity. Less granular groupings will produce larger aggregates that clear the suppression threshold.
If you've worked through all the steps above and your clean room environment still isn't returning data, or if you're seeing provisioning issues where feeds that should be available aren't showing up in your workspace, this is the point to escalate. Specifically, contact support if: your test query against a single table with LIMIT 10 returns zero rows; your Customer Match API activation has the correct permissions but audiences never appear in Microsoft Advertising; or you're seeing schema definitions in your workspace that don't match the official documentation column counts. Reach out via Microsoft Support and reference your clean room workspace ID and advertiser account ID in the initial ticket. The more specific you can be about which table, which column, and what query you're running, the faster the support team can route you to the right specialist.
Prevention & Best Practices
Getting the Microsoft Advertising data clean room working reliably over time means building good habits from day one. The issues I've described in this guide are almost entirely preventable with the right setup discipline.
The single biggest thing you can do is maintain a schema reference document for your specific account configuration. Before writing any production queries, document exactly which columns you have access to, their correct data types, and any account-specific caveats (Monetize vs. standard, Search vs. Audience). When someone new joins your team and starts writing queries, they have a reference that saves them from rediscovering the postal_code integer padding issue or the Unix timestamp gotcha.
Establish a query testing protocol. Every new query should be tested with a small LIMIT clause and a narrow date range before it runs over a full dataset. This catches type errors and schema issues without wasting compute resources or hitting rate limits in your clean room environment.
For teams doing first-party data matching via hashed_email, build a hash validation pipeline as part of your CRM export process. Generate a sample of known test email addresses, hash them using your pipeline, and verify that the output matches the expected format before any production data join. This is a one-time investment that prevents hours of debugging every time you refresh your first-party data.
Keep your LiveRamp workspace documentation current. The Microsoft Advertising data clean room was built in partnership with LiveRamp (formerly Habu Clean Room), and the platform does receive updates. Column additions, type changes, and new privacy controls can affect your queries without breaking them in obvious ways, they just return different results. Subscribe to Microsoft Advertising platform release notes and audit your queries whenever a data clean room update is announced.
- Always filter Exposure feed queries by a specific
date_timerange, querying the full table without a date filter is slow and expensive in Spark SQL environments. - Use
CAST(postal_code AS STRING)withLPADwhenever your analysis involves US ZIP codes to preserve leading zeros. - Pre-hash your first-party email data and validate against a small test cohort before any production audience matching job.
- Keep Monetize-specific columns (2, 15, 17, 19, 21) in a separate, clearly labeled query file so there's no ambiguity about which queries require a Monetize account.
Frequently Asked Questions
Why is my hashed_email join returning zero results even though both tables have data?
This almost always means the two sides of your join were hashed using different methods or normalizations. Microsoft's hashed_email field uses a specific hashing algorithm, your first-party data must be hashed the same way before the join will produce matches. Start by confirming with your Microsoft Advertising account team what exact hashing specification the clean room uses (algorithm, case normalization, whitespace trimming). Then verify your CRM export pipeline produces output in the identical format. Even a single difference, like one side hashing uppercase email addresses and the other hashing lowercase, will produce a complete mismatch with zero joined rows.
What's the difference between advertiser_account_id and advertiser_id in the Exposure feed?
These are identifiers for two different platforms. The advertiser_account_id (column 14) is a LONG type identifier for an advertiser within Microsoft Advertising, the standard search and audience advertising platform. The advertiser_id (column 15) is an INTEGER identifier specific to Microsoft Monetize, the programmatic platform. Column 15 is only populated for Microsoft Monetize accounts. If you're a standard Microsoft Advertising advertiser, column 15 will always be null in your feed, use advertiser_account_id exclusively for your joins and filters.
How do I find out which Microsoft In-Market Audience segments are available in my data?
Query the Audience segments feed directly with a simple group-by on segment_id and segment_name. Run SELECT segment_id, segment_name, COUNT(*) FROM audience_segments GROUP BY segment_id, segment_name ORDER BY COUNT(*) DESC to see all segments represented in your provisioned data along with the user count for each. Remember that these are Microsoft's predefined In-Market Audience segments, they're not custom audiences you build yourself. The segments available to you depend on what Microsoft's matching logic has identified for your user base.
Why are the bidded_keyword and query_id columns always empty in my Exposure feed?
Columns 26 (bidded_keyword) and 27 (query_id) are populated only for Search campaigns within Microsoft Advertising. If you're running display, native, or audience campaigns exclusively, these fields will always be null, that's expected behavior, not an error. If you do have active Search campaigns and these fields are still empty, confirm with your account team that your Search campaign data is included in your clean room data provisioning. Search impression data and audience impression data can sometimes be provisioned separately.
My Spark SQL query runs but some rows in the output have null or suppressed values, what's happening?
You're hitting the data clean room's privacy threshold suppression. The environment is designed to only surface data in aggregate, and any aggregate group that falls below a minimum user count threshold gets suppressed to prevent inference about individual users. This is a core privacy control and cannot be disabled. The solution is to reduce the granularity of your query, group by fewer dimensions, use broader geographic buckets (country instead of postal code, for example), or widen your time range. Larger aggregate groups are more likely to clear the suppression threshold and appear in your output.
Can I use the data clean room to build audiences for platforms outside of Microsoft Advertising?
The activation path built into the Microsoft Advertising data clean room is specifically the MSAP Customer Match API, this activates audiences back into Microsoft Advertising for campaign targeting. The clean room environment itself, built on the LiveRamp infrastructure, may have broader capabilities depending on your LiveRamp agreement, but the Microsoft-specific integration described in the official documentation is scoped to MSAP activation. If you need to activate audiences to third-party platforms, you'd need to confirm that capability directly with your LiveRamp account team, as it goes beyond the standard Microsoft Advertising data clean room integration.