Best practices for custom insights
| Product family | Azure Video Indexer |
|---|---|
| Document source | Azure Video Indexer |
| Guide type | Hands-on Reference |
| Skill level | Intermediate to advanced |
| Time | 20 - 75 minutes depending on tenant scale |
Azure Video Indexer (AVI) lets you train custom models for brands, people, and language. The custom insights surface is where you tell AVI "this person matters, this brand matters, this acronym means X in our context." Done well, you turn a generic video search index into a focused enterprise knowledge base. Done badly, you train on noise and degrade the out-of-the-box accuracy.
I built custom insights for a Bengaluru-based ed-tech company indexing 4,200 hours of lecture content. The custom person model — for 60 instructors — needed 5-8 sample images each. The custom language model: for 1,400 domain terms, needed a curated phrase list. Two weeks of curation. Six months of solid retrieval.
Reference content and what it actually means
The Microsoft Learn page for Best practices for custom insights treats the topic as a checklist of recommendations. That is useful as a memory aid. It is not enough when you are picking between two approaches for your tenant. Here is the framing I use when I am the engineer on the hook for shipping it.
Azure Video Indexer is built on Azure AI Foundry's video analysis stack. Under the hood it runs a chain of models. speaker diarization, face detection, OCR, scene detection, sentiment, topic extraction. The custom insights layer is your way of biasing those models toward your domain without retraining them from scratch.
What the dataset and prompt actually train
You are not retraining the base models. You are adding a thin custom layer on top, a vocabulary, a person list, a brand list, a focus prompt. The base recognition stays the same. The customisation adds confidence to terms the base model would have rendered as similar-sounding nonsense, or boosts the recall of entities the base model would have ignored.
That means two things. Customisation cannot fix bad base recognition. If the audio quality is poor, the speaker is heavily accented, or the video is low resolution, no amount of custom insights helps. Get the source quality right first.
API versions and surfaces
AVI has two API surfaces. The v2 Classic API and the new v2 ARM-managed API. The ARM version is the one to build on now: it integrates with Azure RBAC, Private Link, and Bicep / Terraform. The classic API still works but new features ship on ARM first.
# Pin the AVI API version when calling
POST https://api.videoindexer.ai/<location>/Accounts/<accountId>/Videos/<videoId>/Index?api-version=2024-10-01-preview
Authorization: Bearer <arm-token>
Content-Type: application/json
How to apply this in practice
- Provision the AVI account through ARM in the region closest to your media storage.
az ams account createis not the same, AVI has its own resource type. Use the AVI ARM template or the portal Create flow. - Enable managed identity on the AVI account. Assign it Storage Blob Data Reader on the source storage account. Without this, your indexing jobs fail with a generic 403.
- For custom person models: collect 5-8 images per person, ideally taken in different lighting and angles. Upload through the AVI portal or REST. Allow 10-15 minutes for the model to bake.
- For custom language models: prepare a clean dataset. Aim for at least 100,000 words for a small domain, 1 million+ for a broad domain. Dedup. Remove HTML and markup. Normalise case.
- Wire up a smoke test. Index a 5-minute representative video. Confirm your custom entities appear with confidence above 0.7. Iterate before scaling.
- Monitor cost in Azure Cost Management. AVI bills per indexing minute, per model invoked. Custom model training is billed separately.
The smoke test in step 5 is the one I see teams skip. Without it you do not know whether your customisation worked until 8 hours of indexing is done and the bill is in.
Caveats and what to double-check
- AVI custom models have ceilings: a custom person model holds up to about 1,000 people; a custom language model holds up to 50,000 unique terms. Plan around these.
- Some AVI features ship region-by-region. East US and West Europe have the broadest coverage. India regions (Central India, South India) trail by 3-6 months on preview features.
- The face recognition model is region-restricted in some jurisdictions for regulatory reasons. Confirm availability for your tenant region.
- Custom model training is billed per training hour. A medium custom language model trains in about 90 minutes. Budget ₹800 per training run.
- The portal UI lags the API by 4-6 weeks. If a setting works in REST but the portal cannot show it, that is normal. trust the REST response.
Related work in your environment
- Document the AVI account, region, custom models, and the team that owns each in your wiki. Custom models drift in usefulness over six months, schedule a quarterly review.
- Pair AVI with Azure Storage lifecycle policies. Source videos can move to Cool tier after indexing completes. AVI does not need them online once indexed.
- If you serve indexed video to end users, layer Azure Front Door in front of the storage account. AVI does not host playback; that is a separate concern.
- For Indian regulated workloads, ensure the AVI account is in an Indian region and the source data has not transited outside. Confirm with your DPO.
- Mirror your AVI configuration in IaC. Custom model definitions can be exported and replayed: do this before they drift.
Troubleshooting the failures I keep seeing
Custom entities not appearing in results
Almost always a confidence threshold issue. AVI returns custom entity matches above a configurable confidence; the default is 0.5. If your entities appear in the raw insight stream but not the filtered results, lower the threshold and re-query. Confirm the entity is spelled the same way in your dataset and your query.
Indexing job stuck at 80%
The 80% mark is where the OCR and topic extraction phases run. A slow source video, high resolution, long duration, low-quality audio. can sit here for hours. Confirm the job is not actually failing by checking the status endpoint. If it has been stuck for more than 2x the video duration, cancel and re-submit at a lower resolution.
Cost spike after custom model rollout
Custom models invoke additional pipeline steps. The per-minute cost goes up. I have seen 1.4x for custom language only, 2.1x for custom language + custom person + custom brand combined. Forecast before rollout. Use Azure Cost Management budgets to alert on threshold breach.
Cost notes
AVI has two pricing modes: trial (10 hours/month free) and pay-per-minute (about ₹0.85 per indexing minute at S0 at the time of writing, billed in 1-second increments). Custom language model training: roughly ₹400 per training hour. Custom person model training: free.
For the ed-tech I mentioned, indexing 4,200 hours at ₹0.85/minute equalled ₹2.14 lakh, spread over 6 weeks. Custom model training added ₹4,800. The retrieval value across the platform's 28,000 students was orders of magnitude higher.
Rollback plan
If a custom model degrades your results, you have three options. Roll back to the prior model version (AVI keeps the last 3 versions). Disable the custom model on the index (one API call). Re-index the affected videos without customisation (full re-billing, slow).
I keep one prior version of every production custom model around for 14 days post-deployment. Twice this year I have used the rollback path. Both times the fix was a single API call.
# Disable a custom language model on an account
PATCH https://api.videoindexer.ai/<location>/Accounts/<accountId>/Customization/Language/<modelId>?enable=false&api-version=2024-10-01-preview
Authorization: Bearer <arm-token>
Two lines of curl. Five seconds to take effect. The kind of rollback control I wish every Azure AI service offered.
FAQ
References
- Microsoft Learn. official documentation for Azure Video Indexer
- Microsoft tech community forums and Q&A
- Azure Service Health and Microsoft 365 Service health dashboards
- Azure pricing calculator (azure.microsoft.com/pricing/calculator)
Related fixes
Related guides worth a look while you sort this one out:
- Best practices for plain text datasets
- Best Practices for setting the focus on prompt
- Get media transcription, translation, and language identification insights
- Best practices for long running operations
- Best practices for improving system performance
- System limitations and best practices to improve system performance