Search & Queries
Query types, scoring formulas, query rewriting, SAS token injection, and performance guidance
RAG DB exposes four query types through a unified search endpoint. Each type targets a different retrieval strategy, and all return results with a normalized relevance score.
Query Types
Vector Search
Vector search finds documents by computing cosine similarity between a query embedding and stored document embeddings. Best suited when you already have a pre-computed embedding or want pure semantic similarity without keyword matching.
When to use: You have a pre-embedded query vector, or you want deterministic similarity with no keyword component.
Request body:
{
"indexName": "my-index",
"queryType": "vector",
"query": "How do I configure TLS?",
"top": 10
}Scoring formula:
score = 1 - VectorDistance()Cosine distance is used. A score closer to 1 means higher relevance; closer to 0 means less relevant.
Example response:
{
"results": [
{
"id": "chunk-0042",
"content": "To configure TLS, set the HTTPS_ENABLED flag...",
"score": 0.9134,
"metadata": {
"source": "admin-guide.pdf",
"page": 12
},
"downloadUrl": "https://storage.blob.core.windows.net/..."
}
]
}Hybrid Search
Hybrid search combines vector similarity with full-text keyword matching using Reciprocal Rank Fusion (RRF). It retrieves candidates from both pipelines and merges the ranked lists into a single result set.
When to use: Mixed queries that contain both natural-language intent and specific keywords or identifiers (e.g., error codes, product names).
Request body:
{
"indexName": "my-index",
"queryType": "hybrid",
"query": "error ERR_TLS_CERT_INVALID in production",
"top": 10
}Scoring formula:
score = 1 / (rank + 60)This is the standard RRF formula. Each candidate receives a synthetic score based on its rank in the vector and full-text result lists, then the scores are summed and re-ranked.
Example response:
{
"results": [
{
"id": "chunk-0099",
"content": "ERR_TLS_CERT_INVALID is raised when the certificate chain...",
"score": 0.0323,
"metadata": {
"source": "troubleshooting.md",
"page": 3
},
"downloadUrl": "https://storage.blob.core.windows.net/..."
}
]
}Full-Text Search
Full-text search uses BM25-style term matching with Reciprocal Rank Fusion scoring. It does not use embeddings. Best for exact keyword lookups, identifiers, or when you need deterministic term matching.
When to use: Keyword-heavy queries, exact identifiers, error codes, or when semantic meaning is less important than exact term presence.
Request body:
{
"indexName": "my-index",
"queryType": "full_text",
"query": "COSMOSDB_RETRY_TOTAL",
"top": 10
}Scoring formula:
score = 1 / (rank + 60)BM25 ranks results by term frequency and inverse document frequency. The final score is derived from RRF over the BM25 rank list.
Example response:
{
"results": [
{
"id": "chunk-0201",
"content": "Set COSMOSDB_RETRY_TOTAL to control the maximum number of retries...",
"score": 0.0164,
"metadata": {
"source": "configuration-reference.md",
"page": 1
},
"downloadUrl": "https://storage.blob.core.windows.net/..."
}
]
}Semantic Search
Semantic search uses AI-powered query rewriting to generate multiple reformulations of the original query, runs each through the vector pipeline, and aggregates results using a priority formula. This produces the highest-quality results for natural-language questions.
When to use: Natural-language questions, vague or ambiguous queries, queries with typos, or when maximum recall is important.
Request body:
{
"indexName": "my-index",
"queryType": "semantic",
"query": "how do I rotat API keeys?",
"top": 10
}Scoring formula:
priority = best_sim * 1.05 + 0.10 * avg_simWhere best_sim is the highest similarity score for a chunk across all query rewrites, and avg_sim is the average similarity across rewrites that returned the chunk. Chunks appearing in multiple rewrite results receive a boost through the average component.
Example response:
{
"results": [
{
"id": "chunk-0310",
"content": "To rotate API keys, call POST /apiKeys/regenerate with...",
"score": 0.9587,
"metadata": {
"source": "security-guide.pdf",
"page": 7
},
"downloadUrl": "https://storage.blob.core.windows.net/..."
}
]
}Query Rewriting
Semantic search includes an AI-powered query rewriting stage powered by Azure OpenAI. Before executing the vector search, the system generates multiple reformulations of the original query.
What query rewriting does:
- Corrects typos —
"rotat API keeys"becomes"rotate API keys" - Expands synonyms —
"delete user"may also search for"remove user","deactivate account" - Preserves intent — reformulations maintain the original meaning
- Generates in the same language — if the query is in Spanish, rewrites are in Spanish
Each rewrite is executed as an independent vector search. Results are then deduplicated and scored using the priority formula described above.
SAS Token Injection
Search responses include auto-generated download URLs for source documents. These URLs use Azure User Delegation SAS tokens so that clients can download the original file directly from Azure Blob Storage without additional authentication.
How it works:
- RAG DB requests a user delegation key using the managed identity
- A SAS token is generated scoped to the specific blob container (read-only)
- The token is appended to the blob URL in the
downloadUrlfield - Tokens are valid for up to 7 days (configurable via
SAS_TOKEN_TTL_DAYS)
Key properties:
- Container-level scope — tokens grant access only to the container holding the source document, not the entire storage account
- User delegation — no storage account keys are used; the token is signed with Azure AD credentials
- Graceful degradation — if token generation fails, the
downloadUrlfield is omitted and the result is still returned
Response Headers
Every search response includes diagnostic headers:
| Header | Description |
|---|---|
x-request-charge | Cosmos DB request units consumed by the query |
x-index-metrics | Index utilization statistics (which indexes were hit) |
x-query-metrics | Timing breakdown: parsing, execution, serialization |
These headers are useful for monitoring query cost and diagnosing performance issues.
Performance Tips
| Scenario | Recommended Query Type |
|---|---|
| Natural-language questions from end users | Semantic — handles typos, ambiguity, and language variation |
| Mixed keyword + intent queries | Hybrid — combines the strengths of vector and full-text |
| Pre-embedded queries or direct similarity | Vector — no overhead from keyword matching |
| Exact identifiers, error codes, config keys | Full-Text — deterministic term matching without embedding |
General guidance:
- Keep
topas low as practical — fewer results means fewer request units consumed - Use semantic search as the default for user-facing applications
- Use full-text search for internal tooling or exact lookups
- Monitor
x-request-chargeto understand query cost and optimize accordingly