RAG.DB

Search & Queries

Query types, scoring formulas, query rewriting, SAS token injection, and performance guidance

RAG DB exposes four query types through a unified search endpoint. Each type targets a different retrieval strategy, and all return results with a normalized relevance score.


Query Types

Vector search finds documents by computing cosine similarity between a query embedding and stored document embeddings. Best suited when you already have a pre-computed embedding or want pure semantic similarity without keyword matching.

When to use: You have a pre-embedded query vector, or you want deterministic similarity with no keyword component.

Request body:

{
  "indexName": "my-index",
  "queryType": "vector",
  "query": "How do I configure TLS?",
  "top": 10
}

Scoring formula:

score = 1 - VectorDistance()

Cosine distance is used. A score closer to 1 means higher relevance; closer to 0 means less relevant.

Example response:

{
  "results": [
    {
      "id": "chunk-0042",
      "content": "To configure TLS, set the HTTPS_ENABLED flag...",
      "score": 0.9134,
      "metadata": {
        "source": "admin-guide.pdf",
        "page": 12
      },
      "downloadUrl": "https://storage.blob.core.windows.net/..."
    }
  ]
}

Hybrid search combines vector similarity with full-text keyword matching using Reciprocal Rank Fusion (RRF). It retrieves candidates from both pipelines and merges the ranked lists into a single result set.

When to use: Mixed queries that contain both natural-language intent and specific keywords or identifiers (e.g., error codes, product names).

Request body:

{
  "indexName": "my-index",
  "queryType": "hybrid",
  "query": "error ERR_TLS_CERT_INVALID in production",
  "top": 10
}

Scoring formula:

score = 1 / (rank + 60)

This is the standard RRF formula. Each candidate receives a synthetic score based on its rank in the vector and full-text result lists, then the scores are summed and re-ranked.

Example response:

{
  "results": [
    {
      "id": "chunk-0099",
      "content": "ERR_TLS_CERT_INVALID is raised when the certificate chain...",
      "score": 0.0323,
      "metadata": {
        "source": "troubleshooting.md",
        "page": 3
      },
      "downloadUrl": "https://storage.blob.core.windows.net/..."
    }
  ]
}

Full-text search uses BM25-style term matching with Reciprocal Rank Fusion scoring. It does not use embeddings. Best for exact keyword lookups, identifiers, or when you need deterministic term matching.

When to use: Keyword-heavy queries, exact identifiers, error codes, or when semantic meaning is less important than exact term presence.

Request body:

{
  "indexName": "my-index",
  "queryType": "full_text",
  "query": "COSMOSDB_RETRY_TOTAL",
  "top": 10
}

Scoring formula:

score = 1 / (rank + 60)

BM25 ranks results by term frequency and inverse document frequency. The final score is derived from RRF over the BM25 rank list.

Example response:

{
  "results": [
    {
      "id": "chunk-0201",
      "content": "Set COSMOSDB_RETRY_TOTAL to control the maximum number of retries...",
      "score": 0.0164,
      "metadata": {
        "source": "configuration-reference.md",
        "page": 1
      },
      "downloadUrl": "https://storage.blob.core.windows.net/..."
    }
  ]
}

Semantic search uses AI-powered query rewriting to generate multiple reformulations of the original query, runs each through the vector pipeline, and aggregates results using a priority formula. This produces the highest-quality results for natural-language questions.

When to use: Natural-language questions, vague or ambiguous queries, queries with typos, or when maximum recall is important.

Request body:

{
  "indexName": "my-index",
  "queryType": "semantic",
  "query": "how do I rotat API keeys?",
  "top": 10
}

Scoring formula:

priority = best_sim * 1.05 + 0.10 * avg_sim

Where best_sim is the highest similarity score for a chunk across all query rewrites, and avg_sim is the average similarity across rewrites that returned the chunk. Chunks appearing in multiple rewrite results receive a boost through the average component.

Example response:

{
  "results": [
    {
      "id": "chunk-0310",
      "content": "To rotate API keys, call POST /apiKeys/regenerate with...",
      "score": 0.9587,
      "metadata": {
        "source": "security-guide.pdf",
        "page": 7
      },
      "downloadUrl": "https://storage.blob.core.windows.net/..."
    }
  ]
}

Query Rewriting

Semantic search includes an AI-powered query rewriting stage powered by Azure OpenAI. Before executing the vector search, the system generates multiple reformulations of the original query.

What query rewriting does:

  • Corrects typos"rotat API keeys" becomes "rotate API keys"
  • Expands synonyms"delete user" may also search for "remove user", "deactivate account"
  • Preserves intent — reformulations maintain the original meaning
  • Generates in the same language — if the query is in Spanish, rewrites are in Spanish

Each rewrite is executed as an independent vector search. Results are then deduplicated and scored using the priority formula described above.


SAS Token Injection

Search responses include auto-generated download URLs for source documents. These URLs use Azure User Delegation SAS tokens so that clients can download the original file directly from Azure Blob Storage without additional authentication.

How it works:

  1. RAG DB requests a user delegation key using the managed identity
  2. A SAS token is generated scoped to the specific blob container (read-only)
  3. The token is appended to the blob URL in the downloadUrl field
  4. Tokens are valid for up to 7 days (configurable via SAS_TOKEN_TTL_DAYS)

Key properties:

  • Container-level scope — tokens grant access only to the container holding the source document, not the entire storage account
  • User delegation — no storage account keys are used; the token is signed with Azure AD credentials
  • Graceful degradation — if token generation fails, the downloadUrl field is omitted and the result is still returned

Response Headers

Every search response includes diagnostic headers:

HeaderDescription
x-request-chargeCosmos DB request units consumed by the query
x-index-metricsIndex utilization statistics (which indexes were hit)
x-query-metricsTiming breakdown: parsing, execution, serialization

These headers are useful for monitoring query cost and diagnosing performance issues.


Performance Tips

ScenarioRecommended Query Type
Natural-language questions from end usersSemantic — handles typos, ambiguity, and language variation
Mixed keyword + intent queriesHybrid — combines the strengths of vector and full-text
Pre-embedded queries or direct similarityVector — no overhead from keyword matching
Exact identifiers, error codes, config keysFull-Text — deterministic term matching without embedding

General guidance:

  • Keep top as low as practical — fewer results means fewer request units consumed
  • Use semantic search as the default for user-facing applications
  • Use full-text search for internal tooling or exact lookups
  • Monitor x-request-charge to understand query cost and optimize accordingly

On this page