RAG.DB
Architecture

Architecture

System design, infrastructure components, data flow, Cosmos DB schema, multi-account scale-out, and infrastructure as code

Overview

RAG DB is a production-grade, event-driven indexing platform for Retrieval-Augmented Generation on Azure. It watches Azure Blob Storage containers for content changes, routes events through Service Bus, and fans out to isolated per-index processors that convert, chunk, embed, and store documents in Azure Cosmos DB with Vector Search and Full-Text Search enabled. The system is designed for multi-tenant workloads, automatic scaling, and zero-downtime recovery.


Main Components

ComponentTechnologyResponsibility
API ServiceFastAPI on Azure Container AppsPublic/private HTTP API for index CRUD, file management, search queries, and event ingestion via Dapr
Event DispatcherDapr pub/sub on Service Bus QueueReceives blob events from Event Grid, resolves index ownership, and routes work to per-index topics
Index ProcessorPython on Azure Container AppsDedicated per-index worker that downloads files, converts to markdown, chunks, generates embeddings, and upserts to Cosmos DB
CronJobAzure Container Apps JobRuns every 5 minutes to detect stale indexes, recover failed runs, and reconcile state
InfrastructureBicep + ARM templatesDeclarative IaC for all Azure resources, with parameterized deployments for baseline and per-index stacks
ragdb_helper_libraryPython package (PyPI)Shared library used by the API and processors containing Cosmos DB clients, chunking logic, embedding utilities, and data models

End-to-End Flow

The complete nine-step pipeline from file upload to searchable content:

  1. Blob Storage — A file is created, renamed, or deleted in the watched storage account and container. Supported formats include PDF, DOCX, PPTX, XLSX, images, audio, video, and plain text.

  2. Event Grid System Topic — An Event Grid subscription on the storage account detects the change and fires a Microsoft.Storage.BlobCreated, BlobRenamed, or BlobDeleted event.

  3. Service Bus Queue — Event Grid delivers the event to the events-dispatcher Service Bus queue. Service Bus Premium guarantees ordering within a session and at-least-once delivery.

  4. API receives event via Dapr — The RAG DB API subscribes to the events-dispatcher queue through a Dapr pub/sub component. It deserializes the event payload and extracts the storage account, container, folder, and file name.

  5. Index resolution — The API queries the indexes_metadata database to find which Index owns the blob path. If no matching index exists, the event is dead-lettered for inspection.

  6. Publish to per-index topic — The API creates a processing message (containing file metadata, index configuration, and the desired action) and publishes it to the Service Bus Topic named after the indexId.

  7. Index Processor processes the file — The Container App ip-{indexId} receives the message via its Dapr subscription. It then:

    • Downloads the file from Blob Storage using a managed identity with Storage Blob Data Reader role
    • Converts the file to markdown using markitdown-pro (leveraging Document Intelligence for PDFs/images and Speech Services for audio/video)
    • Splits the markdown into overlapping chunks using configurable size and overlap parameters
    • Generates vector embeddings for each chunk via Azure OpenAI
  8. Upsert to Cosmos DB — The processor writes DocumentChunk records to the index-specific Cosmos container. Each record includes the text content, vector embedding, source metadata, and an ETag for optimistic concurrency control. Existing chunks for the same file are replaced atomically.

  9. Metadata updated — The processor updates the IndexFile status to indexed (or failed) and increments the IndexRun counters in the metadata database. Once all files in a run complete, the Index status transitions to succeeded.


Infrastructure Baseline

The following Azure services form the shared baseline infrastructure, deployed once per environment:

ServicePurpose
Azure Cosmos DB AccountNoSQL document store with VectorSearch and FullTextSearch capabilities enabled at the account level
Azure Service Bus Premium NamespaceMessage broker for the events-dispatcher queue and all per-index topics
Azure Event Grid System TopicMonitors the source Storage Account for blob lifecycle events
Azure Container Apps Environment (API)Hosts the API Container App and CronJob with external ingress
Azure Container Apps Environment (Processors)Dedicated environment for Index Processors with internal load balancer on a VNet
Azure OpenAI ServiceProvides embedding model deployments (e.g., text-embedding-3-large)
Azure Key VaultCentralized secret store for connection strings, API keys, and certificates
Azure Application InsightsDistributed tracing, structured logging, live metrics, and alerting
Azure Document IntelligenceExtracts structured text from PDFs, scanned documents, and images
Azure Speech ServicesTranscribes audio and video files to text
Azure Container RegistryStores Docker images for the API and Index Processor applications
Azure Storage AccountSource blob storage watched by Event Grid; also used for internal staging
Managed Identities + Role AssignmentsSystem-assigned identities with Storage Blob Data Reader, Cosmos DB Data Contributor, ACR Pull, Key Vault Admin, and Service Bus Data Sender/Owner roles

Per-Index Deployment

When a new index is created, RAG DB provisions dedicated resources for isolation and independent scaling:

Service Bus Topic

A new topic is created in the Service Bus namespace, named after the indexId. The API publishes processing messages to this topic, and the index processor subscribes to it via Dapr.

Container App (ip-{indexId})

A dedicated Container App is deployed in the processors environment. It runs the Index Processor image from the Container Registry and is configured with:

  • A Dapr pub/sub component pointing at the index-specific topic
  • Auto-scaling rules based on the Service Bus topic message count (scales to zero when idle, scales up under load)
  • Workload profile constraints (CPU and memory limits)
  • Liveness and readiness probes

Cosmos DB Container

A new container is created in the appropriate Cosmos DB database (see Multi-Cosmos Scale-Out below). The container is configured with:

  • Vector index policy — DiskANN index on the /vector path with cosine distance function
  • Full-text index policy — Full-text indexes on /chunk and /name for BM25 keyword search
  • Partition key/file_id for efficient per-file operations
  • Indexing policy — Optimized includes/excludes for query patterns

Key Vault Secrets

Index-specific secrets (Cosmos container name, topic name, configuration overrides) are written to Key Vault and referenced by the Container App as secret environment variables.


Cosmos DB Architecture

RAG DB uses two logical databases within each Cosmos DB account:

indexes_metadata Database

Stores all control-plane data. Contains four containers:

ContainerPartition KeyContent
indexes/idIndex definitions (name, storage path, status, configuration, Cosmos account assignment)
index_files/indexIdFile tracking records with status, content hash, and timestamps
index_runs/indexIdRun records with counters (total, completed, failed, pending) and timing
cosmos_accounts/idRegistry of available Cosmos DB accounts with current container counts

indexes Database (per-account)

Contains the actual document chunks. Each index gets its own container within this database. The container name matches the indexId.

Each DocumentChunk document has this structure:

  • id — unique chunk identifier
  • file_id — the source IndexFile ID (partition key)
  • chunk — the text content (full-text indexed)
  • name — source file name (full-text indexed)
  • vector — float array embedding (vector indexed)
  • metadata — page number, chunk index, source path, timestamps
  • _etag — Cosmos DB ETag for optimistic concurrency

Vector Index Configuration

  • Index type: DiskANN — provides high-recall approximate nearest-neighbor search with low memory overhead
  • Distance function: Cosine — measures semantic similarity between embedding vectors
  • Dimensions: Configured per deployment (typically 3072 for text-embedding-3-large)

Full-Text Index Configuration

  • Indexed paths: /chunk and /name
  • Search type: BM25 full-text ranking
  • Used for: Hybrid queries that combine vector similarity with keyword matching for improved relevance

Multi-Cosmos Scale-Out

Azure Cosmos DB has a limit of 500 containers per database account. RAG DB handles this with an automatic bin-packing and scale-out strategy:

  1. Account registry — The cosmos_accounts container in the metadata database tracks all available Cosmos DB accounts and their current container counts.

  2. Bin-packing on index creation — When a new index is created, the API queries the account registry to find the account with the most available capacity. The new index container is placed there.

  3. Automatic provisioning — When all registered accounts are nearing capacity (configurable threshold, e.g., 450 containers), the system flags the need for a new Cosmos DB account. A new account can be provisioned via the IaC pipeline and registered in the metadata store.

  4. Cross-account queries — The API and processors resolve which Cosmos account holds a given index by looking up the assignment in the indexes metadata container. Connection strings for each account are stored in Key Vault.

This approach allows RAG DB to scale to thousands of indexes across multiple Cosmos DB accounts without any single-account bottleneck.


Monitoring and Recovery

CronJob

A Container Apps Job runs every 5 minutes and performs the following:

  • Stale index detection — Identifies indexes that have been in provisioning or running status for longer than the configured timeout (default: 30 minutes)
  • Automatic recovery — Retries failed provisioning steps, re-publishes stuck processing messages, and resets zombie IndexFile records that are stuck in indexing status
  • State reconciliation — Compares the actual state of Azure resources (topics, container apps, Cosmos containers) with the expected state in the metadata database and flags or repairs drift
  • Run completion — Checks if all files in an active IndexRun have been processed and transitions the run and index to their final status

Application Insights

All components emit structured logs and distributed traces to Application Insights:

  • API — request/response logging, event processing traces, error rates
  • Processors — per-file processing duration, embedding latency, chunk counts, failure reasons
  • CronJob — recovery actions taken, drift detected, stale indexes found

Custom dashboards and alerts can be configured for key metrics such as processing lag, error rates, and queue depth.


Infrastructure as Code

All infrastructure is defined declaratively and deployed through automated pipelines:

Bicep and ARM Templates

  • Baseline template — Deploys all shared infrastructure (Cosmos account, Service Bus namespace, Container Apps environments, Key Vault, monitoring, identity and role assignments)
  • Per-index template — Deploys index-specific resources (Service Bus Topic, Container App, Cosmos container, Key Vault secrets). Parameterized by indexId and called programmatically by the API during index creation

GitHub Actions

The repository includes 15 GitHub Actions workflows covering:

  • CI/CD — Build, test, and push Docker images to Container Registry on every merge to main
  • Baseline deployment — Provision or update shared infrastructure across dev, staging, and production environments
  • Per-index deployment — Triggered by the API or manually to deploy/update/delete index-specific resources
  • Database migrations — Apply schema changes to Cosmos DB metadata containers
  • Scheduled maintenance — Nightly cleanup of orphaned resources and cost optimization checks
  • Security scanning — Dependency audits, container image vulnerability scanning, and secret rotation reminders

On this page