Architecture

System design, infrastructure components, data flow, Cosmos DB schema, multi-account scale-out, and infrastructure as code

Overview

RAG DB is a production-grade, event-driven indexing platform for Retrieval-Augmented Generation on Azure. It watches Azure Blob Storage containers for content changes, routes events through Service Bus, and fans out to isolated per-index processors that convert, chunk, embed, and store documents in Azure Cosmos DB with Vector Search and Full-Text Search enabled. The system is designed for multi-tenant workloads, automatic scaling, and zero-downtime recovery.

Main Components

Component	Technology	Responsibility
API Service	FastAPI on Azure Container Apps	Public/private HTTP API for index CRUD, file management, search queries, and event ingestion via Dapr
Event Dispatcher	Dapr pub/sub on Service Bus Queue	Receives blob events from Event Grid, resolves index ownership, and routes work to per-index topics
Index Processor	Python on Azure Container Apps	Dedicated per-index worker that downloads files, converts to markdown, chunks, generates embeddings, and upserts to Cosmos DB
CronJob	Azure Container Apps Job	Runs every 5 minutes to detect stale indexes, recover failed runs, and reconcile state
Infrastructure	Bicep + ARM templates	Declarative IaC for all Azure resources, with parameterized deployments for baseline and per-index stacks
ragdb_helper_library	Python package (PyPI)	Shared library used by the API and processors containing Cosmos DB clients, chunking logic, embedding utilities, and data models

End-to-End Flow

The complete nine-step pipeline from file upload to searchable content:

Blob Storage — A file is created, renamed, or deleted in the watched storage account and container. Supported formats include PDF, DOCX, PPTX, XLSX, images, audio, video, and plain text.
Event Grid System Topic — An Event Grid subscription on the storage account detects the change and fires a Microsoft.Storage.BlobCreated, BlobRenamed, or BlobDeleted event.
Service Bus Queue — Event Grid delivers the event to the events-dispatcher Service Bus queue. Service Bus Premium guarantees ordering within a session and at-least-once delivery.
API receives event via Dapr — The RAG DB API subscribes to the events-dispatcher queue through a Dapr pub/sub component. It deserializes the event payload and extracts the storage account, container, folder, and file name.
Index resolution — The API queries the indexes_metadata database to find which Index owns the blob path. If no matching index exists, the event is dead-lettered for inspection.
Publish to per-index topic — The API creates a processing message (containing file metadata, index configuration, and the desired action) and publishes it to the Service Bus Topic named after the indexId.
Index Processor processes the file — The Container App ip-{indexId} receives the message via its Dapr subscription. It then:
- Downloads the file from Blob Storage using a managed identity with Storage Blob Data Reader role
- Converts the file to markdown using markitdown-pro (leveraging Document Intelligence for PDFs/images and Speech Services for audio/video)
- Splits the markdown into overlapping chunks using configurable size and overlap parameters
- Generates vector embeddings for each chunk via Azure OpenAI
Upsert to Cosmos DB — The processor writes DocumentChunk records to the index-specific Cosmos container. Each record includes the text content, vector embedding, source metadata, and an ETag for optimistic concurrency control. Existing chunks for the same file are replaced atomically.
Metadata updated — The processor updates the IndexFile status to indexed (or failed) and increments the IndexRun counters in the metadata database. Once all files in a run complete, the Index status transitions to succeeded.

Infrastructure Baseline

The following Azure services form the shared baseline infrastructure, deployed once per environment:

Service	Purpose
Azure Cosmos DB Account	NoSQL document store with VectorSearch and FullTextSearch capabilities enabled at the account level
Azure Service Bus Premium Namespace	Message broker for the events-dispatcher queue and all per-index topics
Azure Event Grid System Topic	Monitors the source Storage Account for blob lifecycle events
Azure Container Apps Environment (API)	Hosts the API Container App and CronJob with external ingress
Azure Container Apps Environment (Processors)	Dedicated environment for Index Processors with internal load balancer on a VNet
Azure OpenAI Service	Provides embedding model deployments (e.g., `text-embedding-3-large`)
Azure Key Vault	Centralized secret store for connection strings, API keys, and certificates
Azure Application Insights	Distributed tracing, structured logging, live metrics, and alerting
Azure Document Intelligence	Extracts structured text from PDFs, scanned documents, and images
Azure Speech Services	Transcribes audio and video files to text
Azure Container Registry	Stores Docker images for the API and Index Processor applications
Azure Storage Account	Source blob storage watched by Event Grid; also used for internal staging
Managed Identities + Role Assignments	System-assigned identities with Storage Blob Data Reader, Cosmos DB Data Contributor, ACR Pull, Key Vault Admin, and Service Bus Data Sender/Owner roles

Per-Index Deployment

When a new index is created, RAG DB provisions dedicated resources for isolation and independent scaling:

Service Bus Topic

A new topic is created in the Service Bus namespace, named after the indexId. The API publishes processing messages to this topic, and the index processor subscribes to it via Dapr.

Container App (`ip-{indexId}`)

A dedicated Container App is deployed in the processors environment. It runs the Index Processor image from the Container Registry and is configured with:

A Dapr pub/sub component pointing at the index-specific topic
Auto-scaling rules based on the Service Bus topic message count (scales to zero when idle, scales up under load)
Workload profile constraints (CPU and memory limits)
Liveness and readiness probes

Cosmos DB Container

A new container is created in the appropriate Cosmos DB database (see Multi-Cosmos Scale-Out below). The container is configured with:

Vector index policy — DiskANN index on the /vector path with cosine distance function
Full-text index policy — Full-text indexes on /chunk and /name for BM25 keyword search
Partition key — /file_id for efficient per-file operations
Indexing policy — Optimized includes/excludes for query patterns

Key Vault Secrets

Index-specific secrets (Cosmos container name, topic name, configuration overrides) are written to Key Vault and referenced by the Container App as secret environment variables.

Cosmos DB Architecture

RAG DB uses two logical databases within each Cosmos DB account:

`indexes_metadata` Database

Stores all control-plane data. Contains four containers:

Container	Partition Key	Content
`indexes`	`/id`	Index definitions (name, storage path, status, configuration, Cosmos account assignment)
`index_files`	`/indexId`	File tracking records with status, content hash, and timestamps
`index_runs`	`/indexId`	Run records with counters (total, completed, failed, pending) and timing
`cosmos_accounts`	`/id`	Registry of available Cosmos DB accounts with current container counts

`indexes` Database (per-account)

Contains the actual document chunks. Each index gets its own container within this database. The container name matches the indexId.

Each DocumentChunk document has this structure:

id — unique chunk identifier
file_id — the source IndexFile ID (partition key)
chunk — the text content (full-text indexed)
name — source file name (full-text indexed)
vector — float array embedding (vector indexed)
metadata — page number, chunk index, source path, timestamps
_etag — Cosmos DB ETag for optimistic concurrency

Vector Index Configuration

Index type: DiskANN — provides high-recall approximate nearest-neighbor search with low memory overhead
Distance function: Cosine — measures semantic similarity between embedding vectors
Dimensions: Configured per deployment (typically 3072 for text-embedding-3-large)

Full-Text Index Configuration

Indexed paths: /chunk and /name
Search type: BM25 full-text ranking
Used for: Hybrid queries that combine vector similarity with keyword matching for improved relevance

Multi-Cosmos Scale-Out

Azure Cosmos DB has a limit of 500 containers per database account. RAG DB handles this with an automatic bin-packing and scale-out strategy:

Account registry — The cosmos_accounts container in the metadata database tracks all available Cosmos DB accounts and their current container counts.
Bin-packing on index creation — When a new index is created, the API queries the account registry to find the account with the most available capacity. The new index container is placed there.
Automatic provisioning — When all registered accounts are nearing capacity (configurable threshold, e.g., 450 containers), the system flags the need for a new Cosmos DB account. A new account can be provisioned via the IaC pipeline and registered in the metadata store.
Cross-account queries — The API and processors resolve which Cosmos account holds a given index by looking up the assignment in the indexes metadata container. Connection strings for each account are stored in Key Vault.

This approach allows RAG DB to scale to thousands of indexes across multiple Cosmos DB accounts without any single-account bottleneck.

Monitoring and Recovery

CronJob

A Container Apps Job runs every 5 minutes and performs the following:

Stale index detection — Identifies indexes that have been in provisioning or running status for longer than the configured timeout (default: 30 minutes)
Automatic recovery — Retries failed provisioning steps, re-publishes stuck processing messages, and resets zombie IndexFile records that are stuck in indexing status
State reconciliation — Compares the actual state of Azure resources (topics, container apps, Cosmos containers) with the expected state in the metadata database and flags or repairs drift
Run completion — Checks if all files in an active IndexRun have been processed and transitions the run and index to their final status

Application Insights

All components emit structured logs and distributed traces to Application Insights:

API — request/response logging, event processing traces, error rates
Processors — per-file processing duration, embedding latency, chunk counts, failure reasons
CronJob — recovery actions taken, drift detected, stale indexes found

Custom dashboards and alerts can be configured for key metrics such as processing lag, error rates, and queue depth.

Infrastructure as Code

All infrastructure is defined declaratively and deployed through automated pipelines:

Bicep and ARM Templates

Baseline template — Deploys all shared infrastructure (Cosmos account, Service Bus namespace, Container Apps environments, Key Vault, monitoring, identity and role assignments)
Per-index template — Deploys index-specific resources (Service Bus Topic, Container App, Cosmos container, Key Vault secrets). Parameterized by indexId and called programmatically by the API during index creation

GitHub Actions

The repository includes 15 GitHub Actions workflows covering:

CI/CD — Build, test, and push Docker images to Container Registry on every merge to main
Baseline deployment — Provision or update shared infrastructure across dev, staging, and production environments
Per-index deployment — Triggered by the API or manually to deploy/update/delete index-specific resources
Database migrations — Apply schema changes to Cosmos DB metadata containers
Scheduled maintenance — Nightly cleanup of orphaned resources and cost optimization checks
Security scanning — Dependency audits, container image vulnerability scanning, and secret rotation reminders

Architecture

On this page