Getting Started
Prerequisites, architecture overview, key concepts, and a quick start guide for RAG DB
RAG DB is a fully managed indexing platform for Retrieval-Augmented Generation on Azure. It watches your blob storage for new files, automatically converts, chunks, and embeds them, then stores searchable document chunks in Azure Cosmos DB with hybrid vector and full-text search.
This guide covers what you need, how the system works end to end, the core data model, and how to create your first index and run your first query.
Prerequisites
RAG DB depends on the following Azure services. All of them must be provisioned in your subscription before deployment.
| Service | Purpose |
|---|---|
| Azure Cosmos DB | Document store with VectorSearch and FullTextSearch capabilities enabled |
| Azure Service Bus Premium | Ordered, reliable message routing between components (queues and per-index topics) |
| Azure Event Grid | System Topic on your Storage Account to detect blob changes in real time |
| Azure Container Apps | Hosts the API, per-index processors, and the scheduled CronJob |
| Azure OpenAI Service | Generates vector embeddings for document chunks |
| Azure Key Vault | Stores connection strings, API keys, and managed-identity secrets |
| Azure Application Insights | Distributed tracing, logging, and performance monitoring |
| Azure Document Intelligence | Extracts structured content from PDFs, Office documents, and images |
| Azure Speech Services | Transcribes audio and video files to text before indexing |
| Azure Container Registry | Hosts the Docker images for the API and Index Processor apps |
| Azure Storage Account (Blob) | Source of truth for the files you want indexed |
You will also need an Auth0 tenant (or equivalent OIDC provider) if you plan to use token-based authentication, or you can use API key authentication.
How It Works
RAG DB follows a fully event-driven architecture. Here is the complete nine-step flow from file upload to searchable content:
-
Files uploaded to Blob Storage — A user or automated process uploads files (PDFs, Office docs, images, audio, etc.) to a designated blob container and folder.
-
Event Grid detects the change — An Event Grid System Topic on the storage account fires a
BlobCreated,BlobRenamed, orBlobDeletedevent. -
Event delivered to Service Bus Queue — Event Grid routes the event to the
events-dispatcherService Bus queue, guaranteeing at-least-once delivery. -
API receives the event via Dapr — The RAG DB API (FastAPI) subscribes to the queue through a Dapr pub/sub component. It deserializes the event and extracts the blob path.
-
API resolves the target index — Based on the storage account, container, and folder in the event payload, the API looks up which Index owns that path in the
indexes_metadatadatabase. -
API publishes work to the per-index topic — The API creates a processing message and publishes it to the Service Bus Topic named after the
indexId. This fans out work so each index is processed independently. -
Index Processor downloads, converts, chunks, and embeds — The dedicated Container App for that index (
ip-{indexId}) picks up the message, downloads the file from blob storage, converts it to markdown usingmarkitdown-pro(with Document Intelligence and Speech Services for complex formats), splits the content into chunks, and generates vector embeddings via Azure OpenAI. -
Chunks upserted to Cosmos DB — The processor writes
DocumentChunkrecords (each containing the text, vector embedding, and metadata) into the index-specific Cosmos DB container using optimistic concurrency. -
Metadata updated — The processor updates the
IndexFilestatus andIndexRuncounters in the metadata database, marking files asindexedand recording success/failure counts.
After this pipeline completes, all content is immediately searchable through the RAG DB query APIs using hybrid vector and full-text search.
Key Concepts
Index
An Index is the top-level resource. It represents a logical collection of documents tied to a specific blob storage location. Each index gets its own Cosmos DB container, Service Bus Topic, and Container App processor.
When you create an index, it moves through a lifecycle tracked by its status:
| Status | Meaning |
|---|---|
provisioning | Azure infrastructure is being deployed (topic, container app, Cosmos container) |
running | Actively processing files |
succeeded | All known files have been indexed successfully |
failed | One or more critical errors occurred during provisioning or processing |
updating | Configuration change in progress (e.g., scaling, redeployment) |
deleting | Teardown in progress — topic, container app, and data being removed |
IndexFile
An IndexFile tracks a single source file within an index. It records the blob path, content hash, and processing state.
| Status | Meaning |
|---|---|
pending | File detected but not yet picked up by the processor |
indexing | Processor is actively converting, chunking, and embedding |
indexed | Successfully processed — chunks are searchable |
failed | Processing failed (conversion error, embedding timeout, etc.) |
delete_pending | Deletion requested but not yet executed |
deleting | Chunks are being removed from Cosmos DB |
not_supported | File type is not supported for conversion |
IndexRun
An IndexRun represents a batch processing session for an index. It maintains counters that track progress:
- total — number of files in the run
- completed — files successfully indexed
- failed — files that encountered errors
- pending — files still waiting to be processed
Runs also record timestamps for start, last activity, and completion.
DocumentChunk
A DocumentChunk is the atomic unit of searchable content. Each chunk contains:
- chunk — the text content of the chunk
- name — the source file name
- vector — a float array (embedding) generated by Azure OpenAI, stored with a DiskANN vector index for fast approximate nearest-neighbor search
- metadata — page number, chunk index, source path, and other contextual information
Cosmos DB indexes both the vector field (for semantic search) and the chunk and name fields (for full-text keyword search), enabling hybrid queries.
Quick Start
1. Create an Index
Send a POST request to create a new index pointing at your blob storage location:
curl -X POST https://your-ragdb-api.azurecontainerapps.io/api/v1/indexes \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"name": "my-knowledge-base",
"storageAccount": "mystorageaccount",
"blobContainer": "documents",
"blobFolder": "knowledge-base"
}'The response includes the indexId. The index status will be provisioning while RAG DB deploys the required infrastructure (typically 1-3 minutes).
2. Upload Files to Blob Storage
Upload your documents to the blob path you specified:
az storage blob upload-batch \
--account-name mystorageaccount \
--destination documents \
--destination-path knowledge-base \
--source ./my-local-docs/Event Grid will automatically detect the uploads and trigger the indexing pipeline. You can monitor progress by polling the index status or checking IndexRun counters.
3. Search Your Content
Once files reach indexed status, run a semantic search:
curl -X POST https://your-ragdb-api.azurecontainerapps.io/api/v1/indexes/{indexId}/query/semantic \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"query": "How does the billing system handle refunds?",
"top_k": 5
}'The API returns the most relevant chunks ranked by hybrid vector and full-text similarity, ready to be passed as context to your LLM.
Authentication
RAG DB supports two authentication methods:
-
Auth0 login — Obtain a JWT token via your Auth0 tenant and pass it as a Bearer token in the
Authorizationheader. This is the recommended approach for interactive applications and dashboards. -
API key — Pass your key in the
x-api-keyrequest header. This is simpler and well-suited for server-to-server integrations and scripts.
Both methods are supported on all API endpoints. See the API Reference for details on each endpoint.