Getting Started

Prerequisites, architecture overview, key concepts, and a quick start guide for RAG DB

RAG DB is a fully managed indexing platform for Retrieval-Augmented Generation on Azure. It watches your blob storage for new files, automatically converts, chunks, and embeds them, then stores searchable document chunks in Azure Cosmos DB with hybrid vector and full-text search.

This guide covers what you need, how the system works end to end, the core data model, and how to create your first index and run your first query.

Prerequisites

RAG DB depends on the following Azure services. All of them must be provisioned in your subscription before deployment.

Service	Purpose
Azure Cosmos DB	Document store with VectorSearch and FullTextSearch capabilities enabled
Azure Service Bus Premium	Ordered, reliable message routing between components (queues and per-index topics)
Azure Event Grid	System Topic on your Storage Account to detect blob changes in real time
Azure Container Apps	Hosts the API, per-index processors, and the scheduled CronJob
Azure OpenAI Service	Generates vector embeddings for document chunks
Azure Key Vault	Stores connection strings, API keys, and managed-identity secrets
Azure Application Insights	Distributed tracing, logging, and performance monitoring
Azure Document Intelligence	Extracts structured content from PDFs, Office documents, and images
Azure Speech Services	Transcribes audio and video files to text before indexing
Azure Container Registry	Hosts the Docker images for the API and Index Processor apps
Azure Storage Account (Blob)	Source of truth for the files you want indexed

You will also need an Auth0 tenant (or equivalent OIDC provider) if you plan to use token-based authentication, or you can use API key authentication.

How It Works

RAG DB follows a fully event-driven architecture. Here is the complete nine-step flow from file upload to searchable content:

Files uploaded to Blob Storage — A user or automated process uploads files (PDFs, Office docs, images, audio, etc.) to a designated blob container and folder.
Event Grid detects the change — An Event Grid System Topic on the storage account fires a BlobCreated, BlobRenamed, or BlobDeleted event.
Event delivered to Service Bus Queue — Event Grid routes the event to the events-dispatcher Service Bus queue, guaranteeing at-least-once delivery.
API receives the event via Dapr — The RAG DB API (FastAPI) subscribes to the queue through a Dapr pub/sub component. It deserializes the event and extracts the blob path.
API resolves the target index — Based on the storage account, container, and folder in the event payload, the API looks up which Index owns that path in the indexes_metadata database.
API publishes work to the per-index topic — The API creates a processing message and publishes it to the Service Bus Topic named after the indexId. This fans out work so each index is processed independently.
Index Processor downloads, converts, chunks, and embeds — The dedicated Container App for that index (ip-{indexId}) picks up the message, downloads the file from blob storage, converts it to markdown using markitdown-pro (with Document Intelligence and Speech Services for complex formats), splits the content into chunks, and generates vector embeddings via Azure OpenAI.
Chunks upserted to Cosmos DB — The processor writes DocumentChunk records (each containing the text, vector embedding, and metadata) into the index-specific Cosmos DB container using optimistic concurrency.
Metadata updated — The processor updates the IndexFile status and IndexRun counters in the metadata database, marking files as indexed and recording success/failure counts.

After this pipeline completes, all content is immediately searchable through the RAG DB query APIs using hybrid vector and full-text search.

Key Concepts

Index

An Index is the top-level resource. It represents a logical collection of documents tied to a specific blob storage location. Each index gets its own Cosmos DB container, Service Bus Topic, and Container App processor.

When you create an index, it moves through a lifecycle tracked by its status:

Status	Meaning
`provisioning`	Azure infrastructure is being deployed (topic, container app, Cosmos container)
`running`	Actively processing files
`succeeded`	All known files have been indexed successfully
`failed`	One or more critical errors occurred during provisioning or processing
`updating`	Configuration change in progress (e.g., scaling, redeployment)
`deleting`	Teardown in progress — topic, container app, and data being removed

IndexFile

An IndexFile tracks a single source file within an index. It records the blob path, content hash, and processing state.

Status	Meaning
`pending`	File detected but not yet picked up by the processor
`indexing`	Processor is actively converting, chunking, and embedding
`indexed`	Successfully processed — chunks are searchable
`failed`	Processing failed (conversion error, embedding timeout, etc.)
`delete_pending`	Deletion requested but not yet executed
`deleting`	Chunks are being removed from Cosmos DB
`not_supported`	File type is not supported for conversion

IndexRun

An IndexRun represents a batch processing session for an index. It maintains counters that track progress:

total — number of files in the run
completed — files successfully indexed
failed — files that encountered errors
pending — files still waiting to be processed

Runs also record timestamps for start, last activity, and completion.

DocumentChunk

A DocumentChunk is the atomic unit of searchable content. Each chunk contains:

chunk — the text content of the chunk
name — the source file name
vector — a float array (embedding) generated by Azure OpenAI, stored with a DiskANN vector index for fast approximate nearest-neighbor search
metadata — page number, chunk index, source path, and other contextual information

Cosmos DB indexes both the vector field (for semantic search) and the chunk and name fields (for full-text keyword search), enabling hybrid queries.

Quick Start

1. Create an Index

Send a POST request to create a new index pointing at your blob storage location:

curl -X POST https://your-ragdb-api.azurecontainerapps.io/api/v1/indexes \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "name": "my-knowledge-base",
    "storageAccount": "mystorageaccount",
    "blobContainer": "documents",
    "blobFolder": "knowledge-base"
  }'

The response includes the indexId. The index status will be provisioning while RAG DB deploys the required infrastructure (typically 1-3 minutes).

2. Upload Files to Blob Storage

Upload your documents to the blob path you specified:

az storage blob upload-batch \
  --account-name mystorageaccount \
  --destination documents \
  --destination-path knowledge-base \
  --source ./my-local-docs/

Event Grid will automatically detect the uploads and trigger the indexing pipeline. You can monitor progress by polling the index status or checking IndexRun counters.

3. Search Your Content

Once files reach indexed status, run a semantic search:

curl -X POST https://your-ragdb-api.azurecontainerapps.io/api/v1/indexes/{indexId}/query/semantic \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "query": "How does the billing system handle refunds?",
    "top_k": 5
  }'

The API returns the most relevant chunks ranked by hybrid vector and full-text similarity, ready to be passed as context to your LLM.

Authentication

RAG DB supports two authentication methods:

Auth0 login — Obtain a JWT token via your Auth0 tenant and pass it as a Bearer token in the Authorization header. This is the recommended approach for interactive applications and dashboards.
API key — Pass your key in the x-api-key request header. This is simpler and well-suited for server-to-server integrations and scripts.

Both methods are supported on all API endpoints. See the API Reference for details on each endpoint.

Getting Started

On this page