RAG

Retrieval-augmented generation with pgvector — multi-tenant vector store + generic use cases for indexing, retrieval, and stats.

The @app/rag package wraps a pgvector table and the @app/ai embedding provider into four small use cases the rest of the codebase can call: EmbedAndStoreUseCase, RetrieveRelevantContextUseCase, DeleteEmbeddingsBySourceUseCase, GetEmbeddingStatsUseCase.

It is opt-in. The boilerplate ships with a stock postgres:16-alpine image; pgvector and the migration only matter if you actually use the package.

When to install

Install @app/rag when you want to:

Embed domain text (notes, descriptions, transcripts) and search it later by semantic similarity.
Inject "relevant context" snippets into an LLM prompt at turn time (classic RAG pattern — pairs cleanly with the assistant package).
Power an admin observability surface that shows per-org index size.

If you only need streaming chat completions, skip this — @app/ai is enough.

Switching to pgvector

In docker-compose.dev.yml, swap the Postgres image:

postgres:
  # image: postgres:16-alpine                  # default
  image: pgvector/pgvector:pg16                # required for @app/rag

The data directory is compatible — no volume reset needed.

Migration

Copy the reference SQL from packages/rag/prisma/migrations/0001_rag_init/ into your apps/server/prisma/migrations/ directory (with a fresh timestamp), then:

cd apps/server && bun prisma migrate dev

Add the matching Embedding model to schema.prisma — the snippet is in the package README.

Wiring example

import { OpenAiEmbeddingProvider } from '@app/ai';
import {
  PgvectorEmbeddingStore,
  EmbedAndStoreUseCase,
  RetrieveRelevantContextUseCase,
} from '@app/rag';

const embeddings = new OpenAiEmbeddingProvider({
  apiKey: env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',
  expectedDimension: 1536,
});
const store = new PgvectorEmbeddingStore(prisma);

const indexer = new EmbedAndStoreUseCase(embeddings, store);
await indexer.execute({
  organizationId: 'org_123',
  sourceType: 'note',
  sourceId: 'note_42',
  snippet: 'Quarterly review with Acme Corp scheduled for May 12.',
});

const retriever = new RetrieveRelevantContextUseCase(embeddings, store);
const hits = await retriever.execute({
  organizationId: 'org_123',
  query: 'what meetings do I have with Acme?',
  topK: 5,
});

Multi-tenant safety

organizationId is required on every read and write. The pgvector store filters by it on every SELECT. Add an integration test on the consumer side that inserts overlapping rows across two orgs and asserts no leakage — the package can't enforce this without your DB, so it's your contract to keep.

Vector dimension

The migration ships vector(1536) to match text-embedding-3-small. To use a larger model (e.g. text-embedding-3-large → 3072), edit the migration column width, recreate the IVFFlat index, and pass { dimension: 3072 } to the store + expectedDimension: 3072 to the embedding provider so a future model swap fails loud rather than corrupting the index.

Listener pattern

Pair EmbedAndStoreUseCase with your domain events to keep the index fresh. When a note/task/etc. is created or updated, build a snippet string and call indexer.execute(...). When it's deleted, call DeleteEmbeddingsBySourceUseCase so stale rows don't keep showing up in search.

The assistant package (coming next) provides a generator for these listeners.