Supported components

This section lists all available components and shows how to select them in the configuration file. If you want to know what a specific component does and what role it plays in the system, take a look at Architecture. And if you don’t find the component you are looking for, feel free to add it and open a pull request!

Readers

Readers parse documents so that they can be added to the sytem’s database. Currently supported is text in the following file formats:

pdf

Splitters

Splitters take the text during ingestion and split it into chunks. Currently supported is a recursive text splitter only, so there is no option to select one in the configuration file yet.

Vector Databases

The database stores your embeddings and (references to) your document chunks. RAG Core supports both local and remote databases. The latter is a database which is hosted, for example, by a cloud provider.

Local Databases

Config key `provider`
Database	config value	requires
Chroma	`"chroma"`	`base_dir` in config

Remote Databases

Config key `provider`
Database	config value	requires
Pinecone	`"pinecone"`	`base_url` in config, `PINECONE_API_KEY` environment variable

Embeddings

The embedding model is used to create a vector representation of your document chunks and queries. Currently, the following remote embedding model families are supported.

Config key `provider`
Embedding model family	config value	requires
OpenAI	`"openai"`	`OPENAI_API_KEY` environment variable
Azure OpenAI	`"azure"`	`AZURE_OPENAI_API_KEY` environment variable

LLMs

The Large Language Model generates a response in natural language using the retrieved chunks and a prompt.

Config key `provider`
LLM model family	config value	requires
OpenAI	`"openai"`	`OPENAI_API_KEY` environment variable
Azure OpenAI	`"azure"`	`AZURE_OPENAI_API_KEY` environment variable