API reference
Documentation of the different classes, starting from the app layer down to the model layer.
App
- class ragcore.app.RAGCore(config: str | None = None, log_level='DEBUG', file_logging=False)
Bases:
AbstractApp
Retrieval-Augmented Generation Core lets you create RAG applications with a configuration and a few lines of code.
RAGCore is a library that simplifies the implementation of Retrieval-Augmented Generation (RAG) applications. It combines a large language model with a knowledge store, allowing users to generate responses based on retrieved information.
- Usage:
Create an instance of RAGCore with the desired configuration.
Add documents to the knowledge store using the
add_document
method.Query the system with natural language using the
query
method.Retrieve generated responses.
- Example:
# Instantiate RAGCore rag_instance = RAGCore(config_path='path/to/config.yaml') # Add a document to the knowledge store rag_instance.add(path='path/to/my_book.pdf') # Query the system query = 'Tell me about the topic.' response = rag_instance.query(query=query) # Print the content string of the generated response print(response.content) # List the document's titles and contents on which the response is based for doc in response.documents: print(doc.title, " | ", doc.content) # List all documents in the database print(rag_instance.get_titles()) # Remove the document rag_instance.delete(title="my_book")
- Configuration:
RAGCore relies on a configuration file (default name
config.yaml
) to customize its behavior. For more information, refer to the Configuration section of the documentation at https://daved01.github.io/ragcore.- Attributes:
database_service: The service for handling database interactions.
document_service: The service for handling document interactions.
llm_service: The service for handling large language model interactions.
configuration: An
AppConfig
containing the configuration.
- add(path: str, user: str | None = None) None
Adds a document to the database.
Adds the document in the path to the database. The filename, without the file extension, becomes the document title. For example, the path
data/documents/my_book.pdf
adds the book to the database with the titlemy_book
. Before it is added, the document is split into overlapping chunks as specified in the config file. Then, using the embedding model, vector representations are created which are then added to the database.- Args:
path: A string to the file location.
user: An optional string to identify a user.
- delete(title: str, user: str | None = None) None
Deletes a collection from the database.
Given a title, all documents with that title, also called a collection, are deleted from the database.
- Args:
title: The title of the collection to remove from the database.
user: An optional string to identify a user.
- get_titles(user: str | None = None) TitlesResponse
Gets the document titles in the database.
If a user identifier is given, the titles owned by this user are returned. If no user is given, the titles of the main collection are returned. The titles are sorted in alphabetical order.
- Args:
user: An optional string to identify the owner.
- Returns:
TitlesResponse: Object with an optional list of alphabetically sorted string titles and an optional user.
- query(query: str, user: str | None = None) QueryResponse
Queries the database with a query.
Queries the database and makes an LLM request with the prompt and the context provided by the database.
- Args:
query: The query string to query the database with.
user: An optional string to identify a user.
- Returns:
A
QueryResponse
object. The field content contains the string or None if a response could not be generated. The field documents is a list with documents of type Document on which the response is based.
- class ragcore.app.base_app.AbstractApp(log_level='DEBUG', file_logging=False)
Bases:
object
Abstract base app for RAG Core.
Defines the required methods and sets up the logger.
- abstract add(path: str, user: str | None = None) None
Adds a document to the database.
- abstract delete(title: str, user: str | None = None) None
Removes a document from the database.
- abstract get_titles(user: str | None = None) TitlesResponse
Lists all titles owned by the user in the database, sorted in alphabetical order.
- initialize_logger(log_level: str) Logger
Creates and configures a logger instance for console and file logging.
Sets the log level and defines the format of the log statements.
- Args:
log_level: A string to set the log level.
- Returns:
Logger: The logger instance for the app.
- logger: Logger
- abstract query(query: str, user: str | None = None) QueryResponse
Runs a query against a database.
Services
- class ragcore.services.database_service.DatabaseService(logger: Logger, config: DatabaseConfiguration, embedding_config: EmbeddingConfiguration)
Bases:
object
Handles database interactions.
The DatabaseService class provides methods for interacting with the database, allowing you to perform operations such as adding documents, querying information, and managing the knowledge store. Based on the configuration, either a local or a remote database is managed by the service. The interactions with the database are implemented in the model layer.
- Attributes:
logger: A logger instance
base_path: The path to the local database as a string.
name: The name of the database
num_search_results: The number of results which are returned for a query.
embedding_config: A configuration for the embedding, usually the embedding part of the config file.
- add_documents(documents: list[Document], user: str | None = None) None
Adds documents to an existing database.
Documents must have metadata, and the metadata must have a title specified. Adding documents with the same title multiple times is not possible.
- Args:
documents: A list of documents of type
Document
.user: An optional string to identify a user.
- delete_documents(title: str, user: str | None = None) None
Deletes all documents with the title
title
from the database.The title matching is case sensitive.
- Args:
title: The title of the documents to be deleted.
user: An optional string to identify a user.
- get_titles(user: str | None = None) list[str | None]
Get the document titles for the user in the database, sorted in alphabetical order.
- Args:
user: An optional string to identify a user.
- Returns:
A list of alphabetically sorted strings with the titles, or an empty list.
- initialize_local_database() None
Initializes a local database.
To initialize a local database, the DatabaseService must have the attributes
base_path
andprovider
set. If the base path does not exist it is created.
- initialize_remote_database() None
Initializes a remote database.
A remote database often requires that a
base_url
is set.
- query(query: str, user: str | None = None) list[Document] | None
Query the database with a query.
The instantiated database is queried with the given query string and returns a list of documents for a query.
- Args:
query: A query as a string.
user: An optional string to identify a user.
- Returns:
A list of documents or None.
- class ragcore.services.document_service.DocumentService(logger: Logger)
Bases:
object
Handles document interactions.
The DocumentService class provides methods for creating and processing documents so that they can be stored in the database.
There are two properties related to documents. When a text is first loaded into the system, for example from a pdf file, it is parsed into
pages
, which has oneDocument
per page. This list should then be split into overlapping chunks using the methodsplit_pages
. The resulting splits are then available in thedocuments
property.- Attributes:
logger: A logger instance.
pages: A list of
Document
, representing text which has not been split into chunks.documents: A list of
Document
of overlapping chunks.
- load_texts(path: str) None
Loads text from a file into memory so it can be processed further.
Usually, you load text from a file so that it can split into chunks and ingested into the database.
- Currently supported file formats are:
PDF
- Args:
path: The string path to a file to be loaded into the service as
pages
.
- split_pages(chunk_size: int, chunk_overlap: int) None
Splits pages into overlapping chunks and stores them in documents.
Must have loaded text with the
load_texts
method prior to splitting it.- Args:
chunk_size: The size of the chunks. chunk_overlap: The overlap of the chunks.
- class ragcore.services.llm_service.LLMService(logger: Logger, config: LLMConfiguration)
Bases:
object
Initializes a Large Language Model and handles requests made to it.
- Currently supported providers are:
OpenAI
AzureOpenAI
- Attributes:
logger: A logger instance. llm_provider: The provider for the LLM. llm_model: The name of the LLM, as specified by the provider. llm_config: A configuration for the LLM.
- create_prompt(question: str, contexts: list[Document]) str
Creates the prompt which is used to make the request.
The prompt is created from the prompt template and a concatenation of the document chunks as strings.
- Args:
question: A question as string.
contexts: List of
Document
. Typically the grounding information for the question from the database.- Returns:
A prompt as a string.
- initialize_llm()
Initializes the selected Large Language Model from the specified provider.
Make sure to have environment variables set as required by the selected provider.
- Examples:
OpenAI:
OPENAI_API_KEY
AzureOpenAI:
AZURE_OPENAI_API_KEY
- make_llm_request(prompt: str) str | None
Makes a request to the initialized Large Language Model.
- Args:
prompt: A prompt as a string for the request.
- Returns:
The response from the LLM as a string or None if no response could be generated.
- class ragcore.services.text_splitter_service.TextSplitterService(chunk_size: int, chunk_overlap: int)
Bases:
object
Handles splitting of text.
To split the documents, a recursive text splitter is used.
- Attributes:
chunk_size: The size of the chunks. chunk_overlap: The overlap of the chunks.
Models
- class ragcore.models.database_model.BaseVectorDatabaseModel
Bases:
ABC
Abstract Base Class for vector database models.
The BaseVectorDatabaseModel defines the interface for vector database models, serving as a base class for concrete implementations. Subclasses must implement the abstract methods to provide functionality for adding, deleting, querying, and retrieving the number of documents in the database.
- abstract add_documents(documents: list[Document], user: str | None = None) bool
Adds documents to the database.
- Args:
documents: A list of documents
Document
to be added to the database.user: An optional string to identify a user.
- Returns:
True if documents have been added, False otherwise.
- abstract delete_documents(title: str, user: str | None = None) bool
Deletes all documents with title title from the database.
- Args:
title: The title of the documents to be deleted.
user: An optional string to identify a user.
- Returns:
True if documents have been delete, False otherwise.
- abstract get_number_of_documents(user: str | None = None) int
Returns the total number of documents in the database.
- abstract get_titles(user: str | None = None) list[str | None]
Returns the titles owned by the user.
- Args:
user: An optional string to identify a user.
- Returns:
A list of strings with the titles, or an empty list.
- class ragcore.models.database_model.BaseLocalVectorDatabaseModel(persist_directory: str | None, embedding_function: BaseEmbedding)
Bases:
BaseVectorDatabaseModel
Base class for local databases.
- Attributes:
persist_directory: Path to a folder in which the local database should be created.
embedding_function: Embedding of type
BaseEmbedding
to be used to create vector representations of inputs.
- class ragcore.models.database_model.ChromaDatabase(persist_directory: str, num_search_results: int, embedding_function: BaseEmbedding)
Bases:
BaseLocalVectorDatabaseModel
Chroma database.
Chroma allows to create collections, which are groups of documents. In this class, a single collection is used for all documents.
For more information on Chroma, see: https://www.trychroma.com.
- Attributes:
persist_directory: Path to a folder in which the local database should be created.
num_search_results: The number of results to be returned for a query.
embedding_function: Embedding of type
BaseEmbedding
to be used to create vector representations of inputs.
- add_documents(documents: list[Document], user: str | None = None) bool
Adds documents to the Chroma database.
In the database, each ID must be unique.
To prevent documents from the same source, say the same PDF file, from being added more than once, we check if a document with the same title already exists in the database. Only if it does not can the documents be added.
- Args:
documents: A list of documents.
user: An optional string to identify a user.
- Returns:
True if the document has been added, False otherwise.
- delete_documents(title: str, user: str | None = None) bool
Deletes all documents with the given title.
- Args:
title: The title of the documents to be deleted.
user: An optional string to identify a user.
- Returns:
True if documents have been deleted, False otherwise.
- get_number_of_documents(user: str | None = None) int
Returns the number of documents in the collection.
We use only one collection, so getting all documents in the database is equal to getting all documenst in the main collection.
- Args:
user: An optional string to identify a user.
- Returns:
The number of documents in the database.
- get_titles(user: str | None = None) list[str | None]
Returns the titles which are owned by the user.
- query(query: str, user: str | None = None) list[Document] | None
Queries the database with a query.
To perform the query on the database, vector representations is created from the query first.
- Args:
query: A query to query the database with.
user: An optional string to identify a user.
- Returns:
A list of results from the database, or None if no results could be retrieved.
- class ragcore.models.database_model.PineconeDatabase(base_url: str, num_search_results: int, embedding_function: BaseEmbedding)
Bases:
BaseVectorDatabaseModel
Pinecone database.
To use it, make sure you have your API key under the name
PINECONE_API_KEY
available as environment variable.This implementation uses the Pinecone SDK and the REST API. There is also a gRPC version of the Python client with potential for higher upsert speeds, which could be investigated: https://docs.pinecone.io/docs/upsert-data.
For more information on Pinecone, see: https://www.pinecone.io.
- Attributes:
base_url: The url pointing to your Pinecone instance.
num_search_results: The number of results to be returned for a query.
embedding_function: Embedding of type
BaseEmbedding
to be used to create vector representations of inputs.
- add_documents(documents: list[Document], user: str | None = None) bool
Adds documents to the database.
Pinecone is heavily based on IDs while other features are currently missing. For example, it is not possible to search the database by metadata without a vector to find titles. That is why we create IDs in the format
<title>#<UID>
. It is possible to filter results by IDs and ID prefixes using the REST API.- Args:
documents: A list of documents
Document
to be added to the database.user: An optional string to identify a user.
- Returns:
True if documents have been added, False otherwise.
- delete_documents(title: str, user: str | None = None) bool
Deletes all documents with title title from the database.
- Args:
title: The title of the documents to be deleted.
user: An optional string to identify a user.
- Returns:
True if documents have been deleted, False otherwise.
- get_number_of_documents(user: str | None = None) int
Returns the total number of documents in the database.
- Args:
user: An optional string to identify a user.
- Returns:
The number of documents owned by the user.
- get_titles(user: str | None = None) list[str | None]
Returns the titles owned by the user.
Currently, this method does not exist in the SDK. Additionally, it is not possible to return the metadata along with the vectors from the API endpoint. That is why we extract the titles from the IDs.
- Args:
user: An optional string to identify a user.
- Returns:
A list of strings with the titles, or an empty list.
- class ragcore.models.document_model.Document(content: str, title: str, metadata: Mapping[str, Any])
Bases:
object
Model for documents.
- Attributes:
content: The content of document, as string.
title: The title of the document.
metadata: A mapping for the metadata.
- content: str
- metadata: Mapping[str, Any]
- title: str
- class ragcore.models.app_model.QueryResponse(content: str | None, documents: Sequence[Document | None], user: str | None)
Bases:
object
Model for query responses.
- Attributes:
content: String with the response, None if no response could be generated.
documents: Sequence of documents on which the response is based on. Empty list if response is None.
user: An optional string to identify a user.
- content: str | None
- user: str | None
- class ragcore.models.app_model.TitlesResponse(user: str | None, contents: list[str | None])
Bases:
object
Model for document title responses.
- Attributes:
user: The owner of the titles.
contents: The list of title strings.
- contents: list[str | None]
- user: str | None
- class ragcore.models.document_loader_model.PDFLoader(file_path: str)
Bases:
object
Class for the PDF loader.
- Atrributes:
file_path: The path to the PDF file to be loaded.
- load_and_split(title: str) list[Document]
Loads a PDF file with the specified title from the file path.
The created metadata contains the field
title
which is taken from the arguments. Typically, the title is the file name without the file extension.- Args:
title: The title of the documents.
- Returns:
A list of documents.
- class ragcore.models.embedding_model.BaseEmbedding
Bases:
ABC
Abstract Base Class for embeddings.
The BaseEmbedding defines the interface for embedding models, serving as a base class for concrete implementations. Subclasses must implement the abstract method to provide functionality for embedding a list of strings.
- abstract embed_texts(texts: list[str]) list[list[float]]
Creates a list of embedding vectors for a list of text strings.
- Args:
texts: A list of strings to create embeddings from.
- Returns:
A list of embedding vectors.
- class ragcore.models.embedding_model.BaseOpenAIEmbeddings(client: OpenAI | AzureOpenAI)
Bases:
BaseEmbedding
Base class for OpenAI and AzureOpenAI embeddings.
A class to implement the embedding method
embed_texts
which is the same for both OpenAI embedding models and AzureOpenAI models.- Attributes:
client: The client for the embedding provider, either OpenAI or AzureOpenAI.
- client: OpenAI | AzureOpenAI
- embed_texts(texts: list[str]) list[list[float]]
Create embedding vectors using the selected client.
- Args:
texts: A list of text strings.
- Returns:
A list of embedding vectors, one vector for each text element.
- model: str
- class ragcore.models.embedding_model.AzureOpenAIEmbedding(model: str, api_version: str, endpoint: str)
Bases:
BaseOpenAIEmbeddings
Class for Azure OpenAI embedding models.
Note that you must have your API key for OpenAI
AZURE_OPENAI_API_KEY
set.For more information see: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models
- Attributes:
model: The string for the model which should be used, as specified by Azure OpenAI.
api_version: The version string of the deployment.
endpoint: The endpoint of the deployment.
- class ragcore.models.embedding_model.OpenAIEmbedding(model: str)
Bases:
BaseOpenAIEmbeddings
Class for OpenAI embedding models.
Note that you must have your API key for OpenAI
OPENAI_API_KEY
set.For more information see: https://platform.openai.com/docs/guides/embeddings
- Attributes:
model: The string for the model which should be used, as specified by OpenAI.
- class ragcore.models.llm_model.BaseLLMModel(llm_provider: str, llm_model: str, llm_config: dict[str, str] | None)
Bases:
ABC
Abstract Base Class for Large Language Model models.
The BaseLLMModel defines the interface for LLM models, serving as a base class for concrete implementations. Subclasses must implement the abstract method to provide functionality for generating responses for a given text input.
- Attributes:
llm_provider: The provider of the LLM model.
llm_model: The llm from the provder as a string, as specified by the provider.
llm_config: Configuration for the LLM.
- abstract request(text: str) str
Perform a request to an LLM and return the response.
- Args:
text: A string with the request for the LLM.
- Returns:
A response from the llm as a string.
- class ragcore.models.llm_model.AzureOpenAIModel(llm_provider: str, llm_model: str, llm_config: dict[str, str] | None)
Bases:
BaseLLMModel
Class to interact with Azure OpenAI LLMs.
Make sure to have your API key set in the environment as
AZURE_OPENAI_API_KEY
.For more information, see: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
- request(text: str) str
Perform a request with an Azure OpenAI LLM.
- Args:
text: The text string for the request.
- Returns:
The response string from the LLM.
- class ragcore.models.llm_model.OpenAIModel(llm_provider: str, llm_model: str, llm_config: dict[str, str] | None)
Bases:
BaseLLMModel
Class to interact with OpenAI LLMs.
Make sure to have your API key set in the environment as
OPENAI_API_KEY
.For more information, see: https://platform.openai.com/docs/guides/text-generation
- request(text: str) str
Perform a request with an OpenAI LLM.
- Args:
text: The text string for the request.
- Returns:
The response string from the LLM.
- class ragcore.models.prompt_model.PromptGenerator
Bases:
object
Class to manage prompt templates and to generate prompts.
A prompt is generated from a template and two inputs. One input is the question, for example a question about some content in a document. The second input is a context string, which is a concatenated string of context which should be used in the prompt.
- get_prompt(question: str, context_str: str) str
Creates a prompt from the template, the question, and the context.
Typically, the question is the user input and the context is retrieved from a database.
- Args:
question: A question as a string.
context_str: A string with all context which should be part of the prompt.
- Returns:
A prompt as a string to be used for a LLM.
Data Transfer Objects
- class ragcore.dto.document_dto.DocumentDTO(content: str, title: str, metadata: Mapping[str, Any])
Bases:
object
Class for Document Data Transfer Objects to convert to RAG Core documents.
- Example:
# Instantiate a DTO dto = DocumentDTO(content="This is my text", title="Great text", metadata={"title": "Great text", "page": "1"}) # Now you can create a ragcore document doc = dto.to_ragcore()
- Attributes:
content: A string for the page content.
title: The title of the document as a string.
metadata: A mapping for metadata.
- to_langchain()
Converts to a LangChain document type.
- to_ragcore()
Converts to a RAG Core document type.