Compendium Keeper

Compendium Keeper is a tool that indexes Compendium data (generated by Compendium Scribe) into a vector database (like Pinecone) to power Retrieval-Augmented Generation (RAG) workflows.

Features

Easily index concepts from a Compendium into a vector store.
Embed concept content, questions, and keywords using OpenAI embeddings.
Store embeddings and metadata in a vector database for quick retrieval.
Supports multiple vector databases through an extensible architecture.
Handles both .compendium.pickle and .compendium.xml file formats.

Requirements

Python 3.12+
Compendium Scribe must be installed.
OpenAI API key.
Pinecone API key and environment (if using Pinecone).

Installation

Clone the Repository

git clone https://github.com/yourusername/compendiumkeeper.git
cd compendiumkeeper

Install Dependencies

Ensure you have PDM installed. Then run:

pdm install

Configuration

Create a .env file in the root directory of the project to store your API keys and configuration. You can use the provided .env.example as a template.

Example `.env` File

# .env.example

# OpenAI API Key for generating embeddings
OPENAI_API_KEY=sk-your-openai-api-key

# Pinecone API Key and Environment
PINECONE_API_KEY=your-pinecone-api-key
PINECONE_ENVIRONMENT=us-east-1-aws

Rename .env.example to .env and replace the placeholder values with your actual API keys.

Usage

Generate a Compendium using Compendium Scribe

compendium-scribe-create-compendium --domain "Cell Biology"

This produces files like cell_biology_2024-12-05.compendium.pickle and cell_biology_2024-12-05.compendium.xml.

Index the Compendium

Use the --compendium-file option to specify the Compendium file (pickle or XML).
You must also specify the vector database index name using the --index-name option.

Ensure your .env file is properly configured with the necessary API keys.

Index from a Pickle File

pdm run compendium-keeper index --compendium-file cell_biology_2024-12-05.compendium.pickle --index-name my_knowledge_index

Index from an XML File

pdm run compendium-keeper index --compendium-file cell_biology_2024-12-05.compendium.xml --index-name my_knowledge_index

Verify Indexing

After successful execution, you should see a confirmation message indicating the number of concepts indexed.

Indexed 25 concepts from domain 'Cell Biology' into index 'my_knowledge_index'.
Indexing complete!

Combine Multiple Compendia

To create a single knowledge base that spans multiple Compendia, repeat the indexing process for each Compendium, using the same --index-name.

For example:

pdm run compendium-keeper index --compendium-file django_2024-12-10.compendium.pickle --index-name all_python_knowledge
pdm run compendium-keeper index --compendium-file flask_2024-12-10.compendium.xml --index-name all_python_knowledge

This will merge the knowledge from multiple Compendia into the same vector database index.

Extensibility

Multiple Vector Databases: The architecture allows for adding support for other vector databases (e.g., Weaviate, ChromaDB) by implementing new classes in the vector_db/ directory.
Custom Embedding Strategies: Modify or extend utils.py to customize how embeddings are generated or processed.

Developer Workflow

Set Up Environment Variables

Create a .env file as described above.
Generate a Compendium

Use Compendium Scribe to generate a Compendium in pickle or XML format.
Index with Compendium Keeper

Run the indexing command to upload embeddings to your chosen vector database.

Troubleshooting

Missing API Keys

Ensure that your .env file contains all required API keys. The CLI will notify you if any are missing.
Unsupported Vector DB

Currently, only Pinecone is supported. To add support for another vector database, implement a new class in vector_db/ adhering to the VectorDatabase abstract base class.
File Format Issues

Ensure that the --compendium-file you provide ends in either .compendium.pickle or .compendium.xml. Files with other extensions are not supported.
API Rate Limits

Be mindful of OpenAI's API rate limits when indexing large Compendia. Consider implementing batching or rate limiting if necessary.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

License

Compendium Keeper is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
src/compendiumkeeper		src/compendiumkeeper
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compendium Keeper

Features

Requirements

Installation

Configuration

Example `.env` File

Usage

Index from a Pickle File

Index from an XML File

Extensibility

Developer Workflow

Troubleshooting

Contributing

License

About

Releases

Languages

License

btfranklin/compendiumkeeper

Folders and files

Latest commit

History

Repository files navigation

Compendium Keeper

Features

Requirements

Installation

Configuration

Example .env File

Usage

Index from a Pickle File

Index from an XML File

Extensibility

Developer Workflow

Troubleshooting

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages

Example `.env` File