Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to only add_{texts,documents} if they aren't already in the vector store? #232

Open
nathaniel-brough opened this issue Oct 31, 2023 · 2 comments
Assignees

Comments

@nathaniel-brough
Copy link

First of all I'd like to thank you for all your hard work. This is a great set of tools, and I'm really enjoying using them. Let me first describe my use case.

I have a directory where I read a bunch of files and convert them to embeddings/vectors with llm-chain-qdrant. This works great! However where I'm running into issues is that some of these files are intermittently edited. This is where the problem lies, if I update 1 file, I should really only be adding 1 new vector to the vector store. However as there is no way of comparing which files are in the vector store with which files have changed on disk, you end up just having to compute/fetch a new embedding anyway which is done here. I'd like to propose adding a third method to the VectorStore trait, something like. document_exists(). Doing so would allow for you to only add new documents that aren't already in the VectorStore.

I have a workaround in the the works by just directly querying at the database level. But I think this would be a lot tidier. What are your thoughts on this?

@nathaniel-brough
Copy link
Author

Just FYI I'm happy to work on this, it just might take me a while to update all the trait implementations, and I'd like to get some expert eyes before I start.

@williamhogman
Copy link
Contributor

Hey :)

Yeah I think that makes sense let's try to get it added. Let me know on discord if you need any help/pair programming :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants