You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all I'd like to thank you for all your hard work. This is a great set of tools, and I'm really enjoying using them. Let me first describe my use case.
I have a directory where I read a bunch of files and convert them to embeddings/vectors with llm-chain-qdrant. This works great! However where I'm running into issues is that some of these files are intermittently edited. This is where the problem lies, if I update 1 file, I should really only be adding 1 new vector to the vector store. However as there is no way of comparing which files are in the vector store with which files have changed on disk, you end up just having to compute/fetch a new embedding anyway which is done here. I'd like to propose adding a third method to the VectorStore trait, something like. document_exists(). Doing so would allow for you to only add new documents that aren't already in the VectorStore.
I have a workaround in the the works by just directly querying at the database level. But I think this would be a lot tidier. What are your thoughts on this?
The text was updated successfully, but these errors were encountered:
Just FYI I'm happy to work on this, it just might take me a while to update all the trait implementations, and I'd like to get some expert eyes before I start.
First of all I'd like to thank you for all your hard work. This is a great set of tools, and I'm really enjoying using them. Let me first describe my use case.
I have a directory where I read a bunch of files and convert them to embeddings/vectors with llm-chain-qdrant. This works great! However where I'm running into issues is that some of these files are intermittently edited. This is where the problem lies, if I update 1 file, I should really only be adding 1 new vector to the vector store. However as there is no way of comparing which files are in the vector store with which files have changed on disk, you end up just having to compute/fetch a new embedding anyway which is done here. I'd like to propose adding a third method to the VectorStore trait, something like.
document_exists()
. Doing so would allow for you to only add new documents that aren't already in the VectorStore.I have a workaround in the the works by just directly querying at the database level. But I think this would be a lot tidier. What are your thoughts on this?
The text was updated successfully, but these errors were encountered: