-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get the ONLY original documents' chunks? #54
Comments
you can set |
Thanks for the response @parthsarthi03 . Alright, I'll try and get back to you. One question if I set |
Yes, it will act as naive RAG. Looking back at your original question, are you using do you want use the tree traversal method of RAPTOR and just filter for the last layer? Because the setting I mentioned will just restrict the retrieval to those layer effectively doing naive RAG. |
@parthsarthi03 |
Ah, okay, that is a bit harder but doable. You'll have to add the following filter before the below line to filter for the leaf nodes. selected_nodes = [node for node in selected_nodes if node.index in self.tree.leaf_nodes] raptor/raptor/tree_retriever.py Line 249 in 7da1d48
This should filter the selected nodes to only the leaf nodes. let me know if you run into any issues. |
Thanks @parthsarthi03 I can get the original doc chunks from the traversal tree method. I have few questions:
|
|
While retrieval I did
context, __ = RA.retrieve(question)
to see the context, since I was not getting the desired response.I noticed that the context that is being passed to
qa_model
to answer questionself.qa_model.answer_question(context, question)
is not the actual chunk of text. There are also summarized text in the node list when we doself.context_chunks = [node.text for node in node_list]
.I wonder how can I get the actual chunks nodes only, since I desire to pass only the actual context text to qa model.
My actual docs has lots of tutorial urls (which i need in response) in various places. But every time I get the response, the urls are messed up, broken or missing. So I dig up to check and found out that qa not getting only the actual chunks.
Is there any way to retrieve the context which should be the original portions of documents?
Thanks.
The text was updated successfully, but these errors were encountered: