Replies: 2 comments
-
Hi @greghobby , there is a tutorial for 2.0 that shows you how to add a metadata field with a classifier: https://haystack.deepset.ai/tutorials/39_embedding_metadata_for_improved_retrieval, and use that to improve retrieval. Would that help? The feedback about the utility functions is very helpful, thank you. Are there any other ones that you were using before? |
Beta Was this translation helpful? Give feedback.
0 replies
-
There seems to be a distinction between the two tutorials mentioned, the one I shared and the one you shared.
The one I mentioned from 1.x classifies each document according to type and adds that as a metadata field when documents are indexed. This is the sort of machine learning process I am looking for to be run on the document at index time. Perhaps it could have been a priori in another process to enrich the document, but that is what caught my eye in this tutorial.
The 2.0 tutorial you give takes if I am reading correctly the url field, an already known piece of information for each document, and puts that in as meta. I don’t see the classification portion where you are finding out this hidden information about the document, like it a document about music. I see it using, and I used metadata like this when I did 1.x development of my previous project, things like date created, title of document, document length, and so one which are readily available already about the document.
…-Greg
From: mrm1001 ***@***.***>
Sent: Friday, April 26, 2024 5:26 AM
To: deepset-ai/haystack ***@***.***>
Cc: Werner, Gregory [USA] ***@***.***>; Mention ***@***.***>
Subject: [External] Re: [deepset-ai/haystack] What is the current equivalent of fetch_archive_from_http? (Discussion #7573)
Hi @greghobby , there is a tutorial for 2. 0 that shows you how to add a metadata field with a classifier: https: //haystack. deepset. ai/tutorials/39_embedding_metadata_for_improved_retrieval, and use that to improve retrieval. Would that help?
Hi @greghobby<https://urldefense.com/v3/__https:/github.com/greghobby__;!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwaGS6jG4$> , there is a tutorial for 2.0 that shows you how to add a metadata field with a classifier: https://haystack.deepset.ai/tutorials/39_embedding_metadata_for_improved_retrieval<https://urldefense.com/v3/__https:/haystack.deepset.ai/tutorials/39_embedding_metadata_for_improved_retrieval__;!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwNEGazJo$>, and use that to improve retrieval. Would that help?
The feedback about the utility functions is very helpful, thank you. Are there any other ones that you were using before?
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/deepset-ai/haystack/discussions/7573*discussioncomment-9234695__;Iw!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwpX2_e3U$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/BH5L3BQLG3KM6FICI3HZ2BLY7IMT7AVCNFSM6AAAAABGSYX376VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TEMZUGY4TK__;!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwFaGyZB0$>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been going through all the tutorials hoping to grab insights as how to do things the 2.0 way after having done some project development in 1.x in 2023. It appears that the current tutorials are a mix of 1.x and 2.x and not all of them may work or be relevant in 2.0. (There are next and previous links at the bottom of each tutorial which are sometimes missing from list of tutorials on the left hand side).
One such example is "Tutorial: Document Classification at Index Time". I was discussing with a colleague just last week about doing something like this (though the discussion did not revolve any particular library or solution). Trying to drag out explicit keywords or topics at index times. However, the function
fetch_archive_from_http
is no longer in haystack_utils as I saw it was removed about 5 months ago.So my question, is there is something like this still around? Or is there a 2.0 way of doing this? I would love to see a 2.0 tutorial on what this tutorial is trying to cover if it still make sense in the current Haystack way of doing things.
Beta Was this translation helpful? Give feedback.
All reactions