What is the current equivalent of fetch_archive_from_http? #7573

greghobby · 2024-04-22T13:18:38Z

greghobby
Apr 22, 2024

I have been going through all the tutorials hoping to grab insights as how to do things the 2.0 way after having done some project development in 1.x in 2023. It appears that the current tutorials are a mix of 1.x and 2.x and not all of them may work or be relevant in 2.0. (There are next and previous links at the bottom of each tutorial which are sometimes missing from list of tutorials on the left hand side).

One such example is "Tutorial: Document Classification at Index Time". I was discussing with a colleague just last week about doing something like this (though the discussion did not revolve any particular library or solution). Trying to drag out explicit keywords or topics at index times. However, the function fetch_archive_from_http is no longer in haystack_utils as I saw it was removed about 5 months ago.

So my question, is there is something like this still around? Or is there a 2.0 way of doing this? I would love to see a 2.0 tutorial on what this tutorial is trying to cover if it still make sense in the current Haystack way of doing things.

mrm1001 · 2024-04-26T09:26:01Z

mrm1001
Apr 26, 2024

Hi @greghobby , there is a tutorial for 2.0 that shows you how to add a metadata field with a classifier: https://haystack.deepset.ai/tutorials/39_embedding_metadata_for_improved_retrieval, and use that to improve retrieval. Would that help?

The feedback about the utility functions is very helpful, thank you. Are there any other ones that you were using before?

0 replies

greghobby · 2024-04-26T11:08:59Z

greghobby
Apr 26, 2024
Author

There seems to be a distinction between the two tutorials mentioned, the one I shared and the one you shared. The one I mentioned from 1.x classifies each document according to type and adds that as a metadata field when documents are indexed. This is the sort of machine learning process I am looking for to be run on the document at index time. Perhaps it could have been a priori in another process to enrich the document, but that is what caught my eye in this tutorial. The 2.0 tutorial you give takes if I am reading correctly the url field, an already known piece of information for each document, and puts that in as meta. I don’t see the classification portion where you are finding out this hidden information about the document, like it a document about music. I see it using, and I used metadata like this when I did 1.x development of my previous project, things like date created, title of document, document length, and so one which are readily available already about the document.

…

-Greg From: mrm1001 ***@***.***> Sent: Friday, April 26, 2024 5:26 AM To: deepset-ai/haystack ***@***.***> Cc: Werner, Gregory [USA] ***@***.***>; Mention ***@***.***> Subject: [External] Re: [deepset-ai/haystack] What is the current equivalent of fetch_archive_from_http? (Discussion #7573) Hi @greghobby , there is a tutorial for 2. 0 that shows you how to add a metadata field with a classifier: https: //haystack. deepset. ai/tutorials/39_embedding_metadata_for_improved_retrieval, and use that to improve retrieval. Would that help? Hi @greghobby<https://urldefense.com/v3/__https:/github.com/greghobby__;!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwaGS6jG4$> , there is a tutorial for 2.0 that shows you how to add a metadata field with a classifier: https://haystack.deepset.ai/tutorials/39_embedding_metadata_for_improved_retrieval<https://urldefense.com/v3/__https:/haystack.deepset.ai/tutorials/39_embedding_metadata_for_improved_retrieval__;!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwNEGazJo$>, and use that to improve retrieval. Would that help? The feedback about the utility functions is very helpful, thank you. Are there any other ones that you were using before? — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/deepset-ai/haystack/discussions/7573*discussioncomment-9234695__;Iw!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwpX2_e3U$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/BH5L3BQLG3KM6FICI3HZ2BLY7IMT7AVCNFSM6AAAAABGSYX376VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TEMZUGY4TK__;!!May37g!Mp8jyMGUyMpSjEVvHNzyRMXfg33gvbERmH4V3-5kVaAUBhxWm1ThX7-w3RkxDHkD-2wIRa2bwF4rwmm3AQSwFaGyZB0$>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the current equivalent of fetch_archive_from_http? #7573

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

What is the current equivalent of fetch_archive_from_http? #7573

greghobby Apr 22, 2024

Replies: 2 comments

mrm1001 Apr 26, 2024

greghobby Apr 26, 2024 Author

greghobby
Apr 22, 2024

mrm1001
Apr 26, 2024

greghobby
Apr 26, 2024
Author