You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I encountered a problem. After executing the scraper, I found that the content of some links cannot be crawled. The logs show 0 records. I have tried many methods, but it still cannot be crawled.
Could you make sure the html selectors exist on that page?
Also, could you make sure that the base url of those links are specified in start_urls section?
Yes, I had configured it. These selectors can be selected using XPath expressions in the Chrome console. And I tried using BeautifulSoup to compress the HTML source code, which can solve the problem. But I'm not sure what the root cause is.
here is the code :
Description
Hi, I encountered a problem. After executing the scraper, I found that the content of some links cannot be crawled. The logs show 0 records. I have tried many methods, but it still cannot be crawled.
here is the snapshot of logs:
Steps to reproduce
here is part of my config
Expected Behavior
I hope to crawl the content of all the links in the configuration into Typesense.
Actual Behavior
Content cannot be searched
Metadata
Typesense Version: maybe 0.24,I don't know how to get to know version
OS:x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: