-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve update cache process #353
Comments
It's also possible that there are some performance gains to be had in the search index calculation process, but I haven't looked too closely at it to know. |
I'd prefer 2 or 3. Or even take a deeper look into the search index build process and look if there are any low hanging fruits there as @sbrl says. The search index was a one-off feature added by a contributor and wasn't really touched after that. So I won't be surprised if there are some simple gains to reap. |
Honestly, building a search index on the end-user machine is wasting resources. Each time cache is updated (tarball downloaded and extracted) it needs to get rebuilt so why don't we simply prebuild it and ship it together with pages in the first place? |
I like that. I'd also prefer to have a default build and a minimal build with no search index for users who don't use search at all. And for this one, we can dynamically build the index if the user decides to do a search at some point. |
Or simply download it at page search time. |
I don't think so, each client should use the data structure that it finds as a better fit for the language and platform. Also, that sounds like a hard dependency that I don't see compatible with this project. |
Um, I don't see how prebuilding search index for node client using GH actions changes that?
What kind of dependency we are talking about? |
Do you want to add that process in the https://github.com/tldr-pages/tldr repository when a PR is merged to
What if we want changes, small or big, in this index file? Should https://github.com/tldr-pages/tldr use a version system for every update of the index file? Sounds a too specific and hard dependency between repositories. |
Yes. They are free to add their own workflows or continue building index locally?
You can't really tweak the search index in the way you describe it. It is built by the selected library and meant to be consumed by a matching client, there is no room for tweaking. Regarding coupling, we already assume pages tree layout which is a far stronger assumption than this one IMO. |
Search indexes can get big, and are also usually stored in a format that's fairly specific to the language you're using / your use case in order to improve performance, so I'm unsure of the value / wisdom of generating the index in the main tldr repo. |
Actual behavior
Updating this client’s cache takes ~41 seconds.
Loading the pages' cache takes
< 1 second
.The cache for the
--search
argument takes the rest of the time.Explaining the build index process
The process reads every page and builds a dictionary
{key: count}
withkey
as the word andcount
indicating how many times that word appears on the given page.Options to improve this process
--search
is executed and thesearch-corpus.json
file isn't created. Then clean the file when the index is updated.search-corpus.json
.Environment
The text was updated successfully, but these errors were encountered: