-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization: use sendfile in create and extract #33
Comments
Hah, I was complaining about this to Elliot a few days ago, since we're using Tar.jl for the rr traces and the tar'ing up step is too slow. For some numbers, on my benchmark Tar.jl uses 60% of one core in addition to 100% of gzip. Regular |
And indeed using a faster compressor, like |
Using a bigger buffer would be pretty easy—currently it's 512 bytes, which is very small. But it feels very unnecessary to use a buffer here at all. Do we have an API that exposes sendfile? The other issue is when Tar.jl is used with TranscodingStreams and CodecZlib, in which case the destination (for create) or source (for extract) is not a real file handle anyway and what we'd want ideally is a way to have TranscodingStreams send the data directly to the output stream. |
I would also be ok with not using TranscodingStreams in performance-sensitive situations, creating JLLs for gzip and co instead (for portability) and then using sendfile to send data to/from the external gzip process without needing to pass through Julia's user space at all. |
I think I have it in a branch already, I just never opened the PR |
Having those as external programs via JLL would be nice in any case because doing compression/decompression via pipe if often both efficient and convenient. |
Seems to be about 6x faster than gzip, but now bottlenecked on JuliaIO/Tar.jl#33.
Makes creating a tarball and compressing it with `zstdmt` about 6x faster (30s vs 5s). Raw `tar` is still about 20% faster, but we'd probably need #33 to make up the difference.
* Increase default buffer size Makes creating a tarball and compressing it with `zstdmt` about 6x faster (30s vs 5s). Raw `tar` is still about 20% faster, but we'd probably need #33 to make up the difference. * Buffer for extract also * 1.3 compat
It would be faster to use sendfile or equivalent for the data transfer part of tarball creation and extraction instead of a user-space buffered read/write loop. Relevant code that should be optimized:
https://github.com/JuliaIO/Tar.jl/blob/b8bd833254b48428f1ce0bf4/src/create.jl#L225-L231
https://github.com/JuliaIO/Tar.jl/blob/b8bd833254b48428f1ce0bf4/src/extract.jl#L288-L293
The text was updated successfully, but these errors were encountered: