Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a docstring for VecUnroll #72

Merged
merged 1 commit into from
Sep 24, 2021

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Sep 24, 2021

Closes #71

@timholy
Copy link
Contributor Author

timholy commented Sep 24, 2021

Please feel free to edit at will!

@timholy timholy mentioned this pull request Sep 24, 2021
@codecov
Copy link

codecov bot commented Sep 24, 2021

Codecov Report

Merging #72 (9db58d7) into master (bbff9c2) will decrease coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #72      +/-   ##
==========================================
- Coverage   56.33%   56.30%   -0.04%     
==========================================
  Files          33       33              
  Lines        5742     5742              
==========================================
- Hits         3235     3233       -2     
- Misses       2507     2509       +2     
Impacted Files Coverage Δ
src/VectorizationBase.jl 67.74% <ø> (ø)
src/llvm_intrin/vector_ops.jl 72.33% <0.00%> (-0.34%) ⬇️
src/vecunroll/memory.jl 48.64% <0.00%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbff9c2...9db58d7. Read the comment docs.

@chriselrod
Copy link
Member

The other advantage is that it allows interleaving instructions, which can take better advantage of a CPU's OOO:

julia> using VectorizationBase, SLEEFPirates

julia> vx = Vec(ntuple(_ -> randn(), pick_vector_width(Float64))...)
Vec{8, Float64}<-0.41777997881994483, -0.2778225158869686, 1.0569948436617753, 0.5176464757147176, 0.2323833794303069, -0.7993648787803678, -0.9685718051710357, 0.12727761225970358>

julia> vxu = VecUnroll((vx,vx,vx,vx));

julia> @btime exp2($(Ref(vx))[])
  4.348 ns (0 allocations: 0 bytes)
Vec{8, Float64}<0.7485756477631421, 0.8248350157271293, 2.0805930967801856, 1.431617888324136, 1.174774112297017, 0.5746020803268009, 0.5110116881622035, 1.0922306992794324>

julia> @btime exp2($(Ref(vxu))[])
  9.461 ns (0 allocations: 0 bytes)
4 x Vec{8, Float64}
Vec{8, Float64}<0.7485756477631421, 0.8248350157271293, 2.0805930967801856, 1.431617888324136, 1.174774112297017, 0.5746020803268009, 0.5110116881622035, 1.0922306992794324>
Vec{8, Float64}<0.7485756477631421, 0.8248350157271293, 2.0805930967801856, 1.431617888324136, 1.174774112297017, 0.5746020803268009, 0.5110116881622035, 1.0922306992794324>
Vec{8, Float64}<0.7485756477631421, 0.8248350157271293, 2.0805930967801856, 1.431617888324136, 1.174774112297017, 0.5746020803268009, 0.5110116881622035, 1.0922306992794324>
Vec{8, Float64}<0.7485756477631421, 0.8248350157271293, 2.0805930967801856, 1.431617888324136, 1.174774112297017, 0.5746020803268009, 0.5110116881622035, 1.0922306992794324>

julia> @btime sin($(Ref(vx))[])
  8.536 ns (0 allocations: 0 bytes)
Vec{8, Float64}<-0.40573237311580634, -0.2742623121079716, 0.8708824084181056, 0.49483633008673605, 0.2302974902834125, -0.7169134530311316, -0.8240775144694545, 0.1269342496262836>

julia> @btime sin($(Ref(vxu))[])
  21.154 ns (0 allocations: 0 bytes)
4 x Vec{8, Float64}
Vec{8, Float64}<-0.40573237311580634, -0.2742623121079716, 0.8708824084181056, 0.49483633008673605, 0.2302974902834125, -0.7169134530311316, -0.8240775144694545, 0.1269342496262836>
Vec{8, Float64}<-0.40573237311580634, -0.2742623121079716, 0.8708824084181056, 0.49483633008673605, 0.2302974902834125, -0.7169134530311316, -0.8240775144694545, 0.1269342496262836>
Vec{8, Float64}<-0.40573237311580634, -0.2742623121079716, 0.8708824084181056, 0.49483633008673605, 0.2302974902834125, -0.7169134530311316, -0.8240775144694545, 0.1269342496262836>
Vec{8, Float64}<-0.40573237311580634, -0.2742623121079716, 0.8708824084181056, 0.49483633008673605, 0.2302974902834125, -0.7169134530311316, -0.8240775144694545, 0.1269342496262836>

In these example, 4x the work takes less than 2.5x the time.

@chriselrod chriselrod enabled auto-merge (squash) September 24, 2021 11:31
@timholy
Copy link
Contributor Author

timholy commented Sep 24, 2021

Do you want me to change the docstring in some way? I do use the word interleave but don't talk about the performance optimizations (which, as always, are cool). I could basically paste your demo, but I worry about making it quite long. This version emphasizes defining it without going into extensive detail about using it. The balance may not be quite right, though.

@chriselrod chriselrod disabled auto-merge September 24, 2021 17:08
@chriselrod chriselrod merged commit 6204a53 into JuliaSIMD:master Sep 24, 2021
@chriselrod
Copy link
Member

The test that failed is bwap on random floating point values, which is likely to (correctly) produce NaNs, so I've updated the test on master to use isequal instead of ==.
So I went ahead and merged, thanks!

Do you want me to change the docstring in some way? I do use the word interleave but don't talk about the performance optimizations (which, as always, are cool). I could basically paste your demo, but I worry about making it quite long.

I agree, that'd be long.
It already mentions some optimizations, so users may assume that it can help in other contexts.

This version emphasizes defining it without going into extensive detail about using it. The balance may not be quite right, though.

It should be usable wherever other AbstractSIMDs are, so ideally not much explanation is needed and focusing on construction is the way to go.

@timholy timholy deleted the teh/doc_vecunroll branch September 24, 2021 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docstring for VecUnroll?
2 participants