-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docstring for VecUnroll? #71
Comments
In case it helps I pushed JuliaImages/ImageFiltering.jl#229 so you can see where I was going |
A
For example: julia> using VectorizationBase
julia> x = [(R = rand(Float32), G = rand(Float32), B = rand(Float32)) for _ in 1:100];
julia> x[1:4]
4-element Vector{NamedTuple{(:R, :G, :B), Tuple{Float32, Float32, Float32}}}:
(R = 0.26760215, G = 0.71913034, B = 0.21303588)
(R = 0.23587197, G = 0.13380599, B = 0.32123268)
(R = 0.15133655, G = 0.95029974, B = 0.515872)
(R = 0.14430124, G = 0.5951618, B = 0.30713272)
julia> vload(stridedpointer(reinterpret(reshape, Float32, x)), Unroll{1,1,3,2,Int(pick_vector_width(Float32)),zero(UInt),1}((1,1)))
3 x Vec{16, Float32}
Vec{16, Float32}<0.26760215f0, 0.23587197f0, 0.15133655f0, 0.14430124f0, 0.96873146f0, 0.87005985f0, 0.77632064f0, 0.7556457f0, 0.5719081f0, 0.9102745f0, 0.857614f0, 0.77330655f0, 0.492265f0, 0.38813847f0, 0.12650585f0, 0.6479362f0>
Vec{16, Float32}<0.71913034f0, 0.13380599f0, 0.95029974f0, 0.5951618f0, 0.84525603f0, 0.91485614f0, 0.598403f0, 0.8754583f0, 0.05103159f0, 0.08334786f0, 0.669178f0, 0.07402092f0, 0.47366828f0, 0.9429586f0, 0.8576969f0, 0.8917303f0>
Vec{16, Float32}<0.21303588f0, 0.32123268f0, 0.515872f0, 0.30713272f0, 0.13002992f0, 0.33429003f0, 0.35313886f0, 0.76046395f0, 0.7992017f0, 0.8299498f0, 0.6705442f0, 0.674591f0, 0.6830538f0, 0.03794396f0, 0.21829528f0, 0.8817978f0>
julia> typeof(ans)
VecUnroll{2, 16, Float32, Vec{16, Float32}} So the In this case, the first 4 elements of the first of these 0.26760215f0, 0.23587197f0, 0.15133655f0, 0.14430124f0 Matching the first 4 This is going to be faster than 3 loads, because 3 independent loads would require 3 gather instructions, because the I'm emphasizing this here as it is probably an important use case if you're using arrays of structs (e.g. arrays of colors) instead of structs of arrays.
It can be used a little like one, e.g.: julia> VecUnroll((1.0, 2.0, 3.0, 4.0))
4 x Float64
1.0
2.0
3.0
4.0
julia> abs2(ans)
4 x Float64
1.0
4.0
9.0
16.0
julia> typeof(ans)
VecUnroll{3, 1, Float64, Float64} They're not julia> x = VecUnroll((1.0, 2.0, 3.0, 4.0));
julia> @btime exp($(Ref(x))[])
5.317 ns (0 allocations: 0 bytes)
4 x Float64
2.7182818284590455
7.3890560989306495
20.085536923187668
54.59815003314424
julia> t = (1.0, 2.0, 3.0, 4.0);
julia> @btime exp.($(Ref(t))[])
19.657 ns (0 allocations: 0 bytes)
(2.718281828459045, 7.38905609893065, 20.085536923187668, 54.598150033144236)
I'd have to look more at the PR, but are you wanting LoopVectorization to support I'm still (very slowly) working on rewriting LoopVectorization. Conceivably, one could add specific support for |
As background, we have two key abstractions that, in my opinion, make JuliaImages the nicest platform for writing generic image processing code (we aim to unify biomedical imaging and computer vision, whereas the vast majority of suites are firmly in one camp or another). Since I suspect I may start pestering you a lot, perhaps it makes sense to spend a little bit of time dragging you through a brief introduction/motivation. Before introducing the two key abstractions, let me acknowledge that they are centered on behavior rather than representation, and do not in and of themselves get in the way of the array-of-structs vs struct-of-arrays issue. So this is not an argument against that viewpoint. With that background, the two key abstractions are:
The first of these is important for both correctness and generalizablilty:
The second abstraction allows us to divorce meaning from representation. In most suites, "white" is 255 if you're using UInt8 and 1.0 if you're using These are very simple, low-level abstractions, and they enable us to write a lot of code that is flexible, correct, and short. But they both focus on getting away from native hardware types and hence pose a challenge to a package like LoopVectorization. So let me now describe a couple of the things that might be needed to build the bridge:
|
If you're writing a lot of color functions, the easiest thing may be to write a transform like you did for
@turbo for i in eachindex(imga)
r = imga.r[i]
g = imga.g[i]
b = imga.b[i]
# do something with r, g, b
end and it isn't obvious to me that this has a real advantage over imga = reinterpret(reshape, Float32, img)
for i in eachindex(img) # should work and effectively drop trailing dims
r = imga[1,i]
g = imga[2,i]
b = imga[3,i]
# do something with r, g, b
end in terms of convenience. But there are potential performance differences based on memory layout. If The latter, assuming the underlying memory is an array of structs, is reasonably efficient: the data will be loaded/stored in contiguous chunks (fast), and then shuffled into separate vectors of Of course, doing the above manually means also manually scaling the kernels because you're still working with native types... |
Yeah, we can add an array dimension and make corresponding adjustments to other inputs. It's a little less pretty, but the performance advantages are compelling. Moreover, the main benefit from our abstractions is "communication about intent with the user," and once that has been achieved we can afford a bit of less-than-pretty specialization. Thanks for the consultation! In gratitude/fair exchange, an attempted docstring for |
I'm looking into providing support for multichannel colors in ImageFiltering. As you may know, JuliaImages provides real RGB types that encode the color of a pixel without adding an array dimension to do it. Naturally, these aren't natively supported by VectorizationBase. Obviously, I can
reinterpret(reshape, Float32 #=or whatever=#, img)
, but everything gets a lot uglier if you have to add array dimensions. I am guessing thatVecUnroll
is kind of like aSVector
, is that right? If so, what do the parameters "mean"? Or if that's not the case, is there a good solution for supporting the equivalent of anNTuple{N,T} where T<:NativeTypes
?The text was updated successfully, but these errors were encountered: