-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple 3- and 4-arg *
methods
#919
base: master
Are you sure you want to change the base?
Conversation
I think ideally 3- and 4-arg methods of |
src/matrix_multiply.jl
Outdated
@@ -15,11 +15,15 @@ import LinearAlgebra: BlasFloat, matprod, mul! | |||
@inline *(A::StaticArray{Tuple{N,1},<:Any,2}, B::Adjoint{<:Any,<:StaticVector}) where {N} = vec(A) * B | |||
@inline *(A::StaticArray{Tuple{N,1},<:Any,2}, B::Transpose{<:Any,<:StaticVector}) where {N} = vec(A) * B | |||
|
|||
# Avoid LinearAlgebra._quad_matmul's order calculation on equal sizes | |||
@inline *(A::StaticMatrix{N,N}, B::StaticMatrix{N,N}, C::StaticMatrix{N,N}) where {N} = (A*B)*C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use StaticMatMulLike
here to be a bit more generic, although currently StaticMatMulLike
doesn't swap sizes for adjoints and transposes so that would have to be fixed when working on the cost model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that StaticArrays.StaticMatMulLike{3,10}
is a union of three different sizes, is that deliberate or an oversight? Seems like it really ought to be StaticSquareLike{3}
for the all the square ones, and StaticMatLike{3,10}
allowing non-square but always size(A) == (3,10)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately I ended up not using the size part of StaticMatMulLike
so the fact that s1
and s2
don't correspond to sizes is essentially an oversight.
BTW, I think you should reduce the cost of multiplying by diagonal matrices (and probably also things like bidiagonal or tridiagonal) in that PR to base. |
The base PR doesn't do anything for special matrices, since they are all square. But the cost of checking their sizes unnecessarily is usually negligible -- only tiny SMatrices are fast enough that it matters, which is the case this PR wants to fix. I agree that things like Diagonal * Diagonal * Diagonal could be further optimised, there's quite a large set of possibilities and I haven't given any thought to how that might work. I didn't know about this, but For non-square matrices, I agree you could pull the cost calculation into a generated function. I think I'm going to leave that for another PR, or for someone else if they need it. This one just wants to avoid slowing down a common case. |
I'm a bit surprised the cost computation isn't compiled away, actually, given that sizes of SMatrices are known at compile time and that this is all that is required to compute the costs. |
Yes, right, optimization of computation of products of two small matrices was complicated enough, and developing a reliable set of heuristics for products of more matrices would be even harder. Anyway, it's indeed interesting that constant propagation didn't manage to simplify size checks here. |
I was also surprised it didn't get compiled away. Maybe it just needs more things inlined, if I try Even with that change, the re-ordering doesn't always speed up SMatrix chains. Here's my test, sometimes one is faster, sometimes the other. Unless the cost calculation is taking as long as the matrix multiplication, it seems that this shows that the optimal order isn't in fact optimal -- so a different cost model would be better for SMatrices?
|
So maybe let's no worry too much now about reordering of multiplication for static matrices? Just eliminating run-time size check would definitely be enough.
That indicates that the overhead may be just from the single additional call, hard to tell without looking at |
The
Whether the existing size check is a good idea overall or not will depend on the distribution of non-square cases in "overall". My |
This PR repairs a small slowdown due to JuliaLang/julia#37898. When all matrices are the same size, that wastes a few ns thinking about their sizes, while this PR restores the left-to-right behaviour:
When the matrices are of different sizes, things aren't so clear-cut. Sometimes the re-ordering does speed things up, sometimes it doesn't.