-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for CUDA (New Subpackage) #156
base: master
Are you sure you want to change the base?
Support for CUDA (New Subpackage) #156
Conversation
Codecov Report
@@ Coverage Diff @@
## master #156 +/- ##
==========================================
- Coverage 90.00% 89.81% -0.19%
==========================================
Files 8 8
Lines 220 216 -4
==========================================
- Hits 198 194 -4
Misses 22 22
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's better to leave this to @timholy (I guess you're in the same lab?) I'll just leave some general comments here.
On the code design. Because it's very likely that we wouldn't have CUDA support for all methods for a long period (will we?), I think a better way to design
the code is to opt-in the CUDA support when we know the interpolation method has CUDA support and otherwise fallback to the CPU. We can have a private cuwarp!
function and then call cuwarp!
from warp!
when certain condition meets (i.e., when we do support CUDA).
On performance, I'd be very interested if this matches or beats the NNLibCUDA performance https://github.com/FluxML/NNlibCUDA.jl/blob/da73f077371a5f85660f6e7620099d74ddd11b6e/src/upsample.jl#L152-L194
img_inds = map(out->out.I, CartesianIndices(axes(out.parent))) | ||
tform_offset = offset_calc(out, in_offsets, tform) | ||
|
||
tformindex = CuArray(tform_offset.(SVector.(img_inds))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculates all image indexes in the single thread CPU and converts them to GPU. I believe this is unnecessarily inefficient -- I think we need to figure out how to directly generate the target indexes on GPU. Maybe manually write a small CUDA kernel for this.
FWIW we agree this needs additional work but is not high priority, as it turns out that warping is not a real bottleneck in the analysis we were planning to optimize. To be continued later. In the meantime, let's leave this open in case it inspires someone else. |
While working on improving the speed of my image registration analysis, I've noticed that ImageTransformations doesn't currently support compatibility for GPU computing using CuArrays. Therefore, I've made a first attempt at a GPU version of ImageTransformations.warp that supports CuArrays that can be wrapped in Offset Arrays and defaults to linear b-spline interpolation.
I'm still working on trying to get this to use the actual Interpolations package to support more functionality, and my test suite is still quite incomplete. I'm also not sure if this will work on grayscale or colored images yet. Any help or feedback would be extremely appreciated.