Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding x87 long double type support #186

Open
i3roly opened this issue Apr 15, 2024 · 3 comments
Open

adding x87 long double type support #186

i3roly opened this issue Apr 15, 2024 · 3 comments

Comments

@i3roly
Copy link

i3roly commented Apr 15, 2024

hi all,

i have recently learned about the long double type that exists on workstation processors and was enthused about using it.

i have reached out to the intel math kernel library team about adding long double support in a future release, which they agreed to (but did not provide an ETA).

in anticipation of when that day comes, i wanted to ask the team if they would add this data type to the library.

@afni-rickr not sure if this is too ugly for your tastes, but given your penchant for things like this, i thought you may like it over some less-rigorous english describing the format:

https://www.intel.com/content/www/us/en/content-details/786447/floating-point-reference-sheet-for-intel-architecture.html

it is the "extended format" shown on the second-last column on the right.

@afni-rickr
Copy link
Contributor

Hi @i3roly,

Thanks for the link. They seem to have added a J-bit just for this case, to signify a Denormal value, presumably from an underflow condition. I had not seen that before. Anyway...
After a little reading, this seems a touch unclear. Yes, it looks like on many CPUs (Intel, for example) long double is actually processed using those 80-bit extended floats. However, it is stored using 128-bits, where just the lower 10 bytes are used for the actual float.
Is your goal to save the 6/16 of memory by compacting an array into 10-byte floats? Because this might be messy for you to program things if arrays will be compact. Note that sizeof(long double[10]) = 160, not 100. The library could have a type of only 80 bytes, but all of the work would then be on you to unpack and repack on read or assign, which would probably make the code pretty slow.
What would be your purpose(s) in using that type?

@i3roly
Copy link
Author

i3roly commented Apr 15, 2024

hi rick,

long double should be available on AMD CPUs as well, if i understand this document correctly: https://community.amd.com/sdtpp67534/attachments/sdtpp67534/opencl-discussions/6370/1/FLOATING-POINT%20ARITHMETIC%20IN%20AMD%20PROCESSORS.pdf

all i want is a little more precision. i thought that it may not be much work on your end, but it sounds like that isn't the case.

when i am reconstructing a given image, i am currently outputting 64 bit precision at each voxel.

since the x86 CPU can handle 80 bits and my methodology would definitely benefit when it performs stage 1 of its computation (calculating the reconstruction weights, which summarises the local relationship between a voxel and its neighbours with a single double-precision number for each neighbouring voxel), it's a good idea (in my opinion) to finish the computation using the same type.

in short: i thought it may be beneficial to preserve the original precision of the computation by having this type available. it's not unreasonable that, at the end, double precision may be enough, but i don't want to lose precision if i can afford it.

keep in mind this is for the future so it's not urgent, but i do hope the MKL team will make the entire api long double friendly by year-end.

@afni-rickr
Copy link
Contributor

afni-rickr commented Apr 17, 2024

My interpretation of this is that the difficulty would be more on your end, or else you would just waste the 6/16 of memory that is untouched with long doubles. Wasting memory would make the coding much more simple. The compiler seems to handle memory access using 128 bits instead of 80 (sizeof(long double)). If you do not attempt to squeeze memory, it might be valid to use the 128 bit size and process using "long double". That would only use the lowest 80 bits of the numbers, wasting the other space, but it should give the desired increase in precision.

And note that this would not even require a nifticlib update to handle it. Just store using FLOAT128 with nbyper 16, but process received data using long double on your end. Essentially, you just pretend long double is 16 bytes in size. If you try that, first test with some simple examples to verify. But that is how it seems to me.
Note also that if you move a dataset from one CPU to a different one, this could become unreliable, based on how they handle long double. *Danger, Will Robinson!*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants