-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AP_Math: prevent FPE in SITL when limitting accel, vectors #28981
base: master
Are you sure you want to change the base?
Conversation
Shouldn't be this done in safe_sqrt ? I would assume that we have some other place that can have this issue |
Well... we really want to know about where this actually does happen. This PR isn't really a satisfactory solution to what we're seeing IMO - it would be nice to trace back the maths to work out how the rhs ens up larger than the lhs in here. #28969 is the issue if you missed it |
I made a Geogebra to visualise the maths. https://www.geogebra.org/calculator/ktxy2uyf I don't see any issue with the maths. Must be floating point stuff. If I were to guess it would be Maybe we need a new method |
This comment was marked as outdated.
This comment was marked as outdated.
Pete is close, the problem is imprecision in The vector is
I'm not sure how to guarantee the vector length quantity is actually <= the limit after the math in |
Doing some research and fiddling suggests this is the correct solution. We need to audit other uses of It's not feasible due to the vagaries of floating point math to guarantee it's always equal or under. |
libraries/AP_Math/control.cpp
Outdated
@@ -387,7 +387,7 @@ bool limit_accel_xy(const Vector2f& vel, Vector2f& accel, float accel_max) | |||
if (accel_cross.limit_length(accel_max)) { | |||
accel_dir = 0.0; | |||
} else { | |||
float accel_max_dir = safe_sqrt(sq(accel_max) - accel_cross.length_squared()); | |||
float accel_max_dir = safe_sqrt(MAX(0, sq(accel_max) - accel_cross.length_squared())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float accel_max_dir = safe_sqrt(MAX(0, sq(accel_max) - accel_cross.length_squared())); | |
// limit_length can't absolutely guarantee this subtraction won't be slightly negative | |
float accel_max_dir = safe_sqrt(MAX(0, sq(accel_max) - accel_cross.length_squared())); |
We could limit to the smaller of the the existing and the squared so we always conservative. On the whole I tend to think know we know what is going on we probably don't need to panic. I'd even be tempted not to fix this case so we can see if a recent change had made this more likely. We do have a test for the function, maybe we should add some cases there. ardupilot/libraries/AP_Math/tests/test_vector2.cpp Lines 125 to 131 in 5726c03
|
Closes #28969 |
I don't actually think And it has happened again here: https://github.com/ArduPilot/ardupilot/actions/runs/12572260649/job/35044375709?pr=28857 |
Co-authored-by: Leonard Hall <[email protected]> the cross-product code can produce something slightly negative. Stop passing that into safe_sqrt; it isn't *that* safe on SITL!
in this corner case we *do* need to scale the vector
this can avoid problems where floating point precision means that post-square-rooting the lengths are considered equal. Subtracting the squares after making such a check can lead to attempting to take the square root of a negative number
6d764d3
to
a73e086
Compare
I've written a test for this now so other people can play with things in gdb easily. I've also pushed up a patch reverting the original fix and proposing an alternative fix. Basically stop using the apples comparison to determine if we do the oranges maths - use a single fruit in both cases. |
Well, that's coincidence... third is presumed to be enemy action... I've had a quick look through the stack and code changes and haven't found any smoking gun. @robertlong13 changed things up in this test a bit recently be adding in the ICE frames. Is it possible that the slightly different mission is causing this? 3982d57 |
So this both fixes the problem and doesn't. The problem is fixed because we won't execute a square root with a negative number; we now trust that the limit function says it wanted to limit the vector and then set But it doesn't fix the problem because the vector is still in some sense too long. Calling I think adding a The current fix is sort of worse because now
Yes, I expect so. It just happens to sometimes hit this exact point now. Here's the enemy action: https://github.com/ArduPilot/ardupilot/actions/runs/12581405966/job/35065761383?pr=28991 . The numbers are again identical. |
Another failure: https://github.com/ArduPilot/ardupilot/actions/runs/12591859248/job/35097592628?pr=28996 Going to try to replicate locally and see if that mission change was the instigator. Still prefer the |
I was able to replicate it (running only the FlyEachFrame test and quadplane-tilt frame) but sometimes it took 50-60 tries and sometimes it was one or two. It would be hard to say what particular commit did it. I imagine it's kind of dependent on system load and stuff too. |
Now it's infected copter: https://github.com/ArduPilot/ardupilot/actions/runs/12616584539/job/35158079748?pr=28857
|
We're probably going to need fire by the end of this. |
Co-authored-by: Leonard Hall [email protected]
the cross-product code can produce something slightly negative. Stop passing that into safe_sqrt; it isn't that safe on SITL!