-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Division by zero in Probability\Distribution\Continuous\Continuous #429
Comments
Hi, Thanks for your interest in MathPHP. Let's see if we can figure out what is going on. Can you provide the values for Thanks, |
Sure thing! $points = [[5,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1]];
$x = 5.0;
$regression = new MathPHP\Statistics\Regression\Linear($points);
$regression->ci($x, 0.05); That |
Turns out |
What a weird dataset. The slope in this inverse function can only be zero on a T distribution when x is positive or negative infinity‽ I’ll have to think about the best way to handle this, probably throw an exception. The fact that your dataset is basically one data point, repeated many many times, makes t very large (or very small) at any other x value.
edit: this is all wrong, the confidence interval within reasonable values is finite and reasonable. I didn’t see that there were some (8,0) points mixed in. |
What value of X are you attempting to evaluate the confidence interval at? |
R code
|
I was just testing it at 5, but maybe you were using some really large or small number. If you were using 1E100 I would offer different advice than if you used 5.
At a bare minimum this function should return an exception when the slope is zero, this would allow you to catch it and make a decision, like not plotting the line, or failing gracefully. I have not testing the library yet so I don’t know where the failure is actually happening. |
This unit test will reproduce the issue. public function testBugIssue429()
{
// Given
$points = [[5,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1]];
$x = 5.0;
// And
$regression = new Linear($points);
// When
$ci = $regression->ci($x, 0.05);
} It produces this stack trace:
The division by zero happens when computing the guess in the Continues while ($dif > $tolerance) {
$y = $this->cdf($guess);
// Since the CDF is the integral of the PDF, the PDF is the derivative of the CDF
$slope = $this->pdf($guess);
$del_y = $target - $y;
$guess = $del_y / $slope + $guess; // <---- this is where the division by zero happens With the test inputs, these are the values at the time of the crash:
|
So the question becomes, why is the pdf 0 at t=0? |
The issue looks like it is in the StudentT distribution's PDF. This unit test using the data from this bug currently fails, producing 0 instead of the expected result. public function testBugIssue429StudentTPdf()
{
// Given
$v = 341;
$studentT = new StudentT($v);
// When
$t = 0;
$pdf = $studentT->pdf($t);
// Then
$expected = 0.3986499;
$this->assertEqualsWithDelta($expected, $pdf, 0.001);
}
R code that generated the test data result: > library(stats)
> v <- 341
> t <- 0
> dt(t, v)
[1] 0.3986499 This online calculator also confirms the result: https://keisan.casio.com/exec/system/1180573203 The StudentT's PDF is calculated using the gamma special function. The denominator is calculating gamma(341/2) which is producing infinity, which is where the 0 value seems to be coming from. For the same value, R produces an extremely large number, which for all intents and purposes, is the same as infinity. > gamma(341/2)
[1] 5.562092e+305 Is there an alternate way to calculate the StudentT PDF? I'm not familiar with this or gamma, but numbers this high are basically at the limit of what PHP can handle on a 64-bit machine: php > echo \PHP_INT_MAX;
9223372036854775807
php > echo \PHP_FLOAT_MAX;
1.7976931348623E+308 @Beakerboy when you have time can you investigate this further? Seems like StudentT PDF is supposed to be able to produce a result here, and maybe being smarter with calculating and applying the gamma function arithmetic, or calculating it some other way, might lead to a better result? Regardless, the StudentT PDF unit tests don't have any test data with values over 10, so maybe that would be a good place to start: add some higher values and see where it breaks down. I'll add some additional unit tests and see how high it can go and still get reasonable results matching R. |
I added additional test cases in the develop branch for the StudentT |
In theory, at this height of degrees of freedom the standard normal could probably be used instead of T with minimal loss of precision. I’ll see how R does their T calculations. |
R Source for dt function: |
Looks like I didn't push the new test cases. Here they are for reference. /**
* @return array [t, ν, pdf]
* Generated with R dt(t, ν) from package stats
*/
public function dataProviderForPdf(): array
{
return [
[-4, 1, 0.01872411],
[-3, 1, 0.03183099],
[-2, 1, 0.06366198],
[-1, 1, 0.1591549],
[0, 1, 0.3183099],
[1, 1, 0.1591549],
[2, 1, 0.06366198],
[3, 1, 0.03183099],
[4, 1, 0.01872411],
[5, 1, 0.01224269],
[10, 1, 0.003151583],
[-4, 2, 0.01309457],
[-3, 2, 0.02741012],
[-2, 2, 0.06804138],
[-1, 2, 0.1924501],
[0, 2, 0.3535534],
[1, 2, 0.1924501],
[2, 2, 0.06804138],
[3, 2, 0.02741012],
[4, 2, 0.01309457],
[5, 2, 0.007127781],
[10, 2, 0.0009707329],
[-4, 6, 0.004054578],
[-3, 6, 0.01549193],
[-2, 6, 0.06403612],
[-1, 6, 0.2231423],
[0, 6, 0.3827328],
[1, 6, 0.2231423],
[2, 6, 0.06403612],
[3, 6, 0.01549193],
[4, 6, 0.004054578],
[5, 6, 0.001220841],
[10, 6, 1.651408e-05],
[-10, 10, 7.284686e-07],
[-5, 10, 0.0003960011],
[-2, 10, 0.06114577],
[-1, 10, 0.230362],
[0, 10, 0.3891084],
[1, 10, 0.230362],
[2, 10, 0.06114577],
[5, 10, 0.0003960011],
[10, 10, 7.284686e-07],
[-10, 20, 2.660085e-09],
[-5, 20, 7.898911e-05],
[-2, 20, 0.05808722],
[-1, 20, 0.2360456],
[0, 20, 0.3939886],
[1, 20, 0.2360456],
[2, 20, 0.05808722],
[5, 20, 7.898911e-05],
[-10, 20, 2.660085e-09],
[0, 50, 0.3969527],
[1, 50, 0.2395711],
[5, 50, 1.283547e-05],
[0, 100, 0.3979462],
[1, 100, 0.2407659],
[5, 100, 5.080058e-06],
[0, 200, 0.3984439],
[1, 200, 0.2413671],
[5, 200, 2.88097e-06],
[0, 250, 0.3985435],
[1, 250, 0.2414876],
[5, 250, 2.545035e-06],
[0, 300, 0.39861],
[1, 300, 0.241568],
[5, 300, 2.338036e-06],
[0, 301, 0.3986111],
// Anything past 302 for v and INF comes back.
// [0, 302, 0.3986122],
// [0, 303, 0.3986133],
// [0, 305, 0.3986154],
// [0, 310, 0.3986207],
//
// [0, 320, 0.3986307],
// [1, 320, 0.2415931],
// [5, 320, 2.276028e-06],
//
// [5, 320, 2.276028e-06],
// [5, 325, 2.261896e-06],
// [5, 330, 2.248256e-06],
// [5, 331, 2.245584e-06],
// [5, 332, 2.242931e-06],
// [5, 333, 2.240297e-06],
// [5, 334, 2.23768e-06],
// [5, 335, 2.235082e-06],
//
// [0, 341, 0.3986499],
// [0, 351, 0.3986582],
// [0, 361, 0.3986661],
// [0, 371, 0.3986735],
// [0, 381, 0.3986806],
// [0, 400, 0.398693],
];
} |
A pull request has been submitted which should fix this issue. It needs to be cleaned up to clarify what it is doing, but it should work for you while we make it pretty. |
Thanks for providing a fix so quickly. The StudentT PDF is now always producing answers for any input, which is a great improvement. I have a question, in @drjayvee's original input, it was previously crashing, and no longer crashes, which is great. However, what is the expected result? public function testBugIssue429()
{
// Given
$points = [[5,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,0],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1],[8,1]];
$x = 5.0;
// And
$regression = new Linear($points);
// When
$ci = $regression->ci($x, 0.05);
// Then
// ??
} Right now the confidence interval produces NAN, and it looks like the NAN comes from the CDF computation in the inverse. Is this the expected result? It's certainly better than crashing, but I wasn't sure if this was the expected result for that dataset or not. Thoughts? |
I have r code above that should give the correct result. I’ll probably just port the cdf and inverse functions from the C sources that R uses as well. I think I have some idea of what the functions are doing after reading the source paper. I found a couple typos though which makes it harder. |
will need to:
|
This has been fixed in Version 2.5.0. You should not get any more division by zero errors. Many thanks to @Beakerboy for numerous improvements in the special functions that drive these distributions. |
Thanks a bunch! I'm upgrading as we speak. |
Contiuous.php:39 can trigger a division by zero Warning.
Specifically, this happens in our project with the following call
(new MathPHP\Statistics\Regression\Linear($points))->ci($x, 0.05);
.Call stack:
I'm sorry I can't contribute a patch for this. There's a reason we're using a math library ;p
The text was updated successfully, but these errors were encountered: