You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The way we handle certain TODO-ed tests in our test suite is IMO sub-optimal.
Consider t/run/todo.t. If I build an unthreaded perl on Linux (e.g., on Ubuntu 24.04 LTS) and run that program through the harness, I get:
$ sh ./Configure -des -Dusedevel && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -
ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
ok 3 - No assertion failure # TODO GH 16876
ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Test Summary Report
-------------------
run/todo.t (Wstat: 0 Tests: 6 Failed: 0)
TODO passed: 3-4
Files=1, Tests=6, 0 wallclock secs ( 0.00 usr 0.00 sys + 0.02 cusr 0.01 csys = 0.03 CPU)
Result: PASS
4 out of 6 unit tests were marked TODO -- but 2 of those tests were reported as TODO passed. I looked at that result and thought, "We should un-TODO those two tests." (Indeed, test 6 above had once been a TODO-ed test.) That led me to spend several hours preparing #22862. As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD. However, once I created the p.r., it failed many of its test runs in our GH CI setup. (See https://github.com/Perl/perl5/pull/22862/checks.) Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back to t/run/todo.t -- only this time run on a -DDEBUGGING build.
$ sh ./Configure -des -Dusedevel -DDEBUGGING && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -
ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
not ok 3 - No assertion failure # TODO GH 16876
not ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Files=1, Tests=6, 2 wallclock secs ( 0.00 usr 0.00 sys + 0.02 cusr 0.01 csys = 0.03 CPU)
Result: PASS
4 out of 6 unit tests remain TODO-ed, but each of them actually FAILs. The file as a whole PASSes because the failing tests have been TODO-ed. No tests are reported as TODO passed.
Now, you have to be fairly familiar with our test suite to recognize that if a test has No assertion failure in its description (label), that means its PASS/FAIL status on -DDEBUGGING builds is ... (how shall we put it?) ... unresolved. Such a unit test cannot really be un-TODO-ed until its code passes on both non-debugging and debugging builds. But if someone sees tests reported as TODO passed, they are likely to expend considerable effort (as I did this weekend), un-TODO-ing them prematurely. Many TODO-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds. But how should we indicate that a particular unit test may not be ready to be un-TODO-ed even if on some builds, it is reported as TODO passed?
The text was updated successfully, but these errors were encountered:
The way we handle certain
TODO
-ed tests in our test suite is IMO sub-optimal.Consider
t/run/todo.t
. If I build an unthreaded perl on Linux (e.g., on Ubuntu 24.04 LTS) and run that program through the harness, I get:4 out of 6 unit tests were marked
TODO
-- but 2 of those tests were reported asTODO passed
. I looked at that result and thought, "We should un-TODO
those two tests." (Indeed, test 6 above had once been aTODO
-ed test.) That led me to spend several hours preparing #22862. As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD. However, once I created the p.r., it failed many of its test runs in our GH CI setup. (See https://github.com/Perl/perl5/pull/22862/checks.) Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back tot/run/todo.t
-- only this time run on a-DDEBUGGING
build.4 out of 6 unit tests remain
TODO
-ed, but each of them actually FAILs. The file as a whole PASSes because the failing tests have beenTODO
-ed. No tests are reported asTODO passed
.Now, you have to be fairly familiar with our test suite to recognize that if a test has No assertion failure in its description (label), that means its PASS/FAIL status on
-DDEBUGGING
builds is ... (how shall we put it?) ... unresolved. Such a unit test cannot really be un-TODO
-ed until its code passes on both non-debugging and debugging builds. But if someone sees tests reported asTODO passed
, they are likely to expend considerable effort (as I did this weekend), un-TODO
-ing them prematurely. ManyTODO
-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds. But how should we indicate that a particular unit test may not be ready to be un-TODO
-ed even if on some builds, it is reported asTODO passed
?The text was updated successfully, but these errors were encountered: