Suboptimal handling of TODO-ed tests #22863

jkeenan · 2024-12-15T22:24:23Z

The way we handle certain TODO-ed tests in our test suite is IMO sub-optimal.

Consider t/run/todo.t. If I build an unthreaded perl on Linux (e.g., on Ubuntu 24.04 LTS) and run that program through the harness, I get:

$ sh ./Configure -des -Dusedevel && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
ok 3 - No assertion failure # TODO GH 16876
ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.

Test Summary Report
-------------------
run/todo.t (Wstat: 0 Tests: 6 Failed: 0)
  TODO passed:   3-4
Files=1, Tests=6,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS

4 out of 6 unit tests were marked TODO -- but 2 of those tests were reported as TODO passed. I looked at that result and thought, "We should un-TODO those two tests." (Indeed, test 6 above had once been a TODO-ed test.) That led me to spend several hours preparing #22862. As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD. However, once I created the p.r., it failed many of its test runs in our GH CI setup. (See https://github.com/Perl/perl5/pull/22862/checks.) Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back to t/run/todo.t -- only this time run on a -DDEBUGGING build.

$ sh ./Configure -des -Dusedevel -DDEBUGGING && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
not ok 3 - No assertion failure # TODO GH 16876
not ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Files=1, Tests=6,  2 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS

4 out of 6 unit tests remain TODO-ed, but each of them actually FAILs. The file as a whole PASSes because the failing tests have been TODO-ed. No tests are reported as TODO passed.

Now, you have to be fairly familiar with our test suite to recognize that if a test has No assertion failure in its description (label), that means its PASS/FAIL status on -DDEBUGGING builds is ... (how shall we put it?) ... unresolved. Such a unit test cannot really be un-TODO-ed until its code passes on both non-debugging and debugging builds. But if someone sees tests reported as TODO passed, they are likely to expend considerable effort (as I did this weekend), un-TODO-ing them prematurely. Many TODO-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds. But how should we indicate that a particular unit test may not be ready to be un-TODO-ed even if on some builds, it is reported as TODO passed?

The text was updated successfully, but these errors were encountered:

jkeenan added the Needs Triage label Dec 15, 2024

jkeenan mentioned this issue Dec 15, 2024

t/run/todo.t: Reposition formerly TODO-ed tests which are now passing #22862

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal handling of TODO-ed tests #22863

Suboptimal handling of TODO-ed tests #22863

jkeenan commented Dec 15, 2024

Suboptimal handling of TODO-ed tests #22863

Suboptimal handling of TODO-ed tests #22863

Comments

jkeenan commented Dec 15, 2024