Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal handling of TODO-ed tests #22863

Open
jkeenan opened this issue Dec 15, 2024 · 0 comments
Open

Suboptimal handling of TODO-ed tests #22863

jkeenan opened this issue Dec 15, 2024 · 0 comments

Comments

@jkeenan
Copy link
Contributor

jkeenan commented Dec 15, 2024

The way we handle certain TODO-ed tests in our test suite is IMO sub-optimal.

Consider t/run/todo.t. If I build an unthreaded perl on Linux (e.g., on Ubuntu 24.04 LTS) and run that program through the harness, I get:

$ sh ./Configure -des -Dusedevel && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
ok 3 - No assertion failure # TODO GH 16876
ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.

Test Summary Report
-------------------
run/todo.t (Wstat: 0 Tests: 6 Failed: 0)
  TODO passed:   3-4
Files=1, Tests=6,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS

4 out of 6 unit tests were marked TODO -- but 2 of those tests were reported as TODO passed. I looked at that result and thought, "We should un-TODO those two tests." (Indeed, test 6 above had once been a TODO-ed test.) That led me to spend several hours preparing #22862. As part of the preparation on that p.r., I successfully tested my branch on an unthreaded build on Linux and a threaded build on FreeBSD. However, once I created the p.r., it failed many of its test runs in our GH CI setup. (See https://github.com/Perl/perl5/pull/22862/checks.) Fortunately, our long-lived smoke-testing setup (http://perl.develop-help.com/?b=smoke-me%2Fjkeenan%2Freposition-todo-pass-tests-20241215) gave me valuable feedback, which, after several more hours of work led me back to t/run/todo.t -- only this time run on a -DDEBUGGING build.

$ sh ./Configure -des -Dusedevel -DDEBUGGING && make test_prep
...
$ cd t; ./perl harness -v run/todo.t; cd -

ok 1 - GH \#16894
not ok 2 - "abcde5678" =~ / b (*pla:.*(*plb:(*plb:(.{4}))? (.... # TODO GH 16250
not ok 3 - No assertion failure # TODO GH 16876
not ok 4 - No assertion failure # TODO GH 16952
not ok 5 - No assertion failure # TODO GH 16971
ok 6 - No assertion failure
ok
All tests successful.
Files=1, Tests=6,  2 wallclock secs ( 0.00 usr  0.00 sys +  0.02 cusr  0.01 csys =  0.03 CPU)
Result: PASS

4 out of 6 unit tests remain TODO-ed, but each of them actually FAILs. The file as a whole PASSes because the failing tests have been TODO-ed. No tests are reported as TODO passed.

Now, you have to be fairly familiar with our test suite to recognize that if a test has No assertion failure in its description (label), that means its PASS/FAIL status on -DDEBUGGING builds is ... (how shall we put it?) ... unresolved. Such a unit test cannot really be un-TODO-ed until its code passes on both non-debugging and debugging builds. But if someone sees tests reported as TODO passed, they are likely to expend considerable effort (as I did this weekend), un-TODO-ing them prematurely. Many TODO-ed tests are, of course, classified as such because they are failing against both non-debugging and debugging builds. But how should we indicate that a particular unit test may not be ready to be un-TODO-ed even if on some builds, it is reported as TODO passed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant