pkg/lwip: fix race in async sock API with event #21093

maribu · 2024-12-16T19:19:36Z

Contribution description

In TCP server mode, the sock_tcp_t sockets are managed by the network stack and can be reused if a previous connection is no longer in used. However, an event may still be posted in the event queue when the socket is reused. Wiping it will result in the next pointer in that event to be NULL, which will cause the event handler fetching that event to crash.

This adds an event_cancel() at two places:

Just before reusing the socket
During sock_tcp_disconnect()

The former catches issues in server mode e.g. when a connect has been closed (e.g. due to timeout) and is reused before a pending event (e.g. a timeout event) has been processed.

The letter may be an issue on client side. E.g. when sock_tcp_t was allocated on stack and goes out of scope after sock_tcp_disconnect but before the event handler was run.

Testing procedure

I was able to see this bug in the wild in my work-in-progress PR that adds CoAP over TCP to nanocoap. But that code is not in a well shape now and has other issues.

A test that puts an async TCP server with lots of connections through the paces would likely be a good demonstrator.

Issues/PRs references

None

In TCP server mode, the sock_tcp_t sockets are managed by the network stack and can be reused if a previous connection is no longer in used. However, an event may still be posted in the event queue when the socket is reused. Wiping it will result in the `next` pointer in that event to be NULL, which will cause the event handler fetching that event to crash. This adds an `event_cancel()` at two places: 1. Just before reusing the socket 2. During sock_tcp_disconnect() The former catches issues in server mode e.g. when a connect has been closed (e.g. due to timeout) and is reused before a pending event (e.g. a timeout event) has been processed. The letter may be an issue on client side. E.g. when `sock_tcp_t` was allocated on stack and goes out of scope after `sock_tcp_disconnect` but before the event handler was run.

maribu · 2024-12-16T19:22:25Z

I'm not super confident in this PR in regard to regressions. My WIP CoAP over TCP server still behaves oddly. I was unable to crash it anymore with this applied.

riot-ci · 2024-12-16T19:47:03Z

Murdock results

✔️ PASSED

8ac3a43 pkg/lwip: fix race in async sock API with event

Success	Failures	Total	Runtime
10249	0	10249	16m:38s

Artifacts

Documentation preview

maribu · 2024-12-17T14:48:46Z

Aaand if seen another segfault, again with the event queue pointing to the event in a sock_tcp_t and the next ptr in there being NULL.

yarrick · 2024-12-19T19:26:30Z

I don't really have any insight in how this part of the code works.

miri64 · 2024-12-20T10:59:59Z

Aaand if seen another segfault, again with the event queue pointing to the event in a sock_tcp_t and the next ptr in there being NULL.

Since you are working on network interface code and the segfault happens a few layers up, maybe the packet times out and is deleted while it is still in the queue?

maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Dec 16, 2024

github-actions bot added Area: network Area: Networking Area: pkg Area: External package ports labels Dec 16, 2024

maribu requested review from benpicco and mguetschow December 16, 2024 19:23

benpicco requested a review from yarrick December 16, 2024 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/lwip: fix race in async sock API with event #21093

pkg/lwip: fix race in async sock API with event #21093

maribu commented Dec 16, 2024

maribu commented Dec 16, 2024

riot-ci commented Dec 16, 2024

maribu commented Dec 17, 2024

yarrick commented Dec 19, 2024

miri64 commented Dec 20, 2024

pkg/lwip: fix race in async sock API with event #21093

Are you sure you want to change the base?

pkg/lwip: fix race in async sock API with event #21093

Conversation

maribu commented Dec 16, 2024

Contribution description

Testing procedure

Issues/PRs references

maribu commented Dec 16, 2024

riot-ci commented Dec 16, 2024

Murdock results

Artifacts

maribu commented Dec 17, 2024

yarrick commented Dec 19, 2024

miri64 commented Dec 20, 2024