-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/lwip: fix race in async sock API with event #21093
base: master
Are you sure you want to change the base?
Conversation
In TCP server mode, the sock_tcp_t sockets are managed by the network stack and can be reused if a previous connection is no longer in used. However, an event may still be posted in the event queue when the socket is reused. Wiping it will result in the `next` pointer in that event to be NULL, which will cause the event handler fetching that event to crash. This adds an `event_cancel()` at two places: 1. Just before reusing the socket 2. During sock_tcp_disconnect() The former catches issues in server mode e.g. when a connect has been closed (e.g. due to timeout) and is reused before a pending event (e.g. a timeout event) has been processed. The letter may be an issue on client side. E.g. when `sock_tcp_t` was allocated on stack and goes out of scope after `sock_tcp_disconnect` but before the event handler was run.
I'm not super confident in this PR in regard to regressions. My WIP CoAP over TCP server still behaves oddly. I was unable to crash it anymore with this applied. |
Aaand if seen another segfault, again with the event queue pointing to the event in a |
I don't really have any insight in how this part of the code works. |
Since you are working on network interface code and the segfault happens a few layers up, maybe the packet times out and is deleted while it is still in the queue? |
Contribution description
In TCP server mode, the
sock_tcp_t
sockets are managed by the network stack and can be reused if a previous connection is no longer in used. However, an event may still be posted in the event queue when the socket is reused. Wiping it will result in thenext
pointer in that event to be NULL, which will cause the event handler fetching that event to crash.This adds an
event_cancel()
at two places:The former catches issues in server mode e.g. when a connect has been closed (e.g. due to timeout) and is reused before a pending event (e.g. a timeout event) has been processed.
The letter may be an issue on client side. E.g. when
sock_tcp_t
was allocated on stack and goes out of scope aftersock_tcp_disconnect
but before the event handler was run.Testing procedure
I was able to see this bug in the wild in my work-in-progress PR that adds CoAP over TCP to nanocoap. But that code is not in a well shape now and has other issues.
A test that puts an async TCP server with lots of connections through the paces would likely be a good demonstrator.
Issues/PRs references
None