Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf_loop Dependency Causes Errors in Kernel Versions < 5.17 When Extending TCP Payload with extend_skb Hook #1439

Open
tomatopunk opened this issue Dec 10, 2024 · 3 comments

Comments

@tomatopunk
Copy link
Contributor

tomatopunk commented Dec 10, 2024

extend_skb hook extends the TCP payload. When writing new additional data to the payload, the bpf_loop function is used, but this function only exists in kernel versions greater than 5.17.

beyla/bpf/tc_http_tp.c

Lines 402 to 407 in d2a8a83

static __always_inline void
move_data(unsigned char *dst, unsigned char *src, const unsigned char *end, u32 size) {
struct memmove_loop_ctx memmove_loop_ctx = {dst, src, end, size};
bpf_loop(size, memmove_loop, &memmove_loop_ctx, 0);
}

If we enable BEYLA_BPF_TC_L7_CP:true and use a kernel version lower than 5.17, an error occurs.

Considering that the current long-term versions include 5.15, 5.10, 5.4, and 4.19, this issue should be considered a bug and needs to be addressed.

By the way, I've already discussed this with @grcevski on Slack. To prevent others from encountering the same issue, I'm leaving a backup here. Please feel free to close this issue if it has been resolved.

@rafaelroquetto
Copy link
Contributor

Hi @tomatopunk, thanks for bringing this up. Yeah, unfortunately the dependency on bpf_loop hinders us a bit, we will try to do without it in a few places, but in this particular case it's really not feasible.

If you look here, you will see that we do print out an error, but we are missing a return, which causes us to still try to load the http tracer and fail with a big kernel verifier error message (which I assume is what you are seeing). If you feel like, you can raise a PR adding a return statement there, so that the error message becomes more evident, and I can review it, or if you prefer, I can raise it myself as well.

I also wanted to note that BEYLA_BPF_TC_CP now also supports L7 context propagation. This is all new functionality that needs further testing, but all things going according to plan, we may elect to remove BEYLA_BPF_TC_L7_CP altogether and make BEYLA_BPF_TC_CP the canonical way of doing it.

Another thing I'd like to point out in case it helps, is that RedHat based kernels tend to have all of these functionalities (such as bpf_loop and other eBPF related patches) backported.

@tomatopunk
Copy link
Contributor Author

Hello team,

I completely agree with your point: bpf_loop and other patches should be ported. However, this function is a 2017 patch with no clear plan for porting. Therefore, I think that if we decide to use bpf_loop, we should highlight this issue more prominently, such as including the minimum kernel version in the documentation.

Furthermore, the dependency on bpf_loop isn't just here. For example, in #1413, unfortunately, there's also an edge-back issue when searching for the traceparent header, which might require using bpf_loop. Internally, what is our view on such issues? Should we strive for compatibility or abandon it, setting 5.17 as the minimum kernel version?

@rafaelroquetto
Copy link
Contributor

Hi @tomatopunk ,

I agree with you - I've added improving documentation to my TODO list.

With regards to searching the traceparent header, it's one of those cases in which we depend on bpf_loop. There may be another way of accomplishing that, but for the moment, the current implementation is constrained to kernels >= 5.17 - this has always been the case for detecting the trace parent and I don't think we ever supported this functionality on older kernels.

Internally, what is our view on such issues? Should we strive for compatibility or abandon it, setting 5.17 as the minimum kernel version?

We have plans to continue supporting kernels older than 5.17. As you said yourself, a lot of people are using older kernels (including some of our customers), therefore making establishing 5.17 as a minimum kernel version not feasible. Obviously those people don't have access to trace context propagation.

To sum up, there are two action items here:

  1. improve documentation
  2. mitigate the use of bpf_loop -> unfortunately this is not trivial when dealing with dynamically sized buffers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants