Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate context through TCP packets #1161

Merged
merged 6 commits into from
Sep 12, 2024
Merged

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Sep 11, 2024

This PR finishes the work required to propagate context with trace context injection through TCP packets. The feature is disabled by default. I'm proposing the PR to ensure we can pass all integration tests without this enabled.

To complete the work, I plan to follow-up with another PR that will do the following:

  • Add option to disable black-box context propagation
  • Add tests without black-box context propagation and TCP packet propagation.

Relates to #871

@@ -175,12 +175,13 @@ const u8 ip4ip6_prefix[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0xff};

#ifdef BPF_DEBUG
static __always_inline void dbg_print_http_connection_info(connection_info_t *info) {
bpf_dbg_printk("[http] s_h = %llx, s_l = %llx, d_h = %llx, d_l = %llx, s_port=%d, d_port=%d",
bpf_dbg_printk("[conn] s_h = %llx, s_l = %llx, s_port=%d",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this so that we can use it on older kernels where bpf_printk allowed for at most 3 arguments.

@@ -186,6 +186,33 @@ int BPF_KPROBE(kprobe_tcp_connect, struct sock *sk) {

bpf_dbg_printk("=== tcp connect %llx ===", id);

tp_info_pid_t *tp_p = tp_buf();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect runs before the SYN packet is sent. We use this opportunity to setup a trace context information for the connection. We'll later query the trace information in tc_egress, and serialize it on the TCP packet.

if (tcp_syn(&tcp) && !tcp_ack(&tcp)) {
bpf_skb_load_bytes(skb, tcp.hdr_len, &buf, 4);
if (skb->len - tcp.hdr_len < sizeof(tp_info_pid_t)) {
bpf_printk("SYN packet without tp info");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these prints will have to be removed, but keeping them for now since this is disabled. bpf_dbg_printk doesnt' work because we need to fix it to not use bpf_get_pid_tid, but bpf_get_task info

make_tp_string(tp_buf, &tp.tp);
bpf_printk("tp: %s", tp_buf);

bpf_map_update_elem(&incoming_trace_map, &conn, &tp, BPF_ANY);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we receive a traceID over the wire (TCP packet) we store it for later to be used by the trace code.

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.35%. Comparing base (77cba46) to head (06ddc8d).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1161      +/-   ##
==========================================
- Coverage   81.07%   78.35%   -2.73%     
==========================================
  Files         136      136              
  Lines       11416    11416              
==========================================
- Hits         9256     8945     -311     
- Misses       1636     1920     +284     
- Partials      524      551      +27     
Flag Coverage Δ
integration-test ?
k8s-integration-test 58.81% <ø> (-0.12%) ⬇️
oats-test 35.77% <ø> (ø)
unittests 53.23% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -196,10 +204,16 @@ static __always_inline void get_or_create_trace_info(http_connection_metadata_t

if (meta) {
if (meta->type == EVENT_HTTP_CLIENT) {
tp_info_pid_t *in_tp = bpf_map_lookup_elem(&outgoing_trace_map, conn);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change the client code only looked for a server trace and if it doesn't find it it will generate the trace information later. Now we look if the TCP egress has setup trace info for us, if we find it we mark as having trace info, we must not regenerate it. If we have our own TCP egress info, but there's server incoming request that wrapped this client call, then we'll overwrite this info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually put that in a comment

tp_info_pid_t *existing_tp = trace_info_for_connection(conn);

if (correlated_requests(tp_p, existing_tp)) {
tp_info_pid_t *existing_tp = bpf_map_lookup_elem(&incoming_trace_map, conn);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For server requests, we first look for TCP info and then we fall back to black-box info.

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, awesome! Very clever!

My comments are very minor (pedantic) nits, feel free to ignore.

@@ -232,7 +259,7 @@ int BPF_KRETPROBE(kretprobe_sys_connect, int fd)
sort_connection_info(&info.p_conn.conn);
info.p_conn.pid = pid_from_pid_tgid(id);
info.orig_dport = orig_dport;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: trailing whitespace


if (tcp_syn(&tcp) && !tcp_ack(&tcp)) {
bpf_skb_load_bytes(skb, tcp.hdr_len, &buf, 4);
if (skb->len - tcp.hdr_len < sizeof(tp_info_pid_t)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into this, and stumbled upon this check in read_sk_buff()

 if ((skb->len - tcp->hdr_len) < 0) { // less than 0 is a packet we can't parse

both len and hdr_len are unsigned, so I think, to complement this check, we should change read_sk_buff() to

 if ((skb->len - tcp->hdr_len) > skb->len) { // underflow, hdr_len is > len, packet too big

or even better

if (tcp->hdr_len > skb->len) { // package too big

bpf/k_tracer.c Outdated
Comment on lines 775 to 776
bpf_printk("tot_len = %d, new_tot_len = %d", bpf_ntohs(tcp.tot_len), bpf_ntohs(new_tot_len));
bpf_printk("h_proto = %d, skb->len = %d", tcp.h_proto, skb->len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bpf_printk("tot_len = %d, new_tot_len = %d", bpf_ntohs(tcp.tot_len), bpf_ntohs(new_tot_len));
bpf_printk("h_proto = %d, skb->len = %d", tcp.h_proto, skb->len);
bpf_printk("tot_len = %u, new_tot_len = %u", bpf_ntohs(tcp.tot_len), bpf_ntohs(new_tot_len));
bpf_printk("h_proto = %u, skb->len = %u", tcp.h_proto, skb->len);

@@ -196,10 +204,16 @@ static __always_inline void get_or_create_trace_info(http_connection_metadata_t

if (meta) {
if (meta->type == EVENT_HTTP_CLIENT) {
tp_info_pid_t *in_tp = bpf_map_lookup_elem(&outgoing_trace_map, conn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually put that in a comment

bpf_printk("SYN packed len = %d, offset = %d, hdr_len %d", skb->len, len, tcp.hdr_len);
tp_info_pid_t tp = {0};

bpf_skb_load_bytes(skb, tcp.hdr_len, &tp, sizeof(tp_info_pid_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it may be a good idea to have tp_info_pid_t store some sort of magic number - it's highly unlikely, but if someone stores the exact same amount of bytes there, we will be fooled. Who knows how many other people are doing crazy things like this?

Something like [0xb3l4][tp_info_pid_t]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I initially thought of this and said to myself, so if they do, we'll read something random and it's no worse than generating new ID. However, on a second thought, the value someone else puts in could be not random, but a constant and it will make all of our traceIDs be the same. I took your advice to put an ID on the PID (we don't use this field for TCP and it's later overwritten with -1). So I'll check the PID on ingress and if it doesn't match our magic value we'll not take in the packet.

@@ -703,17 +741,25 @@ int app_egress(struct __sk_buff *skb) {
return 0;
}

if (!tcp_syn(&tcp) || tcp_ack(&tcp)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment such as:

// handle SYN only, ignore SYN+ACK packets

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! As Rafael suggests, I'd write your GitHub explanations as actual code comments.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grcevski grcevski merged commit 6a94e5e into grafana:main Sep 12, 2024
9 checks passed
@grcevski grcevski deleted the propagate_tc_tcp branch September 12, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants