BPF: translation of program contexts

Question

I was looking at the different types of BPF program, and noticed that for different program types the context is being passed differently.

Example:

For program type BPF_PROG_TYPE_SOCK_OPS, an object of type struct bpf_sock_ops_kern is passed. However, the BPF program of this type takes a reference to struct bpf_sock_ops. Why is it done this way and where is the "translation" from bpf_sock_ops_kern to bpf_sock_ops?
For program type BPF_PROG_TYPE_CGROUP_SKB, an object of type struct sk_buff is passed (e.g., in __cgroup_bpf_run_filter_skb), but the BPF program expects a minimized version, struct __sk_buff.

So I looked at the struct bpf_verifier_ops function callbacks, but they seem to only adjust the offsets in BPF instructions, as they are called by the BPF verifier.

I'd be glad if someone could shed light on how the BPF context is defined. Thanks.

For the first one, bpf_sock_ops_kern is just a subset of bpf_sock_ops. To convert, sock_filter_convert_ctx_access only need to advance the pointer after the first sk field. The verifier will then ensure that fields after the union are not accessed. I've looked into the second case yet. — pchaigno
Okay, so for the second one: bpf_convert_ctx_access matches on each possible required offset on __sk_buff, one by one, and converts them to the equivalent offset in the sk_buff object. Does that answer your question? I'll make a proper answer if that's the case. — pchaigno
I don't think it has to do with performance. These data structure are used to limit access of BPF programs to a handful of fields; only fields that programs should be able to access are in mirror structures (e.g., bpf_sock_ops and __sk_buff). For example, you can see the process for __sk_buff described by Alexei here, with more details in the PATCH description. — pchaigno
As far as I can tell pchaigno is right, struct __sk_buff has little to do with performance but is used mostly for simplicity, to offer a cleaner interface to BPF users (only offer the fields that can be accessed from BPF). It's converted in the verifier with bpf_convert_ctx_access, as mentioned already. Then you have additional checks in net/core/filter.c (for networking), to make sure the user can read from, possibly write to, each of the fields of the struct. See tc_cls_act_is_valid_access() function for example. (I'm less familiar with tracing bits.) — Qeole
@pchaigno, thanks for responses! You can make an official answer, which I can accept, so that others can benefit :-) You probably can incorporate Qeole's comment, as it is also useful. — Mark

pchaigno pchaigno · Accepted Answer · 2018-03-06T10:26:00

The mirror objects (e.g., struct bpf_sock_ops) passed as argument expose a subset of the original object(s)'s fields to the BPF program. The mirror structure can also have fields from several different original structures; in that case, the mirror object serves as aggregate. Passing the original object(s) to the BPF program would also be misleading as the user could think they have access to all fields. For example, they could think they have access to bpf_sock_ops_kern.sk when that's actually not the case.

The verifier then converts accesses to the mirror object into accesses to the original object(s), before the program is executed for the first time. There's a conversion function for each type of mirror object (e.g., sock_ops_convert_ctx_access for the conversion of accesses to struct bpf_sock_ops). Then, for each field of the mirror object (i.e., for each offset), the conversion function rewrites the load or store instruction with the offset to the original field.

Note that all original fields might not be in the same object. For example, in the mirror object struct bpf_sock_ops, the fields op and family are retrieved in bpf_sock_ops_kern.op and bpf_sock_ops_kern.sk->skc_family respectively.

BPF: translation of program contexts

1 Answers