lib: cgen: group same-key maps using bitmasks#520
lib: cgen: group same-key maps using bitmasks#520pzmarzly wants to merge 4 commits intofacebook:mainfrom
Conversation
|
@claude review |
Claude review of PR #520 (a7ebb96)Must fix
Suggestions
Nits
|
qdeslandes
left a comment
There was a problem hiding this comment.
Thanks for working on this, I think's it's a good first implementation, but there are a few things that could be improved to simplify it.
When grouping sets, using a dedicated structure would be helpful:
bf_set_group {
key;
sets;
};
When looping over sets, you can ignore the empty ones, then for non-empty ones find if a set group exists already and add it there. You don't necessary need a "leader".
Fixups can hold a pointer to a set directly, it will be easier to resolve them, instead of relying on a set index. This is possible as the fixup (and bf_program) aren't serialized, they're only temporary objects (which was not the case before).
a49eca0 to
7f30d9d
Compare
That's amazing. Trying this. @claude review |
| if (r) | ||
| return bf_err_r(r, "failed to add set elements to the map"); | ||
| memset(value, 0, value_size); | ||
| (void)bf_bpf_map_lookup_elem(new_map->fd, elem, value); |
There was a problem hiding this comment.
Claude: must-fix: For LPM trie maps, bf_bpf_map_lookup_elem performs longest-prefix matching, not exact-key matching. When inserting an element from set B (e.g. 192.168.1.0/24), the lookup may return the value for a shorter prefix from set A (e.g. 192.168.0.0/16) that was already inserted. The subsequent OR would incorrectly propagate set A's bit into set B's entry.
This means a packet matching 192.168.1.0/24 would appear to belong to both set A and set B, producing false positives. Consider either:
- Skipping the lookup for LPM trie groups and always writing only the current set's bit, or
- Preventing LPM trie sets from being grouped together when their prefixes can overlap.
There was a problem hiding this comment.
This means a packet matching 192.168.1.0/24 would appear to belong to both set A and set B
I mean, logically it does belong to both sets, so your example is not the best. However, thanks for bringing up LPM maps. They have footguns. Let's exclude them from grouping up.
1794c42 to
4629064
Compare
4629064 to
a7ebb96
Compare
| (void)snprintf(name, BPF_OBJ_NAME_LEN, _BF_SET_MAP_PREFIX "%04x", | ||
| (uint8_t)set_idx++); | ||
| r = bf_map_new_from_set(&map, name, set); | ||
| (uint16_t)map_idx++); |
There was a problem hiding this comment.
Claude: suggestion: The (void)bf_bpf_map_lookup_elem(...) discards the return value unconditionally. If the syscall fails for a reason other than ENOENT (e.g., EFAULT from a bad fd), the error is silently swallowed and the bitmask is built from zeroed data, which could cause silent incorrect behavior. Consider at least logging at debug level on unexpected failure:
int lr = bf_bpf_map_lookup_elem(new_map->fd, elem->data, value);
if (lr && lr != -ENOENT)
bf_dbg("unexpected map lookup error: %d", lr);| bf_list_foreach (&group->sets, set_node) { | ||
| const struct bf_set *set = bf_list_node_get_data(set_node); | ||
|
|
||
| bf_hashset_foreach (&set->elems, elem) { |
There was a problem hiding this comment.
Claude: suggestion: The non-owning alias map_ref is stored before ownership transfer and assigned to group->map after the push. This is correct today, but fragile: if future code adds an early return between bf_list_push and group->map = map_ref, the group's map pointer will be left as NULL with no indication of what went wrong. Consider assigning group->map immediately after the successful push, or adding a brief comment noting the dependency.
| return r; | ||
| continue; | ||
| bf_list_foreach (&new_prog->set_groups, group_node) { | ||
| struct bf_set_group *group = bf_list_node_get_data(group_node); |
There was a problem hiding this comment.
Claude: suggestion: total_elems sums all elements across all sets in the group, but keys shared across sets will occupy a single map entry. For groups with heavily overlapping sets this over-estimates the BPF map's max_entries, wasting kernel memory. Safe but worth a brief comment documenting the trade-off (or a cap to prevent requesting an excessively large map from the kernel).
| @@ -71,10 +71,13 @@ int bf_map_new(struct bf_map **map, const char *name, enum bf_map_type type, | |||
| * also as filename when pinning the map to the system. Can't be NULL or | |||
| * empty. | |||
| * @param set Set to create the map from. Can't be NULL. | |||
There was a problem hiding this comment.
Claude: nit: The @param set documentation reads "Set to create the map from" but the set is now used only for its key format (elem_size, use_trie), not its elements. Consider updating to: "Set whose key format defines the map's key size and type."
| * the actual entry count may be smaller, but we can err on the | ||
| * safe side. */ | ||
| bf_list_foreach (&group->sets, set_node) { | ||
| const struct bf_set *set = bf_list_node_get_data(set_node); |
There was a problem hiding this comment.
Claude: suggestion: The read-modify-write loop performs 2 syscalls (lookup + update) per element per set. For single-set groups (the common case and all LPM trie groups), the bitmask is always 0x01, so the lookup is pure overhead. Consider a fast path: when n_sets == 1, skip the lookup and unconditionally write a static 0x01 value, or use bf_bpf_map_update_batch as the old code did. This avoids a per-element syscall regression for chains that do not benefit from grouping.
There was a problem hiding this comment.
I think bf_bpf_map_update_batch is the right approach here, we can use bf_hashset to prepare final bf_map shape on userspace side. I'll try it later. However, it seems it's not a dealbreaker if we don't get it now - even with 2 syscalls per item, it only slows down load_maps 2-3x in microbenchmarks.
Closes #381.
We're grouping all sets in a chain by key format. Then, sets of the same format will turn into a single BPF map, with bitmasks in its values that indicate which sets the element appears in. With the 64 maps per program limit, we should now be able to have 64 unique key formats per program, which is much less likely to hit than just 64 sets.
There is an exception for LPM tries though.
BPF_FUNC_map_lookup_elemon an LPM trie performs longest-prefix-match. So for a ruleset like:If we grouped a and b together, and got packet from
192.168.1.1,BPF_FUNC_map_lookup_elemwould return seta(key:192.168.1.0/24, value:0x01) and packet would be accepted.Therefore, LPM-based sets are excluded from merging.