Skip to content

lib: cgen: group same-key maps using bitmasks#520

Open
pzmarzly wants to merge 4 commits intofacebook:mainfrom
pzmarzly:push-ytzrrosklspt
Open

lib: cgen: group same-key maps using bitmasks#520
pzmarzly wants to merge 4 commits intofacebook:mainfrom
pzmarzly:push-ytzrrosklspt

Conversation

@pzmarzly
Copy link
Copy Markdown
Contributor

@pzmarzly pzmarzly commented Apr 21, 2026

Closes #381.

We're grouping all sets in a chain by key format. Then, sets of the same format will turn into a single BPF map, with bitmasks in its values that indicate which sets the element appears in. With the 64 maps per program limit, we should now be able to have 64 unique key formats per program, which is much less likely to hit than just 64 sets.


There is an exception for LPM tries though. BPF_FUNC_map_lookup_elem on an LPM trie performs longest-prefix-match. So for a ruleset like:

chain demo BF_HOOK_XDP ACCEPT
    set a (ip4.snet) in { 192.168.1.0/24 }
    set b (ip4.snet) in { 192.168.0.0/16 }
    rule (ip4.snet) in b counter DROP

If we grouped a and b together, and got packet from 192.168.1.1, BPF_FUNC_map_lookup_elem would return set a (key: 192.168.1.0/24, value: 0x01) and packet would be accepted.

Therefore, LPM-based sets are excluded from merging.

@pzmarzly pzmarzly requested a review from qdeslandes as a code owner April 21, 2026 15:00
@meta-cla meta-cla Bot added the cla signed label Apr 21, 2026
Comment thread src/libbpfilter/cgen/matcher/set.c Outdated
@pzmarzly
Copy link
Copy Markdown
Contributor Author

@claude review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

Claude review of PR #520 (a7ebb96)

Must fix

  • assert after dereferencesrc/libbpfilter/cgen/program.c:693 — fixed in v2: assert is now before any dereference
  • missing NULL check on bf_vector_getsrc/libbpfilter/cgen/program.c:868 — no longer applicable: code was restructured
  • missing assert in cleanup functionsrc/libbpfilter/cgen/program.c:103 — fixed: _bf_set_group_free now has assert(group) at the top
  • LPM trie read-modify-write correctnesssrc/libbpfilter/cgen/program.c:918 — addressed by design: LPM trie sets are excluded from grouping (each gets its own single-set group where the value is always 0x01)

Suggestions

  • BPF_ALU64_IMM to BPF_ALU32_IMMsrc/libbpfilter/cgen/matcher/set.c:85 — fixed in v2: now uses BPF_ALU32_IMM
  • duplicated group computationsrc/libbpfilter/cgen/matcher/set.c:35 — fixed in v2: grouping now centralized in _bf_program_build_set_groups
  • consolidate parallel arrayssrc/libbpfilter/cgen/program.c:700 — no longer applicable: restructured to use bf_set_group
  • NULL set name in format strings — fixed: uses ?: operator throughout (set->name ?: "<anonymous>")
  • total_elems overestimatesrc/libbpfilter/cgen/program.c:925total_elems over-counts when keys overlap across sets; safe but wastes kernel map capacity
  • group->map assignment orderingsrc/libbpfilter/cgen/program.c:959 — non-owning alias stored before ownership transfer; fragile against future early returns
  • e2e isolation test missing exit check — fixed: ping exit code is verified via || { echo ...; exit 1; }
  • per-element syscall regressionsrc/libbpfilter/cgen/program.c:940 — the read-modify-write loop does 2 syscalls per element; for single-set groups (the common case) this is pure overhead vs. the old batch-update path
  • silently swallowed lookup errorsrc/libbpfilter/cgen/program.c:945bf_bpf_map_lookup_elem return value is discarded; unexpected errors (not ENOENT) go unnoticed

Nits

  • unnecessary calloc castsrc/libbpfilter/cgen/program.c:721 — no longer applicable
  • %lu for size_tsrc/libbpfilter/cgen/program.c:857 — pre-existing pattern throughout codebase; not a regression
  • unnecessary bf_list_ops_free cast — no longer applicable: no cast present in current diff
  • set_size param namingsrc/libbpfilter/cgen/prog/map.h:74 — fixed: now uses n_elems and value_size
  • @param set doc misleadingsrc/libbpfilter/cgen/prog/map.h:73 — doc says "Set to create the map from" but the set is now only used for key format, not elements

Workflow run

Comment thread src/libbpfilter/cgen/program.c
Comment thread src/libbpfilter/cgen/program.c Outdated
Comment thread src/libbpfilter/cgen/matcher/set.c Outdated
Comment thread src/libbpfilter/cgen/matcher/set.c Outdated
Comment thread src/libbpfilter/cgen/program.c Outdated
Comment thread src/libbpfilter/cgen/program.c Outdated
Comment thread src/libbpfilter/cgen/program.c Outdated
Comment thread src/libbpfilter/cgen/prog/map.h Outdated
Copy link
Copy Markdown
Contributor

@qdeslandes qdeslandes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I think's it's a good first implementation, but there are a few things that could be improved to simplify it.

When grouping sets, using a dedicated structure would be helpful:

bf_set_group {
    key;
    sets;
};

When looping over sets, you can ignore the empty ones, then for non-empty ones find if a set group exists already and add it there. You don't necessary need a "leader".

Fixups can hold a pointer to a set directly, it will be easier to resolve them, instead of relying on a set index. This is possible as the fixup (and bf_program) aren't serialized, they're only temporary objects (which was not the case before).

Comment thread src/libbpfilter/cgen/matcher/set.c Outdated
@pzmarzly pzmarzly force-pushed the push-ytzrrosklspt branch 4 times, most recently from a49eca0 to 7f30d9d Compare April 22, 2026 15:06
@pzmarzly
Copy link
Copy Markdown
Contributor Author

Fixups can hold a pointer to a set directly, it will be easier to resolve them, instead of relying on a set index. This is possible as the fixup (and bf_program) aren't serialized, they're only temporary objects (which was not the case before).

That's amazing. Trying this.

@claude review

Comment thread src/libbpfilter/cgen/program.c
Comment thread src/libbpfilter/cgen/program.c Outdated
if (r)
return bf_err_r(r, "failed to add set elements to the map");
memset(value, 0, value_size);
(void)bf_bpf_map_lookup_elem(new_map->fd, elem, value);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: For LPM trie maps, bf_bpf_map_lookup_elem performs longest-prefix matching, not exact-key matching. When inserting an element from set B (e.g. 192.168.1.0/24), the lookup may return the value for a shorter prefix from set A (e.g. 192.168.0.0/16) that was already inserted. The subsequent OR would incorrectly propagate set A's bit into set B's entry.

This means a packet matching 192.168.1.0/24 would appear to belong to both set A and set B, producing false positives. Consider either:

  • Skipping the lookup for LPM trie groups and always writing only the current set's bit, or
  • Preventing LPM trie sets from being grouped together when their prefixes can overlap.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means a packet matching 192.168.1.0/24 would appear to belong to both set A and set B

I mean, logically it does belong to both sets, so your example is not the best. However, thanks for bringing up LPM maps. They have footguns. Let's exclude them from grouping up.

Comment thread src/libbpfilter/cgen/matcher/set.c Outdated
Comment thread src/libbpfilter/cgen/fixup.c Outdated
Comment thread src/libbpfilter/cgen/program.c
Comment thread src/libbpfilter/cgen/program.c
Comment thread src/libbpfilter/cgen/program.c Outdated
Comment thread tests/e2e/parsing/set_group.sh Outdated
Comment thread src/libbpfilter/cgen/program.c Outdated
Comment thread src/libbpfilter/cgen/prog/map.h Outdated
@pzmarzly pzmarzly marked this pull request as draft April 22, 2026 15:41
@pzmarzly pzmarzly force-pushed the push-ytzrrosklspt branch 2 times, most recently from 1794c42 to 4629064 Compare May 5, 2026 15:50
@pzmarzly pzmarzly force-pushed the push-ytzrrosklspt branch from 4629064 to a7ebb96 Compare May 5, 2026 16:51
@pzmarzly pzmarzly marked this pull request as ready for review May 5, 2026 16:55
(void)snprintf(name, BPF_OBJ_NAME_LEN, _BF_SET_MAP_PREFIX "%04x",
(uint8_t)set_idx++);
r = bf_map_new_from_set(&map, name, set);
(uint16_t)map_idx++);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The (void)bf_bpf_map_lookup_elem(...) discards the return value unconditionally. If the syscall fails for a reason other than ENOENT (e.g., EFAULT from a bad fd), the error is silently swallowed and the bitmask is built from zeroed data, which could cause silent incorrect behavior. Consider at least logging at debug level on unexpected failure:

int lr = bf_bpf_map_lookup_elem(new_map->fd, elem->data, value);
if (lr && lr != -ENOENT)
    bf_dbg("unexpected map lookup error: %d", lr);

bf_list_foreach (&group->sets, set_node) {
const struct bf_set *set = bf_list_node_get_data(set_node);

bf_hashset_foreach (&set->elems, elem) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The non-owning alias map_ref is stored before ownership transfer and assigned to group->map after the push. This is correct today, but fragile: if future code adds an early return between bf_list_push and group->map = map_ref, the group's map pointer will be left as NULL with no indication of what went wrong. Consider assigning group->map immediately after the successful push, or adding a brief comment noting the dependency.

return r;
continue;
bf_list_foreach (&new_prog->set_groups, group_node) {
struct bf_set_group *group = bf_list_node_get_data(group_node);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: total_elems sums all elements across all sets in the group, but keys shared across sets will occupy a single map entry. For groups with heavily overlapping sets this over-estimates the BPF map's max_entries, wasting kernel memory. Safe but worth a brief comment documenting the trade-off (or a cap to prevent requesting an excessively large map from the kernel).

@@ -71,10 +71,13 @@ int bf_map_new(struct bf_map **map, const char *name, enum bf_map_type type,
* also as filename when pinning the map to the system. Can't be NULL or
* empty.
* @param set Set to create the map from. Can't be NULL.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: The @param set documentation reads "Set to create the map from" but the set is now used only for its key format (elem_size, use_trie), not its elements. Consider updating to: "Set whose key format defines the map's key size and type."

* the actual entry count may be smaller, but we can err on the
* safe side. */
bf_list_foreach (&group->sets, set_node) {
const struct bf_set *set = bf_list_node_get_data(set_node);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The read-modify-write loop performs 2 syscalls (lookup + update) per element per set. For single-set groups (the common case and all LPM trie groups), the bitmask is always 0x01, so the lookup is pure overhead. Consider a fast path: when n_sets == 1, skip the lookup and unconditionally write a static 0x01 value, or use bf_bpf_map_update_batch as the old code did. This avoids a per-element syscall regression for chains that do not benefit from grouping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bf_bpf_map_update_batch is the right approach here, we can use bf_hashset to prepare final bf_map shape on userspace side. I'll try it later. However, it seems it's not a dealbreaker if we don't get it now - even with 2 syscalls per item, it only slows down load_maps 2-3x in microbenchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge maps filtering on sets with identical keys

2 participants