Skip to content

Question: Do PCRE2 leftmost-first semantics include capture group positions? #1336

@wareya

Description

@wareya

What version of regex are you using?

1.12.x and also older

Describe the bug at a high level.

See title: "Question: Do PCRE2 leftmost-first semantics include capture groups?"

I've been testing regex implementations for differences in capture behavior because I'm trying to figure out how to best handle tie-breaking in a lockstep parallel NFA simulation. I'm running into some strange differences from PCRE2 in automata-driven crates and can't figure out if they would be considered bugs worth reporting or not. If not then I want to avoid dropping a ton of supposed bugs on here for no reason. I vaguely remember from working on my own regex implementation that handling quantified nullable groups was a headache even in backtracking land.

Random example: On the regex (|.)*(a+b) (yes really) with the input axaaab, everything successfully matches the entire input string, but rust/regex and re2 give capture groups of axa[a][ab], while PCRE2 and C# etc give ax[][aaab].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions