Skip to content

Feature: Iterator over matching and non-matching parts of a haystack #1296

@LunarLambda

Description

@LunarLambda

An API that is the union between find_iter (matching parts) and split (non-matching parts):

// I don't care about the exact naming or structure. Please do not bikeshed these.

// and equiv. in `bytes`
pub enum Piece<'a> {
    Matching(&'a str /* or `Match`? */),
    NotMatching(&'a str),
}

impl<'a> Piece<'a> {
    pub fn as_str(&self) -> &'a str;
}

impl Regex {
    pub fn pieces<'h>(&self, haystack: &'h str) -> impl Iterator<Item = Piece<'h>>;
}

such that

let r = Regex::new("%.").unwrap();

let text = String::from("Hello, world: %s %d");

let mut p = r.pieces(&text);

assert_eq!(p.next(), Some(Piece::NotMatching("Hello, world: ")));
assert_eq!(p.next(), Some(Piece::Matching("%s")));
assert_eq!(p.next(), Some(Piece::NotMatching(" ")));
assert_eq!(p.next(), Some(Piece::Matching("%d")));
assert_eq!(p.next(), None);

// roundtrip property:
assert_eq!(text, r.pieces(&text).map(|p| p.as_str()).collect::<String>())

I think it may be possible to build this by making a wrapper around find_iter but the code would be quite clunky, it would be beneficial to implement this in Regex proper.

Use cases where both matching and non-matching pieces of text are needed are pretty common, such as format strings, handling control sequences/utf-8/alternations of text & non-text, and currently require awkward constructions of manually tracking slice sub-ranges, memchr loops, or writing custom parsers with libraries like nom.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions