Struct RegexSet

Source

pub struct RegexSet { /* private fields */ }

Expand description

Match multiple, possibly overlapping, regexes in a single search.

A regex set corresponds to the union of zero or more regular expressions. That is, a regex set will match a haystack when at least one of its constituent regexes matches. A regex set as its formulated here provides a touch more power: it will also report which regular expressions in the set match. Indeed, this is the key difference between regex sets and a single Regex with many alternates, since only one alternate can match at a time.

For example, consider regular expressions to match email addresses and domains: [a-z]+@[a-z]+\.(com|org|net) and [a-z]+\.(com|org|net). If a regex set is constructed from those regexes, then searching the haystack foo@example.com will report both regexes as matching. Of course, one could accomplish this by compiling each regex on its own and doing two searches over the haystack. The key advantage of using a regex set is that it will report the matching regexes using a single pass through the haystack. If one has hundreds or thousands of regexes to match repeatedly (like a URL router for a complex web application or a user agent matcher), then a regex set can realize huge performance gains.

Unlike the top-level RegexSet, this RegexSet searches haystacks with type &[u8] instead of &str. Consequently, this RegexSet is permitted to match invalid UTF-8.

§Limitations

Regex sets are limited to answering the following two questions:

Does any regex in the set match?
If so, which regexes in the set match?

As with the main Regex type, it is cheaper to ask (1) instead of (2) since the matching engines can stop after the first match is found.

You cannot directly extract Match or Captures objects from a regex set. If you need these operations, the recommended approach is to compile each pattern in the set independently and scan the exact same haystack a second time with those independently compiled patterns:

use regex::bytes::{Regex, RegexSet};

let patterns = ["foo", "bar"];
// Both patterns will match different ranges of this string.
let hay = b"barfoo";

// Compile a set matching any of our patterns.
let set = RegexSet::new(patterns).unwrap();
// Compile each pattern independently.
let regexes: Vec<_> = set
    .patterns()
    .iter()
    .map(|pat| Regex::new(pat).unwrap())
    .collect();

// Match against the whole set first and identify the individual
// matching patterns.
let matches: Vec<&[u8]> = set
    .matches(hay)
    .into_iter()
    // Dereference the match index to get the corresponding
    // compiled pattern.
    .map(|index| &regexes[index])
    // To get match locations or any other info, we then have to search the
    // exact same haystack again, using our separately-compiled pattern.
    .map(|re| re.find(hay).unwrap().as_bytes())
    .collect();

// Matches arrive in the order the constituent patterns were declared,
// not the order they appear in the haystack.
assert_eq!(vec![&b"foo"[..], &b"bar"[..]], matches);

§Performance

A RegexSet has the same performance characteristics as Regex. Namely, search takes O(m * n) time, where m is proportional to the size of the regex set and n is proportional to the length of the haystack.

§Trait implementations

The Default trait is implemented for RegexSet. The default value is an empty set. An empty set can also be explicitly constructed via RegexSet::empty.

§Example

This shows how the above two regexes (for matching email addresses and domains) might work:

use regex::bytes::RegexSet;

let set = RegexSet::new(&[
    r"[a-z]+@[a-z]+\.(com|org|net)",
    r"[a-z]+\.(com|org|net)",
]).unwrap();

// Ask whether any regexes in the set match.
assert!(set.is_match(b"foo@example.com"));

// Identify which regexes in the set match.
let matches: Vec<_> = set.matches(b"foo@example.com").into_iter().collect();
assert_eq!(vec![0, 1], matches);

// Try again, but with a haystack that only matches one of the regexes.
let matches: Vec<_> = set.matches(b"example.com").into_iter().collect();
assert_eq!(vec![1], matches);

// Try again, but with a haystack that doesn't match any regex in the set.
let matches: Vec<_> = set.matches(b"example").into_iter().collect();
assert!(matches.is_empty());

Note that it would be possible to adapt the above example to using Regex with an expression like:

(?P<email>[a-z]+@(?P<email_domain>[a-z]+[.](com|org|net)))|(?P<domain>[a-z]+[.](com|org|net))

After a match, one could then inspect the capture groups to figure out which alternates matched. The problem is that it is hard to make this approach scale when there are many regexes since the overlap between each alternate isn’t always obvious to reason about.

Struct RegexSet Copy item path

§Limitations

§Performance

§Trait implementations

§Example

Implementations§

impl RegexSet

pub fn new<I, S>(exprs: I) -> Result<RegexSet, Error>where S: AsRef<str>, I: IntoIterator<Item = S>,

§Example

pub fn empty() -> RegexSet

§Example

pub fn is_match(&self, haystack: &[u8]) -> bool

§Example

pub fn is_match_at(&self, haystack: &[u8], start: usize) -> bool

§Panics

§Example

pub fn matches(&self, haystack: &[u8]) -> SetMatches

§Example

pub fn matches_at(&self, haystack: &[u8], start: usize) -> SetMatches

§Panics

§Example

pub fn len(&self) -> usize

§Example

pub fn is_empty(&self) -> bool

§Example

pub fn patterns(&self) -> &[String]

§Example

Trait Implementations§

impl Clone for RegexSet

fn clone(&self) -> RegexSet

fn clone_from(&mut self, source: &Self)

impl Debug for RegexSet

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for RegexSet

fn default() -> Self

Auto Trait Implementations§

impl Freeze for RegexSet

impl RefUnwindSafe for RegexSet

impl Send for RegexSet

impl Sync for RegexSet

impl Unpin for RegexSet

impl UnwindSafe for RegexSet

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct RegexSet

pub fn new<I, S>(exprs: I) -> Result<RegexSet, Error>
where S: AsRef<str>, I: IntoIterator<Item = S>,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,