Struct Regex

Source

pub struct Regex { /* private fields */ }

Expand description

A compiled regular expression for searching Unicode haystacks.

A Regex can be used to search haystacks, split haystacks into substrings or replace substrings in a haystack with a different substring. All searching is done with an implicit (?s:.)*? at the beginning and end of an pattern. To force an expression to match the whole string (or a prefix or a suffix), you must use an anchor like ^ or $ (or \A and \z).

Like the Regex type in the parent module, matches with this regex return byte offsets into the haystack. Unlike the parent Regex type, these byte offsets may not correspond to UTF-8 sequence boundaries since the regexes in this module can match arbitrary bytes.

The only methods that allocate new byte strings are the string replacement methods. All other methods (searching and splitting) return borrowed references into the haystack given.

§Example

Find the offsets of a US phone number:

use regex::bytes::Regex;

let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap();
let m = re.find(b"phone: 111-222-3333").unwrap();
assert_eq!(7..19, m.range());

§Example: extracting capture groups

A common way to use regexes is with capture groups. That is, instead of just looking for matches of an entire regex, parentheses are used to create groups that represent part of the match.

For example, consider a haystack with multiple lines, and each line has three whitespace delimited fields where the second field is expected to be a number and the third field a boolean. To make this convenient, we use the Captures::extract API to put the strings that match each group into a fixed size array:

use regex::bytes::Regex;

let hay = b"
rabbit         54 true
groundhog 2 true
does not match
fox   109    false
";
let re = Regex::new(r"(?m)^\s*(\S+)\s+([0-9]+)\s+(true|false)\s*$").unwrap();
let mut fields: Vec<(&[u8], i64, bool)> = vec![];
for (_, [f1, f2, f3]) in re.captures_iter(hay).map(|caps| caps.extract()) {
    // These unwraps are OK because our pattern is written in a way where
    // all matches for f2 and f3 will be valid UTF-8.
    let f2 = std::str::from_utf8(f2).unwrap();
    let f3 = std::str::from_utf8(f3).unwrap();
    fields.push((f1, f2.parse()?, f3.parse()?));
}
assert_eq!(fields, vec![
    (&b"rabbit"[..], 54, true),
    (&b"groundhog"[..], 2, true),
    (&b"fox"[..], 109, false),
]);

§Example: matching invalid UTF-8

One of the reasons for searching &[u8] haystacks is that the &[u8] might not be valid UTF-8. Indeed, with a bytes::Regex, patterns that match invalid UTF-8 are explicitly allowed. Here’s one example that looks for valid UTF-8 fields that might be separated by invalid UTF-8. In this case, we use (?s-u:.), which matches any byte. Attempting to use it in a top-level Regex will result in the regex failing to compile. Notice also that we use . with Unicode mode enabled, in which case, only valid UTF-8 is matched. In this way, we can build one pattern where some parts only match valid UTF-8 while other parts are more permissive.

use regex::bytes::Regex;

// F0 9F 92 A9 is the UTF-8 encoding for a Pile of Poo.
let hay = b"\xFF\xFFfoo\xFF\xFF\xFF\xF0\x9F\x92\xA9\xFF";
// An equivalent to '(?s-u:.)' is '(?-u:[\x00-\xFF])'.
let re = Regex::new(r"(?s)(?-u:.)*?(?<f1>.+)(?-u:.)*?(?<f2>.+)").unwrap();
let caps = re.captures(hay).unwrap();
assert_eq!(&caps["f1"], &b"foo"[..]);
assert_eq!(&caps["f2"], "💩".as_bytes());

Struct Regex Copy item path

§Example

§Example: extracting capture groups

§Example: matching invalid UTF-8

Implementations§

impl Regex

pub fn new(re: &str) -> Result<Regex, Error>

§Errors

§Example

pub fn is_match(&self, haystack: &[u8]) -> bool

§Example

pub fn find<'h>(&self, haystack: &'h [u8]) -> Option<Match<'h>>

§Example

pub fn find_iter<'r, 'h>(&'r self, haystack: &'h [u8]) -> Matches<'r, 'h> ⓘ

§Time complexity

§Example

pub fn captures<'h>(&self, haystack: &'h [u8]) -> Option<Captures<'h>>

§Example

pub fn captures_iter<'r, 'h>( &'r self, haystack: &'h [u8], ) -> CaptureMatches<'r, 'h> ⓘ

§Time complexity

§Example

pub fn split<'r, 'h>(&'r self, haystack: &'h [u8]) -> Split<'r, 'h> ⓘ

§Time complexity

§Example

§Example: more cases

pub fn splitn<'r, 'h>( &'r self, haystack: &'h [u8], limit: usize, ) -> SplitN<'r, 'h> ⓘ

§Time complexity

§Example

§Examples: more cases

pub fn replace<'h, R: Replacer>( &self, haystack: &'h [u8], rep: R, ) -> Cow<'h, [u8]>

§Replacement string syntax

§Example

pub fn replace_all<'h, R: Replacer>( &self, haystack: &'h [u8], rep: R, ) -> Cow<'h, [u8]>

§Time complexity

§Fallibility

§Example

pub fn replacen<'h, R: Replacer>( &self, haystack: &'h [u8], limit: usize, rep: R, ) -> Cow<'h, [u8]>

§Time complexity

§Fallibility

§Example

impl Regex

pub fn shortest_match(&self, haystack: &[u8]) -> Option<usize>

§Example

pub fn shortest_match_at(&self, haystack: &[u8], start: usize) -> Option<usize>

§Panics

§Example

pub fn is_match_at(&self, haystack: &[u8], start: usize) -> bool

§Panics

§Example

pub fn find_at<'h>(&self, haystack: &'h [u8], start: usize) -> Option<Match<'h>>

§Panics

§Example

pub fn captures_at<'h>( &self, haystack: &'h [u8], start: usize, ) -> Option<Captures<'h>>

§Panics

§Example

pub fn captures_read<'h>( &self, locs: &mut CaptureLocations, haystack: &'h [u8], ) -> Option<Match<'h>>

§Example

pub fn captures_read_at<'h>( &self, locs: &mut CaptureLocations, haystack: &'h [u8], start: usize, ) -> Option<Match<'h>>

§Panics

§Example

impl Regex

pub fn as_str(&self) -> &str

§Example

pub fn capture_names(&self) -> CaptureNames<'_> ⓘ

§Example

pub fn captures_len(&self) -> usize

§Example

pub fn static_captures_len(&self) -> Option<usize>

§Example

pub fn capture_locations(&self) -> CaptureLocations

§Example

Trait Implementations§

impl Clone for Regex

fn clone(&self) -> Regex

fn clone_from(&mut self, source: &Self)

impl Debug for Regex

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Display for Regex

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl FromStr for Regex

Struct Regex

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T> ToString for T
where T: Display + ?Sized,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,