Struct regex::bytes::Captures

source ·
pub struct Captures<'h> { /* private fields */ }
Expand description

Represents the capture groups for a single match.

Capture groups refer to parts of a regex enclosed in parentheses. They can be optionally named. The purpose of capture groups is to be able to reference different parts of a match based on the original pattern. In essence, a Captures is a container of Match values for each group that participated in a regex match. Each Match can be looked up by either its capture group index or name (if it has one).

For example, say you want to match the individual letters in a 5-letter word:

(?<first>\w)(\w)(?:\w)\w(?<last>\w)

This regex has 4 capture groups:

  • The group at index 0 corresponds to the overall match. It is always present in every match and never has a name.
  • The group at index 1 with name first corresponding to the first letter.
  • The group at index 2 with no name corresponding to the second letter.
  • The group at index 3 with name last corresponding to the fifth and last letter.

Notice that (?:\w) was not listed above as a capture group despite it being enclosed in parentheses. That’s because (?:pattern) is a special syntax that permits grouping but without capturing. The reason for not treating it as a capture is that tracking and reporting capture groups requires additional state that may lead to slower searches. So using as few capture groups as possible can help performance. (Although the difference in performance of a couple of capture groups is likely immaterial.)

Values with this type are created by Regex::captures or Regex::captures_iter.

'h is the lifetime of the haystack that these captures were matched from.

§Example

use regex::bytes::Regex;

let re = Regex::new(r"(?<first>\w)(\w)(?:\w)\w(?<last>\w)").unwrap();
let caps = re.captures(b"toady").unwrap();
assert_eq!(b"toady", &caps[0]);
assert_eq!(b"t", &caps["first"]);
assert_eq!(b"o", &caps[2]);
assert_eq!(b"y", &caps["last"]);

Implementations§

source§

impl<'h> Captures<'h>

source

pub fn get(&self, i: usize) -> Option<Match<'h>>

Returns the Match associated with the capture group at index i. If i does not correspond to a capture group, or if the capture group did not participate in the match, then None is returned.

When i == 0, this is guaranteed to return a non-None value.

§Examples

Get the substring that matched with a default of an empty string if the group didn’t participate in the match:

use regex::bytes::Regex;

let re = Regex::new(r"[a-z]+(?:([0-9]+)|([A-Z]+))").unwrap();
let caps = re.captures(b"abc123").unwrap();

let substr1 = caps.get(1).map_or(&b""[..], |m| m.as_bytes());
let substr2 = caps.get(2).map_or(&b""[..], |m| m.as_bytes());
assert_eq!(substr1, b"123");
assert_eq!(substr2, b"");
source

pub fn name(&self, name: &str) -> Option<Match<'h>>

Returns the Match associated with the capture group named name. If name isn’t a valid capture group or it refers to a group that didn’t match, then None is returned.

Note that unlike caps["name"], this returns a Match whose lifetime matches the lifetime of the haystack in this Captures value. Conversely, the substring returned by caps["name"] has a lifetime of the Captures value, which is likely shorter than the lifetime of the haystack. In some cases, it may be necessary to use this method to access the matching substring instead of the caps["name"] notation.

§Examples

Get the substring that matched with a default of an empty string if the group didn’t participate in the match:

use regex::bytes::Regex;

let re = Regex::new(
    r"[a-z]+(?:(?<numbers>[0-9]+)|(?<letters>[A-Z]+))",
).unwrap();
let caps = re.captures(b"abc123").unwrap();

let numbers = caps.name("numbers").map_or(&b""[..], |m| m.as_bytes());
let letters = caps.name("letters").map_or(&b""[..], |m| m.as_bytes());
assert_eq!(numbers, b"123");
assert_eq!(letters, b"");
source

pub fn extract<const N: usize>(&self) -> (&'h [u8], [&'h [u8]; N])

This is a convenience routine for extracting the substrings corresponding to matching capture groups.

This returns a tuple where the first element corresponds to the full substring of the haystack that matched the regex. The second element is an array of substrings, with each corresponding to the substring that matched for a particular capture group.

§Panics

This panics if the number of possible matching groups in this Captures value is not fixed to N in all circumstances. More precisely, this routine only works when N is equivalent to Regex::static_captures_len.

Stated more plainly, if the number of matching capture groups in a regex can vary from match to match, then this function always panics.

For example, (a)(b)|(c) could produce two matching capture groups or one matching capture group for any given match. Therefore, one cannot use extract with such a pattern.

But a pattern like (a)(b)|(c)(d) can be used with extract because the number of capture groups in every match is always equivalent, even if the capture indices in each match are not.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap();
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
let Some((full, [year, month, day])) =
    re.captures(hay).map(|caps| caps.extract()) else { return };
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
assert_eq!(b"03", month);
assert_eq!(b"14", day);
§Example: iteration

This example shows how to use this method when iterating over all Captures matches in a haystack.

use regex::bytes::Regex;

let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap();
let hay = b"1973-01-05, 1975-08-25 and 1980-10-18";

let mut dates: Vec<(&[u8], &[u8], &[u8])> = vec![];
for (_, [y, m, d]) in re.captures_iter(hay).map(|c| c.extract()) {
    dates.push((y, m, d));
}
assert_eq!(dates, vec![
    (&b"1973"[..], &b"01"[..], &b"05"[..]),
    (&b"1975"[..], &b"08"[..], &b"25"[..]),
    (&b"1980"[..], &b"10"[..], &b"18"[..]),
]);
§Example: parsing different formats

This API is particularly useful when you need to extract a particular value that might occur in a different format. Consider, for example, an identifier that might be in double quotes or single quotes:

use regex::bytes::Regex;

let re = Regex::new(r#"id:(?:"([^"]+)"|'([^']+)')"#).unwrap();
let hay = br#"The first is id:"foo" and the second is id:'bar'."#;
let mut ids = vec![];
for (_, [id]) in re.captures_iter(hay).map(|c| c.extract()) {
    ids.push(id);
}
assert_eq!(ids, vec![b"foo", b"bar"]);
source

pub fn expand(&self, replacement: &[u8], dst: &mut Vec<u8>)

Expands all instances of $ref in replacement to the corresponding capture group, and writes them to the dst buffer given. A ref can be a capture group index or a name. If ref doesn’t refer to a capture group that participated in the match, then it is replaced with the empty string.

§Format

The format of the replacement string supports two different kinds of capture references: unbraced and braced.

For the unbraced format, the format supported is $ref where name can be any character in the class [0-9A-Za-z_]. ref is always the longest possible parse. So for example, $1a corresponds to the capture group named 1a and not the capture group at index 1. If ref matches ^[0-9]+$, then it is treated as a capture group index itself and not a name.

For the braced format, the format supported is ${ref} where ref can be any sequence of bytes except for }. If no closing brace occurs, then it is not considered a capture reference. As with the unbraced format, if ref matches ^[0-9]+$, then it is treated as a capture group index and not a name.

The braced format is useful for exerting precise control over the name of the capture reference. For example, ${1}a corresponds to the capture group reference 1 followed by the letter a, where as $1a (as mentioned above) corresponds to the capture group reference 1a. The braced format is also useful for expressing capture group names that use characters not supported by the unbraced format. For example, ${foo[bar].baz} refers to the capture group named foo[bar].baz.

If a capture group reference is found and it does not refer to a valid capture group, then it will be replaced with the empty string.

To write a literal $, use $$.

§Example
use regex::bytes::Regex;

let re = Regex::new(
    r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
).unwrap();
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
let caps = re.captures(hay).unwrap();

let mut dst = vec![];
caps.expand(b"year=$year, month=$month, day=$day", &mut dst);
assert_eq!(dst, b"year=2010, month=03, day=14");
source

pub fn iter<'c>(&'c self) -> SubCaptureMatches<'c, 'h>

Returns an iterator over all capture groups. This includes both matching and non-matching groups.

The iterator always yields at least one matching group: the first group (at index 0) with no name. Subsequent groups are returned in the order of their opening parenthesis in the regex.

The elements yielded have type Option<Match<'h>>, where a non-None value is present if the capture group matches.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"(\w)(\d)?(\w)").unwrap();
let caps = re.captures(b"AZ").unwrap();

let mut it = caps.iter();
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), Some(&b"AZ"[..]));
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), Some(&b"A"[..]));
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), None);
assert_eq!(it.next().unwrap().map(|m| m.as_bytes()), Some(&b"Z"[..]));
assert_eq!(it.next(), None);
source

pub fn len(&self) -> usize

Returns the total number of capture groups. This includes both matching and non-matching groups.

The length returned is always equivalent to the number of elements yielded by Captures::iter. Consequently, the length is always greater than zero since every Captures value always includes the match for the entire regex.

§Example
use regex::bytes::Regex;

let re = Regex::new(r"(\w)(\d)?(\w)").unwrap();
let caps = re.captures(b"AZ").unwrap();
assert_eq!(caps.len(), 4);

Trait Implementations§

source§

impl<'h> Debug for Captures<'h>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl<'h, 'n> Index<&'n str> for Captures<'h>

Get a matching capture group’s haystack substring by name.

The haystack substring returned can’t outlive the Captures object if this method is used, because of how Index is defined (normally a[i] is part of a and can’t outlive it). To work around this limitation, do that, use Captures::name instead.

'h is the lifetime of the matched haystack, but the lifetime of the &str returned by this implementation is the lifetime of the Captures value itself.

'n is the lifetime of the group name used to index the Captures value.

§Panics

If there is no matching group at the given name.

source§

type Output = [u8]

The returned type after indexing.
source§

fn index<'a>(&'a self, name: &'n str) -> &'a [u8]

Performs the indexing (container[index]) operation. Read more
source§

impl<'h> Index<usize> for Captures<'h>

Get a matching capture group’s haystack substring by index.

The haystack substring returned can’t outlive the Captures object if this method is used, because of how Index is defined (normally a[i] is part of a and can’t outlive it). To work around this limitation, do that, use Captures::get instead.

'h is the lifetime of the matched haystack, but the lifetime of the &str returned by this implementation is the lifetime of the Captures value itself.

§Panics

If there is no matching group at the given index.

source§

type Output = [u8]

The returned type after indexing.
source§

fn index<'a>(&'a self, i: usize) -> &'a [u8]

Performs the indexing (container[index]) operation. Read more

Auto Trait Implementations§

§

impl<'h> Freeze for Captures<'h>

§

impl<'h> RefUnwindSafe for Captures<'h>

§

impl<'h> Send for Captures<'h>

§

impl<'h> Sync for Captures<'h>

§

impl<'h> Unpin for Captures<'h>

§

impl<'h> UnwindSafe for Captures<'h>

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.