Struct regex_lite::RegexBuilder
source · pub struct RegexBuilder { /* private fields */ }
Expand description
A configurable builder for a Regex
.
This builder can be used to programmatically set flags such as i
(case
insensitive) and x
(for verbose mode). This builder can also be used to
configure things like a size limit on the compiled regular expression.
Implementations§
source§impl RegexBuilder
impl RegexBuilder
sourcepub fn new(pattern: &str) -> RegexBuilder
pub fn new(pattern: &str) -> RegexBuilder
Create a new builder with a default configuration for the given pattern.
If the pattern is invalid or exceeds the configured size limits, then
an error will be returned when RegexBuilder::build
is called.
sourcepub fn build(&self) -> Result<Regex, Error>
pub fn build(&self) -> Result<Regex, Error>
Compiles the pattern given to RegexBuilder::new
with the
configuration set on this builder.
If the pattern isn’t a valid regex or if a configured size limit was exceeded, then an error is returned.
sourcepub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder
pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder
This configures whether to enable ASCII case insensitive matching for the entire pattern.
This setting can also be configured using the inline flag i
in the pattern. For example, (?i:foo)
matches foo
case
insensitively while (?-i:foo)
matches foo
case sensitively.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"foo(?-i:bar)quux")
.case_insensitive(true)
.build()
.unwrap();
assert!(re.is_match("FoObarQuUx"));
// Even though case insensitive matching is enabled in the builder,
// it can be locally disabled within the pattern. In this case,
// `bar` is matched case sensitively.
assert!(!re.is_match("fooBARquux"));
sourcepub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder
pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder
This configures multi-line mode for the entire pattern.
Enabling multi-line mode changes the behavior of the ^
and $
anchor
assertions. Instead of only matching at the beginning and end of a
haystack, respectively, multi-line mode causes them to match at the
beginning and end of a line in addition to the beginning and end of
a haystack. More precisely, ^
will match at the position immediately
following a \n
and $
will match at the position immediately
preceding a \n
.
The behavior of this option is impacted by the RegexBuilder::crlf
setting. Namely, CRLF mode changes the line terminator to be either
\r
or \n
, but never at the position between a \r
and \
n.
This setting can also be configured using the inline flag m
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"^foo$")
.multi_line(true)
.build()
.unwrap();
assert_eq!(Some(1..4), re.find("\nfoo\n").map(|m| m.range()));
sourcepub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder
pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder
This configures dot-matches-new-line mode for the entire pattern.
Perhaps surprisingly, the default behavior for .
is not to match
any character, but rather, to match any character except for the line
terminator (which is \n
by default). When this mode is enabled, the
behavior changes such that .
truly matches any character.
This setting can also be configured using the inline flag s
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"foo.bar")
.dot_matches_new_line(true)
.build()
.unwrap();
let hay = "foo\nbar";
assert_eq!(Some("foo\nbar"), re.find(hay).map(|m| m.as_str()));
sourcepub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder
pub fn crlf(&mut self, yes: bool) -> &mut RegexBuilder
This configures CRLF mode for the entire pattern.
When CRLF mode is enabled, both \r
(“carriage return” or CR for
short) and \n
(“line feed” or LF for short) are treated as line
terminators. This results in the following:
- Unless dot-matches-new-line mode is enabled,
.
will now match any character except for\n
and\r
. - When multi-line mode is enabled,
^
will match immediately following a\n
or a\r
. Similarly,$
will match immediately preceding a\n
or a\r
. Neither^
nor$
will ever match between\r
and\n
.
This setting can also be configured using the inline flag R
in
the pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"^foo$")
.multi_line(true)
.crlf(true)
.build()
.unwrap();
let hay = "\r\nfoo\r\n";
// If CRLF mode weren't enabled here, then '$' wouldn't match
// immediately after 'foo', and thus no match would be found.
assert_eq!(Some("foo"), re.find(hay).map(|m| m.as_str()));
This example demonstrates that ^
will never match at a position
between \r
and \n
. ($
will similarly not match between a \r
and a \n
.)
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"^")
.multi_line(true)
.crlf(true)
.build()
.unwrap();
let hay = "\r\n\r\n";
let ranges: Vec<_> = re.find_iter(hay).map(|m| m.range()).collect();
assert_eq!(ranges, vec![0..0, 2..2, 4..4]);
sourcepub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder
pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder
This configures swap-greed mode for the entire pattern.
When swap-greed mode is enabled, patterns like a+
will become
non-greedy and patterns like a+?
will become greedy. In other words,
the meanings of a+
and a+?
are switched.
This setting can also be configured using the inline flag U
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let re = RegexBuilder::new(r"a+")
.swap_greed(true)
.build()
.unwrap();
assert_eq!(Some("a"), re.find("aaa").map(|m| m.as_str()));
sourcepub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder
pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder
This configures verbose mode for the entire pattern.
When enabled, whitespace will treated as insignifcant in the pattern
and #
can be used to start a comment until the next new line.
Normally, in most places in a pattern, whitespace is treated literally.
For example +
will match one or more ASCII whitespace characters.
When verbose mode is enabled, \#
can be used to match a literal #
and \
can be used to match a literal ASCII whitespace character.
Verbose mode is useful for permitting regexes to be formatted and broken up more nicely. This may make them more easily readable.
This setting can also be configured using the inline flag x
in the
pattern.
The default for this is false
.
§Example
use regex_lite::RegexBuilder;
let pat = r"
\b
(?<first>[A-Z]\w*) # always start with uppercase letter
\s+ # whitespace should separate names
(?: # middle name can be an initial!
(?:(?<initial>[A-Z])\.|(?<middle>[A-Z]\w*))
\s+
)?
(?<last>[A-Z]\w*)
\b
";
let re = RegexBuilder::new(pat)
.ignore_whitespace(true)
.build()
.unwrap();
let caps = re.captures("Harry Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
assert_eq!("Potter", &caps["last"]);
let caps = re.captures("Harry J. Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(Some("J"), caps.name("initial").map(|m| m.as_str()));
assert_eq!(None, caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);
let caps = re.captures("Harry James Potter").unwrap();
assert_eq!("Harry", &caps["first"]);
// Since a middle name/initial isn't required for an overall match,
// we can't assume that 'initial' or 'middle' will be populated!
assert_eq!(None, caps.name("initial").map(|m| m.as_str()));
assert_eq!(Some("James"), caps.name("middle").map(|m| m.as_str()));
assert_eq!("Potter", &caps["last"]);
sourcepub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder
pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder
Sets the approximate size limit, in bytes, of the compiled regex.
This roughly corresponds to the number of heap memory, in bytes, occupied by a single regex. If the regex would otherwise approximately exceed this limit, then compiling that regex will fail.
The main utility of a method like this is to avoid compiling regexes that use an unexpected amount of resources, such as time and memory. Even if the memory usage of a large regex is acceptable, its search time may not be. Namely, worst case time complexity for search is `O(m
- n)
, where
m ~ len(pattern)and
n ~ len(haystack)`. That is, search time depends, in part, on the size of the compiled regex. This means that putting a limit on the size of the regex limits how much a regex can impact search time.
The default for this is some reasonable number that permits most patterns to compile successfully.
§Example
use regex_lite::RegexBuilder;
assert!(RegexBuilder::new(r"\w").size_limit(100).build().is_err());
sourcepub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder
pub fn nest_limit(&mut self, limit: u32) -> &mut RegexBuilder
Set the nesting limit for this parser.
The nesting limit controls how deep the abstract syntax tree is allowed to be. If the AST exceeds the given limit (e.g., with too many nested groups), then an error is returned by the parser.
The purpose of this limit is to act as a heuristic to prevent stack overflow for consumers that do structural induction on an AST using explicit recursion. While this crate never does this (instead using constant stack space and moving the call stack to the heap), other crates may.
This limit is not checked until the entire AST is parsed. Therefore, if callers want to put a limit on the amount of heap space used, then they should impose a limit on the length, in bytes, of the concrete pattern string. In particular, this is viable since this parser implementation will limit itself to heap space proportional to the length of the pattern string. See also the untrusted inputs section in the top-level crate documentation for more information about this.
Note that a nest limit of 0
will return a nest limit error for most
patterns but not all. For example, a nest limit of 0
permits a
but
not ab
, since ab
requires an explicit concatenation, which results
in a nest depth of 1
. In general, a nest limit is not something that
manifests in an obvious way in the concrete syntax, therefore, it
should not be used in a granular way.
§Example
use regex_lite::RegexBuilder;
assert!(RegexBuilder::new(r"").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"a").nest_limit(0).build().is_ok());
assert!(RegexBuilder::new(r"(a)").nest_limit(0).build().is_err());