aho_corasick/
automaton.rs

Help
/*!
Provides [`Automaton`] trait for abstracting over Aho-Corasick automata.

The `Automaton` trait provides a way to write generic code over any
Aho-Corasick automaton. It also provides access to lower level APIs that
permit walking the state transitions of an Aho-Corasick automaton manually.
*/

use alloc::{string::String, vec::Vec};

use crate::util::{
    error::MatchError,
    primitives::PatternID,
    search::{Anchored, Input, Match, MatchKind, Span},
};

pub use crate::util::{
    prefilter::{Candidate, Prefilter},
    primitives::{StateID, StateIDError},
};

/// We seal the `Automaton` trait for now. It's a big trait, and it's
/// conceivable that I might want to add new required methods, and sealing the
/// trait permits doing that in a backwards compatible fashion. On other the
/// hand, if you have a solid use case for implementing the trait yourself,
/// please file an issue and we can discuss it. This was *mostly* done as a
/// conservative step.
pub(crate) mod private {
    pub trait Sealed {}
}
impl private::Sealed for crate::nfa::noncontiguous::NFA {}
impl private::Sealed for crate::nfa::contiguous::NFA {}
impl private::Sealed for crate::dfa::DFA {}

impl<'a, T: private::Sealed + ?Sized> private::Sealed for &'a T {}

/// A trait that abstracts over Aho-Corasick automata.
///
/// This trait primarily exists for niche use cases such as:
///
/// * Using an NFA or DFA directly, bypassing the top-level
/// [`AhoCorasick`](crate::AhoCorasick) searcher. Currently, these include
/// [`noncontiguous::NFA`](crate::nfa::noncontiguous::NFA),
/// [`contiguous::NFA`](crate::nfa::contiguous::NFA) and
/// [`dfa::DFA`](crate::dfa::DFA).
/// * Implementing your own custom search routine by walking the automaton
/// yourself. This might be useful for implementing search on non-contiguous
/// strings or streams.
///
/// For most use cases, it is not expected that users will need
/// to use or even know about this trait. Indeed, the top level
/// [`AhoCorasick`](crate::AhoCorasick) searcher does not expose any details
/// about this trait, nor does it implement it itself.
///
/// Note that this trait defines a number of default methods, such as
/// [`Automaton::try_find`] and [`Automaton::try_find_iter`], which implement
/// higher level search routines in terms of the lower level automata API.
///
/// # Sealed
///
/// Currently, this trait is sealed. That means users of this crate can write
/// generic routines over this trait but cannot implement it themselves. This
/// restriction may be lifted in the future, but sealing the trait permits
/// adding new required methods in a backwards compatible fashion.
///
/// # Special states
///
/// This trait encodes a notion of "special" states in an automaton. Namely,
/// a state is treated as special if it is a dead, match or start state:
///
/// * A dead state is a state that cannot be left once entered. All transitions
/// on a dead state lead back to itself. The dead state is meant to be treated
/// as a sentinel indicating that the search should stop and return a match if
/// one has been found, and nothing otherwise.
/// * A match state is a state that indicates one or more patterns have
/// matched. Depending on the [`MatchKind`] of the automaton, a search may
/// stop once a match is seen, or it may continue looking for matches until
/// it enters a dead state or sees the end of the haystack.
/// * A start state is a state that a search begins in. It is useful to know
/// when a search enters a start state because it may mean that a prefilter can
/// be used to skip ahead and quickly look for candidate matches. Unlike dead
/// and match states, it is never necessary to explicitly handle start states
/// for correctness. Indeed, in this crate, implementations of `Automaton`
/// will only treat start states as "special" when a prefilter is enabled and
/// active. Otherwise, treating it as special has no purpose and winds up
/// slowing down the overall search because it results in ping-ponging between
/// the main state transition and the "special" state logic.
///
/// Since checking whether a state is special by doing three different
/// checks would be too expensive inside a fast search loop, the
/// [`Automaton::is_special`] method is provided for quickly checking whether
/// the state is special. The `Automaton::is_dead`, `Automaton::is_match` and
/// `Automaton::is_start` predicates can then be used to determine which kind
/// of special state it is.
///
/// # Panics
///
/// Most of the APIs on this trait should panic or give incorrect results
/// if invalid inputs are given to it. For example, `Automaton::next_state`
/// has unspecified behavior if the state ID given to it is not a valid
/// state ID for the underlying automaton. Valid state IDs can only be
/// retrieved in one of two ways: calling `Automaton::start_state` or calling
/// `Automaton::next_state` with a valid state ID.
///
/// # Safety
///
/// This trait is not safe to implement so that code may rely on the
/// correctness of implementations of this trait to avoid undefined behavior.
/// The primary correctness guarantees are:
///
/// * `Automaton::start_state` always returns a valid state ID or an error or
/// panics.
/// * `Automaton::next_state`, when given a valid state ID, always returns
/// a valid state ID for all values of `anchored` and `byte`, or otherwise
/// panics.
///
/// In general, the rest of the methods on `Automaton` need to uphold their
/// contracts as well. For example, `Automaton::is_dead` should only returns
/// true if the given state ID is actually a dead state.
///
/// Note that currently this crate does not rely on the safety property defined
/// here to avoid undefined behavior. Instead, this was done to make it
/// _possible_ to do in the future.
///
/// # Example
///
/// This example shows how one might implement a basic but correct search
/// routine. We keep things simple by not using prefilters or worrying about
/// anchored searches, but do make sure our search is correct for all possible
/// [`MatchKind`] semantics. (The comments in the code below note the parts
/// that are needed to support certain `MatchKind` semantics.)
///
/// ```
/// use aho_corasick::{
///     automaton::Automaton,
///     nfa::noncontiguous::NFA,
///     Anchored, Match, MatchError, MatchKind,
/// };
///
/// // Run an unanchored search for 'aut' in 'haystack'. Return the first match
/// // seen according to the automaton's match semantics. This returns an error
/// // if the given automaton does not support unanchored searches.
/// fn find<A: Automaton>(
///     aut: A,
///     haystack: &[u8],
/// ) -> Result<Option<Match>, MatchError> {
///     let mut sid = aut.start_state(Anchored::No)?;
///     let mut at = 0;
///     let mut mat = None;
///     let get_match = |sid, at| {
///         let pid = aut.match_pattern(sid, 0);
///         let len = aut.pattern_len(pid);
///         Match::new(pid, (at - len)..at)
///     };
///     // Start states can be match states!
///     if aut.is_match(sid) {
///         mat = Some(get_match(sid, at));
///         // Standard semantics require matches to be reported as soon as
///         // they're seen. Otherwise, we continue until we see a dead state
///         // or the end of the haystack.
///         if matches!(aut.match_kind(), MatchKind::Standard) {
///             return Ok(mat);
///         }
///     }
///     while at < haystack.len() {
///         sid = aut.next_state(Anchored::No, sid, haystack[at]);
///         if aut.is_special(sid) {
///             if aut.is_dead(sid) {
///                 return Ok(mat);
///             } else if aut.is_match(sid) {
///                 mat = Some(get_match(sid, at + 1));
///                 // As above, standard semantics require that we return
///                 // immediately once a match is found.
///                 if matches!(aut.match_kind(), MatchKind::Standard) {
///                     return Ok(mat);
///                 }
///             }
///         }
///         at += 1;
///     }
///     Ok(mat)
/// }
///
/// // Show that it works for standard searches.
/// let nfa = NFA::new(&["samwise", "sam"]).unwrap();
/// assert_eq!(Some(Match::must(1, 0..3)), find(&nfa, b"samwise")?);
///
/// // But also works when using leftmost-first. Notice how the match result
/// // has changed!
/// let nfa = NFA::builder()
///     .match_kind(MatchKind::LeftmostFirst)
///     .build(&["samwise", "sam"])
///     .unwrap();
/// assert_eq!(Some(Match::must(0, 0..7)), find(&nfa, b"samwise")?);
///
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub unsafe trait Automaton: private::Sealed {
    /// Returns the starting state for the given anchor mode.
    ///
    /// Upon success, the state ID returned is guaranteed to be valid for
    /// this automaton.
    ///
    /// # Errors
    ///
    /// This returns an error when the given search configuration is not
    /// supported by the underlying automaton. For example, if the underlying
    /// automaton only supports unanchored searches but the given configuration
    /// was set to an anchored search, then this must return an error.
    fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>;

    /// Performs a state transition from `sid` for `byte` and returns the next
    /// state.
    ///
    /// `anchored` should be [`Anchored::Yes`] when executing an anchored
    /// search and [`Anchored::No`] otherwise. For some implementations of
    /// `Automaton`, it is required to know whether the search is anchored
    /// or not in order to avoid following failure transitions. Other
    /// implementations may ignore `anchored` altogether and depend on
    /// `Automaton::start_state` returning a state that walks a different path
    /// through the automaton depending on whether the search is anchored or
    /// not.
    ///
    /// # Panics
    ///
    /// This routine may panic or return incorrect results when the given state
    /// ID is invalid. A state ID is valid if and only if:
    ///
    /// 1. It came from a call to `Automaton::start_state`, or
    /// 2. It came from a previous call to `Automaton::next_state` with a
    /// valid state ID.
    ///
    /// Implementations must treat all possible values of `byte` as valid.
    ///
    /// Implementations may panic on unsupported values of `anchored`, but are
    /// not required to do so.
    fn next_state(
        &self,
        anchored: Anchored,
        sid: StateID,
        byte: u8,
    ) -> StateID;

    /// Returns true if the given ID represents a "special" state. A special
    /// state is a dead, match or start state.
    ///
    /// Note that implementations may choose to return false when the given ID
    /// corresponds to a start state. Namely, it always correct to treat start
    /// states as non-special. Implementations must return true for states that
    /// are dead or contain matches.
    ///
    /// This has unspecified behavior when given an invalid state ID.
    fn is_special(&self, sid: StateID) -> bool;

    /// Returns true if the given ID represents a dead state.
    ///
    /// A dead state is a type of "sink" in a finite state machine. It
    /// corresponds to a state whose transitions all loop back to itself. That
    /// is, once entered, it can never be left. In practice, it serves as a
    /// sentinel indicating that the search should terminate.
    ///
    /// This has unspecified behavior when given an invalid state ID.
    fn is_dead(&self, sid: StateID) -> bool;

    /// Returns true if the given ID represents a match state.
    ///
    /// A match state is always associated with one or more pattern IDs that
    /// matched at the position in the haystack when the match state was
    /// entered. When a match state is entered, the match semantics dictate
    /// whether it should be returned immediately (for `MatchKind::Standard`)
    /// or if the search should continue (for `MatchKind::LeftmostFirst` and
    /// `MatchKind::LeftmostLongest`) until a dead state is seen or the end of
    /// the haystack has been reached.
    ///
    /// This has unspecified behavior when given an invalid state ID.
    fn is_match(&self, sid: StateID) -> bool;

    /// Returns true if the given ID represents a start state.
    ///
    /// While it is never incorrect to ignore start states during a search
    /// (except for the start of the search of course), knowing whether one has
    /// entered a start state can be useful for certain classes of performance
    /// optimizations. For example, if one is in a start state, it may be legal
    /// to try to skip ahead and look for match candidates more quickly than
    /// would otherwise be accomplished by walking the automaton.
    ///
    /// Implementations of `Automaton` in this crate "unspecialize" start
    /// states when a prefilter is not active or enabled. In this case, it
    /// is possible for `Automaton::is_special(sid)` to return false while
    /// `Automaton::is_start(sid)` returns true.
    ///
    /// This has unspecified behavior when given an invalid state ID.
    fn is_start(&self, sid: StateID) -> bool;

    /// Returns the match semantics that this automaton was built with.
    fn match_kind(&self) -> MatchKind;

    /// Returns the total number of matches for the given state ID.
    ///
    /// This has unspecified behavior if the given ID does not refer to a match
    /// state.
    fn match_len(&self, sid: StateID) -> usize;

    /// Returns the pattern ID for the match state given by `sid` at the
    /// `index` given.
    ///
    /// Typically, `index` is only ever greater than `0` when implementing an
    /// overlapping search. Otherwise, it's likely that your search only cares
    /// about reporting the first pattern ID in a match state.
    ///
    /// This has unspecified behavior if the given ID does not refer to a match
    /// state, or if the index is greater than or equal to the total number of
    /// matches in this match state.
    fn match_pattern(&self, sid: StateID, index: usize) -> PatternID;

    /// Returns the total number of patterns compiled into this automaton.
    fn patterns_len(&self) -> usize;

    /// Returns the length of the pattern for the given ID.
    ///
    /// This has unspecified behavior when given an invalid pattern
    /// ID. A pattern ID is valid if and only if it is less than
    /// `Automaton::patterns_len`.
    fn pattern_len(&self, pid: PatternID) -> usize;

    /// Returns the length, in bytes, of the shortest pattern in this
    /// automaton.
    fn min_pattern_len(&self) -> usize;

    /// Returns the length, in bytes, of the longest pattern in this automaton.
    fn max_pattern_len(&self) -> usize;

    /// Returns the heap memory usage, in bytes, used by this automaton.
    fn memory_usage(&self) -> usize;

    /// Returns a prefilter, if available, that can be used to accelerate
    /// searches for this automaton.
    ///
    /// The typical way this is used is when the start state is entered during
    /// a search. When that happens, one can use a prefilter to skip ahead and
    /// look for candidate matches without having to walk the automaton on the
    /// bytes between candidates.
    ///
    /// Typically a prefilter is only available when there are a small (<100)
    /// number of patterns built into the automaton.
    fn prefilter(&self) -> Option<&Prefilter>;

    /// Executes a non-overlapping search with this automaton using the given
    /// configuration.
    ///
    /// See
    /// [`AhoCorasick::try_find`](crate::AhoCorasick::try_find)
    /// for more documentation and examples.
    fn try_find(
        &self,
        input: &Input<'_>,
    ) -> Result<Option<Match>, MatchError> {
        try_find_fwd(&self, input)
    }

    /// Executes a overlapping search with this automaton using the given
    /// configuration.
    ///
    /// See
    /// [`AhoCorasick::try_find_overlapping`](crate::AhoCorasick::try_find_overlapping)
    /// for more documentation and examples.
    fn try_find_overlapping(
        &self,
        input: &Input<'_>,
        state: &mut OverlappingState,
    ) -> Result<(), MatchError> {
        try_find_overlapping_fwd(&self, input, state)
    }

    /// Returns an iterator of non-overlapping matches with this automaton
    /// using the given configuration.
    ///
    /// See
    /// [`AhoCorasick::try_find_iter`](crate::AhoCorasick::try_find_iter)
    /// for more documentation and examples.
    fn try_find_iter<'a, 'h>(
        &'a self,
        input: Input<'h>,
    ) -> Result<FindIter<'a, 'h, Self>, MatchError>
    where
        Self: Sized,
    {
        FindIter::new(self, input)
    }

    /// Returns an iterator of overlapping matches with this automaton
    /// using the given configuration.
    ///
    /// See
    /// [`AhoCorasick::try_find_overlapping_iter`](crate::AhoCorasick::try_find_overlapping_iter)
    /// for more documentation and examples.
    fn try_find_overlapping_iter<'a, 'h>(
        &'a self,
        input: Input<'h>,
    ) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>
    where
        Self: Sized,
    {
        if !self.match_kind().is_standard() {
            return Err(MatchError::unsupported_overlapping(
                self.match_kind(),
            ));
        }
        //  We might consider lifting this restriction. The reason why I added
        // it was to ban the combination of "anchored search" and "overlapping
        // iteration." The match semantics aren't totally clear in that case.
        // Should we allow *any* matches that are adjacent to *any* previous
        // match? Or only following the most recent one? Or only matches
        // that start at the beginning of the search? We might also elect to
        // just keep this restriction in place, as callers should be able to
        // implement it themselves if they want to.
        if input.get_anchored().is_anchored() {
            return Err(MatchError::invalid_input_anchored());
        }
        let _ = self.start_state(input.get_anchored())?;
        let state = OverlappingState::start();
        Ok(FindOverlappingIter { aut: self, input, state })
    }

    /// Replaces all non-overlapping matches in `haystack` with
    /// strings from `replace_with` depending on the pattern that
    /// matched. The `replace_with` slice must have length equal to
    /// `Automaton::patterns_len`.
    ///
    /// See
    /// [`AhoCorasick::try_replace_all`](crate::AhoCorasick::try_replace_all)
    /// for more documentation and examples.
    fn try_replace_all<B>(
        &self,
        haystack: &str,
        replace_with: &[B],
    ) -> Result<String, MatchError>
    where
        Self: Sized,
        B: AsRef<str>,
    {
        assert_eq!(
            replace_with.len(),
            self.patterns_len(),
            "replace_all requires a replacement for every pattern \
             in the automaton"
        );
        let mut dst = String::with_capacity(haystack.len());
        self.try_replace_all_with(haystack, &mut dst, |mat, _, dst| {
            dst.push_str(replace_with[mat.pattern()].as_ref());
            true
        })?;
        Ok(dst)
    }

    /// Replaces all non-overlapping matches in `haystack` with
    /// strings from `replace_with` depending on the pattern that
    /// matched. The `replace_with` slice must have length equal to
    /// `Automaton::patterns_len`.
    ///
    /// See
    /// [`AhoCorasick::try_replace_all_bytes`](crate::AhoCorasick::try_replace_all_bytes)
    /// for more documentation and examples.
    fn try_replace_all_bytes<B>(
        &self,
        haystack: &[u8],
        replace_with: &[B],
    ) -> Result<Vec<u8>, MatchError>
    where
        Self: Sized,
        B: AsRef<[u8]>,
    {
        assert_eq!(
            replace_with.len(),
            self.patterns_len(),
            "replace_all requires a replacement for every pattern \
             in the automaton"
        );
        let mut dst = Vec::with_capacity(haystack.len());
        self.try_replace_all_with_bytes(haystack, &mut dst, |mat, _, dst| {
            dst.extend(replace_with[mat.pattern()].as_ref());
            true
        })?;
        Ok(dst)
    }

    /// Replaces all non-overlapping matches in `haystack` by calling the
    /// `replace_with` closure given.
    ///
    /// See
    /// [`AhoCorasick::try_replace_all_with`](crate::AhoCorasick::try_replace_all_with)
    /// for more documentation and examples.
    fn try_replace_all_with<F>(
        &self,
        haystack: &str,
        dst: &mut String,
        mut replace_with: F,
    ) -> Result<(), MatchError>
    where
        Self: Sized,
        F: FnMut(&Match, &str, &mut String) -> bool,
    {
        let mut last_match = 0;
        for m in self.try_find_iter(Input::new(haystack))? {
            // Since there are no restrictions on what kinds of patterns are
            // in an Aho-Corasick automaton, we might get matches that split
            // a codepoint, or even matches of a partial codepoint. When that
            // happens, we just skip the match.
            if !haystack.is_char_boundary(m.start())
                || !haystack.is_char_boundary(m.end())
            {
                continue;
            }
            dst.push_str(&haystack[last_match..m.start()]);
            last_match = m.end();
            if !replace_with(&m, &haystack[m.start()..m.end()], dst) {
                break;
            };
        }
        dst.push_str(&haystack[last_match..]);
        Ok(())
    }

    /// Replaces all non-overlapping matches in `haystack` by calling the
    /// `replace_with` closure given.
    ///
    /// See
    /// [`AhoCorasick::try_replace_all_with_bytes`](crate::AhoCorasick::try_replace_all_with_bytes)
    /// for more documentation and examples.
    fn try_replace_all_with_bytes<F>(
        &self,
        haystack: &[u8],
        dst: &mut Vec<u8>,
        mut replace_with: F,
    ) -> Result<(), MatchError>
    where
        Self: Sized,
        F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,
    {
        let mut last_match = 0;
        for m in self.try_find_iter(Input::new(haystack))? {
            dst.extend(&haystack[last_match..m.start()]);
            last_match = m.end();
            if !replace_with(&m, &haystack[m.start()..m.end()], dst) {
                break;
            };
        }
        dst.extend(&haystack[last_match..]);
        Ok(())
    }

    /// Returns an iterator of non-overlapping matches with this automaton
    /// from the stream given.
    ///
    /// See
    /// [`AhoCorasick::try_stream_find_iter`](crate::AhoCorasick::try_stream_find_iter)
    /// for more documentation and examples.
    #[cfg(feature = "std")]
    fn try_stream_find_iter<'a, R: std::io::Read>(
        &'a self,
        rdr: R,
    ) -> Result<StreamFindIter<'a, Self, R>, MatchError>
    where
        Self: Sized,
    {
        Ok(StreamFindIter { it: StreamChunkIter::new(self, rdr)? })
    }

    /// Replaces all non-overlapping matches in `rdr` with strings from
    /// `replace_with` depending on the pattern that matched, and writes the
    /// result to `wtr`. The `replace_with` slice must have length equal to
    /// `Automaton::patterns_len`.
    ///
    /// See
    /// [`AhoCorasick::try_stream_replace_all`](crate::AhoCorasick::try_stream_replace_all)
    /// for more documentation and examples.
    #[cfg(feature = "std")]
    fn try_stream_replace_all<R, W, B>(
        &self,
        rdr: R,
        wtr: W,
        replace_with: &[B],
    ) -> std::io::Result<()>
    where
        Self: Sized,
        R: std::io::Read,
        W: std::io::Write,
        B: AsRef<[u8]>,
    {
        assert_eq!(
            replace_with.len(),
            self.patterns_len(),
            "streaming replace_all requires a replacement for every pattern \
             in the automaton",
        );
        self.try_stream_replace_all_with(rdr, wtr, |mat, _, wtr| {
            wtr.write_all(replace_with[mat.pattern()].as_ref())
        })
    }

    /// Replaces all non-overlapping matches in `rdr` by calling the
    /// `replace_with` closure given and writing the result to `wtr`.
    ///
    /// See
    /// [`AhoCorasick::try_stream_replace_all_with`](crate::AhoCorasick::try_stream_replace_all_with)
    /// for more documentation and examples.
    #[cfg(feature = "std")]
    fn try_stream_replace_all_with<R, W, F>(
        &self,
        rdr: R,
        mut wtr: W,
        mut replace_with: F,
    ) -> std::io::Result<()>
    where
        Self: Sized,
        R: std::io::Read,
        W: std::io::Write,
        F: FnMut(&Match, &[u8], &mut W) -> std::io::Result<()>,
    {
        let mut it = StreamChunkIter::new(self, rdr).map_err(|e| {
            let kind = std::io::ErrorKind::Other;
            std::io::Error::new(kind, e)
        })?;
        while let Some(result) = it.next() {
            let chunk = result?;
            match chunk {
                StreamChunk::NonMatch { bytes, .. } => {
                    wtr.write_all(bytes)?;
                }
                StreamChunk::Match { bytes, mat } => {
                    replace_with(&mat, bytes, &mut wtr)?;
                }
            }
        }
        Ok(())
    }
}

// SAFETY: This just defers to the underlying 'AcAutomaton' and thus inherits
// its safety properties.
unsafe impl<'a, A: Automaton + ?Sized> Automaton for &'a A {
    #[inline(always)]
    fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError> {
        (**self).start_state(anchored)
    }

    #[inline(always)]
    fn next_state(
        &self,
        anchored: Anchored,
        sid: StateID,
        byte: u8,
    ) -> StateID {
        (**self).next_state(anchored, sid, byte)
    }

    #[inline(always)]
    fn is_special(&self, sid: StateID) -> bool {
        (**self).is_special(sid)
    }

    #[inline(always)]
    fn is_dead(&self, sid: StateID) -> bool {
        (**self).is_dead(sid)
    }

    #[inline(always)]
    fn is_match(&self, sid: StateID) -> bool {
        (**self).is_match(sid)
    }

    #[inline(always)]
    fn is_start(&self, sid: StateID) -> bool {
        (**self).is_start(sid)
    }

    #[inline(always)]
    fn match_kind(&self) -> MatchKind {
        (**self).match_kind()
    }

    #[inline(always)]
    fn match_len(&self, sid: StateID) -> usize {
        (**self).match_len(sid)
    }

    #[inline(always)]
    fn match_pattern(&self, sid: StateID, index: usize) -> PatternID {
        (**self).match_pattern(sid, index)
    }

    #[inline(always)]
    fn patterns_len(&self) -> usize {
        (**self).patterns_len()
    }

    #[inline(always)]
    fn pattern_len(&self, pid: PatternID) -> usize {
        (**self).pattern_len(pid)
    }

    #[inline(always)]
    fn min_pattern_len(&self) -> usize {
        (**self).min_pattern_len()
    }

    #[inline(always)]
    fn max_pattern_len(&self) -> usize {
        (**self).max_pattern_len()
    }

    #[inline(always)]
    fn memory_usage(&self) -> usize {
        (**self).memory_usage()
    }

    #[inline(always)]
    fn prefilter(&self) -> Option<&Prefilter> {
        (**self).prefilter()
    }
}

/// Represents the current state of an overlapping search.
///
/// This is used for overlapping searches since they need to know something
/// about the previous search. For example, when multiple patterns match at the
/// same position, this state tracks the last reported pattern so that the next
/// search knows whether to report another matching pattern or continue with
/// the search at the next position. Additionally, it also tracks which state
/// the last search call terminated in and the current offset of the search
/// in the haystack.
///
/// This type provides limited introspection capabilities. The only thing a
/// caller can do is construct it and pass it around to permit search routines
/// to use it to track state, and to ask whether a match has been found.
///
/// Callers should always provide a fresh state constructed via
/// [`OverlappingState::start`] when starting a new search. That same state
/// should be reused for subsequent searches on the same `Input`. The state
/// given will advance through the haystack itself. Callers can detect the end
/// of a search when neither an error nor a match is returned.
///
/// # Example
///
/// This example shows how to manually iterate over all overlapping matches. If
/// you need this, you might consider using
/// [`AhoCorasick::find_overlapping_iter`](crate::AhoCorasick::find_overlapping_iter)
/// instead, but this shows how to correctly use an `OverlappingState`.
///
/// ```
/// use aho_corasick::{
///     automaton::OverlappingState,
///     AhoCorasick, Input, Match,
/// };
///
/// let patterns = &["append", "appendage", "app"];
/// let haystack = "append the app to the appendage";
///
/// let ac = AhoCorasick::new(patterns).unwrap();
/// let mut state = OverlappingState::start();
/// let mut matches = vec![];
///
/// loop {
///     ac.find_overlapping(haystack, &mut state);
///     let mat = match state.get_match() {
///         None => break,
///         Some(mat) => mat,
///     };
///     matches.push(mat);
/// }
/// let expected = vec![
///     Match::must(2, 0..3),
///     Match::must(0, 0..6),
///     Match::must(2, 11..14),
///     Match::must(2, 22..25),
///     Match::must(0, 22..28),
///     Match::must(1, 22..31),
/// ];
/// assert_eq!(expected, matches);
/// ```
#[derive(Clone, Debug)]
pub struct OverlappingState {
    /// The match reported by the most recent overlapping search to use this
    /// state.
    ///
    /// If a search does not find any matches, then it is expected to clear
    /// this value.
    mat: Option<Match>,
    /// The state ID of the state at which the search was in when the call
    /// terminated. When this is a match state, `last_match` must be set to a
    /// non-None value.
    ///
    /// A `None` value indicates the start state of the corresponding
    /// automaton. We cannot use the actual ID, since any one automaton may
    /// have many start states, and which one is in use depends on search-time
    /// factors (such as whether the search is anchored or not).
    id: Option<StateID>,
    /// The position of the search.
    ///
    /// When `id` is None (i.e., we are starting a search), this is set to
    /// the beginning of the search as given by the caller regardless of its
    /// current value. Subsequent calls to an overlapping search pick up at
    /// this offset.
    at: usize,
    /// The index into the matching patterns of the next match to report if the
    /// current state is a match state. Note that this may be 1 greater than
    /// the total number of matches to report for the current match state. (In
    /// which case, no more matches should be reported at the current position
    /// and the search should advance to the next position.)
    next_match_index: Option<usize>,
}

impl OverlappingState {
    /// Create a new overlapping state that begins at the start state.
    pub fn start() -> OverlappingState {
        OverlappingState { mat: None, id: None, at: 0, next_match_index: None }
    }

    /// Return the match result of the most recent search to execute with this
    /// state.
    ///
    /// Every search will clear this result automatically, such that if no
    /// match is found, this will always correctly report `None`.
    pub fn get_match(&self) -> Option<Match> {
        self.mat
    }
}

/// An iterator of non-overlapping matches in a particular haystack.
///
/// This iterator yields matches according to the [`MatchKind`] used by this
/// automaton.
///
/// This iterator is constructed via the [`Automaton::try_find_iter`] method.
///
/// The type variable `A` refers to the implementation of the [`Automaton`]
/// trait used to execute the search.
///
/// The lifetime `'a` refers to the lifetime of the [`Automaton`]
/// implementation.
///
/// The lifetime `'h` refers to the lifetime of the haystack being searched.
#[derive(Debug)]
pub struct FindIter<'a, 'h, A> {
    /// The automaton used to drive the search.
    aut: &'a A,
    /// The input parameters to give to each search call.
    ///
    /// The start position of the search is mutated during iteration.
    input: Input<'h>,
    /// Records the end offset of the most recent match. This is necessary to
    /// handle a corner case for preventing empty matches from overlapping with
    /// the ending bounds of a prior match.
    last_match_end: Option<usize>,
}

impl<'a, 'h, A: Automaton> FindIter<'a, 'h, A> {
    /// Creates a new non-overlapping iterator. If the given automaton would
    /// return an error on a search with the given input configuration, then
    /// that error is returned here.
    fn new(
        aut: &'a A,
        input: Input<'h>,
    ) -> Result<FindIter<'a, 'h, A>, MatchError> {
        // The only way this search can fail is if we cannot retrieve the start
        // state. e.g., Asking for an anchored search when only unanchored
        // searches are supported.
        let _ = aut.start_state(input.get_anchored())?;
        Ok(FindIter { aut, input, last_match_end: None })
    }

    /// Executes a search and returns a match if one is found.
    ///
    /// This does not advance the input forward. It just executes a search
    /// based on the current configuration/offsets.
    fn search(&self) -> Option<Match> {
        // The unwrap is OK here because we check at iterator construction time
        // that no subsequent search call (using the same configuration) will
        // ever return an error.
        self.aut
            .try_find(&self.input)
            .expect("already checked that no match error can occur")
    }

    /// Handles the special case of an empty match by ensuring that 1) the
    /// iterator always advances and 2) empty matches never overlap with other
    /// matches.
    ///
    /// (1) is necessary because we principally make progress by setting the
    /// starting location of the next search to the ending location of the last
    /// match. But if a match is empty, then this results in a search that does
    /// not advance and thus does not terminate.
    ///
    /// (2) is not strictly necessary, but makes intuitive sense and matches
    /// the presiding behavior of most general purpose regex engines.
    /// (Obviously this crate isn't a regex engine, but we choose to match
    /// their semantics.) The "intuitive sense" here is that we want to report
    /// NON-overlapping matches. So for example, given the patterns 'a' and
    /// '' (an empty string) against the haystack 'a', without the special
    /// handling, you'd get the matches [0, 1) and [1, 1), where the latter
    /// overlaps with the end bounds of the former.
    ///
    /// Note that we mark this cold and forcefully prevent inlining because
    /// handling empty matches like this is extremely rare and does require
    /// quite a bit of code, comparatively. Keeping this code out of the main
    /// iterator function keeps it smaller and more amenable to inlining
    /// itself.
    #[cold]
    #[inline(never)]
    fn handle_overlapping_empty_match(
        &mut self,
        mut m: Match,
    ) -> Option<Match> {
        assert!(m.is_empty());
        if Some(m.end()) == self.last_match_end {
            self.input.set_start(self.input.start().checked_add(1).unwrap());
            m = self.search()?;
        }
        Some(m)
    }
}

impl<'a, 'h, A: Automaton> Iterator for FindIter<'a, 'h, A> {
    type Item = Match;

    #[inline(always)]
    fn next(&mut self) -> Option<Match> {
        let mut m = self.search()?;
        if m.is_empty() {
            m = self.handle_overlapping_empty_match(m)?;
        }
        self.input.set_start(m.end());
        self.last_match_end = Some(m.end());
        Some(m)
    }
}

/// An iterator of overlapping matches in a particular haystack.
///
/// This iterator will report all possible matches in a particular haystack,
/// even when the matches overlap.
///
/// This iterator is constructed via the
/// [`Automaton::try_find_overlapping_iter`] method.
///
/// The type variable `A` refers to the implementation of the [`Automaton`]
/// trait used to execute the search.
///
/// The lifetime `'a` refers to the lifetime of the [`Automaton`]
/// implementation.
///
/// The lifetime `'h` refers to the lifetime of the haystack being searched.
#[derive(Debug)]
pub struct FindOverlappingIter<'a, 'h, A> {
    aut: &'a A,
    input: Input<'h>,
    state: OverlappingState,
}

impl<'a, 'h, A: Automaton> Iterator for FindOverlappingIter<'a, 'h, A> {
    type Item = Match;

    #[inline(always)]
    fn next(&mut self) -> Option<Match> {
        self.aut
            .try_find_overlapping(&self.input, &mut self.state)
            .expect("already checked that no match error can occur here");
        self.state.get_match()
    }
}

/// An iterator that reports matches in a stream.
///
/// This iterator yields elements of type `io::Result<Match>`, where an error
/// is reported if there was a problem reading from the underlying stream.
/// The iterator terminates only when the underlying stream reaches `EOF`.
///
/// This iterator is constructed via the [`Automaton::try_stream_find_iter`]
/// method.
///
/// The type variable `A` refers to the implementation of the [`Automaton`]
/// trait used to execute the search.
///
/// The type variable `R` refers to the `io::Read` stream that is being read
/// from.
///
/// The lifetime `'a` refers to the lifetime of the [`Automaton`]
/// implementation.
#[cfg(feature = "std")]
#[derive(Debug)]
pub struct StreamFindIter<'a, A, R> {
    it: StreamChunkIter<'a, A, R>,
}

#[cfg(feature = "std")]
impl<'a, A: Automaton, R: std::io::Read> Iterator
    for StreamFindIter<'a, A, R>
{
    type Item = std::io::Result<Match>;

    fn next(&mut self) -> Option<std::io::Result<Match>> {
        loop {
            match self.it.next() {
                None => return None,
                Some(Err(err)) => return Some(Err(err)),
                Some(Ok(StreamChunk::NonMatch { .. })) => {}
                Some(Ok(StreamChunk::Match { mat, .. })) => {
                    return Some(Ok(mat));
                }
            }
        }
    }
}

/// An iterator that reports matches in a stream.
///
/// (This doesn't actually implement the `Iterator` trait because it returns
/// something with a lifetime attached to a buffer it owns, but that's OK. It
/// still has a `next` method and is iterator-like enough to be fine.)
///
/// This iterator yields elements of type `io::Result<StreamChunk>`, where
/// an error is reported if there was a problem reading from the underlying
/// stream. The iterator terminates only when the underlying stream reaches
/// `EOF`.
///
/// The idea here is that each chunk represents either a match or a non-match,
/// and if you concatenated all of the chunks together, you'd reproduce the
/// entire contents of the stream, byte-for-byte.
///
/// This chunk machinery is a bit complicated and it isn't strictly required
/// for a stream searcher that just reports matches. But we do need something
/// like this to deal with the "replacement" API, which needs to know which
/// chunks it can copy and which it needs to replace.
#[cfg(feature = "std")]
#[derive(Debug)]
struct StreamChunkIter<'a, A, R> {
    /// The underlying automaton to do the search.
    aut: &'a A,
    /// The source of bytes we read from.
    rdr: R,
    /// A roll buffer for managing bytes from `rdr`. Basically, this is used
    /// to handle the case of a match that is split by two different
    /// calls to `rdr.read()`. This isn't strictly needed if all we needed to
    /// do was report matches, but here we are reporting chunks of non-matches
    /// and matches and in order to do that, we really just cannot treat our
    /// stream as non-overlapping blocks of bytes. We need to permit some
    /// overlap while we retain bytes from a previous `read` call in memory.
    buf: crate::util::buffer::Buffer,
    /// The unanchored starting state of this automaton.
    start: StateID,
    /// The state of the automaton.
    sid: StateID,
    /// The absolute position over the entire stream.
    absolute_pos: usize,
    /// The position we're currently at within `buf`.
    buffer_pos: usize,
    /// The buffer position of the end of the bytes that we last returned
    /// to the caller. Basically, whenever we find a match, we look to see if
    /// there is a difference between where the match started and the position
    /// of the last byte we returned to the caller. If there's a difference,
    /// then we need to return a 'NonMatch' chunk.
    buffer_reported_pos: usize,
}

#[cfg(feature = "std")]
impl<'a, A: Automaton, R: std::io::Read> StreamChunkIter<'a, A, R> {
    fn new(
        aut: &'a A,
        rdr: R,
    ) -> Result<StreamChunkIter<'a, A, R>, MatchError> {
        // This restriction is a carry-over from older versions of this crate.
        // I didn't have the bandwidth to think through how to handle, say,
        // leftmost-first or leftmost-longest matching, but... it should be
        // possible? The main problem is that once you see a match state in
        // leftmost-first semantics, you can't just stop at that point and
        // report a match. You have to keep going until you either hit a dead
        // state or EOF. So how do you know when you'll hit a dead state? Well,
        // you don't. With Aho-Corasick, I believe you can put a bound on it
        // and say, "once a match has been seen, you'll need to scan forward at
        // most N bytes" where N=aut.max_pattern_len().
        //
        // Which is fine, but it does mean that state about whether we're still
        // looking for a dead state or not needs to persist across buffer
        // refills. Which this code doesn't really handle. It does preserve
        // *some* state across buffer refills, basically ensuring that a match
        // span is always in memory.
        if !aut.match_kind().is_standard() {
            return Err(MatchError::unsupported_stream(aut.match_kind()));
        }
        // This is kind of a cop-out, but empty matches are SUPER annoying.
        // If we know they can't happen (which is what we enforce here), then
        // it makes a lot of logic much simpler. With that said, I'm open to
        // supporting this case, but we need to define proper semantics for it
        // first. It wasn't totally clear to me what it should do at the time
        // of writing, so I decided to just be conservative.
        //
        // It also seems like a very weird case to support anyway. Why search a
        // stream if you're just going to get a match at every position?
        //
        // ¯\_(ツ)_/¯
        if aut.min_pattern_len() == 0 {
            return Err(MatchError::unsupported_empty());
        }
        let start = aut.start_state(Anchored::No)?;
        Ok(StreamChunkIter {
            aut,
            rdr,
            buf: crate::util::buffer::Buffer::new(aut.max_pattern_len()),
            start,
            sid: start,
            absolute_pos: 0,
            buffer_pos: 0,
            buffer_reported_pos: 0,
        })
    }

    fn next(&mut self) -> Option<std::io::Result<StreamChunk>> {
        // This code is pretty gnarly. It IS simpler than the equivalent code
        // in the previous aho-corasick release, in part because we inline
        // automaton traversal here and also in part because we have abdicated
        // support for automatons that contain an empty pattern.
        //
        // I suspect this code could be made a bit simpler by designing a
        // better buffer abstraction.
        //
        // But in general, this code is basically write-only. So you'll need
        // to go through it step-by-step to grok it. One of the key bits of
        // complexity is tracking a few different offsets. 'buffer_pos' is
        // where we are in the buffer for search. 'buffer_reported_pos' is the
        // position immediately following the last byte in the buffer that
        // we've returned to the caller. And 'absolute_pos' is the overall
        // current absolute position of the search in the entire stream, and
        // this is what match spans are reported in terms of.
        loop {
            if self.aut.is_match(self.sid) {
                let mat = self.get_match();
                if let Some(r) = self.get_non_match_chunk(mat) {
                    self.buffer_reported_pos += r.len();
                    let bytes = &self.buf.buffer()[r];
                    return Some(Ok(StreamChunk::NonMatch { bytes }));
                }
                self.sid = self.start;
                let r = self.get_match_chunk(mat);
                self.buffer_reported_pos += r.len();
                let bytes = &self.buf.buffer()[r];
                return Some(Ok(StreamChunk::Match { bytes, mat }));
            }
            if self.buffer_pos >= self.buf.buffer().len() {
                if let Some(r) = self.get_pre_roll_non_match_chunk() {
                    self.buffer_reported_pos += r.len();
                    let bytes = &self.buf.buffer()[r];
                    return Some(Ok(StreamChunk::NonMatch { bytes }));
                }
                if self.buf.buffer().len() >= self.buf.min_buffer_len() {
                    self.buffer_pos = self.buf.min_buffer_len();
                    self.buffer_reported_pos -=
                        self.buf.buffer().len() - self.buf.min_buffer_len();
                    self.buf.roll();
                }
                match self.buf.fill(&mut self.rdr) {
                    Err(err) => return Some(Err(err)),
                    Ok(true) => {}
                    Ok(false) => {
                        // We've hit EOF, but if there are still some
                        // unreported bytes remaining, return them now.
                        if let Some(r) = self.get_eof_non_match_chunk() {
                            self.buffer_reported_pos += r.len();
                            let bytes = &self.buf.buffer()[r];
                            return Some(Ok(StreamChunk::NonMatch { bytes }));
                        }
                        // We've reported everything!
                        return None;
                    }
                }
            }
            let start = self.absolute_pos;
            for &byte in self.buf.buffer()[self.buffer_pos..].iter() {
                self.sid = self.aut.next_state(Anchored::No, self.sid, byte);
                self.absolute_pos += 1;
                if self.aut.is_match(self.sid) {
                    break;
                }
            }
            self.buffer_pos += self.absolute_pos - start;
        }
    }

    /// Return a match chunk for the given match. It is assumed that the match
    /// ends at the current `buffer_pos`.
    fn get_match_chunk(&self, mat: Match) -> core::ops::Range<usize> {
        let start = self.buffer_pos - mat.len();
        let end = self.buffer_pos;
        start..end
    }

    /// Return a non-match chunk, if necessary, just before reporting a match.
    /// This returns `None` if there is nothing to report. Otherwise, this
    /// assumes that the given match ends at the current `buffer_pos`.
    fn get_non_match_chunk(
        &self,
        mat: Match,
    ) -> Option<core::ops::Range<usize>> {
        let buffer_mat_start = self.buffer_pos - mat.len();
        if buffer_mat_start > self.buffer_reported_pos {
            let start = self.buffer_reported_pos;
            let end = buffer_mat_start;
            return Some(start..end);
        }
        None
    }

    /// Look for any bytes that should be reported as a non-match just before
    /// rolling the buffer.
    ///
    /// Note that this only reports bytes up to `buffer.len() -
    /// min_buffer_len`, as it's not possible to know whether the bytes
    /// following that will participate in a match or not.
    fn get_pre_roll_non_match_chunk(&self) -> Option<core::ops::Range<usize>> {
        let end =
            self.buf.buffer().len().saturating_sub(self.buf.min_buffer_len());
        if self.buffer_reported_pos < end {
            return Some(self.buffer_reported_pos..end);
        }
        None
    }

    /// Return any unreported bytes as a non-match up to the end of the buffer.
    ///
    /// This should only be called when the entire contents of the buffer have
    /// been searched and EOF has been hit when trying to fill the buffer.
    fn get_eof_non_match_chunk(&self) -> Option<core::ops::Range<usize>> {
        if self.buffer_reported_pos < self.buf.buffer().len() {
            return Some(self.buffer_reported_pos..self.buf.buffer().len());
        }
        None
    }

    /// Return the match at the current position for the current state.
    ///
    /// This panics if `self.aut.is_match(self.sid)` isn't true.
    fn get_match(&self) -> Match {
        get_match(self.aut, self.sid, 0, self.absolute_pos)
    }
}

/// A single chunk yielded by the stream chunk iterator.
///
/// The `'r` lifetime refers to the lifetime of the stream chunk iterator.
#[cfg(feature = "std")]
#[derive(Debug)]
enum StreamChunk<'r> {
    /// A chunk that does not contain any matches.
    NonMatch { bytes: &'r [u8] },
    /// A chunk that precisely contains a match.
    Match { bytes: &'r [u8], mat: Match },
}

#[inline(never)]
pub(crate) fn try_find_fwd<A: Automaton + ?Sized>(
    aut: &A,
    input: &Input<'_>,
) -> Result<Option<Match>, MatchError> {
    if input.is_done() {
        return Ok(None);
    }
    let earliest = aut.match_kind().is_standard() || input.get_earliest();
    if input.get_anchored().is_anchored() {
        try_find_fwd_imp(aut, input, None, Anchored::Yes, earliest)
    } else if let Some(pre) = aut.prefilter() {
        if earliest {
            try_find_fwd_imp(aut, input, Some(pre), Anchored::No, true)
        } else {
            try_find_fwd_imp(aut, input, Some(pre), Anchored::No, false)
        }
    } else {
        if earliest {
            try_find_fwd_imp(aut, input, None, Anchored::No, true)
        } else {
            try_find_fwd_imp(aut, input, None, Anchored::No, false)
        }
    }
}

#[inline(always)]
fn try_find_fwd_imp<A: Automaton + ?Sized>(
    aut: &A,
    input: &Input<'_>,
    pre: Option<&Prefilter>,
    anchored: Anchored,
    earliest: bool,
) -> Result<Option<Match>, MatchError> {
    let mut sid = aut.start_state(input.get_anchored())?;
    let mut at = input.start();
    let mut mat = None;
    if aut.is_match(sid) {
        mat = Some(get_match(aut, sid, 0, at));
        if earliest {
            return Ok(mat);
        }
    }
    if let Some(pre) = pre {
        match pre.find_in(input.haystack(), input.get_span()) {
            Candidate::None => return Ok(None),
            Candidate::Match(m) => return Ok(Some(m)),
            Candidate::PossibleStartOfMatch(i) => {
                at = i;
            }
        }
    }
    while at < input.end() {
        // I've tried unrolling this loop and eliding bounds checks, but no
        // matter what I did, I could not observe a consistent improvement on
        // any benchmark I could devise. (If someone wants to re-litigate this,
        // the way to do it is to add an 'next_state_unchecked' method to the
        // 'Automaton' trait with a default impl that uses 'next_state'. Then
        // use 'aut.next_state_unchecked' here and implement it on DFA using
        // unchecked slice index acces.)
        sid = aut.next_state(anchored, sid, input.haystack()[at]);
        if aut.is_special(sid) {
            if aut.is_dead(sid) {
                return Ok(mat);
            } else if aut.is_match(sid) {
                // We use 'at + 1' here because the match state is entered
                // at the last byte of the pattern. Since we use half-open
                // intervals, the end of the range of the match is one past the
                // last byte.
                let m = get_match(aut, sid, 0, at + 1);
                // For the automata in this crate, we make a size trade off
                // where we reuse the same automaton for both anchored and
                // unanchored searches. We achieve this, principally, by simply
                // not following failure transitions while computing the next
                // state. Instead, if we fail to find the next state, we return
                // a dead state, which instructs the search to stop. (This
                // is why 'next_state' needs to know whether the search is
                // anchored or not.) In addition, we have different start
                // states for anchored and unanchored searches. The latter has
                // a self-loop where as the former does not.
                //
                // In this way, we can use the same trie to execute both
                // anchored and unanchored searches. There is a catch though.
                // When building an Aho-Corasick automaton for unanchored
                // searches, we copy matches from match states to other states
                // (which would otherwise not be match states) if they are
                // reachable via a failure transition. In the case of an
                // anchored search, we *specifically* do not want to report
                // these matches because they represent matches that start past
                // the beginning of the search.
                //
                // Now we could tweak the automaton somehow to differentiate
                // anchored from unanchored match states, but this would make
                // 'aut.is_match' and potentially 'aut.is_special' slower. And
                // also make the automaton itself more complex.
                //
                // Instead, we insert a special hack: if the search is
                // anchored, we simply ignore matches that don't begin at
                // the start of the search. This is not quite ideal, but we
                // do specialize this function in such a way that unanchored
                // searches don't pay for this additional branch. While this
                // might cause a search to continue on for more than it
                // otherwise optimally would, it will be no more than the
                // longest pattern in the automaton. The reason for this is
                // that we ensure we don't follow failure transitions during
                // an anchored search. Combined with using a different anchored
                // starting state with no self-loop, we guarantee that we'll
                // at worst move through a number of transitions equal to the
                // longest pattern.
                //
                // Now for DFAs, the whole point of them is to eliminate
                // failure transitions entirely. So there is no way to say "if
                // it's an anchored search don't follow failure transitions."
                // Instead, we actually have to build two entirely separate
                // automatons into the transition table. One with failure
                // transitions built into it and another that is effectively
                // just an encoding of the base trie into a transition table.
                // DFAs still need this check though, because the match states
                // still carry matches only reachable via a failure transition.
                // Why? Because removing them seems difficult, although I
                // haven't given it a lot of thought.
                if !(anchored.is_anchored() && m.start() > input.start()) {
                    mat = Some(m);
                    if earliest {
                        return Ok(mat);
                    }
                }
            } else if let Some(pre) = pre {
                // If we're here, we know it's a special state that is not a
                // dead or a match state AND that a prefilter is active. Thus,
                // it must be a start state.
                debug_assert!(aut.is_start(sid));
                // We don't care about 'Candidate::Match' here because if such
                // a match were possible, it would have been returned above
                // when we run the prefilter before walking the automaton.
                let span = Span::from(at..input.end());
                match pre.find_in(input.haystack(), span).into_option() {
                    None => return Ok(None),
                    Some(i) => {
                        if i > at {
                            at = i;
                            continue;
                        }
                    }
                }
            } else {
                // When pre.is_none(), then starting states should not be
                // treated as special. That is, without a prefilter, is_special
                // should only return true when the state is a dead or a match
                // state.
                //
                // It is possible to execute a search without a prefilter even
                // when the underlying searcher has one: an anchored search.
                // But in this case, the automaton makes it impossible to move
                // back to the start state by construction, and thus, we should
                // never reach this branch.
                debug_assert!(false, "unreachable");
            }
        }
        at += 1;
    }
    Ok(mat)
}

#[inline(never)]
fn try_find_overlapping_fwd<A: Automaton + ?Sized>(
    aut: &A,
    input: &Input<'_>,
    state: &mut OverlappingState,
) -> Result<(), MatchError> {
    state.mat = None;
    if input.is_done() {
        return Ok(());
    }
    // Searching with a pattern ID is always anchored, so we should only ever
    // use a prefilter when no pattern ID is given.
    if aut.prefilter().is_some() && !input.get_anchored().is_anchored() {
        let pre = aut.prefilter().unwrap();
        try_find_overlapping_fwd_imp(aut, input, Some(pre), state)
    } else {
        try_find_overlapping_fwd_imp(aut, input, None, state)
    }
}

#[inline(always)]
fn try_find_overlapping_fwd_imp<A: Automaton + ?Sized>(
    aut: &A,
    input: &Input<'_>,
    pre: Option<&Prefilter>,
    state: &mut OverlappingState,
) -> Result<(), MatchError> {
    let mut sid = match state.id {
        None => {
            let sid = aut.start_state(input.get_anchored())?;
            // Handle the case where the start state is a match state. That is,
            // the empty string is in our automaton. We report every match we
            // can here before moving on and updating 'state.at' and 'state.id'
            // to find more matches in other parts of the haystack.
            if aut.is_match(sid) {
                let i = state.next_match_index.unwrap_or(0);
                let len = aut.match_len(sid);
                if i < len {
                    state.next_match_index = Some(i + 1);
                    state.mat = Some(get_match(aut, sid, i, input.start()));
                    return Ok(());
                }
            }
            state.at = input.start();
            state.id = Some(sid);
            state.next_match_index = None;
            state.mat = None;
            sid
        }
        Some(sid) => {
            // If we still have matches left to report in this state then
            // report them until we've exhausted them. Only after that do we
            // advance to the next offset in the haystack.
            if let Some(i) = state.next_match_index {
                let len = aut.match_len(sid);
                if i < len {
                    state.next_match_index = Some(i + 1);
                    state.mat = Some(get_match(aut, sid, i, state.at + 1));
                    return Ok(());
                }
                // Once we've reported all matches at a given position, we need
                // to advance the search to the next position.
                state.at += 1;
                state.next_match_index = None;
                state.mat = None;
            }
            sid
        }
    };
    while state.at < input.end() {
        sid = aut.next_state(
            input.get_anchored(),
            sid,
            input.haystack()[state.at],
        );
        if aut.is_special(sid) {
            state.id = Some(sid);
            if aut.is_dead(sid) {
                return Ok(());
            } else if aut.is_match(sid) {
                state.next_match_index = Some(1);
                state.mat = Some(get_match(aut, sid, 0, state.at + 1));
                return Ok(());
            } else if let Some(pre) = pre {
                // If we're here, we know it's a special state that is not a
                // dead or a match state AND that a prefilter is active. Thus,
                // it must be a start state.
                debug_assert!(aut.is_start(sid));
                let span = Span::from(state.at..input.end());
                match pre.find_in(input.haystack(), span).into_option() {
                    None => return Ok(()),
                    Some(i) => {
                        if i > state.at {
                            state.at = i;
                            continue;
                        }
                    }
                }
            } else {
                // When pre.is_none(), then starting states should not be
                // treated as special. That is, without a prefilter, is_special
                // should only return true when the state is a dead or a match
                // state.
                //
                // ... except for one special case: in stream searching, we
                // currently call overlapping search with a 'None' prefilter,
                // regardless of whether one exists or not, because stream
                // searching can't currently deal with prefilters correctly in
                // all cases.
            }
        }
        state.at += 1;
    }
    state.id = Some(sid);
    Ok(())
}

#[inline(always)]
fn get_match<A: Automaton + ?Sized>(
    aut: &A,
    sid: StateID,
    index: usize,
    at: usize,
) -> Match {
    let pid = aut.match_pattern(sid, index);
    let len = aut.pattern_len(pid);
    Match::new(pid, (at - len)..at)
}

/// Write a prefix "state" indicator for fmt::Debug impls. It always writes
/// exactly two printable bytes to the given formatter.
///
/// Specifically, this tries to succinctly distinguish the different types of
/// states: dead states, start states and match states. It even accounts for
/// the possible overlappings of different state types. (The only possible
/// overlapping is that of match and start states.)
pub(crate) fn fmt_state_indicator<A: Automaton>(
    f: &mut core::fmt::Formatter<'_>,
    aut: A,
    id: StateID,
) -> core::fmt::Result {
    if aut.is_dead(id) {
        write!(f, "D ")?;
    } else if aut.is_match(id) {
        if aut.is_start(id) {
            write!(f, "*>")?;
        } else {
            write!(f, "* ")?;
        }
    } else if aut.is_start(id) {
        write!(f, " >")?;
    } else {
        write!(f, "  ")?;
    }
    Ok(())
}

/// Return an iterator of transitions in a sparse format given an iterator
/// of all explicitly defined transitions. The iterator yields ranges of
/// transitions, such that any adjacent transitions mapped to the same
/// state are combined into a single range.
pub(crate) fn sparse_transitions<'a>(
    mut it: impl Iterator<Item = (u8, StateID)> + 'a,
) -> impl Iterator<Item = (u8, u8, StateID)> + 'a {
    let mut cur: Option<(u8, u8, StateID)> = None;
    core::iter::from_fn(move || {
        while let Some((class, next)) = it.next() {
            let (prev_start, prev_end, prev_next) = match cur {
                Some(x) => x,
                None => {
                    cur = Some((class, class, next));
                    continue;
                }
            };
            if prev_next == next {
                cur = Some((prev_start, class, prev_next));
            } else {
                cur = Some((class, class, next));
                return Some((prev_start, prev_end, prev_next));
            }
        }
        if let Some((start, end, next)) = cur.take() {
            return Some((start, end, next));
        }
        None
    })
}
aho_corasick/automaton.rs

aho_corasick/
automaton.rs