Struct Decoder

Source

pub struct Decoder { /* private fields */ }

Expand description

A converter that decodes a byte stream into Unicode according to a character encoding in a streaming (incremental) manner.

The various decode_* methods take an input buffer (src) and an output buffer dst both of which are caller-allocated. There are variants for both UTF-8 and UTF-16 output buffers.

A decode_* method decodes bytes from src into Unicode characters stored into dst until one of the following three things happens:

A malformed byte sequence is encountered (*_without_replacement variants only).
The output buffer has been filled so near capacity that the decoder cannot be sure that processing an additional byte of input wouldn’t cause so much output that the output buffer would overflow.
All the input bytes have been processed.

The decode_* method then returns tuple of a status indicating which one of the three reasons to return happened, how many input bytes were read, how many output code units (u8 when decoding into UTF-8 and u16 when decoding to UTF-16) were written (except when decoding into String, whose length change indicates this), and in the case of the variants performing replacement, a boolean indicating whether an error was replaced with the REPLACEMENT CHARACTER during the call.

The number of bytes “written” is what’s logically written. Garbage may be written in the output buffer beyond the point logically written to. Therefore, if you wish to decode into an &mut str, you should use the methods that take an &mut str argument instead of the ones that take an &mut [u8] argument. The former take care of overwriting the trailing garbage to ensure the UTF-8 validity of the &mut str as a whole, but the latter don’t.

In the case of the *_without_replacement variants, the status is a DecoderResult enumeration (possibilities Malformed, OutputFull and InputEmpty corresponding to the three cases listed above).

In the case of methods whose name does not end with *_without_replacement, malformed sequences are automatically replaced with the REPLACEMENT CHARACTER and errors do not cause the methods to return early.

When decoding to UTF-8, the output buffer must have at least 4 bytes of space. When decoding to UTF-16, the output buffer must have at least two UTF-16 code units (u16) of space.

When decoding to UTF-8 without replacement, the methods are guaranteed not to return indicating that more output space is needed if the length of the output buffer is at least the length returned by max_utf8_buffer_length_without_replacement(). When decoding to UTF-8 with replacement, the length of the output buffer that guarantees the methods not to return indicating that more output space is needed is given by max_utf8_buffer_length(). When decoding to UTF-16 with or without replacement, the length of the output buffer that guarantees the methods not to return indicating that more output space is needed is given by max_utf16_buffer_length().

The output written into dst is guaranteed to be valid UTF-8 or UTF-16, and the output after each decode_* call is guaranteed to consist of complete characters. (I.e. the code unit sequence for the last character is guaranteed not to be split across output buffers.)

The boolean argument last indicates that the end of the stream is reached when all the bytes in src have been consumed.

A Decoder object can be used to incrementally decode a byte stream.

During the processing of a single stream, the caller must call decode_* zero or more times with last set to false and then call decode_* at least once with last set to true. If decode_* returns InputEmpty, the processing of the stream has ended. Otherwise, the caller must call decode_* again with last set to true (or treat a Malformed result as a fatal error).

Once the stream has ended, the Decoder object must not be used anymore. That is, you need to create another one to process another stream.

When the decoder returns OutputFull or the decoder returns Malformed and the caller does not wish to treat it as a fatal error, the input buffer src may not have been completely consumed. In that case, the caller must pass the unconsumed contents of src to decode_* again upon the next call.

§Infinite loops

When converting with a fixed-size output buffer whose size is too small to accommodate one character or (when applicable) one numeric character reference of output, an infinite loop ensues. When converting with a fixed-size output buffer, it generally makes sense to make the buffer fairly large (e.g. couple of kilobytes).

Decoder

Struct Decoder Copy item path

§Infinite loops

Implementations§

impl Decoder

pub fn encoding(&self) -> &'static Encoding

pub fn max_utf8_buffer_length(&self, byte_length: usize) -> Option<usize>

pub fn max_utf8_buffer_length_without_replacement( &self, byte_length: usize, ) -> Option<usize>

pub fn decode_to_utf8( &mut self, src: &[u8], dst: &mut [u8], last: bool, ) -> (CoderResult, usize, usize, bool)

pub fn decode_to_str( &mut self, src: &[u8], dst: &mut str, last: bool, ) -> (CoderResult, usize, usize, bool)

pub fn decode_to_string( &mut self, src: &[u8], dst: &mut String, last: bool, ) -> (CoderResult, usize, bool)

pub fn decode_to_utf8_without_replacement( &mut self, src: &[u8], dst: &mut [u8], last: bool, ) -> (DecoderResult, usize, usize)

pub fn decode_to_str_without_replacement( &mut self, src: &[u8], dst: &mut str, last: bool, ) -> (DecoderResult, usize, usize)

pub fn decode_to_string_without_replacement( &mut self, src: &[u8], dst: &mut String, last: bool, ) -> (DecoderResult, usize)

pub fn max_utf16_buffer_length(&self, byte_length: usize) -> Option<usize>

pub fn decode_to_utf16( &mut self, src: &[u8], dst: &mut [u16], last: bool, ) -> (CoderResult, usize, usize, bool)

pub fn decode_to_utf16_without_replacement( &mut self, src: &[u8], dst: &mut [u16], last: bool, ) -> (DecoderResult, usize, usize)

pub fn latin1_byte_compatible_up_to(&self, bytes: &[u8]) -> Option<usize>

Auto Trait Implementations§

impl Freeze for Decoder

impl RefUnwindSafe for Decoder

impl Send for Decoder

impl Sync for Decoder

impl Unpin for Decoder

impl UnwindSafe for Decoder

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Decoder

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,