Struct Encoder

Source

pub struct Encoder { /* private fields */ }

Expand description

A converter that encodes a Unicode stream into bytes according to a character encoding in a streaming (incremental) manner.

The various encode_* methods take an input buffer (src) and an output buffer dst both of which are caller-allocated. There are variants for both UTF-8 and UTF-16 input buffers.

An encode_* method encode characters from src into bytes characters stored into dst until one of the following three things happens:

An unmappable character is encountered (*_without_replacement variants only).
The output buffer has been filled so near capacity that the decoder cannot be sure that processing an additional character of input wouldn’t cause so much output that the output buffer would overflow.
All the input characters have been processed.

The encode_* method then returns tuple of a status indicating which one of the three reasons to return happened, how many input code units (u8 when encoding from UTF-8 and u16 when encoding from UTF-16) were read, how many output bytes were written (except when encoding into Vec<u8>, whose length change indicates this), and in the case of the variants that perform replacement, a boolean indicating whether an unmappable character was replaced with a numeric character reference during the call.

The number of bytes “written” is what’s logically written. Garbage may be written in the output buffer beyond the point logically written to.

In the case of the methods whose name ends with *_without_replacement, the status is an EncoderResult enumeration (possibilities Unmappable, OutputFull and InputEmpty corresponding to the three cases listed above).

In the case of methods whose name does not end with *_without_replacement, unmappable characters are automatically replaced with the corresponding numeric character references and unmappable characters do not cause the methods to return early.

When encoding from UTF-8 without replacement, the methods are guaranteed not to return indicating that more output space is needed if the length of the output buffer is at least the length returned by max_buffer_length_from_utf8_without_replacement(). When encoding from UTF-8 with replacement, the length of the output buffer that guarantees the methods not to return indicating that more output space is needed in the absence of unmappable characters is given by max_buffer_length_from_utf8_if_no_unmappables(). When encoding from UTF-16 without replacement, the methods are guaranteed not to return indicating that more output space is needed if the length of the output buffer is at least the length returned by max_buffer_length_from_utf16_without_replacement(). When encoding from UTF-16 with replacement, the the length of the output buffer that guarantees the methods not to return indicating that more output space is needed in the absence of unmappable characters is given by max_buffer_length_from_utf16_if_no_unmappables(). When encoding with replacement, applications are not expected to size the buffer for the worst case ahead of time but to resize the buffer if there are unmappable characters. This is why max length queries are only available for the case where there are no unmappable characters.

When encoding from UTF-8, each src buffer must be valid UTF-8. (When calling from Rust, the type system takes care of this.) When encoding from UTF-16, unpaired surrogates in the input are treated as U+FFFD REPLACEMENT CHARACTERS. Therefore, in order for astral characters not to turn into a pair of REPLACEMENT CHARACTERS, the caller must ensure that surrogate pairs are not split across input buffer boundaries.

After an encode_* call returns, the output produced so far, taken as a whole from the start of the stream, is guaranteed to consist of a valid byte sequence in the target encoding. (I.e. the code unit sequence for a character is guaranteed not to be split across output buffers. However, due to the stateful nature of ISO-2022-JP, the stream needs to be considered from the start for it to be valid. For other encodings, the validity holds on a per-output buffer basis.)

The boolean argument last indicates that the end of the stream is reached when all the characters in src have been consumed. This argument is needed for ISO-2022-JP and is ignored for other encodings.

An Encoder object can be used to incrementally encode a byte stream.

During the processing of a single stream, the caller must call encode_* zero or more times with last set to false and then call encode_* at least once with last set to true. If encode_* returns InputEmpty, the processing of the stream has ended. Otherwise, the caller must call encode_* again with last set to true (or treat an Unmappable result as a fatal error).

Once the stream has ended, the Encoder object must not be used anymore. That is, you need to create another one to process another stream.

When the encoder returns OutputFull or the encoder returns Unmappable and the caller does not wish to treat it as a fatal error, the input buffer src may not have been completely consumed. In that case, the caller must pass the unconsumed contents of src to encode_* again upon the next call.

§Infinite loops

When converting with a fixed-size output buffer whose size is too small to accommodate one character of output, an infinite loop ensues. When converting with a fixed-size output buffer, it generally makes sense to make the buffer fairly large (e.g. couple of kilobytes).

Encoder

Struct Encoder Copy item path

§Infinite loops

Implementations§

impl Encoder

pub fn encoding(&self) -> &'static Encoding

pub fn has_pending_state(&self) -> bool

pub fn max_buffer_length_from_utf8_if_no_unmappables( &self, byte_length: usize, ) -> Option<usize>

pub fn max_buffer_length_from_utf8_without_replacement( &self, byte_length: usize, ) -> Option<usize>

pub fn encode_from_utf8( &mut self, src: &str, dst: &mut [u8], last: bool, ) -> (CoderResult, usize, usize, bool)

pub fn encode_from_utf8_to_vec( &mut self, src: &str, dst: &mut Vec<u8>, last: bool, ) -> (CoderResult, usize, bool)

pub fn encode_from_utf8_without_replacement( &mut self, src: &str, dst: &mut [u8], last: bool, ) -> (EncoderResult, usize, usize)

pub fn encode_from_utf8_to_vec_without_replacement( &mut self, src: &str, dst: &mut Vec<u8>, last: bool, ) -> (EncoderResult, usize)

pub fn max_buffer_length_from_utf16_if_no_unmappables( &self, u16_length: usize, ) -> Option<usize>

pub fn max_buffer_length_from_utf16_without_replacement( &self, u16_length: usize, ) -> Option<usize>

pub fn encode_from_utf16( &mut self, src: &[u16], dst: &mut [u8], last: bool, ) -> (CoderResult, usize, usize, bool)

pub fn encode_from_utf16_without_replacement( &mut self, src: &[u16], dst: &mut [u8], last: bool, ) -> (EncoderResult, usize, usize)

Auto Trait Implementations§

impl Freeze for Encoder

impl RefUnwindSafe for Encoder

impl Send for Encoder

impl Sync for Encoder

impl Unpin for Encoder

impl UnwindSafe for Encoder

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Encoder

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,