pub unsafe extern "C" fn ZDICT_trainFromBuffer(
dictBuffer: *mut c_void,
dictBufferCapacity: usize,
samplesBuffer: *const c_void,
samplesSizes: *const usize,
nbSamples: c_uint,
) -> usize
Expand description
ZDICT_trainFromBuffer():
Train a dictionary from an array of samples.
Redirect towards ZDICT_optimizeTrainFromBuffer_fastCover() single-threaded, with d=8, steps=4,
f=20, and accel=1.
Samples must be stored concatenated in a single flat buffer samplesBuffer
,
supplied with an array of sizes samplesSizes
, providing the size of each sample, in order.
The resulting dictionary will be saved into dictBuffer
.
@return: size of dictionary stored into dictBuffer
(<= dictBufferCapacity
)
or an error code, which can be tested with ZDICT_isError().
Note: Dictionary training will fail if there are not enough samples to construct a
dictionary, or if most of the samples are too small (< 8 bytes being the lower limit).
If dictionary training fails, you should use zstd without a dictionary, as the dictionary
would’ve been ineffective anyways. If you believe your samples would benefit from a dictionary
please open an issue with details, and we can look into it.
Note: ZDICT_trainFromBuffer()’s memory usage is about 6 MB.
Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
It’s possible to select smaller or larger size, just by specifying dictBufferCapacity
.
In general, it’s recommended to provide a few thousands samples, though this can vary a lot.
It’s recommended that total size of all samples be about ~x100 times the target size of dictionary.