arc_swap::docs

Module performance

source
Expand description

Performance characteristics.

There are several performance advantages of ArcSwap over RwLock.

§Lock-free readers

All the read operations are always lock-free. Most of the time, they are actually wait-free. They are lock-free from time to time, with at least usize::MAX / 4 accesses that are wait-free in between.

Writers are lock-free.

Whenever the documentation talks about contention in the context of ArcSwap, it talks about contention on the CPU level ‒ multiple cores having to deal with accessing the same cache line. This slows things down (compared to each one accessing its own cache line), but an eventual progress is still guaranteed and the cost is significantly lower than parking threads as with mutex-style contention.

§Speeds

The base line speed of read operations is similar to using an uncontended Mutex. However, load suffers no contention from any other read operations and only slight ones during updates. The load_full operation is additionally contended only on the reference count of the Arc inside ‒ so, in general, while Mutex rapidly loses its performance when being in active use by multiple threads at once and RwLock is slow to start with, ArcSwap mostly keeps its performance even when read by many threads in parallel.

Write operations are considered expensive. A write operation is more expensive than access to an uncontended Mutex and on some architectures even slower than uncontended RwLock. However, it is faster than either under contention.

There are some (very unscientific) benchmarks within the source code of the library, and the DefaultStrategy has some numbers measured on my computer.

The exact numbers are highly dependant on the machine used (both absolute numbers and relative between different data structures). Not only architectures have a huge impact (eg. x86 vs ARM), but even AMD vs. Intel or two different Intel processors. Therefore, if what matters is more the speed than the wait-free guarantees, you’re advised to do your own measurements.

Further speed improvements may be gained by the use of the Cache.

§Consistency

The combination of wait-free guarantees of readers and no contention between concurrent loads provides consistent performance characteristics of the synchronization mechanism. This might be important for soft-realtime applications (the CPU-level contention caused by a recent update/write operation might be problematic for some hard-realtime cases, though).

§Choosing the right reading operation

There are several load operations available. While the general go-to one should be load, there may be situations in which the others are a better match.

The load usually only borrows the instance from the shared ArcSwap. This makes it faster, because different threads don’t contend on the reference count. There are two situations when this borrow isn’t possible. If the content gets changed, all existing Guards are promoted to contain an owned instance. The promotion is done by the writer, but the readers still need to decrement the reference counts of the old instance when they no longer use it, contending on the count.

The other situation derives from internal implementation. The number of borrows each thread can have at each time (across all Guards) is limited. If this limit is exceeded, an owned instance is created instead.

Therefore, if you intend to hold onto the loaded value for extended time span, you may prefer load_full. It loads the pointer instance (Arc) without borrowing, which is slower (because of the possible contention on the reference count), but doesn’t consume one of the borrow slots, which will make it more likely for following loads to have a slot available. Similarly, if some API needs an owned Arc, load_full is more convenient and potentially faster then first loading and then cloning that Arc.

Additionally, it is possible to use a Cache to get further speed improvement at the cost of less comfortable API and possibly keeping the older values alive for longer than necessary.