Concurrency in Swift: Reader Writer Lock

Dmytro Anokhin
9 min readSep 5, 2018

--

When it comes to protecting critical sections both Objective-C and Swift provide little to none language support. Luckily, we have tools to choose in frameworks. In this article I will demonstrate how to solve the readers–writers using GCD and pthread (POSIX threads) in Swift.

To understand the difference we dive a bit into basic principles and how things work under the hood. And in the end I collected some measurements to see which solution is better.

The Readers–Writers Problem

What is the readers–writers problem? The readers–writers problem occurs when multiple threads share a resource (variable or a data structure in memory), and some threads may write, some may read. When multiple threads read from the shared resource simultaneously nothing wrong can happen. But when one thread write — others must wait, otherwise ending up with corrupted data.

I.e. the readers–writers problem allows concurrent read but write is serial.

I find this a very common problem. One example is building a cache, where we can store objects downloaded from network and retrieve them later. Such cache must be thread safe and implementing simultaneous reads allows accessing objects concurrently.

Read Write Lock

A classic solution to synchronization problems is a lock. But NSLock (and other locks in Foundation) will not solve the problem because it blocks both read and write. This is where system API comes in handy. pthread_rwlock_t is a reader–write lock, a classic solution for the problem.

Using pthread_rwlock_t in Swift may look a bit clumsy, but it is actually fairly simple. It has a few pitfalls related to how Swift imports C APIs, but nothing we can not work with.

As you can see pthread_rwlock_t is a simple API, but prone to errors. Make sure to initialize/destroy it correctly and not create a copy when using and you’re good to go.

Grand Central Dispatch

Since the moment of introduction, Grand Central Dispatch became a definitive solution for concurrent programming on Apple platforms. The GCD executes code concurrently, provides scheduling and synchronization functions, and seamlessly integrates with the language features and APIs.

GCD provides and manages queues to which the app can submit tasks. This abstraction hides thread management and creates focus on building sequences of tasks.

When it comes to protecting critical sections GCD offers barriers. The barrier is a task that is executed exclusively on the queue. I.e. when the barrier task is executed all other tasks in the queue are waiting.

Using concurrent queue allows multiple simultaneous reads. Write task uses .barrier to make it exclusive. Write can also be performed asynchronously in order not to block calling thread.

Note: always create a separate concurrent queue. Using barriers on a serial queues is pointless. Moreover global queues in GCD are shared resources and can not be blocked with a barrier.

Lock vs GCD

To compare pthread_rwlock_t and GCD .barrier we must understand approaches to protecting shared memory. Let’s see how they work.

Atomic Operations

Atomic is indivisible, uninterruptible. Atomic operations are executed at once, without interference. Atomic operation can either succeed or fail without partial or side effects.

We can highlight primitive and high-level atomic operations. High-level is a very broad term and can mean something as complex as database transaction. For this article purpose I want to discuss primitive atomic operations.

Primitive atomic operations are provided by OS or CPU instructions. Some examples are loading, storing, and incrementing a value in memory address. For macOS and iOS common are OSAtomic — group of functions for atomic reading and updating values. This are now deprecated in favour of C11 standard stdatomic.h.

Compare-and-Swap (Compare and Exchange)

One example of primitive atomic operation is compare-and-swap. This operations takes old value in memory and compares it with the current value. And if old == current writes new value. The current value can be changed by another thread, and if that is the case — operation fails. By repeating this operation we can implement synchronization primitives, such as mutex and semaphore.

This is quite hard to demonstrate in Swift because we need to work with pointers. So just to visualize this logic consider this code:

Here compareAndSwap fail when the value read first time is not equal to the value read second time. This can happen if another thread changed it, and we need to repeat attempt. When we look at it as an attempt to acquiring lock we can see why locks are expensive. In worst case this can end in infinite loop (deadlock).

GCD Barriers

When the barrier task enqueued, tasks in the queue are split in 3 parts: tasks enqueued before, after, and the barrier task itself. Tasks enqueued before the barrier executed as scheduled, concurrently (not much point to use barriers on serial queues). The barrier task is executed after, exclusively. And queue continues executing concurrently.

This eliminates overhead of acquiring lock. But there is different issue with this approach.

Context Switch

GCD manages how tasks, submitted to queues, are executed. There is no guaranteed relationsip between a queue and a thread in GCD. Serial or concurrent, different tasks can be executed by different threads.

Work submitted to dispatch queues are executed on a pool of threads fully managed by the system. No guarantee is made as to the thread on which a task executes.

For this comparison it is important to keep in mind that there is a pool of threads and a task can be executed on one (thread from the pool) without our control.

We also must understand that a single CPU core runs a single thread at a time (put aside hyper-threading and other simultaneous multithreading technologies for now). And number of CPU cores are usually less than number of threads and tasks to be executed. To run a new thread CPU must perform context switch. Context switch is a process of storing current CPU state (registers, memory, various tables) and loading state for the new thread. As you may expect, this has performance cost.

Because tasks in queues and pool of threads are managed by Grand Central Dispatch we can expect less context switches.

Comparison

From what we discussed so far, comparison of pthread_rwlock_t and GCD .barrier is actually a comparison of overhead of acquiring lock and context switch. Or how good GCD handles and minimizes context switches in comparison to how pthread_rwlock_t aquires lock.

For comparison I started by declaring the ThreadSafeContainer protocol:

This is a class only protocol to prevent value semantics (copying when passing in multiple threads). read and write functions use closures to help me measure time.

Two classes implement ThreadSafeContainer to solve readers–writers problem using pthread_rwlock_t and GCD .barrier. And also a class that protects critical section with NSLock that serves as a benchmark. CGD uses dispatch sync function for test purpose and I will compare it with async later on.

To help me run tests I created an app that allows me to run tests with various paramenters: number of read/write iterations, durations and delays, etc. The app has versions for iOS and macOS so I can see how CPU architecture makes a difference.

I measured overall time of test execution and times to acquire locks. Tests were repeated 100 and 1000 times. Top and bottom 10% were dropped before average was taken. Measuring multithreaded code introduces certain noise in results due to additional synchronization needed besides the locks measured. This is fine as far as noise is equally distributed and we are interested in relative numbers.

Hardware I’m using is: MacBook Pro 15" (Core i7), iPhone X (Apple A11 Bionic), and iPad Mini 2 (Apple A7).

Average Case

I imagine the average case when the number of writers is smaller than the number of readers. This can be touching cache when user scrolls a list. Here are results for 3 writers and 10 readers.

GCD .barrier and pthread_rwlock_t show difference of less than 1% that can be neglected. When the difference with NSLock is more than .

Best Case

The best is when writer is one with multiple readers. This are the results for 1 writer/10 readers:

We see minor difference between GCD .barrier and pthread_rwlock_t while NSLock is slower on write and 10× slower on read.

Worst Case

Worst case is when the number of writers becomes greater than or equal the number of readers. In this case readers–writers problem becomes critical section synchronization issue.

This are the results for 5 writers and 5 readers.

Difference between GCD and pthread is insignificant, yet both are 23% faster than NSLock.

Performance of the three synchronization mechanisms becomes even closer to the case of 10 writers and 1 reader.

Here NSLock is 1-2% faster than GCD and pthread, but I would say this is insignificant considering the corner case.

The Case with Different Timings

I also want to see how difference in timing affects performance. This is the setup for the test:

+--------------------+-------+
| Writers | 3 |
| Readers | 10 |
| Max Write Delay | 2 ms |
| Max Read Delay | 1 ms |
| Max Write Duration | 1 ms |
| Max Read Duration | 0 ms |
+--------------------+-------+

Acquiring read lock takes up to 48% longer, what correlates with longer duration of the write operation. NSLock is 17–20% slower and the difference between GCD and pthread is still ~1%.

iOS vs macOS

For this comparison I took average case with 3 writers and 10 readers.

macOS is 5–10% faster than iOS in aquiring lock. On macOS GCD demonstrates ~3% better performance than pthread, when on iOS difference is less than 0.5%.

How CPU affects performance?

On iPhone X and MacBook Pro the difference is minimal. But what if we take different hardware? In this test I used iPhone X and iPad Mini 2.

iPad Mini 2 is of course slower than iPhone X, but only about 16–20%.

iPad Mini 2 demonstrates slightly better performance of 2% for GCD in both write and read operations. But I would say this difference is minimal.

Can GCD be slower than pthread?

So far we seen that the difference between GCD .barrier and pthread_rwlock_t is insignificant. Yet here is the case when GCD is 20% slower:

GCD prioritizes and schedules tasks according to quality of service — DispatchQoS. The case was measured with user-interactive quality of service class tasks running in parallel. This can happen when you have high volume of user interaction events.

What if we use async?

One benefit of using GCD is ability to dispatch tasks asynchronously. This can be very handy with write operations.

Here we see uneven distribution for GCD: write is 71% faster than pthread, when read is 48% slower.

As you can see solving the readers–writers problem can boost multithreaded code performance significantly.

Difference in performance of GCD .barrier and pthread_rwlock_t in most cases is insignificant and can be neglected.

Other queues and quality of service managed by GCD may affect performance. When pthread API is more prone to programmatic errors and looks clumsy in Swift.

Another question to ask is: will you benefit of dispatching tasks asynchronously?

For 98% cases GCD .barrier looks like the way to go. I would consider using pthread_rwlock_t in a subsystem that must least depend on other threads, higher priority tasks (such as user input), or where parallelism is limited.

This is a quite old talk, year after introducing GCD on iOS if I’m not mistaken, but it well describes usage of .barrierMastering Grand Central Dispatch

I will later open source the app I used to collect test data so stay tuned.

Thank you for reading. If you like the article, please follow me and share.

Show authors more ❤️ with 👏’s

And hope to see you next time 😊

--

--

Dmytro Anokhin

iOS Developer, here to share best practices learned through my experience. You can find me on Twitter: https://twitter.com/dmytroanokhin