Introducing c10::Synchronized<T> a safe abstraction for mutex based concurrency

Author(s): Dhruv Matani (@dhrubird), Jacob Szwejbka, Pavithran Ramachandran (@pavithran), Chen Lai

c10::Synchronized<T> is heavily inspired by folly::Synchronized and borrows a lot of the good ideas from there. Here’s what folly::Synchronized claims to be:

folly/Synchronized.h introduces a simple abstraction for mutex- based concurrency. It replaces convoluted, unwieldy, and just plain wrong code with simple constructs that are easy to get right and difficult to get wrong.

c10::Synchronized<T> is the same, but in the PyTorch/c10 codebase. Additionally, it provides just the bare minimum API needed to write thread-safe and concurrency-aware code in a way that’s hard to get wrong.

Let’s dive into the details.

Motivation

When using data structures and containers (basically variables) that may be accessed and/or updated from multiple threads concurrently, you want to protect them with a mutex so that you don’t end up corrupting the internal state. For example, if you have an std::vector<T> that you want 2 or more threads to be writing to, then you should use a mutex to prevent corrupting the internal state of the data structure. See this page to learn more about why to use a mutex.

Okay, now that we are convinced that using a mutex to protect access to shared resources is a desirable thing, let’s see how one may do it naively.

// Global vector of integers
std::vector<int> v;
std::mutex m;

void called_from_multiple_threads(int element) {
  std::lock_guard<std::mutex> guard(m);
  v.push_back(element);
}

The code above prevents simultaneous unsafe insertion (push_back) of elements into the vector v.

However, you can see that this relies on the developer remembering to add the std::lock_guard<T> call before inserting into vector v. If the developer forgets this, then all bets are off.

Solution

The idea is to force the mutex(m) and vector(v) to be co-homed so that one can not easily get a handle on the vector(v) without holding a lock on the mutex(m).

// The variable v below is safe to access using the withLock<T> method.
c10::Synchronized<std::vector<int>> v;

void called_from_multiple_threads(int element) {
  v.withLock([element](std::vector<int> &v) {
    v.push_back(element);
  });
}

The withLock<T> method accepts a callback that will be invoked with the mutex safely held. This way, every caller is forced (using this abstraction) to hold a hold on the mutex before they can update the shared data structure that is the vector of integers.

Impact on PyTorch

PyTorch currently has 160 instances of std::lock_guard in the codebase that can be replaced with the use of c10::Synchronized<T>.

Gaps compared to folly::Synchronized<T>

As mentioned above, c10::Synchronized<T> doesn’t implement the complete API implemented by folly::Synchronized<T>. Notable absences are:

  1. No way to write a single line method call on T. For example using data.lock()->push_back(...)
  2. No way to use read-write locks using wlock() or rlock()
  3. No way to upgrade locks
  4. No support for acquireLocked() which able to lock multiple mutexes simultaneously and safely if using multiple c10::Synchronized<T> objects

These are definitely implementable as additions to the current API but don’t seem to be critical for the basic functionality that is currently provided.

Thanks

To Edward Yang (@ezyang) for reaching out with a crash in the model tracer that led to the discovery of a missing mutex, and the subsequent discussion that led to the implementation of this (missing) abstraction for safe concurrent access to shared data structures.

6 Likes