Author(s): Dhruv Matani (@dhrubird), Jacob Szwejbka, Pavithran Ramachandran (@pavithran), Chen Lai
c10::Synchronized<T>
is heavily inspired by folly::Synchronized and borrows a lot of the good ideas from there. Here’s what folly::Synchronized claims to be:
folly/Synchronized.h introduces a simple abstraction for mutex- based concurrency. It replaces convoluted, unwieldy, and just plain wrong code with simple constructs that are easy to get right and difficult to get wrong.
c10::Synchronized<T>
is the same, but in the PyTorch/c10 codebase. Additionally, it provides just the bare minimum API needed to write thread-safe and concurrency-aware code in a way that’s hard to get wrong.
Let’s dive into the details.
Motivation
When using data structures and containers (basically variables) that may be accessed and/or updated from multiple threads concurrently, you want to protect them with a mutex so that you don’t end up corrupting the internal state. For example, if you have an std::vector<T>
that you want 2 or more threads to be writing to, then you should use a mutex to prevent corrupting the internal state of the data structure. See this page to learn more about why to use a mutex.
Okay, now that we are convinced that using a mutex to protect access to shared resources is a desirable thing, let’s see how one may do it naively.
// Global vector of integers
std::vector<int> v;
std::mutex m;
void called_from_multiple_threads(int element) {
std::lock_guard<std::mutex> guard(m);
v.push_back(element);
}
The code above prevents simultaneous unsafe insertion (push_back) of elements into the vector v.
However, you can see that this relies on the developer remembering to add the std::lock_guard<T>
call before inserting into vector v. If the developer forgets this, then all bets are off.
Solution
The idea is to force the mutex(m) and vector(v) to be co-homed so that one can not easily get a handle on the vector(v) without holding a lock on the mutex(m).
// The variable v below is safe to access using the withLock<T> method.
c10::Synchronized<std::vector<int>> v;
void called_from_multiple_threads(int element) {
v.withLock([element](std::vector<int> &v) {
v.push_back(element);
});
}
The withLock<T>
method accepts a callback that will be invoked with the mutex safely held. This way, every caller is forced (using this abstraction) to hold a hold on the mutex before they can update the shared data structure that is the vector of integers.
Impact on PyTorch
PyTorch currently has 160 instances of std::lock_guard in the codebase that can be replaced with the use of c10::Synchronized<T>
.
Gaps compared to folly::Synchronized<T>
As mentioned above, c10::Synchronized<T>
doesn’t implement the complete API implemented by folly::Synchronized<T>
. Notable absences are:
- No way to write a single line method call on T. For example using
data.lock()->push_back(...)
- No way to use read-write locks using
wlock()
orrlock()
- No way to upgrade locks
- No support for
acquireLocked()
which able to lock multiple mutexes simultaneously and safely if using multiplec10::Synchronized<T>
objects
These are definitely implementable as additions to the current API but don’t seem to be critical for the basic functionality that is currently provided.
Thanks
To Edward Yang (@ezyang) for reaching out with a crash in the model tracer that led to the discovery of a missing mutex, and the subsequent discussion that led to the implementation of this (missing) abstraction for safe concurrent access to shared data structures.