There is a convenient interface for stacks of IValue
s represented as std::vector<IValue>
in ATen’s stack.h. Unfortunately, torch::jit::pop()
in particular can easily be a performance problem relative to using the native std::vector API. For your convenience, here is pop()
:
static inline IValue pop(Stack& stack) {
auto r = std::move(stack.back());
stack.pop_back();
return r;
}
There is nothing wrong with this implementation; it does what it needs to do in order to implement the pop()
interface. The problem is that even if pop()
is inlined, the compiler cannot help it out as much as would be needed. The C++ copy elision rules only permit copies (and thus moves, which are just optimized copies) to be elided in specific circumstances, which typical use of pop()
does not qualify for. Let’s take a look at some examples, which I’ve also set up on Compiler Explorer so that we can easily see how many copy and move operations are happening.
Just pop() a stack
First, let’s suppose you want to remove the last value from a Stack
. You could do it with pop()
:
void justPop(Stack& stack) {
pop(stack);
}
or you could use std::vector::pop_back()
directly:
void justPopManual(Stack& stack) {
stack.pop_back();
}
The problem is that, in the first case, you are creating a new move of a Value
, and that cannot be elided even though the Value
is unused! If we go look on Compiler Explorer, we can see that justPop
does one Value
move construction and two Value
destructions, whereas justPopManual
does just one Value
destruction.
Pop one value and push another
Let’s suppose that we want to pop a value off the stack and push another value:
void popAndPush(Stack& stack) {
auto v = pop(stack);
// Some accessor on Value that returns another Value; compare IValue::toObject()
stack.push_back(v.subValue());
}
We can do this better by not popping or pushing at all, but rather using assigment:
void popAndPushManual(Stack& stack) {
stack.back() = stack.back().subValue();
}
Again checking assembly to confirm, popAndPush
does two move constructions and 3 Value
destructor calls (and emits code to conditionally extend the vector), whereas popAndPushManual
does one move assignment and one Value
destructor call.
Extracting multiple items from the stack
Let’s say we want to grab a few items from the end of the stack:
void multiGet(Stack& stack, std::vector<Value>& out, int N) {
for (int ii = 0; ii < N; ii++) {
out[ii] = pop(stack);
}
}
Again, we can do better by moving off the stack without popping and then erasing the end of the stack:
void multiGetManual(Stack& stack, std::vector<Value>& out, int N) {
for (int ii = 0; ii < N; ii++) {
out[ii] = std::move(*(stack.end() - 1 - ii));
}
stack.erase(stack.end() - N, stack.end());
}
multiGet
does 1 move construction, 1 move assignment, and 2 destructor calls per loop iteration. multiGetManual
does 1 move assignment and 1 destructor call.
Why does this matter?
Most concretely, all of these cases show up in the JIT interpreter implementation: justPop
is DROP
, popAndPush
is GET_ATTR
, and multiGet
is STOREN
. [PyTorch][JIT] Full pass at push/pop fixes in interpreter by swolchok · Pull Request #54197 · pytorch/pytorch · GitHub attempts to fix these.
In general, the stack.h
interface is used elsewhere in the codebase as well. In my opinion, we should stop using it in performance-sensitive paths in favor of direct std::vector
access.