I’m writing this post to refute the following possibly-conventional wisdom: "Tensor
is represented as an intrusive_ptr<TensorImpl>
, which is just a single pointer under the hood. Passing primitive types like pointers by value is efficient, and therefore so is passing Tensor
by value: we copy the underlying pointer into a register, and (per intrusive_ptr
's copy constructor) we bump the reference count. You might want to avoid the reference count bump by passing const Tensor&
, but that’s not a clear win, because now we’re passing a pointer to a Tensor
around, which is a double indirection."
The reality is that passing by const Tensor&
is a pure win. Let’s see why.
Where the conventional wisdom comes from
Let’s consider a trivial struct.
struct Trivial {
int x;
};
If we pass it by value, it goes into a register. If we pass it by reference, it gets pushed onto the stack if necessary, and a pointer to the stack-allocated Trivial
instance is passed in a register.
Code:
int iTakeTrivialByValue(Trivial x) {
return x.x;
}
int iTakeTrivialByRef(const Trivial& x) {
return x.x;
}
Cleaned-up assembly:
at::iTakeTrivialByValue(at::Trivial):
pushq %rbp
movq %rsp, %rbp
movl %edi, %eax
popq %rbp
retq
at::iTakeTrivialByRef(at::Trivial const&):
pushq %rbp
movq %rsp, %rbp
movl (%rdi), %eax
popq %rbp
retq
The only difference between the two functions is the third instruction: the by-value implementation just returns the value in our argument register (%edi
), whereas the by-reference implementation has to load it from memory.
In other words, Trivial
follows the conventional wisdom: small structs passed by value go in registers, just as would happen if we passed the underlying int
instead of wrapping it in a Trivial
.
The problem: the Itanium C++ ABI
The Itanium C++ ABI specifies, among other things, the way C++ types are passed to and returned from functions at the machine instruction level. Specifically, types that have a non-trivial destructor, copy constructor, or move constructor (the document calls this “non-trivial for purposes of calls” must be pushed onto the stack and passed by reference, even if they would otherwise be eligible for passing in registers.
Our dear friend Tensor
is clearly non-trivial for purposes of calls: its destructor does a reference count decrement on its underlying intrusive_ptr
. So, it has to be passed by reference under the hood. Therefore, we can always do better by passing by reference ourselves.
What about if I want to retain an owning reference to the Tensor
?
It’s probably OK to pass by value in this case in order to avoid having both a const Tensor&
and Tensor &&
overload.