We shouldn't feel bad about passing `Tensor` by reference

I’m writing this post to refute the following possibly-conventional wisdom: "Tensor is represented as an intrusive_ptr<TensorImpl>, which is just a single pointer under the hood. Passing primitive types like pointers by value is efficient, and therefore so is passing Tensor by value: we copy the underlying pointer into a register, and (per intrusive_ptr's copy constructor) we bump the reference count. You might want to avoid the reference count bump by passing const Tensor&, but that’s not a clear win, because now we’re passing a pointer to a Tensor around, which is a double indirection."

The reality is that passing by const Tensor& is a pure win. Let’s see why.

Where the conventional wisdom comes from

Let’s consider a trivial struct.

struct Trivial {
  int x;
};

If we pass it by value, it goes into a register. If we pass it by reference, it gets pushed onto the stack if necessary, and a pointer to the stack-allocated Trivial instance is passed in a register.

Code:

int iTakeTrivialByValue(Trivial x) {
  return x.x;
}

int iTakeTrivialByRef(const Trivial& x) {
  return x.x;
}

Cleaned-up assembly:

at::iTakeTrivialByValue(at::Trivial):
        pushq   %rbp
        movq    %rsp, %rbp
        movl    %edi, %eax
        popq    %rbp
        retq 
at::iTakeTrivialByRef(at::Trivial const&):
        pushq   %rbp
        movq   %rsp, %rbp
        movl    (%rdi), %eax
        popq    %rbp
        retq

The only difference between the two functions is the third instruction: the by-value implementation just returns the value in our argument register (%edi), whereas the by-reference implementation has to load it from memory.

In other words, Trivial follows the conventional wisdom: small structs passed by value go in registers, just as would happen if we passed the underlying int instead of wrapping it in a Trivial.

The problem: the Itanium C++ ABI

The Itanium C++ ABI specifies, among other things, the way C++ types are passed to and returned from functions at the machine instruction level. Specifically, types that have a non-trivial destructor, copy constructor, or move constructor (the document calls this “non-trivial for purposes of calls” must be pushed onto the stack and passed by reference, even if they would otherwise be eligible for passing in registers.

Our dear friend Tensor is clearly non-trivial for purposes of calls: its destructor does a reference count decrement on its underlying intrusive_ptr. So, it has to be passed by reference under the hood. Therefore, we can always do better by passing by reference ourselves.

What about if I want to retain an owning reference to the Tensor?

It’s probably OK to pass by value in this case in order to avoid having both a const Tensor& and Tensor && overload.

5 Likes