Working With `c10::IValue` Efficiently

c10::IValue (I think the “I” stands for “interpreter”) is a tagged union that allows you to hold a wide variety of PyTorch C++ types, including Tensor, primitive integer and floating-point types, Tuple, String, lists, dictionaries, and more. It functions similarly to PyObject in CPython: code that needs to work generically with PyTorch values (e.g., for handling operator arguments in the dispatcher) tends to hold them as IValues.

IValue has two main storage modes: it holds primitive types by value, and it holds non-primitive types via c10::intrusive_ptr, which is our alternative to std::shared_ptr that does intrusive reference counting. Using it to contain primitive types is fairly efficient; it is 16 bytes in size (so 1 extra pointer of overhead) and accesses require a tag check (though we could add unsafe accessors if needed), but that’s about it.

Reference counting is where the problems start. First, we get all the inefficiencies of reference counting as well as the same overheads as for primitive IValues. In addition, the const (non-moving) accessors for reference-counted types (e.g., IValue::toString()) return a new intrusive_ptr, which means that each call to one of these accessors necessarily entails an atomic refereounce count increment/decrement pair.

Ref Accessors

To alleviate the problems with accessing reference counted types, IValue has a smattering of toFooRef accessors that return direct references instead (e.g., toStringRef, toObjectRef, and toTupleRef). These should be used instead of the non-ref accessors whenever possible. (Thankfully, IValue::toTensor also returns a direct Tensor reference.)

Tuples

IValue Tuples are heap-allocated (TupleElements, std::shared_ptr<Type>) pairs. TupleElements is a custom “small array”: it is a union of either a 3-element IValue array or a std::vector<IValue>; this saves a separate heap allocation for Tuples whose size is 3 or less. (In the future, we could potentially import llvm::TrailingObjects to c10 and use it to improve on this by consolidating the space for the tuple elements right into the Tuple itself – this would bring efficiency closer in line with Python tuples by avoiding wasted space for 0/1/2-element tuples and saving 2 or 3 pointers of space for 4-or-more-element tuples as well.)

The most efficient way to create small tuples is to use the Tuple constructor overloads that take 1, 2, or 3 IValues directly; these do a placement new directly into the inline storage. You will get a similar result if you pass a std::initializer_list with 1, 2, or 3 elements, but either the compiler or the CPU at runtime will have to do more work and you’ll have to type two extra characters as well, so don’t do that!

Unless you are working with code that really, truly wants a std::vector<IValue>, you should not need to use TupleElements::vec(), because TupleElements supports front, back, operator[], begin, end, empty, and size. Unfortunately, we also have to have TupleElements::operator std::vector<IValue>() because Tuple::elements() used to return std::vector<IValue and there is just too much existing code that wants to do std::vector<IValue> elems = someTuple.elements();. A linter that detects and flags uses of these operator overloads would probably help with efficiency!

Strings

IValue string storage is straightforward: c10::ivalue::ConstantString is a reference-counted wrapper for a std::string that exists so that it can inherit from c10::intrusive_ptr_target. In addition to IValue::toString (slow, avoid!) and IValue::toStringRef, IValue::toStringView returns a c10::string_view for code that needs it.

Lists and Dicts

IValue's native storage for lists and dicts is c10::GenericList (an alias for c10::List<IValue>) and c10::GenericDict (an alias for c10::Dict<IValue, IValue>). The template parameters for c10::List and c10::Dict control type checking and conversions, not the underlying representation, which is always a container of IValues. Some important consequences:

  • Accessing any kind of std::vector or std::unordered_map from an IValue (e.g., with IValue::toIntVector involves a copy.
  • Putting any kind of std::vector or std::unordered_map into an IValue involves a copy.
  • Because they use IValues, boxed calls to operators that take std::vector or std::unordered_map as arguments or return them involve the same copies as outlined in the previous two points for their arguments.
  • If sizeof(T) < sizeof(IValue), List<T> uses more memory than std::vector<T> because the underlying storage is always std::vector<IValue>.

While c10::List is not directly part of c10::IValue, there are also some efficiency pitfalls to be aware of with it. c10::List is an intrusive_ptr to a (std::vector<IValue>, std::shared_ptr<Type>) pair. Because there is an invariant that this pointer is never null, there is no way to efficiently move construct c10::List (the move constructor was deleted because it was slower than the copy constructor). List operations also use IValue::to internally, and IValue::to will often (but not always! see ivalue_to.h use IValue's slow intrusive_ptr-creating IValue accessors. Similar concerns apply to c10::Dict.

MaybeOwned: Borrowed IValues!

While it is not used directly anywhere yet, c10::MaybeOwned (see also MaybeOwned<Tensor> — PyTorch master documentation) supports efficiently borrowing IValues. It may be useful if you really need to minimize reference count overhead.

Tagged pointers

I have a prototype-quality implementation of a tagged pointer representation of IValue, but I haven’t committed it because I wasn’t able to measure a significant speedup on any particular workloads and it added extra costs to IValue destruction. If there are workloads where reducing the size of IValue to 8 bytes would be useful, we could try it again.

1 Like