c10::IValue
(I think the “I” stands for “interpreter”) is a tagged union that allows you to hold a wide variety of PyTorch C++ types, including Tensor
, primitive integer and floating-point types, Tuple
, String
, lists, dictionaries, and more. It functions similarly to PyObject
in CPython: code that needs to work generically with PyTorch values (e.g., for handling operator arguments in the dispatcher) tends to hold them as IValue
s.
IValue
has two main storage modes: it holds primitive types by value, and it holds non-primitive types via c10::intrusive_ptr
, which is our alternative to std::shared_ptr
that does intrusive reference counting. Using it to contain primitive types is fairly efficient; it is 16 bytes in size (so 1 extra pointer of overhead) and accesses require a tag check (though we could add unsafe accessors if needed), but that’s about it.
Reference counting is where the problems start. First, we get all the inefficiencies of reference counting as well as the same overheads as for primitive IValue
s. In addition, the const
(non-moving) accessors for reference-counted types (e.g., IValue::toString()
) return a new intrusive_ptr
, which means that each call to one of these accessors necessarily entails an atomic refereounce count increment/decrement pair.
Ref Accessors
To alleviate the problems with accessing reference counted types, IValue
has a smattering of toFooRef
accessors that return direct references instead (e.g., toStringRef
, toObjectRef
, and toTupleRef
). These should be used instead of the non-ref accessors whenever possible. (Thankfully, IValue::toTensor
also returns a direct Tensor
reference.)
Tuples
IValue
Tuple
s are heap-allocated (
TupleElements
, std::shared_ptr<Type>)
pairs. TupleElements
is a custom “small array”: it is a union of either a 3-element IValue
array or a std::vector<IValue>
; this saves a separate heap allocation for Tuples whose size is 3 or less. (In the future, we could potentially import llvm::TrailingObjects
to c10 and use it to improve on this by consolidating the space for the tuple elements right into the Tuple
itself – this would bring efficiency closer in line with Python tuples by avoiding wasted space for 0/1/2-element tuples and saving 2 or 3 pointers of space for 4-or-more-element tuples as well.)
The most efficient way to create small tuples is to use the Tuple
constructor overloads that take 1, 2, or 3 IValue
s directly; these do a placement new
directly into the inline storage. You will get a similar result if you pass a std::initializer_list
with 1, 2, or 3 elements, but either the compiler or the CPU at runtime will have to do more work and you’ll have to type two extra characters as well, so don’t do that!
Unless you are working with code that really, truly wants a std::vector<IValue>
, you should not need to use TupleElements::vec()
, because TupleElements
supports front
, back
, operator[]
, begin
, end
, empty
, and size
. Unfortunately, we also have to have TupleElements::operator std::vector<IValue>()
because Tuple::elements()
used to return std::vector<IValue
and there is just too much existing code that wants to do std::vector<IValue> elems = someTuple.elements();
. A linter that detects and flags uses of these operator overloads would probably help with efficiency!
Strings
IValue
string storage is straightforward: c10::ivalue::ConstantString
is a reference-counted wrapper for a std::string
that exists so that it can inherit from c10::intrusive_ptr_target
. In addition to IValue::toString
(slow, avoid!) and IValue::toStringRef
, IValue::toStringView
returns a c10::string_view
for code that needs it.
Lists and Dicts
IValue
's native storage for lists and dicts is c10::GenericList
(an alias for c10::List<IValue>
) and c10::GenericDict
(an alias for c10::Dict<IValue, IValue>
). The template parameters for c10::List
and c10::Dict
control type checking and conversions, not the underlying representation, which is always a container of IValue
s. Some important consequences:
- Accessing any kind of
std::vector
orstd::unordered_map
from anIValue
(e.g., withIValue::toIntVector
involves a copy. - Putting any kind of
std::vector
orstd::unordered_map
into anIValue
involves a copy. - Because they use
IValue
s, boxed calls to operators that takestd::vector
orstd::unordered_map
as arguments or return them involve the same copies as outlined in the previous two points for their arguments. - If
sizeof(T) < sizeof(IValue)
,List<T>
uses more memory thanstd::vector<T>
because the underlying storage is alwaysstd::vector<IValue>
.
While c10::List
is not directly part of c10::IValue
, there are also some efficiency pitfalls to be aware of with it. c10::List
is an intrusive_ptr
to a (std::vector<IValue>, std::shared_ptr<Type>)
pair. Because there is an invariant that this pointer is never null, there is no way to efficiently move construct c10::List
(the move constructor was deleted because it was slower than the copy constructor). List
operations also use IValue::to
internally, and IValue::to
will often (but not always! see ivalue_to.h
use IValue
's slow intrusive_ptr
-creating IValue
accessors. Similar concerns apply to c10::Dict
.
MaybeOwned: Borrowed IValues!
While it is not used directly anywhere yet, c10::MaybeOwned
(see also MaybeOwned<Tensor> — PyTorch master documentation) supports efficiently borrowing IValue
s. It may be useful if you really need to minimize reference count overhead.
Tagged pointers
I have a prototype-quality implementation of a tagged pointer representation of IValue
, but I haven’t committed it because I wasn’t able to measure a significant speedup on any particular workloads and it added extra costs to IValue
destruction. If there are workloads where reducing the size of IValue
to 8 bytes would be useful, we could try it again.