PyTorch 1.11 dev release notes

A bit delayed, but - we have quite a few commits in the 1.11 release and some things that are interesting for people that develop within PyTorch.
You can find below a curated list of these changes:

Developers

Python API

  • OpInfo improvements:
    • More operators now have OpInfo tests:
      • Added OpInfo for nn.functional.batch_norm (#63218),
      • Added OpInfo for torch.argsort (#65454)
      • Added OpInfo for torch.repeat_interleave (#65455)
      • Added OpInfo for 2d fft functions (#66128)
      • Added Opinfo’s for avg_pooling (#64214)
      • Added OpInfo for torch.bucketize (#65821)
      • Added OpInfo’s for isfinite, isinf, isposinf, isneginf, isnan, isreal (#66400)
      • Added OpInfo for torch.nn.functional.pairwise_distance (#65460)
      • Added OpInfo for torch.nn.pixel_shuffle (#65467)
      • Added OpInfo for torch.nn.pixel_unshuffle (#65468)
      • Added OpInfo for torch.bincount (#65796)
      • Added OpInfo for norm ops (#67442, #68526)
      • Added OpInfo for torch.nn.functional.gaussian_nll_loss (#67356)
      • Added OpInfo for nn.functional.hinge_embedding_loss (#67381)
      • Added OpInfo for nn.functional.gaussian_nll_loss (#67376)
      • Added OpInfo for nn.functional.poisson_nll_loss (#67371)
      • Added OpInfo for nn.functional.ctc_loss (#67464)
      • Added OpInfo for nn.functional.cosine_embedding_loss (#67465)
      • Added OpInfo for adaptive_max_pool (#67405)
      • Added OpInfo for logical_or, logical_and, logical_xor (#67178)
      • Added OpInfo for torch.allclose (#68023)
      • Added OpInfo for nn.functional.cross_entropy (#63547)
      • Added OpInfo for torch.nn.bilinear and torch.nn.glu (#67478)
      • Added OpInfo for torch.histc (#67452)
      • Added OpInfos for stft, istft, fftshift, ifftshift (#68198)
      • Added OpInfos for parcel Elementwise Binary II (#68085)
      • Added OpInfo for torch.linalg.tensorsolve (#68810)
      • Added OpInfo for torch.nn.functional.kl_div (#65469)
      • Added OpInfo for torch.diagflat (#65680)
      • Added OpInfos for some Tensor dtype conversion methods (#64282)
      • Added OpInfo for *_like functions (#65941)
      • Added OpInfo for torch.unique and torch.unique_consecutive (#67529)
      • Added OpInfo for new_ functions and some _like functions (#67357)
      • Added OpInfo for torch.nonzero (#67459)
      • Added OpInfos for torch.atleast_{1d, 2d, 3d} (#67355)
      • Added OpInfo for embedding_bag (#67252)
      • Added OpInfos for combinations, cartesian_prod, sum_to_size, ldexp, and as_strided (#68853)
      • Added OpInfos for misc nn.functional operators (#68922)
      • Added OpInfo tests for (svd|pca)_lowrank (#69107)
      • Added OpInfo for nn.functional.dropout2d, revise sample inputs for dropout (#67891)
      • Added OpInfos for normal, bernoulli, multinomial (#66358)
      • Added OpInfos for flatten, column_stack (#69237)
    • Other improvements to OpInfo testing:
      • Added inplace_variant for resize_ OpInfo (#66135)
      • Added reference vs. noncontiguous OpInfo test (#67434)
      • Split channels_last test cases for tensor conversion OpInfos (#67368)
      • Remove OpInfo non-contig inputs (#67677)
      • Improve OpInfo test for norm ops: make inputs independent
      • [opinfo] use dtypes instead of dtypesIfCPU (#68732)
      • Fix for python 3.10 for gradient Opinfo (#68113)
      • OpInfo: Convert more sample_input_funcs to generators (#69976)
      • Updated poisson_nll_loss Opinfo samples (#70300)
      • Removed unnecessary skips in rsub OpInfo (#69973)
      • Merged index_{add,fill,copy,select} OpInfo sampling (#68184)
      • Labeled more elementwise binary operators correctly as BinaryUfuncInfos (#71622)
      • Deactivated the tracking of gradients in sampling functions within OpInfos (#68522)
      • Removed special FX OpInfo list (#67520)
  • More informative messages for None types comparisons (#69802)
  • Killed the test_torch.py mixin and created test_scatter_gather_ops (#71691)
  • Relaxes tolerance on ROCm test_noncontiguous_samples_matmul (#67593)
  • Added support for automated error and warning testing (#67354)
  • Skip forward-over-reverse gradgrad check for pinv singular on CUDA (#70123)
  • Made meta tensor data access error message for expressive in assert_close (#68802)
  • Removed skips from determinant tests (#70034)
  • Refactored repetitions into TorchVersion._cmp_wrapper (#71344)
  • Expect test_fn_fwgrad_bwgrad to fail because forward AD is not implemented (#71944)
  • Some python tensor subclass improvements:
    • Added Tensor._make_wrapper_subclass (#65340)
    • getitem: Ensure Tensor subclasses are not treated as tuples (#67202)
    • Fixed _make_wrapper_subclass's storage_offset handling (#68268)
    • Make empty **and ** _like factory functions respect tensor subclasses (#65677)
    • Make new_empty/new_ones/new_zeros/new_full respect subclass (#65169)
    • Ensure that “None” tensors in python map to “undefined” tensors in C++ (#67793)
  • Rationalized API exports in torch_python (#68095)
  • Removed tensor.data usage from a few places in internals (#65389)

C++ API

  • Convolution consolidation:
    • Factored backend routing logic out of convolution forward (#67790)
    • General convolution_backward function (#69044, #70112, #71489, #71490, #71491, #69584, #67283, #70661)
    • Removed finput, fgrad_input, columns, and ones from slow{2,3}d and slow{2,3}d_transpose signatures (#68897, #68898, #68899)
    • Removed backward ops for: cuDNN convolution, cuDNN transposed convolution, deprecated cuDNN convolution, miopen convolution, miopen convolution, miopen transposed convolution, miopen depthwise convolution, slow dilated 2d convolution, slow 2d transposed convolution, slow 3d convolution, slow dilated 3d convolution, mkldnn convolution, low 3d transposed convolution, 2d depthwise convolution, 3d depthwise convolution, NNPACK spatial convolution (#69901, #69902, #71128, #69987, #69987, #70063, #70064, #70067, #70333, #69978, #70068, #70467, #69933, #70461,#69902, #70462, #70305)
  • Removed TH/THC logic (#68127, #68556, #69040, #69041, #65942, #69929, #67940)
  • Added tanh_backward to AT symbols (#70071)
  • Improved documentation of comparison internals (#68977)
  • Added isUndefined to ExclusivelyOwnedTraits debug msg (#70638)
  • Removed buggy ExclusivelyOwnedTraits> (#70647)
  • Generated aten_interned_strings.h automatically (#69407)
  • Empty_strided: Factor out generic implementation (#70614)
  • Empty_meta: Add functions that don’t depend on Tensor (#70615)
  • Consolidated the overloads of TensorImpl::shallow_copy_and_detach (#68953)
  • Improved storage assertion of Tensor’s enforce_invariants (#70380)
  • Fixed aten’s native’s folder docs. (#71395)
  • Use of new_empty in dropout (#72078)
  • Simplified TensorImpl size check and fix error message (#72070)
  • Added output_mask argument to grid_sampler_2d_backward (#66068)
  • Avoided no-op shared_ptr dtor when constructing tuple (#69337)
  • slow_conv2d grad_weight: call gemm directly (#65726)
  • Made handle_torch_function_no_python_arg_parser public (#66054)
  • slow_conv3d: Avoided dispatch in parallel region (#65737)
  • slow_conv3d grad_input: Avoided dispatch in parallel region (#65757)
  • slow_conv3d: Used at::sum for grad_bias accumulation (#65758)
  • TBB: Use static partitioner to match OpenMP scheduling (#65327)
  • Move intraop_launch_future from Parallel.h (#64166)
  • slow_conv3d grad_weight: call gemm directly (#65759)
  • Wextra fix for Tensorshape.cpp (#66320)
  • Add InplaceOrView boxed kernel (#63878)
  • Used at::native::is_nonzero in a few places to skip an unnecessary dispatch trip (#67195)
  • Added tags for inplace view ops in native_functions.yaml (#65412)
  • Fixed C++ BatchNorm pretty_print() with optional momentum (#67335)
  • Inserted check for PyObject_IsInstance in THPVariableCheck (#67588)
  • Added SiLU backward Aten symbol (#67665)
  • Bumped dlpack.h to latest version (#65047)
  • Remove dWindowsTorchApiMacro.h in favor of Export.h (#69585)
  • Added macro to register CPU kernel for all arch types (#70332)
  • c10::irange around the codebase instead of for loops (#70326)

Autograd

  • Forward AD can be tested in gradcheck and OpInfos without also testing backward AD (#65040)
  • Extended OpInfo and gradgradcheck to test forward-over-reverse Hessian-vector products (#69740)
  • Extended OpInfo and gradcheck to test batched forward grad (#66294)
  • Enabled warning tests for nondeterministic backward functions (#66736)
  • Extended autograd functional benchmarking to run vectorized tasks (#67045)
  • Disallowed requires_grad=True in OpInfo’s make_tensor function for integral inputs (#67149)
  • Made autograd codegen for differentiable outputs safer to use (#65823)

Build

  • Improved disable name match (#71499)
  • Made permission errors more human readable when using setup.py (#66492)

torch.nn

  • Added testing across memory_format types to ModuleInfos (#69317)
  • Added private _masked_softmax function (#69268, #69272, #69924)
  • Added native_dropout (#63937)
  • F.interpolate: Removed JIT FC tweaks for antialias flag and nearest-exact mode (#71937)
  • F.pad: Replaced empty() with new_empty() (#68565)
  • F.softmax: Changed dtype to support TorchScript and MyPy (#68336)
  • nn.BatchNorm*d: Incremented num_batches_tracked in place for improved graph safety (#70444)
  • nn.Embedding: Passed arguments of embedding as named arguments (#67574)
  • nn.FractionalMaxPool2d: Fixed to index correct _random_samples dimension when provided (#70031)
  • nn.{GRU, LSTM, RNN}: Fixed links to docs in comments (#68828)
  • nn.Module: Added private _stateless API (#61447, #68969)
  • nn.modules.utils.{_single,_pair,_triple,_quadruple}: Populated __name__ (#70459)
  • nn.Parameter: Used torch.empty() instead of torch.tensor() (#66486)
  • optim: Updated CODEOWNERS (#65773)
  • optim.Optimizer: Integrated multi_tensor zero_grad into base class (#69936)
  • Refactored cuDNN convolution memory format and conv-bias-relu code (#65594)
  • Testing
    • Set cuDNN deterministic flag for test_conv_double_backward_cuda (#69941)
    • Increased tolerance for test_adadelta (#69919)
    • Set test owner for nn tests (#66850)
    • Changed test_conv_large parameter initialization (#71521)
    • Obliviated ALL_TENSORTYPES and ALL_TENSORTYPES2 (#71153)
    • Removed repeat test for types in test_nn.py (#70872)
    • Tweaked rel_tol for test_adadelta (#71880)
    • Added no-input-grad-needed cases to test_grid_sample (#66071)
    • Added OpInfo entries for nn.functional.{conv1d, linear} (#67747, #65498)
    • Added host-side memory requirement for test_softmax_64bit_indexing (#67922)
    • Made @dtypes mandatory when using @dtypesIf (#68186)
    • Added testing for complex non-vanilla SGD (#66261)
    • Skipped failing tests in test_nn.py if compiled without LAPACK (#70913)

torch.fx

  • Supported type annotations in operator_support.py (#65136)
  • Added algo recorder/replayer to lower.py (#68194)
  • Traced asserts with fx by looking at bytecode (#70960)
  • Fixed type checking errors in node.py (#68124)

AMD

  • Updated ROCm build to avoid relying on CUDA_VERSION or HIP_VERSION macros (#65610)

(Continued)

CUDA

  • Moved ATen/CUDAGeneratorImpl.h to ATen/cuda (#71224)
  • empty_cuda: Added functions that don’t depend on Tensor (#70616)
  • Minor ScanKernels.cu cleanup (#65350)
  • Removed THC ScalarConvert (#65471)
  • Removed THCTensor.cu and THCTensorCopy.cu copy (#65491)
  • Removed THCDeviceTensor (#65744)
  • Migrated THCIntegerDivider.cuh to ATen (#65745)
  • Added workaround for nvcc header dependecies bug (#62550)
  • Removed accscalar from i0 and i0e (#67048)
  • Moved some cub templates out of the header file (#67650)
  • Exposed more CUDA/CuDNN info to at::Context and BC stage 1 (#68146)
  • Added ModuleInfo-based CPU / GPU parity tests (#68097)
  • Added ModuleInfo-based device transfer tests (#68092)
  • Updated CUDA memory leak check to verify against driver API and print more diagnostic information (#69556)
  • Fixed build on latest main branch of thrust (#69985)
  • Split cuda: list cpp files that go in _cu library explicitly (#69082)

Dispatcher

  • Made detach re-dispatch like a regular PyTorch operator (#71707)
  • index_backward: used out-of-place index_put if any input is subclass (#71779)
  • Some bug fixes to the external codegen pipeline to make it easier for external backends to use it (#69950, #69951, #69949)
  • Made several ops that are implemented as composite in C++ “compliant”: before they would not play well with custom tensor subclasses, and now they should. Testing logic added in (#65819)
    • binary_cross_entropy backward (#70198)
    • quantile and nanquantile (#70894)
    • linalg.{matrix_power, inv, cholesky} (#69437)
    • index_copy, index_fill, masked_scatter, masked_fill (#71751)
    • index_put (#71765)
    • gather_backward (#71766)

Mobile

  • Split the upgrader test to a separate file and cover mobile part (#70090)
  • Removed unused variable in applyUpgrader (#70261)
  • Better error message when training attribute is not found (#68103)
  • Disabled miopen test for convolution on mobile (#66564)
  • Bumped up iOS CocoaPods version to 1.10.0 (#67058)
  • Lite interpreter naming for android nightly publishing (#68651)
  • Set test owner for mobile tests (#66829)
  • Added ownership to more edge tests (#67859)
  • Skipped compiledWithCuDNN() call for mobile to avoid segfault (#71775)
  • System specific adjustments for UTs to work. (#65245)
  • Updated mobile observer API for inference metadata logging (#65451)
  • Made the error message of missing ops to be more specific (#71294)
  • Exposed is_metal_available in header (#68942)
  • Removed unused function in import (#65865)
  • TensorExprKernel: support custom-class constants (#68856)
  • Moved all serialize/deserialize files to a separate target (#66805)
  • Added Backport test (#67824)
  • Exposed methods and compilation unit (#66854)
  • Populated operator_input_sizes (#68542)
  • Updated generated header to use flatbuffer v1.12 (#71279)
  • Refactored flatbuffer loader to allow overriding how IValues are parsed (#71661)
  • Removed StringView from RecordFunction interface (1/2) (#68410)
  • Moved upgraders from python to cpp (#70593)
  • Moved bytecode generation to python (#71681)
  • Made upgrader test model generation more robust (#72030)
  • Created convinience wrapper for dynamic type construcytors (#71457)
  • Enabled upgraders in TS server (#70539)
  • Added a helper to produce html with a single call in model_dump (#66005)
  • Skipped writing version during backport (#65842)
  • Moved TypeParser class definition to header file (#65976)
  • Updated bytecode version compatibility check (#67417)
  • Added complete type name in error message when fail to export model (#67750)
  • Added old models and unittest (#67726)
  • Updated upgrader codegen with latest change (#70293)
  • Used hypothesis for better test input data and broader coverage (#70263)
  • Removed version compare as they are decoupled now (#71461)
  • Automated model generating process (#70629)
  • Moved generated keyword out of gen_mobile_upgraders.py (#71938)
  • Used upgrader_mobile.cpp as the reference for codegen unittest (#71930)
  • Added type check in compatibility api (#63129)
  • Promoted missing ops for delegated models (#66052)
  • Used at::native::is_nonzero in promoted ops to improve portability (#67097)
  • Set actual output type, remove ambiguity from compile_spec names (#67209)
  • Set kernel func name from compiler (#67229)
  • Used irange for loops (#66747)
  • Added control stack frame to lite interpreter (#65963)
  • Implemented torch::jit::Function for mobile funciton (#65970)
  • Loaded interface methods to corresponding ClassTypes (#65971)
  • Removed usage of shared_ptr (#68037)
  • Created DynamicType for OptionalType in mobile (#68137)
  • Polymorphic IValue::type() for DynamicType (#70120)
  • Do not reuse mobile type parser for all unpicklers (#71048)
  • Migrated {TupleType, ListType} to DynamicType (#70205, #70212)
  • Check to ensure profiler_edge is only added when use_kineto is on (#67494)
  • Removed double pragma once directive in the generated code (#65620)
  • Added mobile upgrader (#67728, #67729, #67730, #67731)
  • Introduced multiple improvements for operator versioning
  • Fixed some bugs in operator upgrader(#71578, #70161, #70225)
    • Used more robust way of extracting min and max versions
    • Ensured initialization thread safety
  • Supported indirect method CALL in lite interpreter (bytecode)
    • Enabled CALL instruction in lite interpreter(#65964)
    • Enabled lite interpreter to correctly handle INTERFACE_CALL instruction (#65972)

Distributed

  • torch.distributed

    • Cleaned up DDP SPMD in reducer.cpp (#64113)
    • Changed type and name of local_used_maps to reflect that it is only one map (#65380)
    • Updated ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943)
    • Disabled NCCL health check (#67668)
    • Fixed object-based collectives for debug mode (#68223)
    • Revised the socket implementation of c10d (#68226)
    • Enabled desync root cause analysis for NCCL (#68310)
    • Added a missing precondition to DistributedSampler docstring (#70104)
  • DistributedDataParallel

    • Track models with sync bn (#66680)
    • Log API Usage for tracking (#66038)
  • torch.distributed.rpc

    • Fixed type checking errors in options.py (#68056)
    • Added API usage to torch.RPC (#67515)
    • Added API usage logging for several other RPC APIs. (#67722)

TorchScript

  • Fixed cases where errors were not thrown for XNNPack Ops, JIT graph executor, Cuda lowering of CUDA Tensor Lowering. This was due to uses of TORCH_CHECK/ TORCH_INTERNAL_ASSERT without a condition (#71879, #71767, #71778)
  • Fixed an Android SDK compilation warning when —D_FORTIFY_SOURCE=2 was used (#65222)
  • Additional warnings suppressed when compiling Caffe2 headers(#71370)

Quantization

  • More informative error messages from fbgemm embedding spmdm call (#65186)
  • Changed observer FQNs generated in prepare step (#65420)
  • Made FixedQParam ops work for dtypes other than quint8 (#65484)
  • Added op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241)
  • Replaced conv_p with convolution_op in qnnpack (#65783)
  • Fixed the hypothesis test for topk (#66057)
  • Removed hypothesis from qtopk (#66158)
  • Shape propagation for quantization (#66343)
  • Updated observer_fqn to not depend on node.name (#66767)
  • Updated qnnpack to use pytorch/cpuinfo.git repo as a third party dependency (#67106)
  • Added pass to duplicate dequant nodes with multi use (#67118)
  • Removed asymmetrical padding parameters in qnnpack (#67102)
  • Added out-variant for quantized::linear_dynamic_fp16 (#67663)
  • Replaced copy_ with data_ptr() since input Tensor’s dtype is guaranteed to be float (#67788)
  • Refactoring quantized op tests to combine test classes (#68282)
  • In q_avgpool operator, loop over batch dimension inside operators (#66819)
  • Added additional string to search cpu flags for vnni detection (#67686)
  • Refactored handling of FixedQParams operators (#68143)
  • Set FakeQuant zeropoint dtype matches observer for embedding QAT (#68390)
  • Removed warning for quantized Tensor in __dir__ (#69265)
  • Moved pattern type definition to ao/quantization/utils.py (#68769)
  • Refactored fusion to use the new Pattern format (#68770)
  • Changed the type for output of convert to be torch.nn.Module (#69959)
  • In FX graph mode quantization, allow duplicate named_modules during fbgemm lowering (#70927)
  • Added explanation of quantized comparison strategy in assert_close (#68911)
  • Added quantized input tensor data type checks (#71218)
  • Added a guard against shapes for qnnpack qadd (#71219)
  • Templatized activationLimits function (#71220)
  • Removed unused allow list arguments from propagate_qconfig and helper (#71104)
  • Supported non-partial functions in qconfig comparison (#68067)

ONNX

  • Suppressed ONNX Runtime warnings in tests (#67804)
  • Fixed CUDA test case (#64378)
  • Added links to the developer documentation in the wiki (#71609)

torch.package

  • Add simple backwards compatibility check for torch.package (#66739)
1 Like