PyTorch 1.9 dev release notes

anjali411 · June 17, 2021, 3:49pm

PyTorch 1.9 release contains quite a few commits that are not user facing but are interesting to people compiling from source or developing low level extensions for PyTorch. Here is a non-exhaustive list of the most important ones.

Python API

Added cpu_kernel_multiple_outputs to help developers implement new torch functions that return two or more tensors conveniently (#51097)
Support auto generation of device check (#56872)
Fix bug in self.assertExpectedInline (#55149)
protect destructors of python bindings that can be kept alive by c++ objects (#57488)
Testing related commits:
- See the Developer Wiki article “Writing tests in PyTorch 1.9” for details on significant testing improvements
  - A prototype torch.testing module (see it documentation here) has been added to facilitate testing libraries built using PyTorch
    - It currently has one function, torch.testing.assert_close, which can be useful when comparing PyTorch tensors (add your feedback to its RFC here!)
    - Request more features by filing issues on PyTorch’s Github
- OpInfo coverage and testing continues to expand!
  - test_ops.py now verifies the out= argument works correctly for operators with OpInfos (#53259)
  - OpInfos now support sample inputs with tensorlist arguments (#54922)
  - OpInfos can now be wrapped in a lambda for grad and gradgrad checks (#54914)
  - OpInfos can now handle sample inputs where the input is broadcast (#55771)
- [ROCm] Setting TEST_WITH_ROCM now skips tests that don’t use GPUs (#55069)
- A new test case method, assertWarnsOnceRegex, can be used to test warnings that are usually thrown only once per process (#52387)
- make_tensor(), the test suite’s goto mechanism for constructing a random tensor, now supports a discontiguous kwarg (#51985)

Distributed

torch.distributed.rpc: Adds a parameter server benchmark for RPC to torch/benchmarks/distributed. (#57454)
torch.distributed.nn.RemoteModule: Improve typing for RemoteModule (#58012)
torch.distributed.rpc: Assert that GIL is not held in blocking destructors in RPC (#57030)
Add logging when store_based_barrier succeeds (#57711)
torch.distributed.rpc: Allow to specify a set of device for CUDAFuture (#56515)
torch.distributed.nn.RemoteModule: Replace Python Pickler with internal RPC pickler for RemoteModule (#58019)
torch.distributed.rpc: Make CUDAFuture handle any kind of device type (#57051)
torch.distributed: Remove deprecated use of torch.LongTensor, torch.ByteTensor in distributed APIs (#55861)
torch.distributed: Join work clean up thread before aborting communicators (#55444)
DistributedDataParallel: Log use of uneven inputs API (#54919)
DistributedDataParallel: Deduplicate shared params before constructing Reducer in DDP (#53279)
torch.distributed: Log nccl_async_error_handling (#52965)
torch.distributed.rpc: Reduce logging verbosity in tensorpipe agent (#51784, #51785)
torch.distributed: Log nccl debug level in ProcessGroupNCCL (#52803)
torch.distributed.rpc: make pickler/unpickler pluggable in RPC (#53050)
torch.distributed: make the pickler in distributed_c10d pluggable (#53060)
DistributedDataParallel: log newly added construction and runtime stats at randomly selected iterations (#51394)
torch.distributed.rpc: Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939)
DistributedDataParallel: Ensure local_used_maps_tmp is distinct from local_used_maps_[i] (#54474)
DistributedDataParallel: Declare NamedTuple at top level to fix typing (#53273)
torch.distributed Combine backtrace print in test logging into one string to avoid interleaving (#56961).

torch.nn

Reenable test_nn tests for Windows (#52051)
Replace type().backend() with device() (#52558)
Remove annoying warnings from common_nn.py (#55982)
Fix __torch_function__ tests. (#54492)
Fixes new tf32 failures in test_nn.py (#52871)
Enable test cases in test_nn.py for ROCm (#52836)
Fix compiler warnings from conv.h (#56181)
Update upsample tests in test_nn.py to test for memory_format (#53665)
Lowering NLLLoss/CrossEntropyLoss to ATen code (#53789)
Refactor multi_head_attention_forward (#56674)
Convert type annotations in nn/functional.py to py3 syntax (#53656)
Migrates some of test_nn.py from assertEqualIgnoreTypes to assertEqual (#57642)
Removes unused RReLU code (#57672)
Disable TestComplexity.test_nn_module_test in fbcode (#56677)
Make convolution_overrideable default implementation raise NotImplementedError (#54707)
Remove ddp_gpu_size field from SyncBatchNorm (#55946)
Remove _specify_ddp_gpu_num method from SyncBatchNorm (#56425)
Check exception messages in embedding_bag_proxy unit test (5a1191d050)
Remove legacy constructor calls from _torch_ folder. (#53889)

C++ Frontend

Add NoOpDeviceGuardImpl (#53142)
Lower ReLu6 to aten (#52723)
Prevent VS from emitting ambiguous symbol errors (#53490)
Devirtualize TensorImpl is_contiguous (#55333)
Update expand_size API to match expand_inplace (#55246)
Put llvmMathExtras in c10 namespace (#55886)
Move flatten_dense_tensors and unflatten_dense_tensors to Native (#58006)

Autograd

Forward AD: Added systematic testing via gradcheck and OpInfos (#57633, #57701)
torch.autograd.gradcheck: fast_mode is now enabled by default for tests (#55699, #55237)
Update autograd kernels and tracing codegen to use redispatch API (#51363, #52009)
Move view handling logic to gen_inplace_or_view_type.py (#53341)
Add getters for attributes on autograd Node(#55225, #53205, #56499, #52451)
Eliminate global usage of torch.set_default_dtype in test_autograd (#56446)
Use _WeakTensorRef over weakref in test_autograd.py (#55726)
Move view and inplace handling to a separate key (#53342)

Complex Numbers

Added complex support for torch.testing.assert_(equal|close) (#57162).
Fixed NVCC related build warnings for complex operations in PyTorch (#55142).
Add eager and jit variant consistency tests for torch.cfloat tensor type (#54854).
Fixed complex mean and reduction tests that weren’t being properly run (#55640).
[ROCm] Added missing template declarations for complex BLAS (#52472).

CUDA

Kernel launch checks for aten/src/ATen (#52185)
Add more kernel launch checks (#53286)
Final kernel launch checks (#54214)
Fix nvcc warnings (#55367)
irange for Indexing.cu (#57479)
reduce number of randperm template instantiations (#58362)
Enforce kernel launch checks (#58116)
fix comments in ATenNVRTC.h (#57318)

AMD

Generalize HIP-specific launch bounds to apply to CUDA (#56143)

Composability

Dispatcher passes computed dispatch keys to kernels (#49354)
Add TORCH_CHECK_NOT_IMPLEMENTED/c10::NotImplementedError; make dispatch use it (#53377)
Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034)
Add Tensor::is_cpu, genericize TensorIterator (#54079)
Migrate about 100 kernel to C10 full dispatcher (#54109)
Rename XPLAT_MOBILE_BUILD to TEMPLATE_SELECTIVE_BUILD (#54217)
Migrate kernels with Tensor? to C10 full dispatcher (#54263)
Delete all unnecessary singular Math entries (#54436)
Rename Math to CompositeImplicitAutograd (#54466)
Rename DefaultBackend to CompositeExplicitAutograd (#54470)
Migrate kernels with TensorOptions to C10 full dispatcher (#54539)
Expose ops present in dispatcher via Dispatcher::getAllOpNames() (#54791)
Make redispatch functions callable from out of tree extensions (#54966)
Remove use_c10_dispatcher option (#54969)
Provide a method ObservedOperators::getUnobservedOperatorList() so that model tracer can empty it out during tracing (#55017)
Support needsOutputs for RecordFunction and ObserverUtil improvements (#55012)
Strict typecheck all files in tools/codegen (#55227)
Add MaybeOwned::operator*() && (#55244)
Allow copy operations on MaybeOwned (#55419)
Remove non-const TensorIterator::tensor() method (#55420)
Make as_strided_ use_const ref for mutable tensors (#55875)
Generate xla codegen in-tree (#56601)
HABANA Device registration key and Autograd key addition (#57094)
Refactor autocast to be extensible for devices (#57104)
Add pybind type caster for c10::Device (#57292)
Make c10::TempFile non-copyable but movable (#57308)
Fix string_view::equals_ compilation by CUDA-11.3 (#57322)
Delete move constructor on TensorImpl (#58048)
structured kernels - error check when structured_delegate is not marked structured (#52227)
fix RegistrationDeclarations.yaml, now that we codegen composite kernels for structured functional/inplace ops (#56307)

TorchScript

Remove output_args from ReduceOp (#52187)
Remove ReduceOp::accumulator (#52196)
Fix memory dependencies computation to not look at reduction output args (#52170)
Update rfactor to not use ReduceOp->output_args() (#52177)
Add an initialization expression to Reduce() (#53751)
Add IRVerifier (#52901)
Add index verifier for Store (#53137)
Add new APIs to get loops corresponding to a Buf (#53778)
Remove Dropout during frozen optimization (#51589)
Add pure list-producing ops to alias analysis (#51999)
Remove DepTracker from LoopNest (#52405)
Use graph executor to run forward on a gradient (#52136)
Support casted_batch_one_hot_lengths with 4-arg to (#53215)
Enable ClipRangesGatherRangesX2SigridHash fusion for SigridHashPrecompute (#53324)
Convert to to to_copy (#53524)
Fuse SigridTransforms + ListUnpack (#53920)
Use reshape when possible in broadcasting (#53326)
Lazily initialize AliasDb constant prop (#54640)
Fix freezing with MKLDNN tensors (#54632)
Add EliminateExceptions pass (#54730)
Lazily initialize AliasDb and add changed status to CSE (#54776)
Make transformations return whether graph is modified (#54777)
Change resize_as_ to resize_ (#55098)
Update to short forms of splitWithTail / splitWithMask (#55542)
Patch requires_grad on DifferentiableGraph (#55701)
Replace AutoNonVariableTypeMode with InferenceMode in static runtime (#55731)
Move tensor implicit conversions to test_builtins.py (#55532)
Redesign Rfactor loopnest transformation. (#55324)
Remove mask field from Load and Store classes (#55825)
Switch type of tensors_ from Tensor to Buf (#56318)
Merge ivalue::Future’s markCompleted and markCompletedWithDataPtrs (#56512)
Don’t lift tensor constants from fusion groups (#56756)
Use c10::ScalarType instead of tensorexpr::ScalarType (#56825)
Use JIT Plug-in for coverage to cover JIT’d functions and methods (#56310)
Add all pools, Batchnorm and Tanh (i.e. all ideeped MKLDNN ops) to MKLDNNFuser (#56541)
Inline hooks in ivalue::Future (#57354)
Add a pass for annotating a graph with input types derived from sample inputs (#57076)
Add a pass for removing a first (self) argument from a graph if it is unused (#57169)
Remove dtype_ and add buf_ fields to CodeGen::BufferArg. (#57382)
Add tests for custom state_dict save/load methods in TorchScript (#57886)
Add schema check to aten::repeat and fb::fast_gather (#58106)
Rename Tensor::call to Tensor::load to be consistent with Buf and Placeholder. (#55826)

Mobile

Check in Gradle wrapper for easier pytorch_android builds. (#51067)

torch.fx

Hoist custom class .so loading into setUp (#52883)
Test forward reference annotations (#53713)
Add TestConstFold coverage to test_fx (#54072)
Fix logic in TestFX.test_get_torch_func_signature_exhaustive (#54510)
Test tracing into all the standard torch.nn.functional (#55550)
Add more model symbolic tracing tests from torchvision (#55744)
Make stack trace testing less strict (#58088)

Quantization

Make bundled inputs work with quantized zero inputs (#47407)
Call native resize_/resize_as_ as much as possible (#53425)
Use expect_contiguous in quantized::linear fbgemm version (#58221)
Add pass in convert to fold quant-dequant sequence (#54860)
Add support for one value being quantized with different qconfigs (#53586)
Store dtype, axis as literals in the graph (#54624)
add _remove_qconfig flag to convert_fx (#53166)
Get first linear use of quantize_per_tensor for FQN (#54859)
Factoring out the list of no_observers (#50459)
Enable test for non quantized input for add/mul (#52412)
Guard the supported quantization type for add/mul (#52413)
Enable test for non quantized input for cat (#52414)
Merge add and mul handler (#52651)
Refactoring binary op tests to split int8 and float16 tests (#52807)
Refactoring binary op tests to split int8 and float16 tests (#52807) (#53020)
Remove reduandent code (#54073)
Change activation_post_process_map to track the observer name instead (#54643)
Separate handling Copy operator to a helper function (#54644)
Factor out insert_observers_for_model to a separate function (#54733)
Factor out insert_observers_for_model to a separate function (#54733) (#55307)
Separate handling Copy operator to a helper function (#54644) (#55429)
Add shape to nontensor op list (#55529)
fx quant:
- clean up nit in insert_observer (#57367)
- readability improvements on observer functions (#57368)
- move output obs logic to QuantizeHandler (#57377)
- move input_output_observed to qhandler (#57388)
- remove FixedQParamsOpQuantizeHandler from quantize.py (#57393)
- remove unnecessary quants arguments (#57399)
- remove find_quants from convert (#57402)
- refactor observer insertion (4f50fdc2a3)
- remove matching hack for binary qhandler (#57470)
- clean up names of quantize handlers (#53614)
Benchmark for torch.ops.quantized.linear_prepack_fp16 operator (#52229)
Add Per Tensor Quantization Support to FXIRImporter (#55405)
Hide warnings for deprecated quantization APIs (#56291)
Remove “Sparsity” from the function names (#56555)

ONNX

cmake: fix ONNX_NAMESPACE if USE_SYSTEM_ONNX (#54973)
Link onnx_library when BUILD_TEST=0 for Windows (#51937)
Fix onnx/constant_fold.cpp compilation on Windows (#55770) (#56167)

Misc

Updated PyBind to official v2.6.2 tag (#52304)
Added gdb special command to print tensors (#54339)
Numpy dependency is now only checked when using Numpy features (#52794)
don’t set the same C++ and C standards twice (#51832)
Fix cmake_minimum_require in libshm (#58306)

Topic		Replies	Views
PyTorch 1.10 dev release notes release/packaging	1	2025	October 21, 2021
PyTorch 1.11 dev release notes release/packaging	1	1802	April 21, 2022
PyTorch 1.8.0 Dev Release Notes release/packaging	0	1394	March 4, 2021
State of PyTorch core: September 2021 edition frontend API	1	9468	September 21, 2021
Tracing with Primitives: Update 2 compiler	4	7107	January 13, 2023