PyTorch 1.8 release contains quite a few commits that are not user facing but are interesting to people compiling from source or developing low level extensions for PyTorch.
Here is a non-exhaustive list of the most important ones.
Python API
- Remove
PyCFunction
casts as much as possible. (#46227) - Clean up use of Flake8 in GitHub CI (#46740)
- Refactors
test_torch.py
to be fewer than 10k lines (#47356) - Refactored OpInfo testing to support custom SampleInputs, added addmm to op_db to test (#48627)
- Review memory overlap checks for advanced indexing operations (#48651)
- Remove unused
six
code for Python 2/3 compatibility (#48077) - Creation of test framework for Sparse Operators (#48488)
- Internal gradcheck wrapper in testing._internal that changes the default value of some flags (#51133)
- Fix
assertEqual
's handling of numpy array inputs (#48217)
Build
- Switch PyTorch Selective Build (Custom Build) to use the SelectiveBuilder abstraction (#45722)
- Set USE_DISTRIBUTED OFF when libuv is not installed (#45554)
- Compress NVCC flags for Windows (#45842)
- Conditional requirement for py3.6 only (#46932)
- Improve libuv detection on Windows (#48571)
- Fix
Python.h
discovery logic on some MacOS platforms (#51586) - Bring
fast_nvcc.py
to PyTorch OSS to speed up CUDA compilation (#48934) - Expose CXX_FLAGS at runtime via
torch.__config__._cxx_flags()
(#47861)
Distributed
- Let
RpcAgent::send()
returnJitFuture
(#49906) - Use correct signatures for
METH_NOARGS
(#45528) - Move python-independent c10d implementations to
torch/lib
(#47309) - Completely remove
FutureMessage
type and its usage (#50029) - Use store based barrier in
init_process_group
(#49419)
CUDA
- CUDA BFloat16 infrastructure (#44925)
- [CUDA graphs] Cuda RNG-safe graph capture and replay bindings (#48875)
- Move CUDA kernel check to c10 (#48277, #48615)
- More about cudnn refactor (#50827)
- Skip cuda test_cholesky_solve_batched_many_batches due to illegal memory access (#48999)
- Bump up the CUDA OOM test memory size (#48029)
- Refactor cudnn convolution (#49109)
- Refactor CuFFTConfig to not use tensor objects (#46909)
- Remove DataPtr extractor from CUDAFuture (#48840)
- Allow ROCm CI to use non-default stream. (#48424)
- Expand the test of torch.bmm on CUDA (#47124)
- Use the latest philox_cuda_state API for stochastic rounding (#51004)
Dispatcher
- Make duplicate def() calls an error in the dispatcher (#48098)
- Support string argument defaults in native_functions.yaml (#45665)
- Add alias dispatch key DefaultBackend (#45718)
- Support DefaultBackend keyword in native_functions.yaml (#45719)
- Rename legacy_dispatcher to native (#45974)
- Refactor dispatcher and native to use Signature structure (#45990)
- Update VariableTypeManual.cpp to not use catchAllKernel (#46353)
- Remove catchAllKernel (#46354)
- Delete TypeDefault call code generation logic in VariableType (#47000)
- Faithful out arguments (#47712)
- Add autograd data model to codegen (#48249)
- Mostly removing unnecessary includes so that TensorIterator.h can be
included from NativeFunctions.h without causing cycles (#48728) - Create a new key for threadLocalDebugInfo (#48762)
- Unregister backward and requires_grad ops from Autograd backend key (#49613)
- Move generator state APIs to ATen (#49589)
- Split out RegisterDispatchKey to its own file (#51508)
Misc
- [vmap] OpInfo-based tests now test that batched gradient computation with our vmap prototype works. (#50818)
- [vmap] NewModuleTest and CriterionTest now test that batched gradient computation with our vmap prototype works. (#50739, #50740, #50744)
- [TorchScript] Update JIT triage project board workflow (#45613)
- [TorchScript] Reformat ivalue_inl.h and ivalue.h (#46174)
- [Complex] Add hashing logic for c10::complex (#51441)
- [Complex] Only run complex gradcheck in TestOpInfo based tests when complex is supported (#49018)
- [Quantization] Add quantization triage bot script (#45622)
- [ONNX] Fix onnx test-reports path in CI (#47315)
- [Metal] Add MacOS unit tests for Metal ops (#50312)