PyTorch 1.11 dev release notes

bdhirsh · April 21, 2022, 6:23pm

(Continued)

CUDA

Moved ATen/CUDAGeneratorImpl.h to ATen/cuda (#71224)
empty_cuda: Added functions that don’t depend on Tensor (#70616)
Minor ScanKernels.cu cleanup (#65350)
Removed THC ScalarConvert (#65471)
Removed THCTensor.cu and THCTensorCopy.cu copy (#65491)
Removed THCDeviceTensor (#65744)
Migrated THCIntegerDivider.cuh to ATen (#65745)
Added workaround for nvcc header dependecies bug (#62550)
Removed accscalar from i0 and i0e (#67048)
Moved some cub templates out of the header file (#67650)
Exposed more CUDA/CuDNN info to at::Context and BC stage 1 (#68146)
Added ModuleInfo-based CPU / GPU parity tests (#68097)
Added ModuleInfo-based device transfer tests (#68092)
Updated CUDA memory leak check to verify against driver API and print more diagnostic information (#69556)
Fixed build on latest main branch of thrust (#69985)
Split cuda: list cpp files that go in _cu library explicitly (#69082)

Dispatcher

Made detach re-dispatch like a regular PyTorch operator (#71707)
index_backward: used out-of-place index_put if any input is subclass (#71779)
Some bug fixes to the external codegen pipeline to make it easier for external backends to use it (#69950, #69951, #69949)
Made several ops that are implemented as composite in C++ “compliant”: before they would not play well with custom tensor subclasses, and now they should. Testing logic added in (#65819)
- binary_cross_entropy backward (#70198)
- quantile and nanquantile (#70894)
- linalg.{matrix_power, inv, cholesky} (#69437)
- index_copy, index_fill, masked_scatter, masked_fill (#71751)
- index_put (#71765)
- gather_backward (#71766)

Mobile

Split the upgrader test to a separate file and cover mobile part (#70090)
Removed unused variable in applyUpgrader (#70261)
Better error message when training attribute is not found (#68103)
Disabled miopen test for convolution on mobile (#66564)
Bumped up iOS CocoaPods version to 1.10.0 (#67058)
Lite interpreter naming for android nightly publishing (#68651)
Set test owner for mobile tests (#66829)
Added ownership to more edge tests (#67859)
Skipped compiledWithCuDNN() call for mobile to avoid segfault (#71775)
System specific adjustments for UTs to work. (#65245)
Updated mobile observer API for inference metadata logging (#65451)
Made the error message of missing ops to be more specific (#71294)
Exposed is_metal_available in header (#68942)
Removed unused function in import (#65865)
TensorExprKernel: support custom-class constants (#68856)
Moved all serialize/deserialize files to a separate target (#66805)
Added Backport test (#67824)
Exposed methods and compilation unit (#66854)
Populated operator_input_sizes (#68542)
Updated generated header to use flatbuffer v1.12 (#71279)
Refactored flatbuffer loader to allow overriding how IValues are parsed (#71661)
Removed StringView from RecordFunction interface (1/2) (#68410)
Moved upgraders from python to cpp (#70593)
Moved bytecode generation to python (#71681)
Made upgrader test model generation more robust (#72030)
Created convinience wrapper for dynamic type construcytors (#71457)
Enabled upgraders in TS server (#70539)
Added a helper to produce html with a single call in model_dump (#66005)
Skipped writing version during backport (#65842)
Moved TypeParser class definition to header file (#65976)
Updated bytecode version compatibility check (#67417)
Added complete type name in error message when fail to export model (#67750)
Added old models and unittest (#67726)
Updated upgrader codegen with latest change (#70293)
Used hypothesis for better test input data and broader coverage (#70263)
Removed version compare as they are decoupled now (#71461)
Automated model generating process (#70629)
Moved generated keyword out of gen_mobile_upgraders.py (#71938)
Used upgrader_mobile.cpp as the reference for codegen unittest (#71930)
Added type check in compatibility api (#63129)
Promoted missing ops for delegated models (#66052)
Used at::native::is_nonzero in promoted ops to improve portability (#67097)
Set actual output type, remove ambiguity from compile_spec names (#67209)
Set kernel func name from compiler (#67229)
Used irange for loops (#66747)
Added control stack frame to lite interpreter (#65963)
Implemented torch::jit::Function for mobile funciton (#65970)
Loaded interface methods to corresponding ClassTypes (#65971)
Removed usage of shared_ptr (#68037)
Created DynamicType for OptionalType in mobile (#68137)
Polymorphic IValue::type() for DynamicType (#70120)
Do not reuse mobile type parser for all unpicklers (#71048)
Migrated {TupleType, ListType} to DynamicType (#70205, #70212)
Check to ensure profiler_edge is only added when use_kineto is on (#67494)
Removed double pragma once directive in the generated code (#65620)
Added mobile upgrader (#67728, #67729, #67730, #67731)
Introduced multiple improvements for operator versioning
- Improved compatibility APIs and minor refactor (#68678, #68677, #71432, #67385)
Fixed some bugs in operator upgrader(#71578, #70161, #70225)
- Used more robust way of extracting min and max versions
- Ensured initialization thread safety
Supported indirect method CALL in lite interpreter (bytecode)
- Enabled CALL instruction in lite interpreter(#65964)
- Enabled lite interpreter to correctly handle INTERFACE_CALL instruction (#65972)

Distributed

torch.distributed
- Cleaned up DDP SPMD in reducer.cpp (#64113)
- Changed type and name of local_used_maps to reflect that it is only one map (#65380)
- Updated ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943)
- Disabled NCCL health check (#67668)
- Fixed object-based collectives for debug mode (#68223)
- Revised the socket implementation of c10d (#68226)
- Enabled desync root cause analysis for NCCL (#68310)
- Added a missing precondition to DistributedSampler docstring (#70104)
DistributedDataParallel
- Track models with sync bn (#66680)
- Log API Usage for tracking (#66038)
torch.distributed.rpc
- Fixed type checking errors in options.py (#68056)
- Added API usage to torch.RPC (#67515)
- Added API usage logging for several other RPC APIs. (#67722)

TorchScript

Fixed cases where errors were not thrown for XNNPack Ops, JIT graph executor, Cuda lowering of CUDA Tensor Lowering. This was due to uses of TORCH_CHECK/ TORCH_INTERNAL_ASSERT without a condition (#71879, #71767, #71778)
Fixed an Android SDK compilation warning when —D_FORTIFY_SOURCE=2 was used (#65222)
Additional warnings suppressed when compiling Caffe2 headers(#71370)

Quantization

More informative error messages from fbgemm embedding spmdm call (#65186)
Changed observer FQNs generated in prepare step (#65420)
Made FixedQParam ops work for dtypes other than quint8 (#65484)
Added op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241)
Replaced conv_p with convolution_op in qnnpack (#65783)
Fixed the hypothesis test for topk (#66057)
Removed hypothesis from qtopk (#66158)
Shape propagation for quantization (#66343)
Updated observer_fqn to not depend on node.name (#66767)
Updated qnnpack to use pytorch/cpuinfo.git repo as a third party dependency (#67106)
Added pass to duplicate dequant nodes with multi use (#67118)
Removed asymmetrical padding parameters in qnnpack (#67102)
Added out-variant for quantized::linear_dynamic_fp16 (#67663)
Replaced copy_ with data_ptr() since input Tensor’s dtype is guaranteed to be float (#67788)
Refactoring quantized op tests to combine test classes (#68282)
In q_avgpool operator, loop over batch dimension inside operators (#66819)
Added additional string to search cpu flags for vnni detection (#67686)
Refactored handling of FixedQParams operators (#68143)
Set FakeQuant zeropoint dtype matches observer for embedding QAT (#68390)
Removed warning for quantized Tensor in __dir__ (#69265)
Moved pattern type definition to ao/quantization/utils.py (#68769)
Refactored fusion to use the new Pattern format (#68770)
Changed the type for output of convert to be torch.nn.Module (#69959)
In FX graph mode quantization, allow duplicate named_modules during fbgemm lowering (#70927)
Added explanation of quantized comparison strategy in assert_close (#68911)
Added quantized input tensor data type checks (#71218)
Added a guard against shapes for qnnpack qadd (#71219)
Templatized activationLimits function (#71220)
Removed unused allow list arguments from propagate_qconfig and helper (#71104)
Supported non-partial functions in qconfig comparison (#68067)

ONNX

Suppressed ONNX Runtime warnings in tests (#67804)
Fixed CUDA test case (#64378)
Added links to the developer documentation in the wiki (#71609)

torch.package

Add simple backwards compatibility check for torch.package (#66739)

Topic		Replies	Views
PyTorch 1.10 dev release notes release/packaging	1	1984	October 21, 2021
PyTorch 1.9 dev release notes release/packaging	0	1352	June 17, 2021
OpInfos in PyTorch 1.10 testing	3	1778	November 6, 2021
PyTorch 1.8.0 Dev Release Notes release/packaging	0	1363	March 4, 2021
State of PyTorch core: September 2021 edition frontend API	1	9405	September 21, 2021

PyTorch 1.11 dev release notes

CUDA

Dispatcher

Mobile

Distributed

TorchScript

Quantization

ONNX

torch.package

Related topics