(Continued)
CUDA
- Moved ATen/CUDAGeneratorImpl.h to ATen/cuda (#71224)
- empty_cuda: Added functions that don’t depend on Tensor (#70616)
- Minor ScanKernels.cu cleanup (#65350)
- Removed THC ScalarConvert (#65471)
- Removed THCTensor.cu and THCTensorCopy.cu copy (#65491)
- Removed THCDeviceTensor (#65744)
- Migrated THCIntegerDivider.cuh to ATen (#65745)
- Added workaround for nvcc header dependecies bug (#62550)
- Removed accscalar from i0 and i0e (#67048)
- Moved some cub templates out of the header file (#67650)
- Exposed more CUDA/CuDNN info to at::Context and BC stage 1 (#68146)
- Added ModuleInfo-based CPU / GPU parity tests (#68097)
- Added ModuleInfo-based device transfer tests (#68092)
- Updated CUDA memory leak check to verify against driver API and print more diagnostic information (#69556)
- Fixed build on latest main branch of thrust (#69985)
- Split cuda: list cpp files that go in _cu library explicitly (#69082)
Dispatcher
- Made
detach
re-dispatch like a regular PyTorch operator (#71707) -
index_backward
: used out-of-place index_put if any input is subclass (#71779) - Some bug fixes to the external codegen pipeline to make it easier for external backends to use it (#69950, #69951, #69949)
- Made several ops that are implemented as composite in C++ “compliant”: before they would not play well with custom tensor subclasses, and now they should. Testing logic added in (#65819)
Mobile
- Split the upgrader test to a separate file and cover mobile part (#70090)
- Removed unused variable in applyUpgrader (#70261)
- Better error message when training attribute is not found (#68103)
- Disabled miopen test for convolution on mobile (#66564)
- Bumped up iOS CocoaPods version to 1.10.0 (#67058)
- Lite interpreter naming for android nightly publishing (#68651)
- Set test owner for mobile tests (#66829)
- Added ownership to more edge tests (#67859)
- Skipped compiledWithCuDNN() call for mobile to avoid segfault (#71775)
- System specific adjustments for UTs to work. (#65245)
- Updated mobile observer API for inference metadata logging (#65451)
- Made the error message of missing ops to be more specific (#71294)
- Exposed is_metal_available in header (#68942)
- Removed unused function in import (#65865)
- TensorExprKernel: support custom-class constants (#68856)
- Moved all serialize/deserialize files to a separate target (#66805)
- Added Backport test (#67824)
- Exposed methods and compilation unit (#66854)
- Populated operator_input_sizes (#68542)
- Updated generated header to use flatbuffer v1.12 (#71279)
- Refactored flatbuffer loader to allow overriding how IValues are parsed (#71661)
- Removed StringView from RecordFunction interface (1/2) (#68410)
- Moved upgraders from python to cpp (#70593)
- Moved bytecode generation to python (#71681)
- Made upgrader test model generation more robust (#72030)
- Created convinience wrapper for dynamic type construcytors (#71457)
- Enabled upgraders in TS server (#70539)
- Added a helper to produce html with a single call in model_dump (#66005)
- Skipped writing version during backport (#65842)
- Moved TypeParser class definition to header file (#65976)
- Updated bytecode version compatibility check (#67417)
- Added complete type name in error message when fail to export model (#67750)
- Added old models and unittest (#67726)
- Updated upgrader codegen with latest change (#70293)
- Used hypothesis for better test input data and broader coverage (#70263)
- Removed version compare as they are decoupled now (#71461)
- Automated model generating process (#70629)
- Moved generated keyword out of gen_mobile_upgraders.py (#71938)
- Used upgrader_mobile.cpp as the reference for codegen unittest (#71930)
- Added type check in compatibility api (#63129)
- Promoted missing ops for delegated models (#66052)
- Used at::native::is_nonzero in promoted ops to improve portability (#67097)
- Set actual output type, remove ambiguity from compile_spec names (#67209)
- Set kernel func name from compiler (#67229)
- Used irange for loops (#66747)
- Added control stack frame to lite interpreter (#65963)
- Implemented torch::jit::Function for mobile funciton (#65970)
- Loaded interface methods to corresponding ClassTypes (#65971)
- Removed usage of shared_ptr (#68037)
- Created DynamicType for OptionalType in mobile (#68137)
- Polymorphic IValue::type() for DynamicType (#70120)
- Do not reuse mobile type parser for all unpicklers (#71048)
- Migrated {TupleType, ListType} to DynamicType (#70205, #70212)
- Check to ensure profiler_edge is only added when use_kineto is on (#67494)
- Removed double pragma once directive in the generated code (#65620)
- Added mobile upgrader (#67728, #67729, #67730, #67731)
- Introduced multiple improvements for
operator versioning
- Fixed some bugs in operator upgrader(#71578, #70161, #70225)
- Used more robust way of extracting min and max versions
- Ensured initialization thread safety
- Supported indirect method CALL in lite interpreter (bytecode)
Distributed
-
torch.distributed
- Cleaned up DDP SPMD in reducer.cpp (#64113)
- Changed type and name of local_used_maps to reflect that it is only one map (#65380)
- Updated ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943)
- Disabled NCCL health check (#67668)
- Fixed object-based collectives for debug mode (#68223)
- Revised the socket implementation of c10d (#68226)
- Enabled desync root cause analysis for NCCL (#68310)
- Added a missing precondition to
DistributedSampler
docstring (#70104)
-
DistributedDataParallel
-
torch.distributed.rpc
TorchScript
- Fixed cases where errors were not thrown for XNNPack Ops, JIT graph executor, Cuda lowering of CUDA Tensor Lowering. This was due to uses of
TORCH_CHECK/ TORCH_INTERNAL_ASSERT
without a condition (#71879, #71767, #71778) - Fixed an Android SDK compilation warning when —D_FORTIFY_SOURCE=2 was used (#65222)
- Additional warnings suppressed when compiling Caffe2 headers(#71370)
Quantization
- More informative error messages from fbgemm embedding spmdm call (#65186)
- Changed observer FQNs generated in prepare step (#65420)
- Made FixedQParam ops work for dtypes other than quint8 (#65484)
- Added op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241)
- Replaced conv_p with convolution_op in qnnpack (#65783)
- Fixed the hypothesis test for topk (#66057)
- Removed hypothesis from qtopk (#66158)
- Shape propagation for quantization (#66343)
- Updated observer_fqn to not depend on node.name (#66767)
- Updated qnnpack to use pytorch/cpuinfo.git repo as a third party dependency (#67106)
- Added pass to duplicate dequant nodes with multi use (#67118)
- Removed asymmetrical padding parameters in qnnpack (#67102)
- Added out-variant for quantized::linear_dynamic_fp16 (#67663)
- Replaced copy_ with data_ptr() since input Tensor’s dtype is guaranteed to be float (#67788)
- Refactoring quantized op tests to combine test classes (#68282)
- In q_avgpool operator, loop over batch dimension inside operators (#66819)
- Added additional string to search cpu flags for vnni detection (#67686)
- Refactored handling of FixedQParams operators (#68143)
- Set FakeQuant zeropoint dtype matches observer for embedding QAT (#68390)
- Removed warning for quantized Tensor in
__dir__
(#69265) - Moved pattern type definition to ao/quantization/utils.py (#68769)
- Refactored fusion to use the new Pattern format (#68770)
- Changed the type for output of convert to be torch.nn.Module (#69959)
- In FX graph mode quantization, allow duplicate named_modules during fbgemm lowering (#70927)
- Added explanation of quantized comparison strategy in assert_close (#68911)
- Added quantized input tensor data type checks (#71218)
- Added a guard against shapes for qnnpack qadd (#71219)
- Templatized activationLimits function (#71220)
- Removed unused allow list arguments from propagate_qconfig and helper (#71104)
- Supported non-partial functions in qconfig comparison (#68067)
ONNX
- Suppressed ONNX Runtime warnings in tests (#67804)
- Fixed CUDA test case (#64378)
- Added links to the developer documentation in the wiki (#71609)
torch.package
- Add simple backwards compatibility check for
torch.package
(#66739)