PyTorch 1.11 dev release notes

(Continued)

CUDA

  • Moved ATen/CUDAGeneratorImpl.h to ATen/cuda (#71224)
  • empty_cuda: Added functions that don’t depend on Tensor (#70616)
  • Minor ScanKernels.cu cleanup (#65350)
  • Removed THC ScalarConvert (#65471)
  • Removed THCTensor.cu and THCTensorCopy.cu copy (#65491)
  • Removed THCDeviceTensor (#65744)
  • Migrated THCIntegerDivider.cuh to ATen (#65745)
  • Added workaround for nvcc header dependecies bug (#62550)
  • Removed accscalar from i0 and i0e (#67048)
  • Moved some cub templates out of the header file (#67650)
  • Exposed more CUDA/CuDNN info to at::Context and BC stage 1 (#68146)
  • Added ModuleInfo-based CPU / GPU parity tests (#68097)
  • Added ModuleInfo-based device transfer tests (#68092)
  • Updated CUDA memory leak check to verify against driver API and print more diagnostic information (#69556)
  • Fixed build on latest main branch of thrust (#69985)
  • Split cuda: list cpp files that go in _cu library explicitly (#69082)

Dispatcher

  • Made detach re-dispatch like a regular PyTorch operator (#71707)
  • index_backward: used out-of-place index_put if any input is subclass (#71779)
  • Some bug fixes to the external codegen pipeline to make it easier for external backends to use it (#69950, #69951, #69949)
  • Made several ops that are implemented as composite in C++ “compliant”: before they would not play well with custom tensor subclasses, and now they should. Testing logic added in (#65819)
    • binary_cross_entropy backward (#70198)
    • quantile and nanquantile (#70894)
    • linalg.{matrix_power, inv, cholesky} (#69437)
    • index_copy, index_fill, masked_scatter, masked_fill (#71751)
    • index_put (#71765)
    • gather_backward (#71766)

Mobile

  • Split the upgrader test to a separate file and cover mobile part (#70090)
  • Removed unused variable in applyUpgrader (#70261)
  • Better error message when training attribute is not found (#68103)
  • Disabled miopen test for convolution on mobile (#66564)
  • Bumped up iOS CocoaPods version to 1.10.0 (#67058)
  • Lite interpreter naming for android nightly publishing (#68651)
  • Set test owner for mobile tests (#66829)
  • Added ownership to more edge tests (#67859)
  • Skipped compiledWithCuDNN() call for mobile to avoid segfault (#71775)
  • System specific adjustments for UTs to work. (#65245)
  • Updated mobile observer API for inference metadata logging (#65451)
  • Made the error message of missing ops to be more specific (#71294)
  • Exposed is_metal_available in header (#68942)
  • Removed unused function in import (#65865)
  • TensorExprKernel: support custom-class constants (#68856)
  • Moved all serialize/deserialize files to a separate target (#66805)
  • Added Backport test (#67824)
  • Exposed methods and compilation unit (#66854)
  • Populated operator_input_sizes (#68542)
  • Updated generated header to use flatbuffer v1.12 (#71279)
  • Refactored flatbuffer loader to allow overriding how IValues are parsed (#71661)
  • Removed StringView from RecordFunction interface (1/2) (#68410)
  • Moved upgraders from python to cpp (#70593)
  • Moved bytecode generation to python (#71681)
  • Made upgrader test model generation more robust (#72030)
  • Created convinience wrapper for dynamic type construcytors (#71457)
  • Enabled upgraders in TS server (#70539)
  • Added a helper to produce html with a single call in model_dump (#66005)
  • Skipped writing version during backport (#65842)
  • Moved TypeParser class definition to header file (#65976)
  • Updated bytecode version compatibility check (#67417)
  • Added complete type name in error message when fail to export model (#67750)
  • Added old models and unittest (#67726)
  • Updated upgrader codegen with latest change (#70293)
  • Used hypothesis for better test input data and broader coverage (#70263)
  • Removed version compare as they are decoupled now (#71461)
  • Automated model generating process (#70629)
  • Moved generated keyword out of gen_mobile_upgraders.py (#71938)
  • Used upgrader_mobile.cpp as the reference for codegen unittest (#71930)
  • Added type check in compatibility api (#63129)
  • Promoted missing ops for delegated models (#66052)
  • Used at::native::is_nonzero in promoted ops to improve portability (#67097)
  • Set actual output type, remove ambiguity from compile_spec names (#67209)
  • Set kernel func name from compiler (#67229)
  • Used irange for loops (#66747)
  • Added control stack frame to lite interpreter (#65963)
  • Implemented torch::jit::Function for mobile funciton (#65970)
  • Loaded interface methods to corresponding ClassTypes (#65971)
  • Removed usage of shared_ptr (#68037)
  • Created DynamicType for OptionalType in mobile (#68137)
  • Polymorphic IValue::type() for DynamicType (#70120)
  • Do not reuse mobile type parser for all unpicklers (#71048)
  • Migrated {TupleType, ListType} to DynamicType (#70205, #70212)
  • Check to ensure profiler_edge is only added when use_kineto is on (#67494)
  • Removed double pragma once directive in the generated code (#65620)
  • Added mobile upgrader (#67728, #67729, #67730, #67731)
  • Introduced multiple improvements for operator versioning
  • Fixed some bugs in operator upgrader(#71578, #70161, #70225)
    • Used more robust way of extracting min and max versions
    • Ensured initialization thread safety
  • Supported indirect method CALL in lite interpreter (bytecode)
    • Enabled CALL instruction in lite interpreter(#65964)
    • Enabled lite interpreter to correctly handle INTERFACE_CALL instruction (#65972)

Distributed

  • torch.distributed

    • Cleaned up DDP SPMD in reducer.cpp (#64113)
    • Changed type and name of local_used_maps to reflect that it is only one map (#65380)
    • Updated ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943)
    • Disabled NCCL health check (#67668)
    • Fixed object-based collectives for debug mode (#68223)
    • Revised the socket implementation of c10d (#68226)
    • Enabled desync root cause analysis for NCCL (#68310)
    • Added a missing precondition to DistributedSampler docstring (#70104)
  • DistributedDataParallel

    • Track models with sync bn (#66680)
    • Log API Usage for tracking (#66038)
  • torch.distributed.rpc

    • Fixed type checking errors in options.py (#68056)
    • Added API usage to torch.RPC (#67515)
    • Added API usage logging for several other RPC APIs. (#67722)

TorchScript

  • Fixed cases where errors were not thrown for XNNPack Ops, JIT graph executor, Cuda lowering of CUDA Tensor Lowering. This was due to uses of TORCH_CHECK/ TORCH_INTERNAL_ASSERT without a condition (#71879, #71767, #71778)
  • Fixed an Android SDK compilation warning when —D_FORTIFY_SOURCE=2 was used (#65222)
  • Additional warnings suppressed when compiling Caffe2 headers(#71370)

Quantization

  • More informative error messages from fbgemm embedding spmdm call (#65186)
  • Changed observer FQNs generated in prepare step (#65420)
  • Made FixedQParam ops work for dtypes other than quint8 (#65484)
  • Added op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241)
  • Replaced conv_p with convolution_op in qnnpack (#65783)
  • Fixed the hypothesis test for topk (#66057)
  • Removed hypothesis from qtopk (#66158)
  • Shape propagation for quantization (#66343)
  • Updated observer_fqn to not depend on node.name (#66767)
  • Updated qnnpack to use pytorch/cpuinfo.git repo as a third party dependency (#67106)
  • Added pass to duplicate dequant nodes with multi use (#67118)
  • Removed asymmetrical padding parameters in qnnpack (#67102)
  • Added out-variant for quantized::linear_dynamic_fp16 (#67663)
  • Replaced copy_ with data_ptr() since input Tensor’s dtype is guaranteed to be float (#67788)
  • Refactoring quantized op tests to combine test classes (#68282)
  • In q_avgpool operator, loop over batch dimension inside operators (#66819)
  • Added additional string to search cpu flags for vnni detection (#67686)
  • Refactored handling of FixedQParams operators (#68143)
  • Set FakeQuant zeropoint dtype matches observer for embedding QAT (#68390)
  • Removed warning for quantized Tensor in __dir__ (#69265)
  • Moved pattern type definition to ao/quantization/utils.py (#68769)
  • Refactored fusion to use the new Pattern format (#68770)
  • Changed the type for output of convert to be torch.nn.Module (#69959)
  • In FX graph mode quantization, allow duplicate named_modules during fbgemm lowering (#70927)
  • Added explanation of quantized comparison strategy in assert_close (#68911)
  • Added quantized input tensor data type checks (#71218)
  • Added a guard against shapes for qnnpack qadd (#71219)
  • Templatized activationLimits function (#71220)
  • Removed unused allow list arguments from propagate_qconfig and helper (#71104)
  • Supported non-partial functions in qconfig comparison (#68067)

ONNX

  • Suppressed ONNX Runtime warnings in tests (#67804)
  • Fixed CUDA test case (#64378)
  • Added links to the developer documentation in the wiki (#71609)

torch.package

  • Add simple backwards compatibility check for torch.package (#66739)
1 Like