Add byteswapping when saving a model on big endian platforms

xuhdev · November 1, 2021, 7:36am

I’m working on saving and loading model files in a byte order-consistent way due to zip file serialization endianness detection is broken · Issue #65300 · pytorch/pytorch · GitHub .

I have created a PR:

github.com/pytorch/pytorch

When loading model on a big-endian platform, do byteswap

pytorch:master ← xuhdev:byteorder

opened 06:57AM - 14 Oct 21 UTC

xuhdev

+13 -1

When loading models that are saved on a little-endian system and that use zip f…ile serialization, the byte order is not restored. This patch assumes that we will always save models on little-endian system and partially fixes this issue by swapping byte order on the loading side. The function `get_storage_from_record` is called from: https://github.com/pytorch/pytorch/blob/facff2ec6562d61eab2ccd9f1c16872e54eec92d/torch/serialization.py#L845 Fix #65300 --- This is not a complete fix (yet) because the saving side must also swap byte order on a big-endian platform. I'm sending this patch in this shape just to confirm this is the right direction to go. cc @driazati @ezyang @andrewsi-z

The PR’s objective is to ensure we always correctly treat saved model files as little endian files. Currently I have successfully corrected the byte order when loading on a big endian platform by swapping bytes once after loading them to memory. (I believe) I have also located the spot in which byteswap should happen for saving on a big endian platform: pytorch/serialization.py at e70b5d64f40feaed5d7fb98ce150f6ca3362150a · pytorch/pytorch · GitHub

However, I’m not really sure how to retrieve the size of each element before the line mentioned. It would be impossible to do byteswapping if the size of each element is unknown. Could someone help, please? Thanks in advance.

Topic		Replies	Views
POC for properly supporting torch.dtype in the JIT	3	1258	December 7, 2021
BC-Breaking Change: torch.load is being flipped to use weights_only=True by default in the nightlies after #137602 frontend API	0	7518	November 4, 2024
What is the relationship requirement among original bytecode, transformed bytecode, and bytecode returned by hooks in Dynamo? compiler	6	554	December 16, 2023
TorchDynamo: An Experiment in Dynamic Python Bytecode Transformation compiler	7	17062	March 9, 2023
State of model creation/initialization/seralization in PyTorch Core	1	1360	May 2, 2023

Add byteswapping when saving a model on big endian platforms

Related topics