PyTorch Faster/Mask RCNN resize images badly

Hello
I am new here. I work since 21 years as software dev and I think I found an issue during PyTorch Faster/Mask RCNN usage.
Deep down in GeneralizedRCNNTransform (transform.py@39-43) PyTorch makes the decidion if an image needs to be resized.

def _resize_image_and_masks(image, self_min_size, self_max_size, target):
____# type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]
____im_shape = torch.tensor(image.shape[-2:])
____min_size = float(torch.min(im_shape))
____max_size = float(torch.max(im_shape))
____scale_factor = self_min_size / min_size
____if max_size * scale_factor > self_max_size:
________scale_factor = self_max_size / max_size
____image = torch.nn.functional.interpolate(
________image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,
________align_corners=False)[0]
____…

There are 4 parameters used here to decide about the scale_factor:
min_size and max_size which are the min and max dimensions of the image
self_min_size and self_max_size which are defined during initialization of the backbone and are by default set to 800 and 1333.

If I now come in with an image that fits perfectly well, like for example 832 x 1333, this algorithm will set scale_factor=800/832=0.96153… So the image gets resized even though it fits perfectly well into the backbone. I debugged this and set the scale_factor to 1.0 in line 44 and it worked perfectly like a charm without resizing anything.

I believe the algorithm should be changed to:

def _resize_image_and_masks(image, self_min_size, self_max_size, target):
____# type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]
____im_shape = torch.tensor(image.shape[-2:])
____min_size = float(torch.min(im_shape))
____max_size = float(torch.max(im_shape))
____if min_size < self_min_size or max_size > self_max_size: # avoid rescaling if not required
________scale_factor = self_min_size / min_size
________if max_size * scale_factor > self_max_size:
____________scale_factor = self_max_size / max_size
________image = torch.nn.functional.interpolate(
____________image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,
____________align_corners=False)[0]
____…

(sorry for bad formatting, the post interface removed all indentations, so I replaced that spaces with _)

What do the pro’s in this formum think about this?

P.S.: this call to interpolate with a float scale_factor creates a very annoying warning that should be addressed :

UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "

To illustrate my change here all test cases:

what : ____________________ dimensions : current : __ proposed (if changed)
both too big : _____________ 1500 x 4000 : 500 x 1333
one too big, one too small : 4000 x 500 : 1333 x 166
one too big, one ok: ______ : 4000 x 900 : 1333 x 333
perfect min, one ok ______ : 800 x 1000 : 800 x 1000
perfect min, perfect max_ : 800 x 1333 : 800 x 1333
both ok __________________ : 900 x 1000 : 800 x 888 : CHANGED: 900 x 1000
one ok, perfect max______ : 900 x 1333 : 800 x 1184 : CHANGED: 900 x 1333
one too small, one ok _____: 500 x 1000 : 666 x 1333
both too small ___________ : 500 x 200 __: 1333 x 533

hey it’s really hard to read your post.

Could you enclose your code in 3 back-ticks (i.e. the symbol ` three times ).
That preserves formatting / indentation.
Then, you can remove the _ prefixes that you’ve added to the code and replace them back with spaces.

Ok, I try again. Sorry for late reply…


Hello
I am new here. I work since 21 years as software dev and I think I found an issue during PyTorch Faster/Mask RCNN usage.
Deep down in GeneralizedRCNNTransform (transform.py@39-43) PyTorch makes the decidion if an image needs to be resized.

def _resize_image_and_masks(image, self_min_size, self_max_size, target):
    # type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]
    im_shape = torch.tensor(image.shape[-2:])
    min_size = float(torch.min(im_shape))
    max_size = float(torch.max(im_shape))
    scale_factor = self_min_size / min_size
    if max_size * scale_factor > self_max_size:
        scale_factor = self_max_size / max_size
        image = torch.nn.functional.interpolate(
            image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,
                align_corners=False)[0]
    …

There are 4 parameters used here to decide about the scale_factor:
min_size and max_size which are the min and max dimensions of the image
self_min_size and self_max_size which are defined during initialization of the backbone and are by default set to 800 and 1333.

If I now come in with an image that fits perfectly well, like for example 832 x 1333, this algorithm will set scale_factor=800/832=0.96153… So the image gets resized even though it fits perfectly well into the backbone. I debugged this and set the scale_factor to 1.0 in line 44 and it worked perfectly like a charm without resizing anything.

I believe the algorithm should be changed to:

def _resize_image_and_masks(image, self_min_size, self_max_size, target):
    # type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]
    im_shape = torch.tensor(image.shape[-2:])
    min_size = float(torch.min(im_shape))
    max_size = float(torch.max(im_shape))
    if min_size < self_min_size or max_size > self_max_size: # avoid rescaling if not required
        scale_factor = self_min_size / min_size
        if max_size * scale_factor > self_max_size:
            scale_factor = self_max_size / max_size
        image = torch.nn.functional.interpolate(
             image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,
                   align_corners=False)[0]
…

What do the pro’s in this formum think about this?

P.S.: this call to interpolate with a float scale_factor creates a very annoying warning that should be addressed :

UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "
1 Like

Can someone follow up on this? Even though the code for _resize_image_and_masks (vision/transform.py at db3ead1656a0ec9d523ca6b07c68394edf87b45a · pytorch/vision · GitHub) has changed since the original post it still seems to always rescale the image so one side is either length min_val or max_val even if both of the original image’s sides are between min_val and max_val. Is this the intended outcome? If it is, that’s not how most people interpret it online and the documentation is unclear for a key part of the image loading pipeline. Also, it’s not clear to me why this is useful default functionality.