Hello

I am new here. I work since 21 years as software dev and I think I found an issue during PyTorch Faster/Mask RCNN usage.

Deep down in GeneralizedRCNNTransform (transform.py@39-43) PyTorch makes the decidion if an image needs to be resized.

def _resize_image_and_masks(image, self_min_size, self_max_size, target):

____# type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]

____im_shape = torch.tensor(image.shape[-2:])

____min_size = float(torch.min(im_shape))

____max_size = float(torch.max(im_shape))

____scale_factor = self_min_size / min_size

____if max_size * scale_factor > self_max_size:

________scale_factor = self_max_size / max_size

____image = torch.nn.functional.interpolate(

________image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,

________align_corners=False)[0]

____…

There are 4 parameters used here to decide about the scale_factor:

min_size and max_size which are the min and max dimensions of the image

self_min_size and self_max_size which are defined during initialization of the backbone and are by default set to 800 and 1333.

If I now come in with an image that fits perfectly well, like for example 832 x 1333, this algorithm will set scale_factor=800/832=0.96153… So the image gets resized even though it fits perfectly well into the backbone. I debugged this and set the scale_factor to 1.0 in line 44 and it worked perfectly like a charm without resizing anything.

I believe the algorithm should be changed to:

def _resize_image_and_masks(image, self_min_size, self_max_size, target):

____# type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]

____im_shape = torch.tensor(image.shape[-2:])

____min_size = float(torch.min(im_shape))

____max_size = float(torch.max(im_shape))

____if min_size < self_min_size or max_size > self_max_size: # avoid rescaling if not required

________scale_factor = self_min_size / min_size

________if max_size * scale_factor > self_max_size:

____________scale_factor = self_max_size / max_size

________image = torch.nn.functional.interpolate(

____________image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,

____________align_corners=False)[0]

____…

(sorry for bad formatting, the post interface removed all indentations, so I replaced that spaces with _)

What do the pro’s in this formum think about this?

P.S.: this call to interpolate with a float scale_factor creates a very annoying warning that should be addressed :

UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.

warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "