I am new here. I work since 21 years as software dev and I think I found an issue during PyTorch Faster/Mask RCNN usage.
Deep down in GeneralizedRCNNTransform (transform.py@39-43) PyTorch makes the decidion if an image needs to be resized.
def _resize_image_and_masks(image, self_min_size, self_max_size, target):
____# type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]
____im_shape = torch.tensor(image.shape[-2:])
____min_size = float(torch.min(im_shape))
____max_size = float(torch.max(im_shape))
____scale_factor = self_min_size / min_size
____if max_size * scale_factor > self_max_size:
________scale_factor = self_max_size / max_size
____image = torch.nn.functional.interpolate(
________image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,
There are 4 parameters used here to decide about the scale_factor:
min_size and max_size which are the min and max dimensions of the image
self_min_size and self_max_size which are defined during initialization of the backbone and are by default set to 800 and 1333.
If I now come in with an image that fits perfectly well, like for example 832 x 1333, this algorithm will set scale_factor=800/832=0.96153… So the image gets resized even though it fits perfectly well into the backbone. I debugged this and set the scale_factor to 1.0 in line 44 and it worked perfectly like a charm without resizing anything.
I believe the algorithm should be changed to:
def _resize_image_and_masks(image, self_min_size, self_max_size, target):
____# type: (Tensor, float, float, Optional[Dict[str, Tensor]]) → Tuple[Tensor, Optional[Dict[str, Tensor]]]
____im_shape = torch.tensor(image.shape[-2:])
____min_size = float(torch.min(im_shape))
____max_size = float(torch.max(im_shape))
____if min_size < self_min_size or max_size > self_max_size: # avoid rescaling if not required
________scale_factor = self_min_size / min_size
________if max_size * scale_factor > self_max_size:
____________scale_factor = self_max_size / max_size
________image = torch.nn.functional.interpolate(
____________image[None], scale_factor=scale_factor, mode=‘bilinear’, recompute_scale_factor=True,
(sorry for bad formatting, the post interface removed all indentations, so I replaced that spaces with _)
What do the pro’s in this formum think about this?
P.S.: this call to interpolate with a float scale_factor creates a very annoying warning that should be addressed :
UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "