TorchAudio has integrated FFmpeg and enabled many features it offers, such as
- audio, video, image decoding in unified interface
- preprocessing, such as resampling and scaling
- GPU video decoding using nvdec
We are looking into ways to take advantage of these features and improve the I/O performance in training.
As a first step, so as to understand the nature of these features, we looked into GPU video decoding.
We found that majority of added GPU memory consumption comes from FFmpeg’s own CUDA context.
Following this, we are looking into a way to share PyTorch’s CUDA context with FFmpeg, so that GPU video decode will be cheaper.
PoC for using primary device context (sharing with PyTorch): PoC: Share CUcontext by mthrok · Pull Request #3371 · pytorch/audio · GitHub