Automatic out-of-tree backend loading

Hello,

Currently I use following somewhat ugly sample to load OpenCL backend shared object:

if r.device.find('opencl')==0:
    torch.ops.load_library("build/libpt_ocl.so")

Is there any location in pytorch tree that I can put my backend shared object/dll and it will be loaded automatically? This way I also access various low level backend functions via ops library, for example, to start low level profiling. or access cache meta data/operations

For example:

torch.ops.oclops.prof_start(device)

Now I’d like to have it accessed as it is done today for cuda, for example:

torch.cuda.empty_cache()
torch.opencl.empty_cache()

Or alternatievely it would be great to have some generic op like:

torch.backed_from_device(device).empty_cache()

Any thoughts? Is it even doable as out-of-tree backend?