I have a problem when running this simple repro in compile:
def test_empty_tensor_hpu():
def create_empty_tensor(output, device_type):
torch.empty(1024,device=device_type, out=output)
return output * output
fn_hpu = torch.compile(create_empty_tensor, backend="hpu_backend")
out_tensor = torch.ones(1024, device="hpu")
result_h = fn_hpu(out_tensor, "hpu")
print(result_h)
I get this error:
RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details: https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html
While executing %result : [num_users=1] = call_function[target=torch.empty](args = (), kwargs = {size: [1024], dtype: torch.float32, device: hpu, layout: torch.strided, out: %l_op_inputs_dict_out_})
cpp + pythoncallstack: callstack
The issue lies with our backend recieving a FunctionalTensor for the resize_ op during FakeTensorMode which of course throws this error while trying to access its data_ptr. When comparing the resize_ op dispatch on cpu and hpu both seem to be handled similarly. Should our backend be able to handle functional tensors or do we have a problem with how we register ops? / Is there any way to debug this?