Skipping Dispatcher with LazyTensor

wconstab · June 14, 2022, 12:38am

Thanks a lot for sharing this! I’m excited to see the data that shows dispatcher’s role in the trace overhead.

We would appreciate any suggestions, feedback, etc. Also, would there be interest in merging a feature-complete version of this patch at some point?

I would consider two points before trying to merge this.

is this going to be useful/ add additional value on top of using torchdynamo + lazytensor? The current proposal (though it is early) is to support a mode where dynamo associates a lazily traced program with its own guards, making it safe to skip lazy tracing on iter2 and jump directly to the compiled computation as long as dynamo’s guards pass. With this approach, we can skip not only the dispatcher overhead but all the trace overhead, and even some overhead originating in python.
what would it take to make this 100% safe/consistent with eager pytorch behavior? mainly, there are probably cases where there is non-linear code between the THPVariable_foo and underlying foo operator. In these cases, jumping directly to lazy trace could make the lazy tensor behave differently from eager. Can we avoid this, and keep things consistent?

Topic		Replies	Views
TorchDynamo Update 4: LazyTensor & nvFuser Experiments compiler	4	4803	February 9, 2024
Lazy Tensor Core hardware-backends	20	7464	July 12, 2022
Dispatcher Performance and Inlining: a Report on Two Days Spent on Dispatcher Performance performance	1	1185	January 28, 2021
TorchDynamo Update: 1.48x geomean speedup on TorchBench CPU Inference compiler	0	5603	November 12, 2021
TorchDynamo: An Experiment in Dynamic Python Bytecode Transformation compiler	7	17159	March 9, 2023