Balance "Usability over Performance" in a concentrating ML Ecosystem

Thanks Alban for posting this. This should help clarify the strategy for users who want performance and developers working on fast kernels for various hardware.