Happy to report that it is working without errors on GH200 systems.
- Tested single-node training
- Tested multi-node training
I am not able to use Infiniband but I think it is more related to my own network and configurations rather than torch.
Happy to report that it is working without errors on GH200 systems.
I am not able to use Infiniband but I think it is more related to my own network and configurations rather than torch.