Training support is not finished yet. I don’t see anything about optimizers that would prevent them from being captured with minor tweaks.
Regarding single whole-program graphs, TorchDynamo generates single graphs often – but there is no guarantee you will get a whole program graph and that it not the goal. The design philosophy is mixed mode execution working with Python and prioritizing preserving the usability of PyTorch. Tons of things will result in graph breaks including: using Python types (e.g. Tensor.item, Tensor.tolist, torch.any, etc); calling external C libraries (e.g. numpy); printing/logging; control flow (e.g. early stopping in training loop); constructing custom Python classes; and more. If you absolutely require whole program graphs above all else, then a different approach, like AOT tracing or Lazy Tensors, might be a better fit.