Ulysses Sequence Parallelism: Training with Million-Token Contexts

Details in article.

Source: Hugging Face Blog