Title: TorchTitan - a PyTorch Native Platform for Training Foundation Models
Speaker: Chien-Chin Huang (Meta) [webpage: https://scholar.google.com/citations?user=vMMXCLAAAAAJ&hl=en]
Time: 10:00 am, November 21 (Friday), 2025
Location: Ryder Hall 156
Online link: provided upon request or see the seminar email.

Abstract:

TorchTitan is a PyTorch native open-source platform (GitHub: https://github.com/pytorch/torchtitan) designed for scalable and flexible training of generative AI models. Integrated tightly with PyTorch’s distributed stack while offering efficient optimizations and modular configurations, TorchTitan showcases elastic training of LLMs with composable 4/5-D parallelism. Moreover, TorchTitan supports extensible abstractions to experiment with new model architectures (e.g., diffusion models) or infrastructure techniques (e.g., a compiler-first FSDP implementation), while biasing towards a clean, minimal codebase.

Bio:

Chien-Chin Huang is a software engineer at Meta, working on the PyTorch Distributed team. His primary focus is on distributed training at scale. He has contributed to several projects, including Distributed Checkpointing (DCP), Fully Sharded Data Parallelism (FSDP), asynchronous Tensor Parallelism (TP), Context Parallelism (CP), and TorchTitan. Before joining Meta, he pursued a Ph.D. at NYU, where his research also centered on distributed training.