Hi, I am Yan Bai.

I currently work at NVIDIA, where I focus on large-scale training systems and RL infrastructure, especially Megatron-Core, distributed parallel training, MoE, long-context training, and system problems in reinforcement learning training frameworks.

I contributed full Megatron-Core support to veRL, making it the first public RL framework with DeepSeek V3 support. I also distilled my Megatron-Core experience into mbridge, an open-source bridge project on GitHub.

I built a Megatron memory estimator for quickly estimating training memory under different parallel configurations. A related introduction is available on the NVIDIA technical blog: Explore using the Megatron-Core training framework to improve GPU memory efficiency in large model training.

This blog collects notes on distributed training, RL infrastructure, model systems, experiments, and lessons learned from building and debugging real training stacks.