June 1, 2026 Some (non-technical) details on training neural networks Notes from training a 500M-parameter transformer mostly from scratch