W&B Training - Weights & Biases Documentation

Use W&B Training for serverless post-training of large language models (LLMs), including both reinforcement learning (RL) and supervised fine-tuning (SFT). W&B Training is now in public preview.

Serverless RL: Improve model reliability performing multi-turn, agentic tasks while increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
Serverless SFT: Fine-tune models using curated datasets for distillation, teaching output style and format, or warming up before RL.

W&B Training includes integration with:

ART, a flexible fine-tuning framework.
RULER, a universal verifier.
A fully-managed backend on CoreWeave Cloud.

To get started, satisfy the prerequisites to start using the service and then see the Serverless RL quickstart or the Serverless SFT docs to learn how to post-train your models.

Serverless RL

Serverless SFT

API Reference

Documentation Index