Use Serverless RL to post-train LLMs that learn new behaviors and improve reliability, speed, and costs when performing multi-turn agentic tasks. Serverless RL is now in public preview. W&B provisions the training infrastructure (on CoreWeave) for you while allowing full flexibility in your environment’s setup. You get instant access to a managed training cluster that elastically auto-scales to dozens of GPUs. Serverless RL splits RL workflows into inference and training phases and multiplexes them across jobs to increase GPU utilization and reduce your training time and costs. Serverless RL is ideal for tasks like:Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-mintlify-style-consistency-1776283399.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Voice agents
- Deep research assistants
- On-prem models
- Content marketing analysis agents
Why Serverless RL?
Reinforcement learning (RL) is a set of powerful training techniques that you can use in many kinds of training setups, including on GPUs that you own or rent directly. Serverless RL can provide the following advantages in your RL post-training:- Lower training costs: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to 0 when you’re not actively training, Serverless RL reduces training costs significantly.
- Faster training time: By splitting inference requests across many GPUs and immediately provisioning training infrastructure when you need it, Serverless RL speeds up your training jobs and lets you iterate faster.
- Automatic deployment: Serverless RL automatically deploys every checkpoint you train, so you do not need to manually set up hosting infrastructure. You can access and test trained models immediately in local, staging, or production environments.
How Serverless RL uses W&B services
Serverless RL uses a combination of the following W&B components to operate:- Inference: To run your models
- Models: To track performance metrics during the LoRA adapter’s training
- Artifacts: To store and version the LoRA adapters
- Weave (optional): To gain observability into how the model responds at each step of the training loop